AmvTek blog/blog/2014-08-01T00:00:00+03:00Extending coverage of the Python serializers benchmark2014-08-01T00:00:00+03:00AmvTek developerstag:,2014-08-01:blog/posts/2014/Aug/01/extending-coverage-of-the-python-serializers-benchmark/<p>Our <a class="reference external" href="/blog/posts/2014/Jul/12/comparing-python-performance-of-protobufthrift-serialization/">previous attend</a> to compare performances of <strong>Python</strong> implementations for
<strong>protocol buffers</strong> and <strong>thrift</strong> serializations has generated interesting
feedback and suggestions. The main request we received was to try to broaden the
coverage of the previous <a class="reference external" href="https://github.com/amvtek/PySerializers">benchmark</a> so as to cover the full range of available options…</p>
<p>We are not there yet, but the <a class="reference external" href="https://github.com/amvtek/PySerializers">benchmark</a> now allows comparing <strong>5</strong> differents
frameworks :</p>
<blockquote>
<ul class="simple">
<li><a class="reference external" href="https://developers.google.com/protocol-buffers/">protocol buffers</a></li>
<li><a class="reference external" href="https://thrift.apache.org/">thrift</a></li>
<li><a class="reference external" href="http://kentonv.github.io/capnproto/index.html">capnp proto</a> (using the <a class="reference external" href="http://jparyani.github.io/pycapnp/">pycapnp</a> package)</li>
<li><a class="reference external" href="http://json.org/">json</a> (standard library package)</li>
<li><a class="reference external" href="http://msgpack.org/">msgpack</a></li>
</ul>
</blockquote>
<div class="section" id="comparing-apples-with-oranges">
<h2>Comparing apples with oranges</h2>
<p>Following our <a class="reference external" href="/blog/posts/2014/Jul/12/comparing-python-performance-of-protobufthrift-serialization/">previous post</a>, several persons suggested us to have a look at
the <a class="reference external" href="http://kentonv.github.io/capnproto/index.html">capnp proto</a> serialization system which is very similar in principle to
thrift and protocol buffers we compared earlier. This similarity allowed us to
get them covered by the previous benchmark in no time, and at first capnp proto
performance looked <strong>astonishingly good</strong>.</p>
<p>We refrained to publish such results however, as something was not looking
correct. If we were to believe what was reported, capnp deserialization time
was not depending upon the size or type of the messages to be processed.
Assuming we had misunderstood how to use the pycapnp extension package we were
leveraging, we contacted Jason Paryani who supports it to ask if he could
suggest anything.</p>
<p>Jason explained us that our results were not surprising him as with capnp <em>real
deserialization</em> will take place only when message inner content is accessed.
Jason also observed that our previous approach to time serialization was
probably favoring capnp irrealistically as part of the serialization happens
when the message content is set.</p>
<p>In short, to allow a <em>fair comparison</em> in between the different frameworks we
wanted to cover and not be fooled by implementation choices made by library
developers, Jason advised to revise our benchmarking approach replacing :</p>
<blockquote>
<ul class="simple">
<li>serialize by construct <span class="amp">&</span> serialize</li>
<li>deserialize by deserialize <span class="amp">&</span> traverse</li>
</ul>
</blockquote>
<p>The new approach is probably not a very good one if one want to establish the
<strong>absolute</strong> performance of a single framework. For example, the full traversal
requirement will unnecessarily harm the deserialization performance figure for
json or msgpack library which deliver fully deserialized dictionaries in one shot.</p>
<p>We believe however that by having each benchmark performs same duties we render
meaningfull comparisons possibles. We invit however any interested individual
to review current <a class="reference external" href="https://github.com/amvtek/PySerializers">benchmark</a> and let us know what could be done to improve
<strong>fairness</strong> of the comparisons we are trying to make.</p>
</div>
<div class="section" id="results-overview">
<h2>Results Overview</h2>
<p>You may run the benchmarks on your side and send us the results for publication
on the GitHub project. The machines we are relying on are low end one… :)</p>
<blockquote>
<ul class="simple">
<li><a class="reference external" href="https://github.com/amvtek/PySerializers/blob/master/results/linux_ubuntu-trusty_32b.rst">linux 32 bits results</a></li>
<li><a class="reference external" href="https://github.com/amvtek/PySerializers/blob/master/results/linux_ubuntu-trusty_64b.rst">linux 64 bits results</a></li>
</ul>
</blockquote>
<p>The two kids of the block are <a class="reference external" href="https://thrift.apache.org/">thrift</a> and the new entrant <a class="reference external" href="http://msgpack.org/">msgpack</a>. There
is no point in trying to departage those <strong>2 winners</strong> as they are not playing in
the same category (schema versus schemaless systems…).</p>
</div>
Comparing Python performance of Protobuf/Thrift serialization…2014-07-12T00:00:00+03:00AmvTek developerstag:,2014-07-12:blog/posts/2014/Jul/12/comparing-python-performance-of-protobufthrift-serialization/<p>When in need to get two software processes to exchange datas, some sort of
protocol is necessary to define how to encode/decode the datas to be
transported. A large number of serialization formats are available (json, xml,
<span class="caps">ASN</span>.1…) so as to tackle the <em>cross process/cross programming language</em> datas
encoding problem and Python provides a large number of libraries to leverage them…</p>
<p>What distinguishes the solution provided by <a class="reference external" href="https://developers.google.com/protocol-buffers/">protocol buffers</a> or <a class="reference external" href="https://thrift.apache.org/">thrift</a> is
the need to describe the datas to be exchanged in a central schema file written
using an <strong>easy to read</strong> idl language. Such schema file is then <em>compiled</em> so
as to provide data representation for a certain programming language. Not all
problems will benefit from this approach, but when developping services that
needs to be accessed by clients written in different programming languages ( eg
Objective C, Java…) we have found that relying on a well defined schema file
allows to save a lot of time.</p>
<div class="section" id="the-need-for-benchmarking">
<h2>The need for benchmarking</h2>
<p>We write a large part of our <strong>server side</strong> code using Python and some of the
projects we support are accessed mainly by <em>native</em> clients over <span class="caps">TCP</span> or <span class="caps">UDP</span>.
Over the years we moved gradually from our home grown custom serialization
solution built on top of Python <a class="reference external" href="https://docs.python.org/2/library/struct.html">struct module</a> to protocol buffer…</p>
<p>The move to protocol buffer allowed us to cut down development time required to
support new types of client to a minimum. We also realized how the reliance of a
central schema was valuable in that everybody can vizualize what the datas are.</p>
<p>One thing however we are regretting from the previous home grown solution are
the performances, and <strong>at the server</strong> performances matter tremendously even
more if you are using an asynchronous networking framework like <a class="reference external" href="https://twistedmatrix.com/trac/">twisted</a>.</p>
<p>For a long while we reinssured ourselves observing that a google supported
extension module was available, and that deploying it, would allow us to
accelerate serialization/deserialization by a factor <strong>10</strong> at least. Deploying
such extension module was delayed till reaching stagging development phase,
because it is quite cumbersome to do so. You need to build things from source
and manage some environment variables in your server processes, to force the use
of the implementation it provides.</p>
<p>Once we activated the protobuf extension module on our stagging server we
started to observe random crashes of the server processes. It took us time to
understand that those crashes were related to the use of such extension module.
Well, we should not have underestimated the fact that Google was labelling this
extension module as experimental, but here again we assumed that Google playing
in a different category than the rest of us they were probably referring to some
pretty advanced usecases :(</p>
<p>After all those hurdles, we realized that selecting the proper serialization
technology for your projects is a decision that shall not be taken lightly.
Thrift provides an obvious alternative to google protocol buffer, but how does
its Python implementation performs ? They exist extensive performance benchmarks
of <a class="reference external" href="https://github.com/eishay/jvm-serializers">java serialization</a> frameworks, but we found nothing similar for Python.</p>
</div>
<div class="section" id="the-benchmark">
<h2>The benchmark</h2>
<p>We have published on GitHub, what we consider to be a good basis to compare the
various serialization frameworks which one may want to leverage. The repository
for the project can be <a class="reference external" href="https://github.com/amvtek/PbThriftBenchmark">reached here</a>.</p>
<p>The benchmark for now compares the performances of protobuf and thrift
serializations for messages defined in the <a class="reference external" href="https://github.com/amvtek/PbThriftBenchmark/blob/master/idl/StuffToTest.proto">StuffTotest schema</a> . We welcome
suggestions to extend such reference schema so as to explore performances
variations more in the details or external help so as to cover more
serialization frameworks…</p>
<div class="section" id="preliminary-results">
<h3>Preliminary results</h3>
<p>We have published on GitHub a <a class="reference external" href="https://github.com/amvtek/PbThriftBenchmark/blob/master/results.rst">result run</a> obtained on a low end development
machine. If we consider performance to be the average in between serialization
and deserialization time for a certain message of the schema, Thrift :</p>
<ul class="simple">
<li>outperforms protocol buffers in 75% of the cases.</li>
<li>is <strong>stable</strong>.</li>
<li>is much easier to deploy (pip install thrift and you are done…)</li>
</ul>
<p>So there is currently a <strong>clear winner</strong> to this benchmark. We will be happy to
rerun it so as to validate that things have changed.</p>
</div>
</div>
Making good use of random in your python unit tests2014-06-20T00:00:00+03:00AmvTek developerstag:,2014-06-20:blog/posts/2014/Jun/20/making-good-use-of-random-in-your-python-unit-tests/<p>Writing efficient <strong>UnitTest</strong> to validate that your code performs as expected
is a difficult endeavor. Lot has been written about the benefits of <strong>Test
Driven</strong> development and on how to best approach testing, and lot can be learned
reading the available litterature. One thing however that we don’t see often
mentionned is that architecting efficient <strong>UnitTest</strong> is pretty <strong>hard</strong> and
that no tools or testing framework are of much value without a fair
understanding of the code base that needs to be tested.</p>
<p>The techniques we will be briefly introducing now are no different. You may use
them to impress your colleagues and show them <strong>TestSuite</strong> you have just
written that contains millions of tests. Be aware though that increasing
TestSuite <strong>test count</strong> may not be sufficient to meaningfully change code base
<strong>coverage</strong>.</p>
<div class="section" id="basic-idea">
<h2>Basic idea</h2>
<p>Assumes you wish to provide tests for an hypothetical <strong>func_to_test</strong> that
looks like so :</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">func_to_test</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
<p>To proceed with unit testing <strong>func_to_test</strong>, our first goal is to generate values
that optimally covers the expected domain.</p>
<p>Assumes that x and y are <em>float numbers</em> varying in between [xmin, xmax] and
[ymin, ymax].</p>
<p>You may generate a range of values for calling <strong>func_to_test</strong> like so :</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">math</span><span class="o">,</span> <span class="nn">itertools</span>
<span class="k">def</span> <span class="nf">float_range</span><span class="p">(</span><span class="n">vmin</span><span class="p">,</span> <span class="n">vmax</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
<span class="s2">"yield n regularily spaced values in between vmin and vmax..."</span>
<span class="n">s</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">vmax</span><span class="o">-</span><span class="n">vmin</span><span class="p">)</span><span class="o">/</span><span class="n">n</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">vmin</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">v</span>
<span class="n">v</span> <span class="o">+=</span> <span class="n">s</span>
<span class="k">def</span> <span class="nf">gen_func_to_test_sample</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
<span class="s2">"yield at least m tuples covering func_to_test domain..."</span>
<span class="c1"># calculate optimal number of values along each axis</span>
<span class="n">n</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">m</span><span class="p">)))</span>
<span class="c1"># define value range for x and y</span>
<span class="n">rx</span> <span class="o">=</span> <span class="n">float_range</span><span class="p">(</span><span class="n">xmin</span><span class="p">,</span> <span class="n">xmax</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
<span class="n">ry</span> <span class="o">=</span> <span class="n">float_range</span><span class="p">(</span><span class="n">ymin</span><span class="p">,</span> <span class="n">ymax</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
<span class="c1"># yield regularily spaced tuples covering func_to_test domain</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">product</span><span class="p">(</span><span class="n">rx</span><span class="p">,</span> <span class="n">ry</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">t</span>
</pre></div>
<p>In this simple case, it would be simpler to use 2 <strong>nested loop</strong> to generate
the values covering <strong>func_to_test</strong> domain. However if <strong>func_to_test</strong> number of axis
is large, itertools.product allows to keep things manageable.</p>
<p>The basic idea of <strong>randomization</strong> consists in covering the problem space with
randomly generated values. <strong>Randomization</strong> has <strong>2</strong> benefits over previous
approach :</p>
<ul class="simple">
<li>The code to generate values over the problem domain is <strong>much simpler</strong>.</li>
<li>Test values being irregularily spaced you will not be trapped by singularity.</li>
</ul>
<p>To generate a random range of values for calling <strong>func_to_test</strong> you may proceed
like so :</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">random</span>
<span class="k">def</span> <span class="nf">gen_func_to_test_random_sample</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">xmin</span><span class="p">,</span><span class="n">xmax</span><span class="p">),</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">ymin</span><span class="p">,</span><span class="n">ymax</span><span class="p">)</span>
</pre></div>
<p>In case you are not familiar with standard library <a class="reference external" href="https://docs.python.org/2/library/random.html">random module</a> we invit you
to explore it as it has lot of features to help generating objects covering
complex domain…</p>
</div>
<div class="section" id="be-repeatable">
<h2>Be repeatable</h2>
<p>By now you shall have understood the basic idea of <strong>tests randomization</strong>
pretty well. What we want is to cover the <strong>problem space</strong> in an efficient way
minimizing the risks of being trapped by singularities…</p>
<p>There is one <strong>big problem</strong> though with the approach that we take, is that
<strong>tests suite</strong> shall be <strong>repeatable</strong>. Imagine that one developer reports that
he has observed failure of <em>test 100</em>. If <em>test 100</em> can never be rerun <em>as is</em>
our randomized <strong>tests suite</strong> will generate more confusion than value.</p>
<p>Fortunately, the <em>Mersenne Twister</em> random generator exported by the standard
library random module can be initialized so that same random sequences are
generated. Let’s modify our sample generator to make use of this :</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">random</span> <span class="kn">import</span> <span class="n">Random</span>
<span class="k">def</span> <span class="nf">gen_func_to_test_random_sample</span><span class="p">(</span><span class="n">seed</span><span class="p">,</span><span class="n">m</span><span class="p">):</span>
<span class="n">random</span> <span class="o">=</span> <span class="n">Random</span><span class="p">((</span><span class="n">seed</span><span class="p">,</span><span class="n">m</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">xmin</span><span class="p">,</span><span class="n">xmax</span><span class="p">),</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">ymin</span><span class="p">,</span><span class="n">ymax</span><span class="p">)</span>
</pre></div>
<p>We use a dedicated instance of Random to prevent interfering with other thread
which may also be in need of random values at the very same moment we are
generating the test sequence.</p>
<p>Using same seed value for each run of the tests suites allows to guarantee that
same sample sequence will be generated…</p>
</div>
<div class="section" id="testcase-factories">
<h2>TestCase factories</h2>
<p>As you have written tests before, by now you shall be asking yourself how to use
this large sequence of (random) objects which you have been advised to generate.</p>
<p>The obvious approach would be to write a single test method that iterates over
the <em>sample sequence</em> and apply desired assertions on <strong>func_to_test</strong> results. We
advise you against doing so as your test function will prospectively be in need
to apply a very large number of assertions and exit without continuing at the
first encountered problem.</p>
<p>Instead you can use a <strong>factory function</strong> which will take care of generating
your TestCase like so :</p>
<div class="highlight"><pre><span></span><span class="s2">"Your test module"</span>
<span class="kn">import</span> <span class="nn">unittest</span>
<span class="kn">from</span> <span class="nn">random</span> <span class="kn">import</span> <span class="n">Random</span>
<span class="kn">from</span> <span class="nn">somewhere</span> <span class="kn">import</span> <span class="n">func_to_test</span>
<span class="n">XDOMAIN</span> <span class="o">=</span> <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span><span class="mf">8.0</span><span class="p">)</span> <span class="c1"># example (xmin,xmax)</span>
<span class="n">YDOMAIN</span> <span class="o">=</span> <span class="p">(</span><span class="mf">2.0</span><span class="p">,</span><span class="mf">6.0</span><span class="p">)</span> <span class="c1"># example (ymin,ymax)</span>
<span class="k">def</span> <span class="nf">gen_func_to_test_random_sample</span><span class="p">(</span><span class="n">seed</span><span class="p">,</span><span class="n">m</span><span class="p">):</span>
<span class="s2">"yield random point over func_to_test domain..."</span>
<span class="n">random</span> <span class="o">=</span> <span class="n">Random</span><span class="p">((</span><span class="n">seed</span><span class="p">,</span><span class="n">m</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">*</span><span class="n">XDOMAIN</span><span class="p">),</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">*</span><span class="n">YDOMAIN</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">build_TestFuncTestCase</span><span class="p">(</span><span class="n">seed</span><span class="p">,</span><span class="n">m</span><span class="p">):</span>
<span class="s2">"return TestCase class for func_to_test..."</span>
<span class="c1"># test method factory</span>
<span class="k">def</span> <span class="nf">make_test_method</span><span class="p">(</span><span class="n">test_point</span><span class="p">):</span>
<span class="s2">"return func_to_test test..."</span>
<span class="k">def</span> <span class="nf">a_test</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">func_to_test</span><span class="p">(</span><span class="o">*</span><span class="n">test_point</span><span class="p">)</span>
<span class="c1"># all your asserts here, see unittest.TestCase documentation...</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertSomethingOn</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="o">...</span>
<span class="k">return</span> <span class="n">a_test</span>
<span class="c1"># fill TestCase dict</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">dico</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">pt</span> <span class="ow">in</span> <span class="n">gen_func_to_test_random_sample</span><span class="p">(</span><span class="n">seed</span><span class="p">,</span><span class="n">m</span><span class="p">):</span>
<span class="n">testname</span> <span class="o">=</span> <span class="s2">"test_func_to_test_</span><span class="si">%i</span><span class="s2">"</span> <span class="o">%</span> <span class="n">count</span>
<span class="n">dico</span><span class="p">[</span><span class="n">testname</span><span class="p">]</span> <span class="o">=</span> <span class="n">make_test_method</span><span class="p">(</span><span class="n">pt</span><span class="p">)</span>
<span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="nb">type</span><span class="p">(</span><span class="s2">"TestFuncTestCase"</span><span class="p">,(</span><span class="n">unittest</span><span class="o">.</span><span class="n">TestCase</span><span class="p">,),</span><span class="n">dico</span><span class="p">)</span>
<span class="c1"># this TestCase class will be picked up by Test Runner</span>
<span class="c1"># it will contain 1024 tests...</span>
<span class="n">TestFuncTestCase</span> <span class="o">=</span> <span class="n">build_TestFuncTestCase</span><span class="p">(</span><span class="s2">"my test suite"</span><span class="p">,</span><span class="mi">1024</span><span class="p">)</span>
</pre></div>
<p>It is our experience that <strong>randomization</strong> <em>when applicable</em> provides an
efficient way forward to unit test your module. This approach can be summarized
like this :</p>
<ol class="arabic simple">
<li>Write code that generate <strong>repeatable</strong> (pseudo random) sequence of objects
over your problem domain.</li>
<li>Use a factory function to generate TestCase subclasses with one test method
for each object in your test sequence.</li>
</ol>
</div>
Accessing multiple postgres schemas from Django2014-06-13T00:00:00+03:00AmvTek developerstag:,2014-06-13:blog/posts/2014/Jun/13/accessing-multiple-postgres-schemas-from-django/<p>One of the postgres feature we have been laking the most when working with the
Django <span class="caps">ORM</span> is the lake of direct support for <strong>postgres schemas</strong>. In the past
we tried several roads to explicitely target other schemas than <strong>public</strong> when
creating or accessing the database structures required by our django
applications, but those <strong>from code</strong> approaches were difficult to maintain.</p>
<p>It appears that this problem can be solved quite elegantly by leveraging
postgres <strong>search_path</strong> parameter.</p>
<div class="section" id="a-simple-example">
<h2>A simple example</h2>
<p>Assumes we wish all the tables of our django project to be created in a schema
called <strong>django</strong> and that our project also requires mapping/accessing a few
tables in a schema called <strong>legacy</strong>. This maybe achieved very easily by fine
tuning the <strong><span class="caps">DATABASES</span></strong> setting.</p>
<p>Let’see 2 differents ways to configure this :</p>
<div class="section" id="approach-1-setting-search-path-at-connection-time">
<h3>Approach 1, setting search_path at connection time :</h3>
<p>We assume that django and legacy schemas already exist in the target database
and that the user we use to access it have the necessary permissions on such schemas.</p>
<p>On django side, we will use a <strong>search_path</strong> connection <strong>option</strong> so that we
land in the correct schema. Two databases will be configured even though we are
connecting to the same database.</p>
<div class="highlight"><pre><span></span><span class="c1"># your project settings file</span>
<span class="n">DATABASES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'default'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'ENGINE'</span><span class="p">:</span> <span class="s1">'django.db.backends.postgresql_psycopg2'</span><span class="p">,</span>
<span class="s1">'OPTIONS'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'options'</span><span class="p">:</span> <span class="s1">'-c search_path=django,public'</span>
<span class="p">},</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'multi_schema_db'</span><span class="p">,</span>
<span class="s1">'USER'</span><span class="p">:</span> <span class="s1">'appuser'</span><span class="p">,</span>
<span class="s1">'PASSWORD'</span><span class="p">:</span> <span class="s1">'secret'</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'legacy'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'ENGINE'</span><span class="p">:</span> <span class="s1">'django.db.backends.postgresql_psycopg2'</span><span class="p">,</span>
<span class="s1">'OPTIONS'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'options'</span><span class="p">:</span> <span class="s1">'-c search_path=legacy,public'</span>
<span class="p">},</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'multi_schema_db'</span><span class="p">,</span>
<span class="s1">'USER'</span><span class="p">:</span> <span class="s1">'appuser'</span><span class="p">,</span>
<span class="s1">'PASSWORD'</span><span class="p">:</span> <span class="s1">'secret'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>This is a good approach for development as it requires minimum configuration.</p>
<p>If you <strong>syncdb</strong> against default databases, all tables for the <strong>managed</strong>
models will get created in <strong>django</strong> schema…</p>
</div>
<div class="section" id="approach-2-configuring-various-databases-users">
<h3>Approach 2, configuring various databases users :</h3>
<p>One drawback of the first approach is that the <strong>set search_path</strong> command will
be send from client to server each time a new database connection is established.</p>
<p>To save some milliseconds on connection time, one can <strong>preassign</strong> the desired
search_path to the user used for connection…</p>
<div class="section" id="preassigning-search-path-to-database-user">
<h4>Preassigning search_path to database user :</h4>
<p>As <strong>postgres</strong> user in psql shell…</p>
<div class="highlight"><pre><span></span><span class="c1">-- user accessing django schema...</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">django_user</span> <span class="n">LOGIN</span> <span class="n">PASSWORD</span> <span class="s1">'secret'</span><span class="p">;</span>
<span class="k">GRANT</span> <span class="n">appuser</span> <span class="k">TO</span> <span class="n">django_user</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="k">ROLE</span> <span class="n">django_user</span> <span class="k">SET</span> <span class="n">search_path</span> <span class="k">TO</span> <span class="n">django</span><span class="p">,</span> <span class="k">public</span><span class="p">;</span>
<span class="c1">-- user accessing legacy schema...</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">legacy_user</span> <span class="n">LOGIN</span> <span class="n">PASSWORD</span> <span class="s1">'secret'</span><span class="p">;</span>
<span class="k">GRANT</span> <span class="n">appuser</span> <span class="k">TO</span> <span class="n">legacy_user</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="k">ROLE</span> <span class="n">legacy_user</span> <span class="k">SET</span> <span class="n">search_path</span> <span class="k">TO</span> <span class="n">legacy</span><span class="p">,</span> <span class="k">public</span><span class="p">;</span>
</pre></div>
</div>
<div class="section" id="defining-databases-setting">
<h4>Defining <span class="caps">DATABASES</span> setting :</h4>
<div class="highlight"><pre><span></span><span class="c1"># your production project settings file</span>
<span class="n">DATABASES</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'default'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'ENGINE'</span><span class="p">:</span> <span class="s1">'django.db.backends.postgresql_psycopg2'</span><span class="p">,</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'multi_schema_db'</span><span class="p">,</span>
<span class="s1">'USER'</span><span class="p">:</span> <span class="s1">'django_user'</span><span class="p">,</span>
<span class="s1">'PASSWORD'</span><span class="p">:</span> <span class="s1">'secret'</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'legacy'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'ENGINE'</span><span class="p">:</span> <span class="s1">'django.db.backends.postgresql_psycopg2'</span><span class="p">,</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'multi_schema_db'</span><span class="p">,</span>
<span class="s1">'USER'</span><span class="p">:</span> <span class="s1">'legacy_user'</span><span class="p">,</span>
<span class="s1">'PASSWORD'</span><span class="p">:</span> <span class="s1">'secret'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">}</span>
</pre></div>
<p>That’s all there is to support multiples postgres schemas from Django.</p>
<p>Custom Django <strong>Database Router</strong> may also be defined to automatically select
the correct schema to use…</p>
</div>
</div>
</div>
Improving EventSource browser support2014-05-20T00:00:00+03:00AmvTek developerstag:,2014-05-20:blog/posts/2014/May/20/improving-eventsource-browser-support/<p>We are happy to <a class="reference external" href="https://github.com/amvtek/EventSource">opensource</a> a Polyfill that we are using extensively, which will
let you use the EventSource in all the browsers that matters today.</p>
<p>In case you have not heard about it, <a class="reference external" href="http://www.w3.org/TR/eventsource/">EventSource</a> (aka Server Sent Event) is a
javascript api part of the html5 suite, which let you efficiently and
asynchronously <strong>stream</strong> a large number of event messages accross a single <span class="caps">HTTP</span> connection.</p>
<p>We started to consider using EventSource in the context of a large scale realtime
field monitoring system, where users may access web pages that let them
vizualize evolution of datas coming from a large number of sensors. With
EventSource we can very cleanly make <strong>web browsers</strong> be updated in realtime of
events distributed by a <strong>publish/subscribe</strong> system like the one provided by
Redis or RabbitMQ.</p>
<p>As we started experimenting with EventSource, we realized that it could not be
used currently in Internet explorer 8, 9, 10, 11 and most of Android browsers.
See this <a class="reference external" href="http://caniuse.com/#feat=eventsource">report</a> for details.</p>
<p>We tested various polyfill aiming at widening the support of EventSource and
after observing they were not allowing to support some of the browsers we had to
target, we decided to build something on our own.</p>
<p>This <a class="reference external" href="https://github.com/amvtek/EventSource">project</a> is now available on GitHub and we hope it will help raising
awareness about this technology.</p>
Making use of twisted coiterate2014-05-12T00:00:00+03:00AmvTek developerstag:,2014-05-12:blog/posts/2014/May/12/making-use-of-twisted-coiterate/<p>Twisted provides various ways to integrate <strong><span class="caps">CPU</span> bound</strong> operations or
<strong>blocking libraries</strong> to the reactor. It provides very clean integration path
for <a class="reference external" href="http://twistedmatrix.com/documents/current/core/howto/threading.html">threading</a> or <a class="reference external" href="http://twistedmatrix.com/documents/current/core/howto/process.html">external processes</a>.</p>
<p>In this post, we describe an <em>under documented</em> alternative, where the long
running task is implemented using an iterator that will be consumed
<strong>directly</strong> in the reactor event loop after passing it to <a class="reference external" href="http://twistedmatrix.com/documents/current/api/twisted.internet.task.html#coiterate">coiterate</a>.</p>
<p>Where usable, coiteration allows to completely avoid using threading, bypassing
the well known python <span class="caps">GIL</span> bottleneck…</p>
<div class="section" id="basic-idea">
<h2>Basic idea</h2>
<p><strong>Coiteration</strong> requires developers to use a <strong>divide and conquer</strong> strategy to
plan their task execution. In python, we will code the task using a generator
function or alternatively a class implementing the iterator protocol.</p>
<p>The task will be executed <em>step by step</em> in the reactor event loop after passing
the iterator that represents it to <strong>coiterate</strong>.</p>
</div>
<div class="section" id="summing-integers">
<h2>Summing integers</h2>
<p>Let’s consider a python function which sums the N first integers, N being
arbitrary large.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">s</span> <span class="o">+=</span> <span class="n">i</span>
<span class="k">return</span> <span class="n">s</span>
</pre></div>
<p>For <em>very</em> large value of N, calling such function from the same thread as the one
inside which the <strong>reactor</strong> is running, is not a good idea as the event loop
will be blocked for as long as this function needs to return…</p>
<div class="section" id="summing-using-coiteration">
<h3>Summing using coiteration</h3>
<div class="section" id="a-first-approach">
<h4>A first approach</h4>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20</pre></div></td><td class="code"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">reactor</span>
<span class="kn">from</span> <span class="nn">twisted.internet.task</span> <span class="kn">import</span> <span class="n">coiterate</span>
<span class="k">def</span> <span class="nf">make_iterator_to_sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">s</span> <span class="o">+=</span> <span class="n">i</span>
<span class="k">print</span> <span class="s2">"Adding </span><span class="si">%i</span><span class="s2"> to result"</span> <span class="o">%</span> <span class="n">i</span>
<span class="k">yield</span> <span class="bp">None</span> <span class="c1"># event loop can looks after other things...</span>
<span class="k">print</span> <span class="s2">"result is </span><span class="si">%i</span><span class="s2"> "</span> <span class="o">%</span> <span class="n">s</span>
<span class="k">def</span> <span class="nf">sum_all_integers_being_nice_to_reactor</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">all_sum_steps</span> <span class="o">=</span> <span class="n">make_iterator_to_sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">)</span>
<span class="n">coiterate</span><span class="p">(</span><span class="n">all_sum_steps</span><span class="p">)</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">callLater</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">sum_all_integers_being_nice_to_reactor</span><span class="p">,</span> <span class="mi">8</span><span class="p">)</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</pre></div>
</td></tr></table><p>See gist <a class="reference external" href="https://gist.github.com/amvtek/0aa479fbfb6f83621b40#file-coiterate_01-py">coiterate01.py</a></p>
<p>At <strong>line 4</strong> a generator function is defined, that return an iterator that will
calculate the sum of all integers until a certain value N. At <strong>line 17</strong>, this
iterator is passed to <strong>coiterate</strong> which will result in such iterator being
consumed in the reactor event loop in an optimal way.</p>
<p>Note that this does not make your iterator magically non blocking, as everywhere
else in Twisted, developer shall ensure that each iteration is non blocking.</p>
</div>
<div class="section" id="obtaining-the-result">
<h4>Obtaining the result</h4>
<p>If you took the time to run the above <em>sum_all_integer …</em>, you have probably
been delighted to see the result being printed in the console.</p>
<p>Retrieving such result to make use of it, requires some additional efforts that
will be detailled now.</p>
<p>Let’s first observe that <strong>coiterate</strong> is a well behaved Twisted citizen. As it
is starting an operation <em>(consumption of the argument iterator…)</em> that will
take some time to complete, it returns a <strong>Deferred</strong>. As you may expect, this
Deferred will fire when iteration is over.</p>
<p>If we attach a <strong>callback</strong> function to this Deferred we will not receive our
result, but the same iterator that coiterate has consumed. Let’s see a possible
solution to obtain a result from the iterator.</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33</pre></div></td><td class="code"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">reactor</span>
<span class="kn">from</span> <span class="nn">twisted.internet.task</span> <span class="kn">import</span> <span class="n">coiterate</span>
<span class="k">def</span> <span class="nf">make_iterator_to_sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">s</span> <span class="o">+=</span> <span class="n">i</span>
<span class="k">print</span> <span class="s2">"Adding </span><span class="si">%i</span><span class="s2"> to result"</span> <span class="o">%</span> <span class="n">i</span>
<span class="k">yield</span> <span class="bp">None</span> <span class="c1"># event loop can looks after other things...</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'result'</span><span class="p">]</span> <span class="o">=</span> <span class="n">s</span>
<span class="k">def</span> <span class="nf">sum_all_integers_being_nice_to_reactor</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="s2">"return Deferred firing calculated sum..."</span>
<span class="k">def</span> <span class="nf">extract_result_cb</span><span class="p">(</span><span class="n">ignored</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
<span class="s2">"return context['result']"</span>
<span class="n">rv</span> <span class="o">=</span> <span class="n">context</span><span class="p">[</span><span class="s1">'result'</span><span class="p">]</span>
<span class="k">print</span> <span class="s2">"Got result = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">rv</span>
<span class="k">return</span> <span class="n">rv</span>
<span class="n">context</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">all_sum_steps</span> <span class="o">=</span> <span class="n">make_iterator_to_sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
<span class="n">deferred</span> <span class="o">=</span> <span class="n">coiterate</span><span class="p">(</span><span class="n">all_sum_steps</span><span class="p">)</span>
<span class="n">deferred</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">extract_result_cb</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
<span class="k">return</span> <span class="n">deferred</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">callLater</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">sum_all_integers_being_nice_to_reactor</span><span class="p">,</span> <span class="mi">8</span><span class="p">)</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</pre></div>
</td></tr></table><p>See gist <a class="reference external" href="https://gist.github.com/amvtek/0aa479fbfb6f83621b40#file-coiterate_02-py">coiterate_02.py</a></p>
<p>Let’s summarize how this code proceeds :</p>
<p>At <strong>line 4</strong>, we define a <strong>generator function</strong> which returns an iterator that
let us execute our task <em>step by step</em>. As we also need a result from such
iterator, we pass it an additional <strong>context object</strong> which provides a way to
“return” any result obtained during iteration.</p>
<p>At <strong>line 14</strong>, we construct a well behaved python function that returns a
Deferred that will fire with the result we are awaiting. Internally this
function takes care of all the gory details of constructing the iterator that
will be passed to coiterate and extracting the result we need.</p>
</div>
<div class="section" id="waiting-for-deferred">
<h4>Waiting for Deferred…</h4>
<p>Meanwhile executing a <em>long running task</em>, it is quite common to have to wait
some time until some externals operations complete. Twisted let our <em>coiterable</em>
tasks indicate that they shall be paused until a certain <strong>Deferred</strong> fires. To
achieve so, the only thing to do is to <strong>yield</strong> the Deferred of interest out of
the task iterator.</p>
<p>Let’s see how we could have our <em>sum_all_integer…</em> wait <strong>1 second</strong> in
between each iteration step.</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26</pre></div></td><td class="code"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">reactor</span>
<span class="kn">from</span> <span class="nn">twisted.internet.task</span> <span class="kn">import</span> <span class="n">coiterate</span><span class="p">,</span> <span class="n">deferLater</span>
<span class="k">def</span> <span class="nf">make_iterator_to_sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">wait_some_time</span><span class="p">(</span><span class="n">t</span><span class="p">):</span>
<span class="s2">"return Deferred firing after t seconds"</span>
<span class="k">return</span> <span class="n">deferLater</span><span class="p">(</span><span class="n">reactor</span><span class="p">,</span><span class="n">t</span><span class="p">,</span><span class="k">lambda</span> <span class="p">:</span><span class="s2">"I was paused </span><span class="si">%.02f</span><span class="s2"> seconds"</span><span class="o">%</span><span class="n">t</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">print_pause_cb</span><span class="p">(</span><span class="n">msg</span><span class="p">):</span>
<span class="s2">"callback printing result message..."</span>
<span class="k">print</span> <span class="n">msg</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">s</span> <span class="o">+=</span> <span class="n">i</span>
<span class="k">print</span> <span class="s2">"Adding </span><span class="si">%i</span><span class="s2"> to result"</span> <span class="o">%</span> <span class="n">i</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">wait_some_time</span><span class="p">(</span><span class="mf">1.0</span><span class="p">)</span>
<span class="n">d</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">print_pause_cb</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">d</span> <span class="c1"># we will be paused until d fires...</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'result'</span><span class="p">]</span> <span class="o">=</span> <span class="n">s</span>
</pre></div>
</td></tr></table><p>See gist <a class="reference external" href="https://gist.github.com/amvtek/0aa479fbfb6f83621b40#file-coiterate_03-py">coiterate_03.py</a></p>
<p>At <strong>line 4</strong> is the modified <strong>generator function</strong> that will pause some time
in between each step. The wait_some_time function at <strong>line 6</strong> could be
anything that returns a Deferred.</p>
<p>It is our experience that the <strong>yield to wait</strong> approach which coiterate allows
greatly simplify coding <em>complex tasks</em> with Twisted.</p>
</div>
<div class="section" id="cancelling-coiteration">
<h4>Cancelling coiteration</h4>
<p>When requiring clients to wait long time to get the result of a long running
operation, we shall expect situations where the client will give up. In such
situations, we normally want to cleanup as soon as possible any resources
allocated to service such client.</p>
<p>Before showing how this can be achieved in the context of this example, let’s
mention that if you need to control your task from the outside to pause it or
stop it, you should consider using <a class="reference external" href="http://twistedmatrix.com/documents/current/api/twisted.internet.task.html#cooperate">cooperate</a> instead of coiterate. Like
coiterate, cooperate shall be called with an iterator which will be consumed in
the reactor <strong>event loop</strong>. Unlike coiterate that returns a Deferred that fires
when iteration is completed, cooperate returns a Task object that can be used to
<strong>pause</strong> or <strong>stop</strong> the ongoing task…</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84</pre></div></td><td class="code"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">twisted.internet</span> <span class="kn">import</span> <span class="n">reactor</span>
<span class="kn">from</span> <span class="nn">twisted.internet.defer</span> <span class="kn">import</span> <span class="n">Deferred</span><span class="p">,</span> <span class="n">CancelledError</span>
<span class="kn">from</span> <span class="nn">twisted.internet.task</span> <span class="kn">import</span> <span class="n">coiterate</span><span class="p">,</span> <span class="n">deferLater</span>
<span class="k">def</span> <span class="nf">make_iterator_to_sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">wait_some_time</span><span class="p">(</span><span class="n">t</span><span class="p">):</span>
<span class="s2">"return Deferred firing after t seconds"</span>
<span class="k">return</span> <span class="n">deferLater</span><span class="p">(</span><span class="n">reactor</span><span class="p">,</span><span class="n">t</span><span class="p">,</span><span class="k">lambda</span> <span class="p">:</span><span class="s2">"I was paused </span><span class="si">%.02f</span><span class="s2"> seconds"</span><span class="o">%</span><span class="n">t</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">print_pause_cb</span><span class="p">(</span><span class="n">msg</span><span class="p">):</span>
<span class="s2">"callback printing result message..."</span>
<span class="k">print</span> <span class="n">msg</span>
<span class="n">d</span> <span class="o">=</span> <span class="bp">None</span>
<span class="n">s</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">s</span> <span class="o">+=</span> <span class="n">i</span>
<span class="k">print</span> <span class="s2">"Adding </span><span class="si">%i</span><span class="s2"> to result"</span> <span class="o">%</span> <span class="n">i</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">wait_some_time</span><span class="p">(</span><span class="mf">1.0</span><span class="p">)</span>
<span class="n">d</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">print_pause_cb</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">d</span> <span class="c1"># we will be paused until d fires...</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'result'</span><span class="p">]</span> <span class="o">=</span> <span class="n">s</span>
<span class="k">except</span> <span class="ne">GeneratorExit</span><span class="p">:</span>
<span class="k">print</span> <span class="s2">"---"</span>
<span class="k">print</span> <span class="s2">"Early termination..."</span>
<span class="c1"># cancel pending Defferred</span>
<span class="k">if</span> <span class="n">d</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">d</span><span class="o">.</span><span class="n">called</span><span class="p">:</span>
<span class="n">d</span><span class="o">.</span><span class="n">cancel</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">sum_all_integers_being_nice_to_reactor</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="s2">"return Deferred firing calculated sum..."</span>
<span class="k">def</span> <span class="nf">extract_result_cb</span><span class="p">(</span><span class="n">ignored</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
<span class="s2">"return context['result']"</span>
<span class="n">rv</span> <span class="o">=</span> <span class="n">context</span><span class="p">[</span><span class="s1">'result'</span><span class="p">]</span>
<span class="k">return</span> <span class="n">rv</span>
<span class="k">def</span> <span class="nf">suppress_cancel_log_eb</span><span class="p">(</span><span class="n">error</span><span class="p">):</span>
<span class="s2">"trap CancelledError"</span>
<span class="c1"># this suppress UnhandledError warning...</span>
<span class="n">error</span><span class="o">.</span><span class="n">trap</span><span class="p">(</span><span class="n">CancelledError</span><span class="p">)</span>
<span class="n">context</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">all_sum_steps</span> <span class="o">=</span> <span class="n">make_iterator_to_sum_all_integers_until</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
<span class="n">deferred</span> <span class="o">=</span> <span class="n">Deferred</span><span class="p">(</span><span class="k">lambda</span> <span class="n">_</span><span class="p">:</span><span class="n">all_sum_steps</span><span class="o">.</span><span class="n">close</span><span class="p">())</span>
<span class="n">coiterate</span><span class="p">(</span><span class="n">all_sum_steps</span><span class="p">)</span><span class="o">.</span><span class="n">chainDeferred</span><span class="p">(</span><span class="n">deferred</span><span class="p">)</span>
<span class="n">deferred</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">extract_result_cb</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
<span class="n">deferred</span><span class="o">.</span><span class="n">addErrback</span><span class="p">(</span><span class="n">suppress_cancel_log_eb</span><span class="p">)</span>
<span class="k">return</span> <span class="n">deferred</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="s2">"start summing integers and stop after 3 seconds..."</span>
<span class="k">def</span> <span class="nf">print_result_cb</span><span class="p">(</span><span class="n">res</span><span class="p">):</span>
<span class="s2">"print result if any..."</span>
<span class="k">if</span> <span class="n">res</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="k">print</span> <span class="s2">"Got result = </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">res</span>
<span class="c1"># start sum calculation using coiteration...</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">sum_all_integers_being_nice_to_reactor</span><span class="p">(</span><span class="mi">8</span><span class="p">)</span>
<span class="n">d</span><span class="o">.</span><span class="n">addCallback</span><span class="p">(</span><span class="n">print_result_cb</span><span class="p">)</span>
<span class="c1"># schedule cancellation after 3.00 seconds</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">callLater</span><span class="p">(</span><span class="mf">3.0</span><span class="p">,</span> <span class="n">d</span><span class="o">.</span><span class="n">cancel</span><span class="p">)</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">callLater</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">main</span><span class="p">)</span>
<span class="n">reactor</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</pre></div>
</td></tr></table><p>See gist <a class="reference external" href="https://gist.github.com/amvtek/0aa479fbfb6f83621b40#file-coiterate_04-py">coiterate_04.py</a></p>
<p>At <strong>line 4</strong> our generator function was again modified. At <strong>line 32</strong>, an inner
handler block for the <strong>GeneratorExit</strong> exception was added. This block will be
reached in case <strong>close</strong> is called on iterator objects returned by our
generator function. In this block, we are cleaning up any pending deferred that
the task may be waiting for.</p>
<p>One would expect that cancelling the Deferred returned by coiterate would
automatically <strong>close</strong> the related iterator, but this is not the case. Let’s modify
the <em>sum_all_integer…</em> function for this to happen. At <strong>line 60</strong>, we
construct the Deferred that <em>sum_all_integer…</em> will return, providing it a
cancellation function. Such function simply <strong>close</strong> the iterator returned by
the generator function. This <em>helper</em> Deferred is <strong>chained</strong> to the Deferred
that coiterate returns, so that when no cancellation occurs, we get our result…</p>
</div>
</div>
</div>