<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Zeta-Puppis.com &#187; Python</title>
	<atom:link href="http://zeta-puppis.com/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://zeta-puppis.com</link>
	<description>my very own personal corner</description>
	<lastBuildDate>Sat, 18 Feb 2012 12:53:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Clustering coordinate points together with quad-trees</title>
		<link>http://zeta-puppis.com/2010/10/02/clustering-coordinate-points-together-with-quad-trees/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=clustering-coordinate-points-together-with-quad-trees</link>
		<comments>http://zeta-puppis.com/2010/10/02/clustering-coordinate-points-together-with-quad-trees/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 18:15:20 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[dataset]]></category>
		<category><![CDATA[openheatmap]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[quad tree]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=329</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2010/10/02/clustering-coordinate-points-together-with-quad-trees/" title="Clustering coordinate points together with quad-trees"></a>Recently I needed to show a heat map of a quite a lot of coordinate points for a little project of mine that ended up in a data visualization contest (that unfortunately I didn&#8217;t win, even though I made to &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2010/10/02/clustering-coordinate-points-together-with-quad-trees/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2010/10/02/clustering-coordinate-points-together-with-quad-trees/" title="Clustering coordinate points together with quad-trees"></a><p>Recently I needed to show a heat map of a quite a lot of coordinate points for a little project of mine that ended up in a <a href="http://thisweekinrelevance.com/2010/09/07/twir-contest/">data visualization contest</a> (that unfortunately I didn&#8217;t win, even though I made to the finalists). The idea was to show the distribution of the georeferenced wikipedia pages through a heat map, so when I first heard about openheatmap.com I knew it was the tool to use. OpenHeatMap.com is an excellent project by <a href="http://petewarden.typepad.com">Pete Warden</a> that takes a dataset as a CSV, Excel or Google Spreadsheet file and convert it to a nice, browsable heat map presentation.<br />
<span id="more-329"></span><br />
The first step was to obtain I dataset I could work on. I first tried to work directly onto the whole <a href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">wikipedia database dump,</a> extracting all the georeferenced pages in a smaller dataset. I actually succeeded but this work included only the english georeferenced pages. Also, extracting and converting coordinates to a common format would have been a real pain. So instead I decided to use the dump from the <a href="http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Georeferenzierung/Wikipedia-World/en">wikipedia-world project</a> that already included data in a CSV file from all the downloadable wikipedia dumps which include other languages other than english. This dataset include roughly 1.300.000 points, so I had to narrow down some options to process&nbsp;it.</p>
<p>Once I had the dataset ready and knew how big it was I realized I had three&nbsp;options:</p>
<ol>
<li>the naive approach, just add every coordinate to the CSV&nbsp;file</li>
<li>use a reverse geocoding service to get the country where the point belongs&nbsp;to</li>
<li>cluster set of points&nbsp;together</li>
</ol>
<p>It was clear that the first approach wouldn&#8217;t have worked for two reasons, the former being that there were just too many points for <a href="http://openheatmap.com">OHM</a> (the rendering is done on client side and that would slow things a lot). Also, I would just draw points onto a map without effectively creating a &#8220;heat map&#8221; so I discarded that option soon.<br />
Using a reverse geocoding service wouldn&#8217;t have worked too: I should have ran too many requests to a service like this and it would have taken ages. Also, I would have ended up with per-country rather than a per-city highlighting and that would have faked the final result. So it was clear that the only viable option was to cluster set of points together and then produce a CSV file that OHM would understand. Soon I realized I needed some sort of spatial indexing for a 2d space that turned out to be quad&nbsp;trees.</p>
<p>Before we dive deeper in how to cluster the points together we need to understand what&#8217;s a quad-tree. In the classic, recursive, definition of a tree, a quad-tree is a tree where each node, that represent a coordinate point, has up to four children. Each child represent a relative position to its father, being north west, north east, south west or south&nbsp;east.</p>
<p><img src="http://zeta-puppis.com/wp-content/uploads/2010/10/quadtree.png" alt="" title="Quad Tree" width="300" height="200" class="alignnone size-full wp-image-333" /></p>
<p>One requirement for generating the heat map is knowing how many nodes we clustered together. Thus it&#8217;s easy to define a node item as an object storing the coordinates of the point and the number of nodes that have been aggregated on that point. We can thus define a class like this (all the code examples following will be in&nbsp;Python):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> PointNode<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
    NW, NE, SW, SE = <span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1</span>, <span style="color: #ff4500;">2</span>, <span style="color: #ff4500;">3</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, lat, lon<span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">lat</span>, <span style="color: #008000;">self</span>.<span style="color: black;">lon</span> = <span style="color: #008000;">float</span><span style="color: black;">&#40;</span>lat<span style="color: black;">&#41;</span>, <span style="color: #008000;">float</span><span style="color: black;">&#40;</span>lon<span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">nodes</span> = <span style="color: black;">&#91;</span><span style="color: #008000;">None</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">*</span> <span style="color: #ff4500;">4</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">aggregate_no</span> = <span style="color: #ff4500;">1</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__str__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;%s, %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">lat</span>, <span style="color: #008000;">self</span>.<span style="color: black;">lon</span><span style="color: black;">&#41;</span></pre></td></tr></table></div>

<p>We&#8217;ll do two operations on the quad-tree: insert a new node and visit the whole tree. We can give a recursive definition of the insertion using the quad-tree as underlying data structure. Given a&nbsp;node:</p>
<ol>
<li>if the tree&#8217;s node is empty just insert the node there and set the number of points clustered together for that node to&nbsp;1</li>
<li>if the new node is near the tree&#8217;s node then compute a &#8220;middle node&#8221;, substitute it to the tree&#8217;s node and increment the number of points clustered together for that&nbsp;node</li>
<li>otherwise find out where the new node belongs in the quad-tree (north west, north east, south west or south east) and insert it&nbsp;there</li>
</ol>
<p>The insert operation on the quad-tree can be thus coded like&nbsp;this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> qtree_insert<span style="color: black;">&#40;</span>root, node<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;
    Insert a point into the quad tree substituting a node with its
    midpoint if the nodes are near to each other (less than DISTANCE_LIMIT)
    &quot;&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> root: <span style="color: #ff7700;font-weight:bold;">return</span> node
&nbsp;
    <span style="color: #808080; font-style: italic;"># if we are under the distance limit, replace the root node with the</span>
    <span style="color: #808080; font-style: italic;"># midpoint of the two nodes</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> point_distance<span style="color: black;">&#40;</span>root, node<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span> DISTANCE_LIMIT:
        c = point_midpoint<span style="color: black;">&#40;</span>root, node<span style="color: black;">&#41;</span>
        c.<span style="color: black;">nodes</span> = root.<span style="color: black;">nodes</span>
        c.<span style="color: black;">aggregate_no</span> = root.<span style="color: black;">aggregate_no</span> + <span style="color: #ff4500;">1</span>
        root = c
    <span style="color: #ff7700;font-weight:bold;">else</span>:
        <span style="color: #808080; font-style: italic;"># otherwise just insert the node where it belongs</span>
&nbsp;
        <span style="color: #808080; font-style: italic;"># exploit PointNode child indexing (with NW being 0 we just need to add</span>
        <span style="color: #808080; font-style: italic;"># the proper number to get what we need)</span>
        pos = PointNode.<span style="color: black;">NW</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> node.<span style="color: black;">lat</span> <span style="color: #66cc66;">&gt;</span> root.<span style="color: black;">lat</span>:
            pos += <span style="color: #ff4500;">2</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> node.<span style="color: black;">lon</span> <span style="color: #66cc66;">&gt;</span> root.<span style="color: black;">lon</span>:
            pos += <span style="color: #ff4500;">1</span>
&nbsp;
        root.<span style="color: black;">nodes</span><span style="color: black;">&#91;</span>pos<span style="color: black;">&#93;</span> = qtree_insert<span style="color: black;">&#40;</span>root.<span style="color: black;">nodes</span><span style="color: black;">&#91;</span>pos<span style="color: black;">&#93;</span>, node<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> root</pre></td></tr></table></div>

<p>The distance between two nodes can be computed using the pythagorean formula with parallel meridians. This formula returns the distance from two points in kilometers and it&#8217;s defined as: <img src='http://s.wordpress.com/latex.php?latex=D%3DR%5Csqrt%7B%28%5CDelta%5Cphi%29%5E2%2B%28%5CDelta%5Clambda%29%5E2%7D%5C%21&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='D=R\sqrt{(\Delta\phi)^2+(\Delta\lambda)^2}\!' title='D=R\sqrt{(\Delta\phi)^2+(\Delta\lambda)^2}\!' class='latex' /> where <img src='http://s.wordpress.com/latex.php?latex=R&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='R' title='R' class='latex' /> is the Earth&#8217;s radius and <img src='http://s.wordpress.com/latex.php?latex=%28%5Cphi_0%2C%5Clambda_0%29%5C%2C%5C%21&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(\phi_0,\lambda_0)\,\!' title='(\phi_0,\lambda_0)\,\!' class='latex' />, <img src='http://s.wordpress.com/latex.php?latex=%28%5Cphi_1%2C%5Clambda_1%29%5C%2C%5C%21&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(\phi_1,\lambda_1)\,\!' title='(\phi_1,\lambda_1)\,\!' class='latex' /> are two points coordinates in radians (thus <img src='http://s.wordpress.com/latex.php?latex=%5CDelta%5Cphi&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\Delta\phi' title='\Delta\phi' class='latex' /> and <img src='http://s.wordpress.com/latex.php?latex=%5CDelta%5Clambda&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\Delta\lambda' title='\Delta\lambda' class='latex' /> are the differences between the two points longitudes and latitudes).<br />
Note that we can return the distance in miles or whatever distance measure we want just by converting the Earth&#8217;s radius accordingly. We should keep in mind though that this formula it not very accurate so if we need better accuracy we need to find a better&nbsp;alternative.</p>
<p>Browsing the whole tree can be done using a classic tree visit and, considering that for my purposes there&#8217;s no need to visit the nodes in a special order I chosen DFS to save some memory. The final source code of the Python script can be found on <a href="http://github.com/kratorius/wikipedia-fun/blob/master/wikicoords.py">my repository on github</a>. The whole process takes slightly more than one minute on my desktop machine. This, instead, is the final heat map that joined the <a href="http://thisweekinrelevance.com/">This Week In Relevance</a>&nbsp;contest:</p>
<p><iframe width="600" height="450" src="http://www.openheatmap.com/embed.html?map=ChacmaFoliaceousnessTarata" ></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2010/10/02/clustering-coordinate-points-together-with-quad-trees/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>My Italian PyCon experience</title>
		<link>http://zeta-puppis.com/2009/05/11/my-italian-pycon-experience/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=my-italian-pycon-experience</link>
		<comments>http://zeta-puppis.com/2009/05/11/my-italian-pycon-experience/#comments</comments>
		<pubDate>Mon, 11 May 2009 10:42:00 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Geekness]]></category>
		<category><![CDATA[Me]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[BDFL]]></category>
		<category><![CDATA[conf]]></category>
		<category><![CDATA[florence]]></category>
		<category><![CDATA[pycon]]></category>
		<category><![CDATA[pycon3]]></category>
		<category><![CDATA[python italia]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=216</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2009/05/11/my-italian-pycon-experience/" title="My Italian PyCon experience"></a>I came back yesterday from the third Italian PyCon (aka pycon3) which was held in Florence and all I can say is that has been an amazing experience. I had the chance to meet a lot of new great people &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2009/05/11/my-italian-pycon-experience/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2009/05/11/my-italian-pycon-experience/" title="My Italian PyCon experience"></a><p>I came back yesterday from the third Italian <a href="http://www.pycon.it">PyCon</a> (aka pycon3) which was held in Florence and all I can say is that has been an <strong>amazing experience</strong>. I had the chance to meet a lot of new great people as well as the <a href="http://neopythonic.blogspot.com">BDFL</a> (which won&#8217;t be back in Europe for quite some time, as he said). Here follows a resume of what I think were the most interesting&nbsp;talks.</p>
<p><span id="more-216"></span>On first day, there were two keynotes: &#8220;A retrospective of how the community helped build Python 3.0&#8221;, held by <strong>Guido van Rossum</strong> and &#8220;Zen and the art of Abstractions&#8217; maintenance&#8221; by Alex Martelli. I can just say that they were two extremely interesting talks which by the way weren&#8217;t diving too much&thinsp;&mdash;&thinsp;or any at all as in Guido&#8217;s talk&thinsp;&mdash;&thinsp;into&nbsp;code.</p>
<p>On the second day I really enjoyed two talks: &#8220;Erlang + Python, joining two worlds&#8221; by <a href="http://www.pycon.it/conference/speakers/lawrence-oluyede">Lawrence Oluyede</a> and a really great talk by Raymond Hettinger, &#8220;Easy AI with Python.&#8221; The former left me with a great curiosity about the functional languages world, while the latter really impressed me with <strong>how easy is to solve certain AI problems with Python</strong> (I solved many of the problems Raymond talked about previously, but never in Python and never really thought about even trying&nbsp;to).</p>
<p>On third day the <strong>Antonio Cangiano&#8217;s talk</strong> was enlightening. Even though it wasn&#8217;t really Python specific, he has given a great insight of how you can, well, &#8220;become rich with&nbsp;Python.&#8221;</p>
<p>Unfortunately I didn&#8217;t follow the Sunday afternoon&#8217;s talks since my airplane was leaving at 3.00pm, but at the end I can say that this was an incredible experience that I hope I can make again next year. And as a side note: <strong>the food was&nbsp;marvelous</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2009/05/11/my-italian-pycon-experience/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Optimize your programs</title>
		<link>http://zeta-puppis.com/2008/12/02/optimize-your-programs/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=optimize-your-programs</link>
		<comments>http://zeta-puppis.com/2008/12/02/optimize-your-programs/#comments</comments>
		<pubDate>Tue, 02 Dec 2008 20:17:39 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[optimizations]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[zlib]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=185</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/12/02/optimize-your-programs/" title="Optimize your programs"></a>The last time I blogged about a new course I&#8217;m following at my university. This course, held by Pasquale Lops and Giovanni Semeraro, is very interesting at the point that I&#8217;ll be developing a custom information retrieval engine as part &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/12/02/optimize-your-programs/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/12/02/optimize-your-programs/" title="Optimize your programs"></a><p><a href="http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/">The last time</a> I blogged about a new course I&#8217;m following at my university. This course, held by <a href="http://www.di.uniba.it/~lops/lops.html">Pasquale Lops</a> and <a href="http://lacam.di.uniba.it:8000/people/semeraro.htm">Giovanni Semeraro</a>, is very interesting at the point that I&#8217;ll be developing a <strong>custom information retrieval engine</strong> as part of my internship project. I can&#8217;t tell much more at this point since the internship haven&#8217;t started yet and I&#8217;m not sure I can release more details about this project (we&#8217;re still in the process of deciding if and how the whole thing will be released to the&nbsp;world).</p>
<p>In the meantime, I&#8217;ve been doing several experiments on this topic mostly about the memory usage and the performances of such system on limited hardware. This practically means implementing the algorithms you&#8217;ll be using and measuring the computational time they&nbsp;require.</p>
<p><span id="more-185"></span>One of the most common thing that our information retrieval engine have to do is to take a document and compress it, but considering&nbsp;that:</p>
<ul>
<li>this is a fundamental piece of this IR&nbsp;engine</li>
<li>it will be used very&nbsp;often</li>
<li>it&#8217;s not rare to process very large&nbsp;documents</li>
</ul>
<p>You&#8217;ll get that this operation should be as efficient as&nbsp;possible.</p>
<p>I chosen to go down with zlib as my compression library for mainly two&nbsp;reasons:</p>
<ul>
<li>it&#8217;s already included in Python (this is not really a strong point since better compression algorithms are included in Python&nbsp;too)</li>
<li>offers the best compromise in speed/compression&nbsp;ratio</li>
</ul>
<p>Given the above considerations, let start coding our compression&nbsp;system.</p>
<p>We will use as our document example the PDF specifications, available at the <a href="http://www.adobe.com/devnet/pdf/pdf_reference.html ">Adobe Development Center</a> (<a href="http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf">this is the file</a>) that are 8.6Mb&nbsp;large.</p>
<p>So let start doing the things the basic&nbsp;way:</p>
<pre><code>#!/usr/bin/env python
# compress1.py
import zlib

def compress(input_path, output_path, compression_level=6):
    input_fd = open(input_path, 'rb')
    output_fd = open(output_path, 'wb')

    cobj = zlib.compressobj(compression_level)
    out = ''
    for line in input_fd:
        out += cobj.compress(line)
    out += cobj.flush()

    output_fd.write(out)

    input_fd.close()
    output_fd.close()

def decompress(input_path, output_path):
    input_fd = open(input_path, 'rb')
    output_fd = open(output_path, 'wb')

    dobj = zlib.decompressobj()
    out = ''
    for line in input_fd:
        out += dobj.decompress(line)
    out += dobj.flush()

    output_fd.write(out)

    input_fd.close()
    output_fd.close()

if __name__ == '__main__':
    import sys
    args = sys.argv[1:]

    options = { 'compress': compress,
                'decompress': decompress,
    }

    input_path, output_path = args[1], args[2]

    try:
        options[args[0]](input_path, output_path)
    except (KeyError, IndexError):
        print("Invalid arguments")
</code></pre>
<p>By running this program and performing a very basic profiling we get some&nbsp;indications:</p>
<pre>
kratorius@becks:~/compress$ time ./compress1.py compress PDF32000_2008.pdf compr.zlib
real    0m2.517s
user    0m1.496s
sys     0m0.060s

kratorius@becks:~/compress$ time ./compress1.py decompress compr.zlib decompr.pdf
real    0m0.640s
user    0m0.537s
sys     0m0.085s
</pre>
<p>We need 2.5 secs in order to compress a file smaller than 10Mb. This is quite unacceptable, since it means that we&#8217;re processing about 3.5Mb per second; so we need to understand what we&#8217;re doing wrong. I can spot at least two big errors in this&nbsp;script:</p>
<ol>
<li>we&#8217;re reading the input file line by line that isn&#8217;t very efficient since in this way <strong>we&#8217;re accessing the disk multiple times</strong> (not counting that we are also processing the compression stuff line by line, that it&#8217;s not efficient and hasn&#8217;t so much sense in a binary file like our&nbsp;PDF)</li>
<li><strong>we keep our compressed object in memory</strong> until we finish the compression, and this means that if the script would run faster, we&#8217;d still have a very high memory usage that is not&nbsp;optimal</li>
</ol>
<p>So here it is the new version of our compression script that address the issues&nbsp;above:</p>
<pre><code>#!/usr/bin/env python
# compress2.py
import zlib

def compress(input_path, output_path, compression_level=6):
    input_fd = open(input_path, 'rb')
    output_fd = open(output_path, 'wb')

    out = zlib.compress(input_fd.read(), compression_level)
    output_fd.write(out)

    input_fd.close()
    output_fd.close()

def decompress(input_path, output_path):
    input_fd = open(input_path, 'rb')
    output_fd = open(output_path, 'wb')

    out = zlib.decompress(input_fd.read())
    output_fd.write(out)

    input_fd.close()
    output_fd.close()

if __name__ == '__main__':
    import sys
    args = sys.argv[1:]

    options = { 'compress': compress,
                'decompress': decompress,
    }

    input_path, output_path = args[1], args[2]

    try:
        options[args[0]](input_path, output_path)
    except (KeyError, IndexError):
        print("Invalid arguments")
</code></pre>
<p>Let perform our basic profiling&nbsp;again:</p>
<pre>kratorius@becks:~/compress$ time ./compress2.py compress PDF32000_2008.pdf compr.zlib
real    0m1.668s
user    0m1.337s
sys     0m0.079s

kratorius@becks:~/compress$ time ./compress2.py decompress compr.zlib decompr.pdf
real    0m0.561s
user    0m0.394s
sys     0m0.086s
</pre>
<p>We are now reading the whole input file in memory (minimizing the disk accesses), compressing everything in memory and writing the compressed file to the output in a single shot. We got a high speedup in this way but <strong>we have just increased our memory usage</strong> since now we&#8217;re keeping in memory both the input and the compressed file. This could be optimal if we&#8217;re processing small files, but since we need to have a generalized approach, this solution is not that&nbsp;good.</p>
<p>We can do better. And we&#8217;ll do better in the third&nbsp;try:</p>
<pre><code>#!/usr/bin/env python
# compress2.py
import zlib

READ_BYTES = 2097152 # 2Mb

def compress(input_path, output_path, compression_level=6):
    input_fd = open(input_path, 'rb')
    output_fd = open(output_path, 'wb')

    cobj = zlib.compressobj(compression_level)
    done = False
    while not done:
        rd = input_fd.read(READ_BYTES)
        done = rd == ''

        output_fd.write(cobj.compress(rd))

    output_fd.write(cobj.flush())

    input_fd.close()
    output_fd.close()

def decompress(input_path, output_path):
    input_fd = open(input_path, 'rb')
    output_fd = open(output_path, 'wb')

    dobj = zlib.decompressobj()
    done = False
    while not done:
        rd = input_fd.read(READ_BYTES)
        done = rd == ''

        output_fd.write(dobj.decompress(rd))

    output_fd.write(dobj.flush())

    input_fd.close()
    output_fd.close()

if __name__ == '__main__':
    import sys
    args = sys.argv[1:]

    options = { 'compress': compress,
                'decompress': decompress,
    }

    input_path, output_path = args[1], args[2]

    try:
        options[args[0]](input_path, output_path)
    except (KeyError, IndexError):
        print("Invalid arguments")
</code></pre>
<p>And we finally reached our&nbsp;goal:</p>
<pre>kratorius@becks:~/compress$ time ./compress3.py compress PDF32000_2008.pdf compr.zlib
real    0m1.325s
user    0m1.226s
sys     0m0.070s

kratorius@becks:~/compress$ time ./compress3.py decompress compr.zlib decompr.pdf
real    0m0.534s
user    0m0.404s
sys     0m0.119s
</pre>
<p>This last try works because <strong>we&#8217;re still minimizing the disk accesses</strong> for small files (we&#8217;re reading 2Mb chunks per time) and this time <strong>we&#8217;re reducing the memory usage</strong>&nbsp;since:</p>
<ul>
<li>we read a 2Mb block from our input&nbsp;file</li>
<li>we compress the read&nbsp;input</li>
<li>we write it directly to our output&nbsp;file</li>
</ul>
<p>I&#8217;m sure there&#8217;s still room for improvement but at this point we can be quite happy of our achievement. You can find the final script that performs error checking and file locking <a href="http://zeta-puppis.com/wp-content/uploads/2008/12/compress.py">here</a> (file locking works only on UNIX systems though, on Windows you should just comment the <code>fcntl</code> lines out). As always, suggestions are&nbsp;welcome.</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/12/02/optimize-your-programs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What I learned by information retrieval in one week</title>
		<link>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=what-i-learned-by-information-retrieval-in-one-week</link>
		<comments>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/#comments</comments>
		<pubDate>Sun, 19 Oct 2008 16:38:24 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[IR]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[text categorization]]></category>
		<category><![CDATA[tf-idf]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=159</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/" title="What I learned by information retrieval in one week"></a>It has been about a week since I began doing a deeper study of information retrieval. Actually, everything just began with a new course at my university about that and I just fallen in love almost immediately. The fact is &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/" title="What I learned by information retrieval in one week"></a><p>It has been about a week since I began doing a deeper study of information retrieval. Actually, everything just began with a new course at my university about that and I just fallen in love almost immediately. The fact is that this thing really got me interested, and I began doing some experiments (one involves django as well, keep reading to know&nbsp;more).</p>
<p>In this week I learned a lot of things about information retrieval, text categorization, natural language processing and machine learning. But the most relevant thing is: <strong>the principles are easy, their implementation is not</strong>. The fact is that most of the techniques are relatively simple but you usually have to deal with very large datasets and this could be challenging, since one of the main requirements about information retrieval is time. It&#8217;s really much more important that you give less results in one second rather than giving better results in one hour. No one will ever care to use your system if it takes an hour to get some result. And if you&#8217;re considering to store your data in a database forget about normalization, it wouldn&#8217;t really take you&nbsp;anywhere.</p>
<p><span id="more-159"></span>Talking about storing informations, you know that if you&#8217;re dealing with documents most of the words are the so called <em>stop words</em>. Those stop words are words that doesn&#8217;t really mean anything, but they help the readers to get a better text flux. Classic examples of stop words are articles like &#8220;the&#8221;, &#8220;a&#8221;, &#8220;an&#8221; or logic connectors like &#8220;or&#8221; and &#8220;and&#8221;. <strong>These words are so common that their presence is quite useless since they&#8217;re are&#8230; everywhere</strong>. If you&#8217;re going to study information retrieval than you&#8217;ll learn about a weighting technique called <a href="http://en.wikipedia.org/wiki/Tf-idf">tf-idf</a> that gives a weight near to 0 to these words, but since you&#8217;d probably use a reverse index for words (an index that given a word, tells you in which documents that word appears) you can understand that this would take a lot of space if you&#8217;re going to include stop&nbsp;words.</p>
<p>So one of the biggest issues until now is that you&#8217;re going to deal with extremely large datasets, so you have to strip as many things as possible. Now consider those words: &#8220;fishing&#8221;, &#8220;fishes&#8221;, &#8220;fish&#8221;. They all talk about &#8220;fish&#8221;, and an user that is searching for &#8220;fish&#8221; would probably be interested in &#8220;fishes&#8221; or &#8220;fishing&#8221; as well. Additionally, it&#8217;s useless to store three words that are almost identical. So here comes the <em>stemming</em> that, by quoting the related <a href="http://en.wikipedia.org/wiki/Stemming">wikipedia page</a>, is the <cite>process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form</cite>. Fortunately, if you&#8217;re dealing with english texts, there&#8217;s the <a href="http://tartarus.org/~martin/PorterStemmer/">Porter algorithm</a> that is the state-of-the-art algorithm for this sort of things. But that works only with english, so <strong>if your documents are written in another language or they are written in multiple languages, things are going to be&nbsp;complicated</strong>.</p>
<p>This leads to think about the problem of the language identification. How do you know if some text is written in a language or in another just by looking at it? Of course you can describe the document&#8217;s language with some kind of meta tagging, but not all the documents have this kind of description, just think about the web. There are some kind of statistical methods based upon the classification of <a href="http://en.wikipedia.org/wiki/N-gram">n-grams</a> but I haven&#8217;t deeply investigated about them yet, so I can&#8217;t really say&nbsp;anything.</p>
<p>Now you got your collection of documents that <em>match</em> a certain query. Now: how do you know what document is more relevant than another (in other words: how do you <em>rank</em> pages)? You got two alternatives (well, probably more, but I know just these at this moment): <strong>the tf-idf that we said above and the <a href="http://en.wikipedia.org/wiki/Cosine_similarity">cosine similarity</a></strong>. The latter is an interesting one: consider the tf-idf vectors of the documents, then consider the query as a document too. Now plot those tf-idf vectors and measure their cosine of the angle between them. The more you&#8217;re near to 1, the more relevant is the&nbsp;document.</p>
<p>There are a lot of other important things that need to be said like the precision and recall concept, but that&#8217;s enough for now. I&#8217;ll talk about this another&nbsp;time.</p>
<p>Anyway I&#8217;m doing an experimental project named <a href="http://code.google.com/p/django-searchable/">django searchable</a>. It&#8217;s a pluggable app for django that implements an information retrieval engine based on tf-idf weighting. Play with it if you&#8217;re brave&nbsp;enough.</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Announcing Pytagram</title>
		<link>http://zeta-puppis.com/2008/08/21/announcing-pytagram/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=announcing-pytagram</link>
		<comments>http://zeta-puppis.com/2008/08/21/announcing-pytagram/#comments</comments>
		<pubDate>Thu, 21 Aug 2008 14:42:11 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[pytagram]]></category>
		<category><![CDATA[svg]]></category>
		<category><![CDATA[toc]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=138</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/08/21/announcing-pytagram/" title="Announcing Pytagram"></a>Today I just ended one of my side projects: pytagram. Basically it generates an SVG file (that can successively be saved as eps/pdf/whatever and eventually manually manipulated) starting from a tree-like plain text file. This can be useful for generating &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/08/21/announcing-pytagram/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/08/21/announcing-pytagram/" title="Announcing Pytagram"></a><p>Today I just ended one of my side projects: pytagram. Basically it generates an SVG file (that can successively be saved as eps/pdf/whatever and eventually manually manipulated) starting from a tree-like plain text file. This can be useful for generating <strong>cheat sheets or quick references</strong> to classes or functions that belongs to some&nbsp;project.</p>
<p>I did this for generating a <a href="http://djangoproject.com">django</a> quick reference (<a href="http://zeta-puppis.com/wp-content/uploads/2008/08/django1.svg">here it is</a>) since it has a lot of functions and I know what&#8217;s their purpose, but I can never remember the names (and now two A4 papers are right in front of&nbsp;me).</p>
<p>If you&#8217;re interested in this, check out the <a href="http://code.google.com/p/pytagram/">google code project page</a> and grab your copy from the SVN&nbsp;repository.</p>
<p>There are <strong>tons of things that can be changed/optimized</strong> (i.e.: add some optional short explanation of the function, add more examples, easier way to change colors, &#8230;) but now the code is working quite well so that can be already useful to the people out&nbsp;there.</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/08/21/announcing-pytagram/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google, codejam and number conversions</title>
		<link>http://zeta-puppis.com/2008/06/26/google-codejam-and-number-conversions/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=google-codejam-and-number-conversions</link>
		<comments>http://zeta-puppis.com/2008/06/26/google-codejam-and-number-conversions/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 11:23:19 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[alien numbers]]></category>
		<category><![CDATA[base]]></category>
		<category><![CDATA[codejam]]></category>
		<category><![CDATA[conversion]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[number]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=94</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/06/26/google-codejam-and-number-conversions/" title="Google, codejam and number conversions"></a>The decimal numeral system is composed of ten digits, which we represent as &#8220;0123456789&#8221; (the digits in a system are written from lowest to highest). Imagine you have discovered an alien numeral system composed of some number of digits, which &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/06/26/google-codejam-and-number-conversions/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/06/26/google-codejam-and-number-conversions/" title="Google, codejam and number conversions"></a><p>The decimal numeral system is composed of ten digits, which we represent as &#8220;0123456789&#8221; (the digits in a system are written from lowest to highest). Imagine you have discovered an alien numeral system composed of some number of digits, which may or may not be the same as those used in decimal. For example, if the alien numeral system were represented as &#8220;oF8&#8221;, then the numbers one through ten would be (F, 8, Fo, FF, F8, 8o, 8F, 88, Foo, FoF). We would like to be able to work with numbers in arbitrary alien systems. More generally, we want to be able <strong>to convert an arbitrary number that&#8217;s written in one alien system into a second alien&nbsp;system</strong>.</p>
<p><span id="more-94"></span>The above was exactly one of the practice problems of the <a href="http://code.google.com/codejam">google codejam</a> (I still don&#8217;t know if I could join the event since I&#8217;ll probably be very busy with university exams in the day of the qualification round) and more generally the problem is the conversion of a number (that isn&#8217;t necessarily composed by the usual digits) from any base to any base. I just figured a solution out and passed both the the small input test and the large one. My solution is simple: <strong>convert the source base number in base 10 and then convert the produced base 10 number in another base</strong>. There are known algorithms for doing this (just think that 1986 for example is nothing but 1 * 10^3 + 9 * 10^2 + 8 * 10^1 + 6 * 10^0) and I finished implement my own solution in&nbsp;python.</p>
<p>The biggest issue here is that the source base <strong>can have symbols instead of digit</strong> and I solved this issue by mapping the symbols to an array and using the index value of the symbols as <em>digit value</em>. Here it is my&nbsp;solution:</p>
<pre><code>#!/usr/bin/env python
import sys, array

def main(argv=None):
    if not argv:
        argv = sys.argv

    try:
        f = open(argv[1])
    except IOError:
        print "File doesn't exist"
        return 0

    try:
        i = 0
        for line in f:
            if i == 0:
                # first line
                line_num = int(line)
            else:
                number, input_b, output_b = line.strip('\n').split(' ')
                print 'Case #%d: %s' % (i, convert(number, input_b, output_b))

            i += 1
    finally:
        f.close()

    return 1

def convert(number, input_b, output_b):
    """
    Convert a number from any base to any base
    """

    return convert_from_10(convert_to_10(number, input_b), output_b)

def convert_to_10(input, base):
    """
    Input can be a number in any base, even in an 'alien' base.
    For example: 'Foo' could be a number in a numerical system
    whose digits are 'oF8'.

    Base is exactly the digits representation.
    If you want to convert that 'Foo' to base 10 then you must
    call ``convert_to_10('Foo', 'oF8')``.

    Remember that the number in ``base`` must be written in an
    ordered form

    Returns a string of the number in base 10
    """

    current_base = len(base)

    map_to_base = array.array('c')
    map(map_to_base.append, base)

    i = len(input) - 1
    base_10 = 0
    for digit in input:
        base_10 += map_to_base.index(digit) * current_base**i
        i -= 1

    return str(base_10)

def convert_from_10(input, base):
    """
    ``input`` is a number in base 10, while ``base`` is the digit
    representation of the new base (for example, for base 16 this
    could be '0123456789ABCDEF' or for an alien base 3 could be
    'oF8').

    Returns the number converted from base 10 to the specified
    base
    """

    map_to_base = array.array('c')
    map(map_to_base.append, base)

    current = int(input)
    base_n = ''
    while current != 0:
        base_n = map_to_base[current % len(base)] + base_n
        current = current / len(base)

    return base_n

if __name__ == '__main__':
    sys.exit(main())</code></pre>
<p>And giving this input (the first line is number of the following&nbsp;lines):</p>
<pre><code>4
9 0123456789 oF8
Foo oF8 0123456789
13 0123456789abcdef 01
CODE O!CDE? A?JM!.</code></pre>
<p>I have the correct&nbsp;output:</p>
<pre><code>Case #1: Foo
Case #2: 9
Case #3: 10011
Case #4: JAM!</code></pre>
<p>Of course this is one of the practice problems and you should try to solve it by your own (otherwise it&#8217;s useless to try to join the&nbsp;event).</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/06/26/google-codejam-and-number-conversions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>And djangodash is ended&#8230;</title>
		<link>http://zeta-puppis.com/2008/06/11/and-djangodash-is-ended/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=and-djangodash-is-ended</link>
		<comments>http://zeta-puppis.com/2008/06/11/and-djangodash-is-ended/#comments</comments>
		<pubDate>Wed, 11 Jun 2008 14:43:23 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[dash]]></category>
		<category><![CDATA[djangodash]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=93</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/06/11/and-djangodash-is-ended/" title="And djangodash is ended..."></a>And I&#8217;ve been 6th. So I won a shared 2 hosting plan at webfaction and a 12 pack of G33K B33R caffeinated root beer (still trying to understand what this is exactly, anyway) from bawls. Anyway, here follows a short &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/06/11/and-djangodash-is-ended/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/06/11/and-djangodash-is-ended/" title="And djangodash is ended..."></a><p>And I&#8217;ve been 6th. So I won a shared 2 hosting plan at <a href="http://webfaction.com">webfaction</a> and a 12 pack of G33K B33R caffeinated root beer (still trying to understand what this is exactly, anyway) from <a href="http://www.bawlstyle.com">bawls</a>. Anyway, here follows <strong>a short resume of what happened</strong> from Saturday through Tuesday (if you&#8217;re asking yourself why it didn&#8217;t ended on Sunday, well, keep&nbsp;reading).</p>
<p>The competition began very well, I worked normally for the first part of the day but then I had to stop for a while. When I came back, <strong>svn and <a href="http://djangodash.com">djangodash</a> website was not working anymore</strong>. I initially thought that it was some connection issue but when I saw that other sites were working properly so they definitely had some&nbsp;problems.</p>
<p><span id="more-93"></span>I just waited, then gone sleeping. In the morning I received in my mailbox a message that informed me of a big power outage in <a href="http://www.theplanet.com">The Planet</a> datacenter where webfaction hosts a lot of their server (among the which there was the <a href="http://djangodash.com">djangodash</a> one) caused by power generator&#8217;s explosion. Then <strong>the competition has been delayed for other two days</strong>, so I decided to take a breath and wait &#8216;till the svn would came back. But that didn&#8217;t happen on Sunday, so after a while I chosen (as the mail suggested) to work locally without committing anything at least until the svn&nbsp;return.</p>
<p>Then Monday came and I had other things to do, so I had to postpone <a href="http://djangodash.com">djangodash</a> for the evening when I&#8217;d freed myself from other, most urgent things. On Monday <strong>I did a very little coding</strong>, as well on Tuesday. So at the end of competition I cannot complete my project, and not even reach the 50%&nbsp;milestone.</p>
<p>Today I discovered that <strong>I was one of the winners</strong> (ok not really, 6th place was not really a good place, but at least I tried) and I really have to thanks the organizers for this event and hope to join another <a href="http://djangodash.com">djangodash</a> next year. Maybe, as I said to one of them in an email thread, hosting the site/svn in two different datacenters, just to be insured against eventual thunderstorms, tornado, earthquakes and so on&#8230;). I have to say that I really enjoyed the whole thing, and hope to have more competitors next&nbsp;year!</p>
<p>If you want to get more news about final process of <a href="http://djangodash.com">djangodash</a> with some stats, <a href="http://www.toastdriven.com/fresh/django-dash-factoids/">read this article</a> on the <a href="http://www.toastdriven.com">Toast Driven</a> website (that&#8217;s the company that ran the&nbsp;dash).</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/06/11/and-djangodash-is-ended/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Let meet at djangodash</title>
		<link>http://zeta-puppis.com/2008/05/04/let-meet-at-djangodash/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=let-meet-at-djangodash</link>
		<comments>http://zeta-puppis.com/2008/05/04/let-meet-at-djangodash/#comments</comments>
		<pubDate>Sun, 04 May 2008 10:20:59 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[competition]]></category>
		<category><![CDATA[dash]]></category>
		<category><![CDATA[djangodash]]></category>
		<category><![CDATA[prizes]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/?p=89</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/05/04/let-meet-at-djangodash/" title="Let meet at djangodash"></a>As probably many of you already knows, on May 31 will begin the Django dash competition. Djangodash&#160;is: [&#8230;] is a chance for Django enthusiasts to flex their coding skills a little and put a fine point on “perfectionists with deadlines” &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/05/04/let-meet-at-djangodash/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/05/04/let-meet-at-djangodash/" title="Let meet at djangodash"></a><p>As probably many of you already knows, <strong>on May 31</strong> will begin the Django dash competition. <a href="http://djangodash.com">Djangodash</a>&nbsp;is:</p>
<blockquote><p>[&#8230;] is a chance for Django enthusiasts to flex their coding skills a little and put a fine point on “perfectionists with deadlines” by giving you a REAL deadline. 48 hours from start to stop to produce the best app you can and have a little fun in the&nbsp;process.</p></blockquote>
<p>I&#8217;ll be participating, so if you haven&#8217;t registered yet, <strong>do it now</strong>! And don&#8217;t forget to check out <a href="http://djangodash.com/sponsors/">the cool prizes</a>&nbsp;:)</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/05/04/let-meet-at-djangodash/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inclusive range() in Python</title>
		<link>http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=inclusive-range-in-python</link>
		<comments>http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/#comments</comments>
		<pubDate>Thu, 06 Mar 2008 21:55:56 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[interval]]></category>
		<category><![CDATA[range]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/" title="Inclusive range() in Python"></a>The Python&#8217;s built-in range() is an extremely useful function, but has a little problem: it doesn&#8217;t include the right extreme of the range. For example, a call to range(1, 10) will be evaluated to this a list of numbers from &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/" title="Inclusive range() in Python"></a><p>The Python&#8217;s built-in <code>range()</code> is an extremely useful function, but has a little problem: <strong>it doesn&#8217;t include the right extreme of the range</strong>. For example, a call to <code>range(1, 10)</code> will be evaluated to this a list of numbers from 1 to 9 (not including&nbsp;10):</p>
<pre><code>>>> range(1, 10)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
</code></pre>
<p>Today I need for a work a <code>range()</code> function that includes the right extreme, so I had to develop mine. Here it&nbsp;is:</p>
<pre><code>def inclusive_range(start, stop, step=1):
    """
    A range() clone, but this includes the extremes
    """
    l = []
    x = start
    while x <= stop:
        l.append(x)
        x += step
    return l</code></pre>
<p>Of course there are faster implementations of this function around here (and if you know one, please let me know) and surely <strong>this one is not one of the fastest</strong>, but it works and that solves my problem right&nbsp;now.</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>When the &#8220;Python Vs PHP&#8221; war matters</title>
		<link>http://zeta-puppis.com/2008/02/21/when-the-python-vs-php-war-matters/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=when-the-python-vs-php-war-matters</link>
		<comments>http://zeta-puppis.com/2008/02/21/when-the-python-vs-php-war-matters/#comments</comments>
		<pubDate>Thu, 21 Feb 2008 19:45:52 +0000</pubDate>
		<dc:creator>kratorius</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Django]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[mvc]]></category>
		<category><![CDATA[symfony]]></category>

		<guid isPermaLink="false">http://zeta-puppis.com/2008/02/21/when-the-python-vs-php-war-matters/</guid>
		<description><![CDATA[<a href="http://zeta-puppis.com/2008/02/21/when-the-python-vs-php-war-matters/" title="When the &quot;Python Vs PHP&quot; war matters"></a>Yesterday I had a meeting with a customer about a new site I should develop for them. Since they&#8217;re a book publisher, they wanted an online book store. Apart from the technical details (the site isn&#8217;t as simple as you &#8230;<p class="read-more"><a href="http://zeta-puppis.com/2008/02/21/when-the-python-vs-php-war-matters/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://zeta-puppis.com/2008/02/21/when-the-python-vs-php-war-matters/" title="When the &quot;Python Vs PHP&quot; war matters"></a><p>Yesterday I had a meeting with a customer about a new site I should develop for them. Since they&#8217;re a book publisher, they wanted an online book store. Apart from the technical details (the site isn&#8217;t as simple as you may believe, they need a lot of not-so-easy-to-do stuff), the most important point we focused on is the fact that <strong>they have an internal IT technician</strong> that handles all their computer needs. If you&#8217;re asking yourself why this matters, keep&nbsp;reading:</p>
<ul>
<li>me (to be precise, my company) stopped development of PHP sites about one year ago in favor of&nbsp;Python</li>
<li>we release the web site&#8217;s code to&nbsp;them</li>
<li>for this project, <strong>we haven&#8217;t been asked any kind of future support</strong>; this means that when the site is finished, we won&#8217;t touch the product anymore (unless they don&#8217;t pay us to do the modifies they&nbsp;need)</li>
<li>but they don&#8217;t want to pay us to these modifies, because they have their internal IT&nbsp;technician</li>
<li><strong>their technician knows only PHP</strong> (and he never even known the Python&#8217;s existence until&nbsp;yesterday)</li>
</ul>
<p><span id="more-82"></span></p>
<p>So I had to illustrate why me and my company chosen Python for our web development needs, and here&#8217;s a summary of what I told them yesterday. Note that <strong>I&#8217;m not talking about why a language is better than the other</strong>, because this would move us in another direction, but <strong>why we chosen Python as our main programming language</strong> (even if this unconsciously lead us to say why, for us, Python is better than PHP, but that&#8217;s another story)&nbsp;:</p>
<ul>
<li><strong>Time</strong>: remember that time is money. If I&#8217;d build an application in PHP, I&#8217;d spend about 1/3 of the time more if I&#8217;d develop the same application in&nbsp;Python</li>
<li><strong>Frameworks</strong>: nowadays all the popular languages have their web frameworks; Ruby has <a href="http://www.rubyonrails.org/">Ruby on Rails</a>, PHP has <a href="http://www.symfony-project.org/">Symfony</a>, <a href="http://www.phpmvc.net/">php.MVC</a>, Python has <a href="http://djangoproject.com/">Django</a>, <a href="http://www.cherrypy.org/">CherryPy</a>, <a href="http://pylonshq.com/">Pylons</a>, and so on. But none of them (apart from rails, but we&#8217;re talking about PHP Vs Python) goes near to the completeness and functionality of django. And this take us at the previous point: less time in develop the same&nbsp;application</li>
<li><strong>Future modifies</strong>: even if for this project this isn&#8217;t the case, it&#8217;s much easier to modify a Python application than a PHP one due to the syntax of the language and its strong <a href="http://en.wikipedia.org/wiki/Object-oriented_programming">OOP</a> orientation. You may argue that PHP 5 introduced a deep OOP support too, but that&#8217;s not the same thing and you know. PHP is born as procedural programming language, and even if we had OOP introduction in version 5, it doesn&#8217;t even comes near to Python under this point of&nbsp;view</li>
</ul>
<p>With this I&#8217;m not saying that PHP is useless: what I mean is that <strong>Python is more convenient under the (our) business point of view</strong>. So if someone ask me why I use Python to do the same thing I can make with PHP, the answer will be: because with Python I can make the same thing in a shorter time and, consequently, with a lower&nbsp;budget.</p>
]]></content:encoded>
			<wfw:commentRss>http://zeta-puppis.com/2008/02/21/when-the-python-vs-php-war-matters/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

