Recently I needed to show a heat map of a quite a lot of coordinate points for a little project of mine that ended up in a data visualization contest (that unfortunately I didn’t win, even though I made to the finalists). The idea was to show the distribution of the georeferenced wikipedia pages through a heat map, so when I first heard about openheatmap.com I knew it was the tool to use. OpenHeatMap.com is an excellent project by Pete Warden that takes a dataset as a CSV, Excel or Google Spreadsheet file and convert it to a nice, browsable heat map presentation.
Read more »
Category Archives: Python
Clustering coordinate points together with quad-trees
My Italian PyCon experience
I came back yesterday from the third Italian PyCon (aka pycon3) which was held in Florence and all I can say is that has been an amazing experience. I had the chance to meet a lot of new great people as well as the BDFL (which won’t be back in Europe for quite some time, as he said). Here follows a resume of what I think were the most interesting talks.
Optimize your programs
The last time I blogged about a new course I’m following at my university. This course, held by Pasquale Lops and Giovanni Semeraro, is very interesting at the point that I’ll be developing a custom information retrieval engine as part of my internship project. I can’t tell much more at this point since the internship haven’t started yet and I’m not sure I can release more details about this project (we’re still in the process of deciding if and how the whole thing will be released to the world).
In the meantime, I’ve been doing several experiments on this topic mostly about the memory usage and the performances of such system on limited hardware. This practically means implementing the algorithms you’ll be using and measuring the computational time they require.
What I learned by information retrieval in one week
It has been about a week since I began doing a deeper study of information retrieval. Actually, everything just began with a new course at my university about that and I just fallen in love almost immediately. The fact is that this thing really got me interested, and I began doing some experiments (one involves django as well, keep reading to know more).
In this week I learned a lot of things about information retrieval, text categorization, natural language processing and machine learning. But the most relevant thing is: the principles are easy, their implementation is not. The fact is that most of the techniques are relatively simple but you usually have to deal with very large datasets and this could be challenging, since one of the main requirements about information retrieval is time. It’s really much more important that you give less results in one second rather than giving better results in one hour. No one will ever care to use your system if it takes an hour to get some result. And if you’re considering to store your data in a database forget about normalization, it wouldn’t really take you anywhere.