It has been about a week since I began doing a deeper study of information retrieval. Actually, everything just began with a new course at my university about that and I just fallen in love almost immediately. The fact is that this thing really got me interested, and I began doing some experiments (one involves django as well, keep reading to know more).
In this week I learned a lot of things about information retrieval, text categorization, natural language processing and machine learning. But the most relevant thing is: the principles are easy, their implementation is not. The fact is that most of the techniques are relatively simple but you usually have to deal with very large datasets and this could be challenging, since one of the main requirements about information retrieval is time. It’s really much more important that you give less results in one second rather than giving better results in one hour. No one will ever care to use your system if it takes an hour to get some result. And if you’re considering to store your data in a database forget about normalization, it wouldn’t really take you anywhere.
(Continue reading…)
Today I just ended one of my side projects: pytagram. Basically it generates an SVG file (that can successively be saved as eps/pdf/whatever and eventually manually manipulated) starting from a tree-like plain text file. This can be useful for generating cheat sheets or quick references to classes or functions that belongs to some project.
I did this for generating a django quick reference (here it is) since it has a lot of functions and I know what’s their purpose, but I can never remember the names (and now two A4 papers are right in front of me).
If you’re interested in this, check out the google code project page and grab your copy from the SVN repository.
There are tons of things that can be changed/optimized (i.e.: add some optional short explanation of the function, add more examples, easier way to change colors, …) but now the code is working quite well so that can be already useful to the people out there.
The decimal numeral system is composed of ten digits, which we represent as “0123456789” (the digits in a system are written from lowest to highest). Imagine you have discovered an alien numeral system composed of some number of digits, which may or may not be the same as those used in decimal. For example, if the alien numeral system were represented as “oF8”, then the numbers one through ten would be (F, 8, Fo, FF, F8, 8o, 8F, 88, Foo, FoF). We would like to be able to work with numbers in arbitrary alien systems. More generally, we want to be able to convert an arbitrary number that’s written in one alien system into a second alien system.
(Continue reading…)
And I’ve been 6th. So I won a shared 2 hosting plan at webfaction and a 12 pack of G33K B33R caffeinated root beer (still trying to understand what this is exactly, anyway) from bawls. Anyway, here follows a short resume of what happened from Saturday through Tuesday (if you’re asking yourself why it didn’t ended on Sunday, well, keep reading).
The competition began very well, I worked normally for the first part of the day but then I had to stop for a while. When I came back, svn and djangodash website was not working anymore. I initially thought that it was some connection issue but when I saw that other sites were working properly so they definitely had some problems.
(Continue reading…)