What I learned by information retrieval in one week

October 19th, 2008

It has been about a week since I began doing a deeper study of infor­ma­tion retrieval. Actu­ally, every­thing just began with a new course at my uni­ver­sity about that and I just fallen in love almost imme­di­ately. The fact is that this thing really got me inter­ested, and I began doing some exper­i­ments (one involves django as well, keep read­ing to know more).

In this week I learned a lot of things about infor­ma­tion retrieval, text cat­e­go­riza­tion, nat­ural lan­guage pro­cess­ing and machine learn­ing. But the most rel­e­vant thing is: the prin­ci­ples are easy, their imple­men­ta­tion is not. The fact is that most of the tech­niques are rel­a­tively simple but you usu­ally have to deal with very large datasets and this could be chal­leng­ing, since one of the main require­ments about infor­ma­tion retrieval is time. It’s really much more impor­tant that you give less results in one second rather than giving better results in one hour. No one will ever care to use your system if it takes an hour to get some result. And if you’re con­sid­er­ing to store your data in a data­base forget about nor­mal­iza­tion, it wouldn’t really take you anywhere.

(Con­tinue reading…)

3 Comments, tagged with Coding, Django, Python

Running Django with fastcgi

October 8th, 2008

Running django with fastcgi is not a dif­fi­cult task, also because of the excel­lent doc­u­men­ta­tion pro­vided. Anyway the doc pro­vides a very basic script to autom­a­tize the start/stop fcgi process, so today I had to write my own so I don’t have to man­u­ally fix things if some­thing goes wrong since I let my script handle the var­i­ous situations.

(Con­tinue reading…)

0 Comments, tagged with Django

Announcing Pytagram

August 21st, 2008

Today I just ended one of my side projects: pyta­gram. Basi­cally it gen­er­ates an SVG file (that can suc­ces­sively be saved as eps/pdf/whatever and even­tu­ally man­u­ally manip­u­lated) start­ing from a tree-​like plain text file. This can be useful for gen­er­at­ing cheat sheets or quick ref­er­ences to classes or func­tions that belongs to some project.

I did this for gen­er­at­ing a django quick ref­er­ence (here it is) since it has a lot of func­tions and I know what’s their pur­pose, but I can never remem­ber the names (and now two A4 papers are right in front of me).

If you’re inter­ested in this, check out the google code project page and grab your copy from the SVN repository.

There are tons of things that can be changed/optimized (i.e.: add some optional short expla­na­tion of the func­tion, add more exam­ples, easier way to change colors, …) but now the code is work­ing quite well so that can be already useful to the people out there.

0 Comments, tagged with Coding, Django, Python, Web

Practical Django Projects

July 5th, 2008

Due to my devo­tion to the Django web frame­work, I finally got my copy of Prac­ti­cal Django Projects, by James Bennet. Not really expect­ing to have that soon, but a beau­ti­ful suprise anyway (to say the truth, I didn’t bought this: this has been sent to me as replace­ment prize for djan­go­dash because I was not eleg­i­ble to get the G33K beers since I live out­side US. Thanks to the gen­eros­ity of Daniel Lindsley).

(Con­tinue reading…)

0 Comments, tagged with Django, Me, Web

Next Page »

Microblogging

  • Funny thing: yesterday night I had an idea about a good blog post I could make. But now I completely forgot what that idea was about. 12 hours ago #
  • I think pownce has a little issue with caching since if I delete a message and I write a new one, it doesn't appear in my homepage. Nov 16, 6:34pm #
  • I didn't know that something like [(x,y) for x in range(10) for y in range(x)] was possible in Python. Nov 16, 3:45pm #
  • I'm about to go to the local LUG dinner: pizza for everyone. Nov 14, 9:15pm #
  • Lately I've been very interested in fast data structures with minimum memory usage. Just surprised to find out that list comprehension in Python are sometimes slower for large quantities of data than classic for loops. Still trying to understand why (if someone has a clue, please let me know). Nov 12, 12:44pm #
  • So wordpress was silently modifying HTTP request headers and I was getting a 400 when fetching Pownce RSS. Now everything works as expected on my blog, shame on WP. Nov 9, 3:58pm #
  • Experimenting with document language identification. Nov 6, 10:23pm #
  • So looks like I finally found an interesting topic apart from web development: information retrieval. Nov 3, 5:13pm #
  • Planning a trip to Bologna in December Nov 1, 5:24pm #
  • After today, I want to go as far as I can from Italy. Oct 29, 11:49am #

Search


« Authored by Giuliani Vito Ivan »