<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: What I learned by information retrieval in one week</title>
	<atom:link href="http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/feed/" rel="self" type="application/rss+xml" />
	<link>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/</link>
	<description>my very own personal corner</description>
	<lastBuildDate>Thu, 29 Jul 2010 09:55:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Zeta-Puppis.com &#187; Optimize your&#160;programs</title>
		<link>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/comment-page-1/#comment-732</link>
		<dc:creator>Zeta-Puppis.com &#187; Optimize your&#160;programs</dc:creator>
		<pubDate>Tue, 02 Dec 2008 20:17:49 +0000</pubDate>
		<guid isPermaLink="false">http://zeta-puppis.com/?p=159#comment-732</guid>
		<description>[...] The last time I blogged about a new course I&#8217;m fol­low­ing at my uni­ver­sity. This course, held by Pasquale Lops and Gio­vanni Semer­aro, is very inter­est­ing at the point that I&#8217;ll be devel­op­ing a custom infor­ma­tion retrieval engine as part of my intern­ship project. I can&#8217;t tell much more at this point since the intern­ship haven&#8217;t started yet and I&#8217;m not sure I can release more details about this project (we&#8217;re still in the process of decid­ing if and how the whole thing will be released to&#160;the world). [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] The last time I blogged about a new course I&#8217;m fol­low­ing at my uni­ver­sity. This course, held by Pasquale Lops and Gio­vanni Semer­aro, is very inter­est­ing at the point that I&#8217;ll be devel­op­ing a custom infor­ma­tion retrieval engine as part of my intern­ship project. I can&#8217;t tell much more at this point since the intern­ship haven&#8217;t started yet and I&#8217;m not sure I can release more details about this project (we&#8217;re still in the process of decid­ing if and how the whole thing will be released to&nbsp;the world).&nbsp;[&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Giuliani Vito, Ivan</title>
		<link>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/comment-page-1/#comment-584</link>
		<dc:creator>Giuliani Vito, Ivan</dc:creator>
		<pubDate>Mon, 20 Oct 2008 17:04:54 +0000</pubDate>
		<guid isPermaLink="false">http://zeta-puppis.com/?p=159#comment-584</guid>
		<description>I just made research.google.com my homepage :)
Anyway thanks for the good advices, I&#039;m looking forward to this stuff right now. Thanks again!</description>
		<content:encoded><![CDATA[<p>I just made research.google.com my homepage :)<br />
Anyway thanks for the good advices, I&#8217;m looking forward to this stuff right now. Thanks&nbsp;again!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: CJ</title>
		<link>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/comment-page-1/#comment-581</link>
		<dc:creator>CJ</dc:creator>
		<pubDate>Mon, 20 Oct 2008 09:24:55 +0000</pubDate>
		<guid isPermaLink="false">http://zeta-puppis.com/?p=159#comment-581</guid>
		<description>P.S: you&#039;ll learn an awful lot by reading Google patents and looking at how their algo works (within the limitations of the knowledge shared by them) -look at SIGIR papers too, and go if you can, it&#039;s a great conference.

cj</description>
		<content:encoded><![CDATA[<p>P.S: you&#8217;ll learn an awful lot by reading Google patents and looking at how their algo works (within the limitations of the knowledge shared by them) -look at SIGIR papers too, and go if you can, it&#8217;s a great&nbsp;conference.</p>
<p>cj</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: CJ</title>
		<link>http://zeta-puppis.com/2008/10/19/what-i-learned-by-information-retrieval-in-one-week/comment-page-1/#comment-580</link>
		<dc:creator>CJ</dc:creator>
		<pubDate>Mon, 20 Oct 2008 09:22:34 +0000</pubDate>
		<guid isPermaLink="false">http://zeta-puppis.com/?p=159#comment-580</guid>
		<description>Hi,

I&#039;m finishing a PhD in natural language processing (natural language generation and understanding).  I spent my first few years building search engines, and studied them in great depth.  You&#039;ll find that the guys who work in SEO (search engine optimisation) have a pretty good understanding of how the Google algorithm works, PageRank, stemming, tagging, and all that stuff. 

I also wrote a stemmer in my first year which is pretty simple really but has been really useful for me, as it stems to actual words, in their proper natural form.  Quite important for NLG.

http://fizz.cmp.uea.ac.uk/Research/stemmer/

You&#039;ll find that tf-idf has some limitations, such as not working very well on large document collections, it&#039;s also not very good with synonyms so it&#039;s hard for this method to find relationships between words.

N-grams - look at Witten-Bell discounting and smoothing.  

Check out LSA too, although it also has some limitations but fewer.  You can use neural-networks. 

On my blog you&#039;ll find some links to good NLP/AI tools:

http://scienceforseo.blogspot.com/2008/10/top-freeware-stuff.html

Good work, and enjoy your journey in NLP/IR!</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I&#8217;m finishing a PhD in natural language processing (natural language generation and understanding).  I spent my first few years building search engines, and studied them in great depth.  You&#8217;ll find that the guys who work in SEO (search engine optimisation) have a pretty good understanding of how the Google algorithm works, PageRank, stemming, tagging, and all that&nbsp;stuff. </p>
<p>I also wrote a stemmer in my first year which is pretty simple really but has been really useful for me, as it stems to actual words, in their proper natural form.  Quite important for&nbsp;NLG.</p>
<p><a href="http://fizz.cmp.uea.ac.uk/Research/stemmer/" rel="nofollow">http://fizz.cmp.uea.ac.uk/Research/stemmer/</a></p>
<p>You&#8217;ll find that tf-idf has some limitations, such as not working very well on large document collections, it&#8217;s also not very good with synonyms so it&#8217;s hard for this method to find relationships between&nbsp;words.</p>
<p>N-grams - look at Witten-Bell discounting and&nbsp;smoothing.  </p>
<p>Check out LSA too, although it also has some limitations but fewer.  You can use&nbsp;neural-networks. </p>
<p>On my blog you&#8217;ll find some links to good NLP/AI&nbsp;tools:</p>
<p><a href="http://scienceforseo.blogspot.com/2008/10/top-freeware-stuff.html" rel="nofollow">http://scienceforseo.blogspot.com/2008/10/top-freeware-stuff.html</a></p>
<p>Good work, and enjoy your journey in&nbsp;NLP/IR!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
