<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>César&#039;s geek-side &#187; pagerank</title>
	<atom:link href="http://crodas.org/tag/pagerank/feed" rel="self" type="application/rss+xml" />
	<link>http://crodas.org</link>
	<description>bits comming in and out.</description>
	<lastBuildDate>Sun, 24 Oct 2010 08:32:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Weird but cool Pagerank&#8217;s usage</title>
		<link>http://crodas.org/weird-but-cool-pageranks-usage.php</link>
		<comments>http://crodas.org/weird-but-cool-pageranks-usage.php#comments</comments>
		<pubDate>Tue, 06 Oct 2009 17:42:33 +0000</pubDate>
		<dc:creator>crodas</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[pagerank]]></category>
		<category><![CDATA[weirdness]]></category>

		<guid isPermaLink="false">http://crodas.org/?p=3</guid>
		<description><![CDATA[Yesterday, talking with a good friend, he told me he needed a good algorithm to detect keywords (relevant words) from a document. The first algorithm  that came out from my head was a simple word frequency counter, discarding common words by building a list of stop-words with a previous learning. This algorithm is pretty [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, talking with a good friend, he told me he needed a good algorithm to detect keywords (relevant words) from a document. The first algorithm  that came out from my head was a simple word frequency counter, discarding common words by building a list of stop-words with a previous learning. This algorithm is pretty obvious and I&#8217;m  sure it is very used out there.</p>
<p>Then Googling for some papers (I have a bunch on my laptop but I do not recall where I stored it)   I found a paper that opened my mind (<a href="http://www.cse.unt.edu/~rada/papers/mihalcea.emnlp04.pdf">TextRank: Bringing order into Texts</a>). It suggests to build a graph of words, then apply the <a href="http://en.wikipedia.org/wiki/PageRank">PageRank Algorithm</a> to the graph in order to know relevant words. I haven&#8217;t read it deeply yet, but I&#8217;ve got that idea with a brief reading, and it makes sense, I&#8217;m wondering why I never thought about it.</p>
<p>I&#8217;m planning to code it, just as a proof-of-concepts during this week.  Basically I will use some old code that I&#8217;ve coded (but never finished) awhile ago, I remember I build it very modular using classes, so adapt that code for these needs will be pretty straightforward.  And in the graph of words (and sets of  1, 2, 3 words probably), the previous word will reference the next word (If you have no idea about what I said here, just take a look <a href="http://en.wikipedia.org/wiki/File:PageRanks-Example.svg">here</a>).</p>
<p>I will post the results here.</p>]]></content:encoded>
			<wfw:commentRss>http://crodas.org/weird-but-cool-pageranks-usage.php/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

