<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Segmentation Fault &#187; wildon</title>
	<atom:link href="http://www.cesaroliveira.net/tea/archives/tag/wildon/feed" rel="self" type="application/rss+xml" />
	<link>http://www.cesaroliveira.net</link>
	<description>Analyzing the core</description>
	<pubDate>Tue, 02 Sep 2008 06:11:59 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
	<language>en</language>
			<item>
		<title>An overly-complex diabolical plan</title>
		<link>http://www.cesaroliveira.net/tea/archives/18</link>
		<comments>http://www.cesaroliveira.net/tea/archives/18#comments</comments>
		<pubDate>Fri, 06 Jun 2008 00:28:50 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
		
		<category><![CDATA[addons]]></category>

		<category><![CDATA[intern]]></category>

		<category><![CDATA[seneca]]></category>

		<category><![CDATA[wildon]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=18</guid>
		<description><![CDATA[So here is a diagram of the plan in which I had in mind to take over the world and catalog all of the extensions on the web:

Click for a larger image
Thank you Dia for letting my express my thoughts in boxes and stick figures. Here is a quick breakdown of some of the components

A [...]]]></description>
			<content:encoded><![CDATA[<p>So here is a diagram of the plan in which I had in mind to take over the world and catalog all of the extensions on the web:<br />
<a href="/images/misc/2008-06-05/theplan.png"><img src="/images/misc/2008-06-05/theplan-resize.png"/></a><br />
<i>Click for a larger image</i></p>
<p>Thank you <a href="http://www.gnome.org/projects/dia/">Dia</a> for letting my express my thoughts in boxes and stick figures. Here is a quick breakdown of some of the components</p>
<ol>
<li>A <strong>URL list</strong> is simply a list of URL that are known to contain extensions. For example source repositories such as AMO and mozdev.</li>
<li><strong>Google API</strong> for more separated addons, such as those on blogs and personal sites</li>
<li><strong>Manual entries</strong> for addons not hosted on webpages. These are usually commercial addons such as McAfee.</li>
<li><strong>Site-specific</strong> and <strong>generic</strong> refer to the rules that the crawler must obey. For example, a generic crawler would crawl a personal site such as example.com, while a site-specific policies would handle sites such as AMO where experimental addons require a login.</li>
<li><strong>Crawler</strong> is a web crawler. I have been having difficulty finding the best tool for the job.</li>
<li><strong>Parser</strong> parses .xpi files. We should also save the html files to extract contextual information where-ever possible.</li>
<li><strong>Site-speicifc persistent storage</strong> is just a database for each site we visit. This may have to be rethought, but I want some sort of redundancy plan to keep files saved even if something horrendous happen to a central database. Especially when dealing with beta software and unfamiliar technology such as web crawlers.</li>
<li><strong>Compared</strong> compares what is stored with a central database. Addons are updated all the time, so we want to the most up-to-date versions available.</li>
<li><strong>View</strong> is used by the <strong>website</strong> to provide information for the <strong>user</strong>.</li>
</ol>
<p>There are still some quirks which have to be figured out:</p>
<ul>
<li>Version bumping on AMO doesn&#8217;t change the actual install.rdf in the xpi file. Instead, Firefox does some update magic to fix that. I either need to work with said magic, or leave it alone (I don&#8217;t think it is entirely a big deal. But it should be noted).</li>
<li>JSpider is a java spider that I have been setting my eyes on. Yeah, it&#8217;s java, but many other crawlers are too. Many other crawlers do both crawl and index, and I different functionality (I need a flexible crawler. Forget the indexer). Unfortunately, JSpider doesn&#8217;t have POST data and web form authentication. Which means I&#8217;m going to have to fix that if I want to use it.</li>
<li><a href="http://code.google.com/apis/ajaxsearch/terms.html">Google&#8217;s Search API TOS</a> doesn&#8217;t seem to be spider friendly. I may have to try out other web search engines.</li>
</ul>
<p>On a brighter note, I put up the <a href="http://repository.cesaroliveira.net/index.cgi/wildon/">sources of my project</a> on the web. And even a nice place to <a href="http://www.cesaroliveira.net/wildon/frontend/">play in</a>. It&#8217;s a bit slow, but I&#8217;m probably into the &#8220;<a href="http://www.sqlite.org/whentouse.html">this isn&#8217;t what you should sqlite for</a>&#8221; territory.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cesaroliveira.net/tea/archives/18/feed</wfw:commentRss>
		</item>
		<item>
		<title>Taming the beast from within</title>
		<link>http://www.cesaroliveira.net/tea/archives/15</link>
		<comments>http://www.cesaroliveira.net/tea/archives/15#comments</comments>
		<pubDate>Thu, 22 May 2008 17:09:14 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
		
		<category><![CDATA[addons]]></category>

		<category><![CDATA[intern]]></category>

		<category><![CDATA[seneca]]></category>

		<category><![CDATA[wildon]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=15</guid>
		<description><![CDATA[The next 5 paragraphs are me whining. To get to the real import stuff, start at paragraph 6
So I have been pouring two weeks into WildOn, which is finding out how many addons exist out there in the wild. But before I start unleashing web crawlers on the web causing havoc and chaos, it will [...]]]></description>
			<content:encoded><![CDATA[<p><em>The next 5 paragraphs are me whining. To get to the real import stuff, start at paragraph 6</em></p>
<p>So I have been pouring two weeks into <a href="http://www.cesaroliveira.net/tea/archives/9">WildOn</a>, which is finding out how many addons exist out there in the wild. But before I start unleashing web crawlers on the web causing havoc and chaos, it will be helpful if we could compare what&#8217;s out there with what we know. What we know is everything from <acronym title="addons.mozilla.org"><a href="https://addons.mozilla.org">AMO</a></acronym>, so we start there. The point of this extra work is to have some results, so that when we release a web crawler on AMO and tell it to find all the extensions, we&#8217;ll have something to compare it&#8217;s results to.</p>
<p>Actually, even this was a bit confusing. AMO provides an <a href="http://wiki.mozilla.org/Update:Remora_API_Docs">API</a> to view its addons (well actually, two versions of the API, with the older being slightly more useful). But that information was eventually scrapped for several reasons. The main one being is that there is a lot of information on AMO that isn&#8217;t on the extension itself (such as, What operating systems are supported, and is the addon a theme or an extension. While the former has been supported since <a href="http://developer.mozilla.org/en/docs/install.rdf#targetPlatform">Firefox 2</a>, I have rarely seen it used, the latter is completely <a href="http://developer.mozilla.org/en/docs/install.rdf#type">optional</a>). This makes any sort of conclusion inconclusive because you don&#8217;t have enough information.</p>
<p>Then there was the problem of having too much information in the database. To the point where ~4000 addons took up ~1.8gigs of information. To an sqlite datbase, this can get slow. When you try some queries, such as the number of extensions that support the &#8216;jp-JP&#8217; locale, this can get to be even more intensive process as you build a table that comprises of tens of thousands of rows (one row for each guid/locale combination). The reason for this is because older versions where being included in the same table as the newest version of the addon. Some addons had something like <a href="https://addons.mozilla.org/en-US/firefox/addons/versions/166">50+ different versions</a>. The solution seemed to be to move old extensions to a different tables. SQL queries seem to go much faster.</p>
<p>Another issue that makes me loathe <a href="http://developer.mozilla.org/en/docs/RDF">RDF</a> is <a href="http://developer.mozilla.org/en/docs/install.rdf">install.rdf</a>. I <strong>strongly</strong> disagree with the use of rdf for anything :) It becomes difficult to parse with a regular xml parser (there are a few python rdf libraries out there. But <a href="http://rdflib.net/">rdflib</a>, the most promising, seems to like not working and not having good examples. Only <a href="http://www.bitstampede.com/">sheppy</a> can save them now, but he&#8217;s working on <acronym title="Mozilla Developer Center">mdc</acronym>). Especially with rdf:resource, which I am completely ignoring right now. So it seems that AMO editors like to get creative with install.rdf, which has caused problems for me (eg. I can not rely on targetPlatform. Some extensions actually have their targetPlatoform in the Description tag. I know this because one of the extensions had Firefox&#8217;s GUID :(). Also, some other quirks like having the id as an attribute of Description instead of a new tag. All things that are probably perfectly valid, but make my life significantly more difficult.</p>
<p><acronym title="Yet Another Problem">YAP</acronym> was that many early extensions did not use chrome.manifest. And some newer ones don&#8217;t. So to look up locale information, they were either in <em>install.rdf</em> or <em>contents.rdf</em>. This makes me (and by extension, kittens and baby Jesus) sad. I don&#8217;t have a fix for this yet.</p>
<p>But enough about problems, what about SUCCESS!?</p>
<p>Ok. So I managed to get a local copy of every extension that is on AMO. Since parsing an analyzing and writing to persistent storage takes a long time, I decided to save myself some trouble and just do the first 2500 extensions (out of the ~7K folders that I have).</p>
<p>Of the 2500 &#8216;extensions numbers&#8217;, <b>1630</b> where successfully analyzed. This is mainly because extension numbers don&#8217;t increment perfectly (eg. there is no <a href="https://services.addons.mozilla.org/en-US/firefox/api/1.1/addon/1">addon #1</a>. The first one starts at <a href="https://services.addons.mozilla.org/en-US/firefox/api/1.1/addon/4">#4</a>. Only about 100 addons failed to parse, giving me a success rate of 94%. Some extensions had quirks in them (eg. bad RDF) that were either invalid or I couldn&#8217;t figure them out.</p>
<p>Out of the 1630 extensions, this is what xulrunner-like applications they supported :<br />
<img src="/images/misc/2008-05-21/addons.png"/><br />
And Here are the approximate numbers :</p>
<table>
<tr>
<th>Name</th>
<th>Count</th>
</tr>
<tr>
<td>Prism/Webrunner</td>
<td>2</td>
</tr>
<tr>
<td>Songbird (old)</td>
<td>2</td>
</tr>
<tr>
<td>Instant</td>
<td>1</td>
</tr>
<tr>
<td>Midbrowser</td>
<td>3</td>
</tr>
<tr>
<td>toolkit (any gecko 1.9 application)</td>
<td>7</td>
</tr>
<tr>
<td>eMusic DLM</td>
<td>12</td>
</tr>
<tr>
<td>Seamonkey (broken GUID)</td>
<td>2</td>
</tr>
<tr>
<td>Nvu</td>
<td>11</td>
</tr>
<tr>
<td>Sunbird</td>
<td>16</td>
</tr>
<tr>
<td>Thunderbird</td>
<td>256</td>
</tr>
<tr>
<td>Songbird</td>
<td>13</td>
</tr>
<tr>
<td>Seamonkey</td>
<td>101</td>
</tr>
<tr>
<td>Flock</td>
<td>159</td>
</tr>
<tr>
<td>Netscape Navigator</td>
<td>68</td>
</tr>
<tr>
<td>Mozilla Suite</td>
<td>166</td>
</tr>
<tr>
<td>Firefox</td>
<td>1466</td>
</tr>
</table>
<p>This looks ok so far. One expects a few non-Firefox extensions. The Thunderbird numbers seem a little low. Reminder that this is only ~33% of the total addons.</p>
<p>Locales seem to be a bigger mess, as there are many early extensions that don&#8217;t use chrome.manifest, so I decided to skip it, but now realize I have to fix it. Out of 1630 addons, only 464 addons had chrome.manifest files that I was able to read. But here is the breakdown anyways :</p>
<p>Number of locales : 173 (en, en-US, en-GB are all considered different locales). There are some invalid locales. For example, <a href="https://addons.mozilla.org/en-US/firefox/addons/versions/2155">Xultris</a> has an invalid locale called xultrisLocale. This can be fixed with a regex expression, but anyways.</p>
<div style="height:300px; overflow:scroll">
<table>
<tr>
<th>Locale</th>
<th>Supported Extensions</th>
</tr>
<tr>
<td>en-US</td>
<td>439</td>
</tr>
<tr>
<td>sv-SE</td>
<td>57</td>
</tr>
<tr>
<td>it-IT</td>
<td>190</td>
</tr>
<tr>
<td>de-DE</td>
<td>189</td>
</tr>
<tr>
<td>pl-PL</td>
<td>137</td>
</tr>
<tr>
<td>es-ES</td>
<td>181</td>
</tr>
<tr>
<td>fi-FI</td>
<td>64</td>
</tr>
<tr>
<td>ru-RU</td>
<td>129</td>
</tr>
<tr>
<td>nl-NL</td>
<td>145</td>
</tr>
<tr>
<td>pt-BR</td>
<td>162</td>
</tr>
<tr>
<td>fr-FR</td>
<td>204</td>
</tr>
<tr>
<td>ja-JP</td>
<td>124</td>
</tr>
<tr>
<td>zh-CN</td>
<td>126</td>
</tr>
<tr>
<td>zh-TW</td>
<td>114</td>
</tr>
<tr>
<td>ko-KR</td>
<td>86</td>
</tr>
<tr>
<td>cs-CZ</td>
<td>90</td>
</tr>
<tr>
<td>en-GB</td>
<td>29</td>
</tr>
<tr>
<td>es-AR</td>
<td>54</td>
</tr>
<tr>
<td>mn-MN</td>
<td>4</td>
</tr>
<tr>
<td>ro-RO</td>
<td>30</td>
</tr>
<tr>
<td>sk-SK</td>
<td>118</td>
</tr>
<tr>
<td>ca-AD</td>
<td>56</td>
</tr>
<tr>
<td>el-GR</td>
<td>38</td>
</tr>
<tr>
<td>pt-PT</td>
<td>49</td>
</tr>
<tr>
<td>ar</td>
<td>18</td>
</tr>
<tr>
<td>uk-UA</td>
<td>61</td>
</tr>
<tr>
<td>sr-YU</td>
<td>12</td>
</tr>
<tr>
<td>bg-BG</td>
<td>28</td>
</tr>
<tr>
<td>hu-HU</td>
<td>84</td>
</tr>
<tr>
<td>hr-HR</td>
<td>64</td>
</tr>
<tr>
<td>da-DK</td>
<td>92</td>
</tr>
<tr>
<td>nb-NO</td>
<td>32</td>
</tr>
<tr>
<td>sl-SI</td>
<td>23</td>
</tr>
<tr>
<td>lt-LT</td>
<td>21</td>
</tr>
<tr>
<td>tr-TR</td>
<td>72</td>
</tr>
<tr>
<td>ar-TN</td>
<td>0</td>
</tr>
<tr>
<td>de-AT</td>
<td>10</td>
</tr>
<tr>
<td>he-IL</td>
<td>41</td>
</tr>
<tr>
<td>el</td>
<td>6</td>
</tr>
<tr>
<td>ja-JA</td>
<td>1</td>
</tr>
<tr>
<td>mk-MK</td>
<td>10</td>
</tr>
<tr>
<td>be-BY</td>
<td>25</td>
</tr>
<tr>
<td>sq-AL</td>
<td>8</td>
</tr>
<tr>
<td>en</td>
<td>19</td>
</tr>
<tr>
<td>de</td>
<td>22</td>
</tr>
<tr>
<td>es</td>
<td>7</td>
</tr>
<tr>
<td>km-KH</td>
<td>6</td>
</tr>
<tr>
<td>th-TH</td>
<td>14</td>
</tr>
<tr>
<td>it</td>
<td>13</td>
</tr>
<tr>
<td>az-AZ</td>
<td>2</td>
</tr>
<tr>
<td>id-ID</td>
<td>8</td>
</tr>
<tr>
<td>fy-NL</td>
<td>13</td>
</tr>
<tr>
<td>fa-IR</td>
<td>33</td>
</tr>
<tr>
<td>af-ZA</td>
<td>8</td>
</tr>
<tr>
<td>ar-SA</td>
<td>4</td>
</tr>
<tr>
<td>cy-GB</td>
<td>0</td>
</tr>
<tr>
<td>gl-ES</td>
<td>11</td>
</tr>
<tr>
<td>ms-MY</td>
<td>3</td>
</tr>
<tr>
<td>ar-JO</td>
<td>1</td>
</tr>
<tr>
<td>es-CH</td>
<td>0</td>
</tr>
<tr>
<td>es-CL</td>
<td>6</td>
</tr>
<tr>
<td>am-HY</td>
<td>1</td>
</tr>
<tr>
<td>hi-IN</td>
<td>5</td>
</tr>
<tr>
<td>vi-VN</td>
<td>4</td>
</tr>
<tr>
<td>en-AU</td>
<td>5</td>
</tr>
<tr>
<td>cz-CZ</td>
<td>1</td>
</tr>
<tr>
<td>he</td>
<td>1</td>
</tr>
<tr>
<td>fa</td>
<td>1</td>
</tr>
<tr>
<td>ur</td>
<td>1</td>
</tr>
<tr>
<td>ja</td>
<td>18</td>
</tr>
<tr>
<td>fr</td>
<td>23</td>
</tr>
<tr>
<td>nl</td>
<td>9</td>
</tr>
<tr>
<td>pl</td>
<td>9</td>
</tr>
<tr>
<td>ru</td>
<td>14</td>
</tr>
<tr>
<td>sk</td>
<td>15</td>
</tr>
<tr>
<td>eu-EU</td>
<td>1</td>
</tr>
<tr>
<td>de-CH</td>
<td>5</td>
</tr>
<tr>
<td>ko</td>
<td>4</td>
</tr>
<tr>
<td>hr</td>
<td>1</td>
</tr>
<tr>
<td>sr-Yu</td>
<td>3</td>
</tr>
<tr>
<td>ga-IE</td>
<td>7</td>
</tr>
<tr>
<td>pt-PR</td>
<td>0</td>
</tr>
<tr>
<td>tr</td>
<td>3</td>
</tr>
<tr>
<td>cs</td>
<td>4</td>
</tr>
<tr>
<td>hu</td>
<td>7</td>
</tr>
<tr>
<td>en-BZ</td>
<td>3</td>
</tr>
<tr>
<td>en-CA</td>
<td>4</td>
</tr>
<tr>
<td>en-IE</td>
<td>3</td>
</tr>
<tr>
<td>en-JM</td>
<td>3</td>
</tr>
<tr>
<td>en-NZ</td>
<td>3</td>
</tr>
<tr>
<td>en-PH</td>
<td>3</td>
</tr>
<tr>
<td>en-TT</td>
<td>3</td>
</tr>
<tr>
<td>en-ZA</td>
<td>3</td>
</tr>
<tr>
<td>en-ZW</td>
<td>3</td>
</tr>
<tr>
<td>es-BO</td>
<td>1</td>
</tr>
<tr>
<td>es-CO</td>
<td>1</td>
</tr>
<tr>
<td>es-CR</td>
<td>1</td>
</tr>
<tr>
<td>es-DO</td>
<td>1</td>
</tr>
<tr>
<td>es-EC</td>
<td>1</td>
</tr>
<tr>
<td>es-SV</td>
<td>1</td>
</tr>
<tr>
<td>es-GT</td>
<td>1</td>
</tr>
<tr>
<td>es-HN</td>
<td>1</td>
</tr>
<tr>
<td>es-NI</td>
<td>1</td>
</tr>
<tr>
<td>es-PA</td>
<td>1</td>
</tr>
<tr>
<td>es-PY</td>
<td>1</td>
</tr>
<tr>
<td>es-PE</td>
<td>1</td>
</tr>
<tr>
<td>es-PR</td>
<td>1</td>
</tr>
<tr>
<td>es-MX</td>
<td>2</td>
</tr>
<tr>
<td>es-UY</td>
<td>1</td>
</tr>
<tr>
<td>es-VE</td>
<td>1</td>
</tr>
<tr>
<td>fr-BE</td>
<td>2</td>
</tr>
<tr>
<td>fr-CA</td>
<td>2</td>
</tr>
<tr>
<td>fr-CH</td>
<td>2</td>
</tr>
<tr>
<td>fr-LU</td>
<td>2</td>
</tr>
<tr>
<td>fr-MC</td>
<td>2</td>
</tr>
<tr>
<td>eu-ES</td>
<td>3</td>
</tr>
<tr>
<td>zw-TH</td>
<td>0</td>
</tr>
<tr>
<td>da-DA</td>
<td>1</td>
</tr>
<tr>
<td>be</td>
<td>1</td>
</tr>
<tr>
<td>eo</td>
<td>1</td>
</tr>
<tr>
<td>ca</td>
<td>7</td>
</tr>
<tr>
<td>pt</td>
<td>2</td>
</tr>
<tr>
<td>ar-DZ</td>
<td>1</td>
</tr>
<tr>
<td>jp-JP</td>
<td>0</td>
</tr>
<tr>
<td>et-EE</td>
<td>2</td>
</tr>
<tr>
<td>nl-BE</td>
<td>1</td>
</tr>
<tr>
<td>eu</td>
<td>1</td>
</tr>
<tr>
<td>en-EN</td>
<td>0</td>
</tr>
<tr>
<td>sr-CS</td>
<td>1</td>
</tr>
<tr>
<td>ua-UA</td>
<td>1</td>
</tr>
<tr>
<td>no-NO</td>
<td>1</td>
</tr>
<tr>
<td>mn-MK</td>
<td>0</td>
</tr>
<tr>
<td>sl-SL</td>
<td>2</td>
</tr>
<tr>
<td>is</td>
<td>2</td>
</tr>
<tr>
<td>nn-NO</td>
<td>1</td>
</tr>
<tr>
<td>lv-LV</td>
<td>0</td>
</tr>
<tr>
<td>uk-AU</td>
<td>1</td>
</tr>
<tr>
<td>ja-JP-mac</td>
<td>2</td>
</tr>
<tr>
<td>ml-IN</td>
<td>1</td>
</tr>
<tr>
<td>wa-BE</td>
<td>1</td>
</tr>
<tr>
<td>is-IS</td>
<td>2</td>
</tr>
<tr>
<td>ca-ES</td>
<td>0</td>
</tr>
<tr>
<td>sv</td>
<td>1</td>
</tr>
<tr>
<td>fr-fR</td>
<td>0</td>
</tr>
<tr>
<td>da</td>
<td>7</td>
</tr>
<tr>
<td>fi</td>
<td>2</td>
</tr>
<tr>
<td>ro</td>
<td>1</td>
</tr>
<tr>
<td>ar-LB</td>
<td>0</td>
</tr>
<tr>
<td>sr-RS</td>
<td>3</td>
</tr>
<tr>
<td>en-UK</td>
<td>2</td>
</tr>
<tr>
<td>es-US</td>
<td>1</td>
</tr>
<tr>
<td>de-LI</td>
<td>1</td>
</tr>
<tr>
<td>de-LU</td>
<td>1</td>
</tr>
<tr>
<td>ko-Kr</td>
<td>1</td>
</tr>
<tr>
<td>no</td>
<td>1</td>
</tr>
<tr>
<td>zh</td>
<td>1</td>
</tr>
<tr>
<td>bg</td>
<td>1</td>
</tr>
<tr>
<td>tl</td>
<td>1</td>
</tr>
<tr>
<td>sr</td>
<td>1</td>
</tr>
<tr>
<td>sq</td>
<td>1</td>
</tr>
<tr>
<td>sl</td>
<td>2</td>
</tr>
<tr>
<td>xultrisLocale</td>
<td>1</td>
</tr>
<tr>
<td>ca-CD</td>
<td>1</td>
</tr>
<tr>
<td>se-SV</td>
<td>1</td>
</tr>
<tr>
<td>mn</td>
<td>0</td>
</tr>
<tr>
<td>mk</td>
<td>1</td>
</tr>
<tr>
<td>pa-IN</td>
<td>0</td>
</tr>
<tr>
<td>ka</td>
<td>1</td>
</tr>
<tr>
<td>lt</td>
<td>1</td>
</tr>
<tr>
<td>uk</td>
<td>2</td>
</tr>
<tr>
<td>ar-AR</td>
<td>1</td>
</tr>
<tr>
<td>he-HL</td>
<td>0</td>
</tr>
<tr>
<td>convertLocale</td>
<td>1</td>
</tr>
</table>
</div>
<p>Some locales will have 0 supported extensions. This is because We are only counting the most up-to-date extension, and not counting previous versions which may have supported that locale. While doing a graph for each locale would be unwise, a much wiser choice would be to break it down into language.</p>
<p>So which languages are best supported?</p>
<div style="height:300px; overflow:scroll">
<table>
<tr>
<th>Language</th>
<th>Extensions supported</th>
</tr>
<tr>
<td>en</td>
<td>462</td>
</tr>
<tr>
<td>sv</td>
<td>58</td>
</tr>
<tr>
<td>it</td>
<td>202</td>
</tr>
<tr>
<td>de</td>
<td>212</td>
</tr>
<tr>
<td>pl</td>
<td>145</td>
</tr>
<tr>
<td>es</td>
<td>192</td>
</tr>
<tr>
<td>fi</td>
<td>66</td>
</tr>
<tr>
<td>ru</td>
<td>143</td>
</tr>
<tr>
<td>nl</td>
<td>154</td>
</tr>
<tr>
<td>pt</td>
<td>165</td>
</tr>
<tr>
<td>fr</td>
<td>225</td>
</tr>
<tr>
<td>ja</td>
<td>142</td>
</tr>
<tr>
<td>zh</td>
<td>148</td>
</tr>
<tr>
<td>ko</td>
<td>91</td>
</tr>
<tr>
<td>cs</td>
<td>94</td>
</tr>
<tr>
<td>mn</td>
<td>4</td>
</tr>
<tr>
<td>ro</td>
<td>31</td>
</tr>
<tr>
<td>sk</td>
<td>133</td>
</tr>
<tr>
<td>ca</td>
<td>64</td>
</tr>
<tr>
<td>el</td>
<td>44</td>
</tr>
<tr>
<td>ar</td>
<td>21</td>
</tr>
<tr>
<td>uk</td>
<td>64</td>
</tr>
<tr>
<td>sr</td>
<td>19</td>
</tr>
<tr>
<td>bg</td>
<td>29</td>
</tr>
<tr>
<td>hu</td>
<td>91</td>
</tr>
<tr>
<td>hr</td>
<td>65</td>
</tr>
<tr>
<td>da</td>
<td>100</td>
</tr>
<tr>
<td>nb</td>
<td>32</td>
</tr>
<tr>
<td>sl</td>
<td>27</td>
</tr>
<tr>
<td>lt</td>
<td>22</td>
</tr>
<tr>
<td>tr</td>
<td>75</td>
</tr>
<tr>
<td>he</td>
<td>42</td>
</tr>
<tr>
<td>mk</td>
<td>11</td>
</tr>
<tr>
<td>be</td>
<td>26</td>
</tr>
<tr>
<td>sq</td>
<td>9</td>
</tr>
<tr>
<td>km</td>
<td>6</td>
</tr>
<tr>
<td>th</td>
<td>14</td>
</tr>
<tr>
<td>az</td>
<td>2</td>
</tr>
<tr>
<td>id</td>
<td>8</td>
</tr>
<tr>
<td>fy</td>
<td>13</td>
</tr>
<tr>
<td>fa</td>
<td>34</td>
</tr>
<tr>
<td>af</td>
<td>8</td>
</tr>
<tr>
<td>cy</td>
<td>0</td>
</tr>
<tr>
<td>gl</td>
<td>11</td>
</tr>
<tr>
<td>ms</td>
<td>3</td>
</tr>
<tr>
<td>am</td>
<td>1</td>
</tr>
<tr>
<td>hi</td>
<td>5</td>
</tr>
<tr>
<td>vi</td>
<td>4</td>
</tr>
<tr>
<td>cz</td>
<td>1</td>
</tr>
<tr>
<td>ur</td>
<td>1</td>
</tr>
<tr>
<td>eu</td>
<td>5</td>
</tr>
<tr>
<td>ga</td>
<td>7</td>
</tr>
<tr>
<td>zw</td>
<td>0</td>
</tr>
<tr>
<td>eo</td>
<td>1</td>
</tr>
<tr>
<td>jp</td>
<td>0</td>
</tr>
<tr>
<td>et</td>
<td>2</td>
</tr>
<tr>
<td>ua</td>
<td>1</td>
</tr>
<tr>
<td>no</td>
<td>2</td>
</tr>
<tr>
<td>is</td>
<td>4</td>
</tr>
<tr>
<td>nn</td>
<td>1</td>
</tr>
<tr>
<td>lv</td>
<td>0</td>
</tr>
<tr>
<td>ml</td>
<td>1</td>
</tr>
<tr>
<td>wa</td>
<td>1</td>
</tr>
<tr>
<td>tl</td>
<td>1</td>
</tr>
<tr>
<td>xultrisLocale</td>
<td>1</td>
</tr>
<tr>
<td>se</td>
<td>1</td>
</tr>
<tr>
<td>pa</td>
<td>0</td>
</tr>
<tr>
<td>ka</td>
<td>1</td>
</tr>
<tr>
<td>convertLocale</td>
<td>1</td>
</tr>
</table>
</div>
<p>And here is the obligatory graph for those numerically challenged by high school mathematics teachers.</p>
<p><img src="http://www.cesaroliveira.net/images/misc/2008-05-21/addons2.png" alt="top 10 languages for 464 analyzed extensions"/></p>
<p>So what does this lead to? First I need to fix locales. We need to get the vast majority of them. Next, I want to profile all the extensions and not just the first 2500. And then, I want to start looking at web crawlers and learning how to crawl a simple website before unleashing a monster on AMO.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cesaroliveira.net/tea/archives/15/feed</wfw:commentRss>
		</item>
		<item>
		<title>Taking on the WildOn(es)</title>
		<link>http://www.cesaroliveira.net/tea/archives/9</link>
		<comments>http://www.cesaroliveira.net/tea/archives/9#comments</comments>
		<pubDate>Thu, 15 May 2008 23:33:37 +0000</pubDate>
		<dc:creator>Cesar</dc:creator>
		
		<category><![CDATA[addons]]></category>

		<category><![CDATA[intern]]></category>

		<category><![CDATA[mozilla]]></category>

		<category><![CDATA[seneca]]></category>

		<category><![CDATA[wildon]]></category>

		<guid isPermaLink="false">http://www.cesaroliveira.net/?p=9</guid>
		<description><![CDATA[I started writing this a week and a half ago, but just finished it today.
First day at interning at Mozilla. I finally found out what I get to do this summer. I got the OK to blog about it, because you know how secret them Mozilla folks are about their secret in-house project (ie. What [...]]]></description>
			<content:encoded><![CDATA[<p><em>I started writing this a week and a half ago, but just finished it today.</em></p>
<p>First day at interning at Mozilla. I finally found out what I get to do this summer. I got the OK to blog about it, because you know how secret them Mozilla folks are about their secret in-house project (ie. <a href="http://starkravingfinkle.org/blog/">What is this guy up to?</a> ;)).</p>
<p>The actual wiki page was apparently out in the open, but no-one heard about it. It&#8217;s called <a href="http://wiki.mozilla.org/Update:WildOn">WildOnAddons</a>. While a new name is, <acronym title="In My Opinion">IMO</acronym>, mandatory, it&#8217;s actually a pretty neat idea. There are many great extensions such as Ted&#8217;s <a href="http://ted.mielczarek.org/code/mozilla/extensiondev/">Extension Developer&#8217;s Extension</a> that aren&#8217;t hosted on AMO. Some other extensions are hosted on AMO, but frequently have updates much sooner on their website before it goes public.</p>
<p>Sometimes, extensions come in bundled with packages such as Norton and McAfeee. <a href="http://www.google.com/tools/firefox/">Google Notebook</a> is one of many Google Labs extension hosted on their own server.</p>
<p>In short, they&#8217;re hosted everywhere. But that presents a problem, how many are out there and can find and index them?</p>
<p>This is actually a lot harder then going on google and typing <a href="http://www.google.com/search?hl=en&amp;q=filetype%3Axpi&amp;btnG=Google+Search">filetype:xpi</a>, because according to those results, <a href="http://www.google.com/search?hl=en&amp;q=filetype%3Axpi+site%3Aaddons.mozilla.org&amp;btnG=Google+Search">AMO only has 78 extensions</a>. In fact, there are <a href="http://www.addonsmirror.net/">several</a> <a href="http://addons.sociz.com/">repositories</a> <a href="http://en.addons.pl/">of</a> <a href="http://www.foxiewire.com/">addons</a> <a href="http://addons.songbirdnest.com/">each</a> <a href="https://extensions.flock.com/">catering</a> to a different crowd (yes, we are counting <strong>all</strong> addons). While I don&#8217;t think that AMO can satisfy everyone all the time. It might help us figure out how many extensions are out there and how many are hosted on our servers. Actually figuring this out will take a lot of work, and not as straight-forward as it sounds (ie. All of AMO&#8217;s sandboxed addons require authentication, so a web crawler would have to know about it if we were crawling through the web), but it will be worth it in the end.</p>
<p>I&#8217;ll keep blogging about it under <a href="http://www.cesaroliveira.net/tea/archives/tag/wildon/feed">wildon tag RSS feed</a> if your interested on how progress goes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cesaroliveira.net/tea/archives/9/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
