Archive for June, 2008

As an AMO editor, one thing you have to do is code review for security flaws. When doing update reviews, the best way to do this is to download the extension update that is currently in sandbox, and the last public release and unzip the zippy and jar files (unless your lucky and your diff program does this for you), than compare the results using a tool such as kDiff3, meld, or WinMerge.

I’m trying to change that by starting a project that will let you compare two files online. I’ve done some work and think it’s a good time to get my idea out to those who will use it.

Here is a screenshot of the output. You can test a sample output at this page :

One of the first thing you might notice is that this isn’t a side-by-side diff. The reason for that is that editors typically aren’t worried about what was taken out, but what was put in (while what was taken out might be more useful for extension developers). There is also the code that hasn’t changed, which is useful for referencing functions if it is ever needed.

It’s a simple php file. I hope to have some feedback whether people will use this tool or not (and saying “I use x because I know it best” is totally fine too. I’m trying to focus my energy on what will be used).

I rarely bookmark any webpages I visit. The awesomebar has cleverly replaced the traditional paradigm of what bookmarks were. If I need anything, the awesomebar handles it. HOWEVER, one thing that it doesn’t do is remind me. The only time I use bookmarks is when I know I’ll never visit the site unless something reminds me to visit. For example, I just made a comment on a blog, and I want to see other comments or if the blogger replied to me. This particular blog doesn’t email me when people reply. It does have an RSS feed for comments, but nothing I want to go through the trouble of having to subscribe to it. So I choose to check manually. I would never remember to do so, so I bookmarked it.

At the very least, I would have a record that I should have checked it. Hmm, maybe something should automate this for me? ;)

For anyone interested in distributing extensions without SSL enabled on their website and cannot use the x86 binaries (I really shouldn’t be the only one), here is one compiled for Linux (unofficial) and some official docs to help you out.

Something very unfortunate happened last Friday. Do to some lack of intern foresight, we actually lost a good to great intern perk. I think all the interns came out of that meeting shaken up a bit, maybe even a bit angry for losing something that they weren’t responsible for.

But that got me thinking about all the perks that Mozilla does gives. Indeed, they don’t have to supply interns with an apartment for the summer. They don’t have to give us transportation to take us to and from work, and let us drive it where ever on the weekends. And they don’t have to stock the place with free beverages and snacks, and a wii console and a ping pong table. But they do, and it’s often easy to take for granted something that is a privilege.

For a good moment, I forgot what this internship means to me, and how lucky I am to be back. While I don’t think any of my mentors will start a start-up with me, it’s great to witness part of the process which makes a great company. So while the lost intern perk was unfortunate, it’s a very small price to pay considering what we’re still getting in return.

So here is a diagram of the plan in which I had in mind to take over the world and catalog all of the extensions on the web:

Click for a larger image

Thank you Dia for letting my express my thoughts in boxes and stick figures. Here is a quick breakdown of some of the components

  1. A URL list is simply a list of URL that are known to contain extensions. For example source repositories such as AMO and mozdev.
  2. Google API for more separated addons, such as those on blogs and personal sites
  3. Manual entries for addons not hosted on webpages. These are usually commercial addons such as McAfee.
  4. Site-specific and generic refer to the rules that the crawler must obey. For example, a generic crawler would crawl a personal site such as example.com, while a site-specific policies would handle sites such as AMO where experimental addons require a login.
  5. Crawler is a web crawler. I have been having difficulty finding the best tool for the job.
  6. Parser parses .xpi files. We should also save the html files to extract contextual information where-ever possible.
  7. Site-speicifc persistent storage is just a database for each site we visit. This may have to be rethought, but I want some sort of redundancy plan to keep files saved even if something horrendous happen to a central database. Especially when dealing with beta software and unfamiliar technology such as web crawlers.
  8. Compared compares what is stored with a central database. Addons are updated all the time, so we want to the most up-to-date versions available.
  9. View is used by the website to provide information for the user.

There are still some quirks which have to be figured out:

  • Version bumping on AMO doesn’t change the actual install.rdf in the xpi file. Instead, Firefox does some update magic to fix that. I either need to work with said magic, or leave it alone (I don’t think it is entirely a big deal. But it should be noted).
  • JSpider is a java spider that I have been setting my eyes on. Yeah, it’s java, but many other crawlers are too. Many other crawlers do both crawl and index, and I different functionality (I need a flexible crawler. Forget the indexer). Unfortunately, JSpider doesn’t have POST data and web form authentication. Which means I’m going to have to fix that if I want to use it.
  • Google’s Search API TOS doesn’t seem to be spider friendly. I may have to try out other web search engines.

On a brighter note, I put up the sources of my project on the web. And even a nice place to play in. It’s a bit slow, but I’m probably into the “this isn’t what you should sqlite for” territory.