I’ve been spending a few days on trying to develop a few tools for editors to use to quickly reject addons for obvious defects, such as loading remote scripts. But I wanted to get deeper into the javascript stuff mainly because it’s a) it’s harder and b) it’s where the real problems lie.
But as anyone can tell you, it’s not an easy task (going towards damn near impossible). Firstly, you can’t really use a lexical parser. Well, you can, but it won’t be dependable. Let’s take an example out of the Reviewer’s guide :
document["crea" + "teElement"]("s" + "c" + "r" + ["i", "p", "t"].join(""));
Which is sneaky way of creating a script element, though I question the competence of someone who will use this as their main line of attack (it’s practically spelled out for you). But taking this as a use case, and ignoring the fact that they can use document[cheese] instead, I wondering if parsing the javascript would make figuring this stuff out any better.
Happily, I have spidermonkey and a js shell to do any parsing I wish. But I found out some cool things that you can do in the shell, such as looking at the bytecode for an entire function using the dis() command.
This was interesting. Certainly, there are some optimizations you can do for :
document["crea" + "teElement"]("s" + "c" + "r" + ["i", "p", "t"].join(""));
I would be shocked if it didn’t end up spelling out :
document["createElement"]("script");
I had a few hurdles to overcome. Firstly, document is not defined in the javascript shell. Thinking it was defined in the xpcshell (owww. I was misled by some apparently unused tests and my general ignorance that xpcshell tests went into unit/ and not js/ directory) I went through the added trouble of coping dis() and related functions from js.cpp to xpcshell.cpp. Once I realized that document wasn’t defined, I made a document mock object just to see what the blasted bytecode would look like.
I was a little disappointed. This source:
Ended up being this bytecode :
var document = {
createElement : function(s) {
print("damn");
}
};
function foo() {
document["crea" + "teElement"]("s" + "c" + "r" + ["i", "p", "t"].join(""));
}
dis(foo);
00000: name "document"
00003: string "createElement"
00006: callelem
00007: string "s"
00010: string "c"
00013: add
00014: string "r"
00017: add
00018: newinit 3
00020: zero
00021: string "i"
00024: initelem
00025: one
00026: string "p"
00029: initelem
00030: int8 2
00032: string "t"
00035: initelem
00036: endinit
00037: callprop "join"
00040: string ""
00043: call 1
00046: add
00047: call 1
00050: pop
00051: stop
Source notes:
0: 0 [ 0] newline
1: 6 [ 6] pcbase offset 6
3: 37 [ 31] xdelta
4: 37 [ 0] pcbase offset 19
6: 43 [ 6] pcbase offset 25
8: 47 [ 4] pcbase offset 47
So, almost. The document[”createElement”] part was correct, but the .join() wasn’t optimized. Although I wasn’t overly estatic, I knew that this was just one (somewhat lame) use case in the countless of possible others.
This is making me rethink whether lexical tools are the way to go. While they don’t give you any definitive proof that there is a possible security hole, they can still be useful. For example, if you want to use XMLHttpRequest, then you have to call it at least once in your program (even if you say var Widget = XMLHttpRequest). And at least that can bring up warning flags, or at least give editors a place to look.
I don’t think any tool can completely replace a human being. But hopefully, tools will help make the review process easier because you can start looking at high-risk areas first rather than starting from a arbitrary point and not coming across something until 10 minutes later.