The Current and Future Performance of the Mobile Web
- Processor speed of mobile ARM processors versus desktop x86 processors
- Memory consumption in particular related to garbage collection
The two bottlenecks that Drew describes are CPU and memory. CPU-boundedness has two aspects: the power of your CPU and efficiency of execution. Drew points out that current generation x86 processors are ten times faster than current generation ARM processors which power today's mobile devices such as the iPhone and high-end Android devices.
I am not a hardware engineer, but I once worked for a major semiconductor company, and the people there tell me that these days performance is mostly a function of your process (e.g., the thing they measure in “nanometers”). The iPhone 5′s impressive performance is due in no small part to a process shrink from 45nm to 32nm — a reduction of about a third. But to do it again, Apple would have to shrink to a 22nm process.
Knowledge and investments in reducing the size of transistors is mostly in the hands of Intel. Drew claims it's not likely that ARM will catch in the foreseeable future. In fact, it's far more likely that Intel will produce a x86 processor that competes power-consumption wise with ARM than that ARM will close the performance gap.
Here’s Chrome v8 on my Mac (the earliest one that still ran, Dec 2010.) Now here’s v26.
If the web feels faster to you than it did in 2010, that is probably because you’re running a faster computer, but it has nothing to do with improvements to Chrome.
The second limitation of the mobile web is memory. Once again, there are two aspects to memory usage: the amount available and efficiency of usage.
While modern mobile devices have a fair amount of memory (usually 512MB or 1GB of memory), the operating system does not allow the application to use that much. A lot of memory is used by the operating system itself as well as other running applications (multi tasking):
[...] essentially on the iPhone 4S, you start getting warned around 40MB and you get killed around 213MB. On the iPad 3, you get warned around 400MB and you get killed around 550MB.
Drew notes that at the iPhone 4S' resolution, a single picture taken with the camera takes up 30MB of bitmap data. That means there is space available for 7 photos in RAM before the operating system kills the application because it runs out of memory. Therefore, especially if an application handles media like pictures and video, it has to be extremely careful in what it keeps in memory and for how long, as memory is extremely limited.
What this chart says is “As long as you have about 6 times as much memory as you really need, you’re fine. But woe betide you if you have less than 4x the required memory.”
The ground truth is that in a memory constrained environment garbage collection performance degrades exponentially. If you write Python or Ruby or JS that runs on desktop computers, it’s possible that your entire experience is in the right hand of the chart, and you can go your whole life without ever experiencing a slow garbage collector. Spend some time on the left side of the chart and see what the rest of us deal with.
This behavior could be the reason why Apple has never supported Objective-C's garbage collector on iOS, and is replacing it with ARC (both on iOS and the Mac), which is not a garbage collector.
While Drew makes interesting points in his article, as pointed out by Brendan Eich in a tweet, not all applications are CPU or memory bound. It is only a certain category of applications that hit these problems, for instance games and multi-media applications. Nevertheless, Drew's article (10,000 words long) is worth a read for anybody interested in performance on the mobile web.
Good Objective View
I wrote a simple 3D flying simulator type game using XNA and F# on WP7 (wp.me/p1buGO-3R). The performance is actually quite good even on my 2 year single core WP7. Many other XNA 3D games run very well on the same phone.
One thing to note is that the 'garbage generation signature' of functional langauges is different because immutability is the default. Some of this difference is seen in the Google paper on performance comparison between Scala and Java. See multi-language-bench.googlecode.com/svn/trunk/d...)
Fair assessment, but depends on the application
Working through the DOM, CSS, and WebGL allows for deferring a lot of computation to native code, and even hardware acceleration. I see asm.js in a similar vein, only now developers are able to control what is available instead of just relying on the browser to implement it. You can call it cheating, but its not so different from the way apple uses Objective-C to interface and orchestrate C code. And keep in consideration that most dynamic languages commonly use FFI to talk to native libraries for things like image manipulation.
"Never bet against the open web"
Did you notice what the benchmark actually compares?
ﬁrst executes the Java program to calculate object lifetimes and generate the program heap trace. The system processes the program heap trace uses the Merlin algorithm to compute object reachability times and generate the reachability-based oracle. The lifetime-based oracle comes directly from the lifetimes computed during the proﬁling run.
So this is not something that could be done when running your regular programs. And is says nothing about the performance of reference counting.