The Acid3 test says "To pass the test, a browser must use its default settings, the animation has to be smooth, the score has to end on 100/100, and the final page has to look exactly, pixel for pixel, like this reference rendering". (Emphasis mine.)
There has been some question as to what "the animation has to be smooth" means.
The idea is to make sure that browsers focus on performance as well as standards. Performance isn't a standards-compliance issue, but it is something that affects all Web authors and users. If a browser passes all 100/100 subtests and gets the rendering pixel-for-pixel correct (including the favicon!), then it has passed the standards-compliance parts of the Acid3 test. The rest is just a competition for who can be the fastest.
To determine the "score" for performance in a browser that gets 100/100, click on the "A" of "Acid3" on the test after having run the test twice (so that the test uses the browser's cache). An alert should pop up, giving a total time elapsed, and reporting any tests that took longer than 33ms. Test 26 is the only one that should take any significant amount of time, as it contains a tight loop doing some common DOM and JS operations. The test has "passed", for the purposes of the "smoothness" criteria, if all the tests took less than 33ms (it'll give you a message saying "No JS errors and no timing issues." if this happens). Then the only issue is the total time — is it faster than all the other browsers?
An important question is "using what hardware?". Performance tests vary depending on the hardware, so some "reference platform" has to be picked to make a decision. Since "computer" browsers are the first priority with Acid3, as opposed to browsers for phones or other small devices, and since we want the hardware to be able to run the three major platforms of today, I have decided that the "reference hardware" is whatever the top-of-the-line Apple laptop is at the time the test is run.
As hardware improves, performance improves too, so to take this into account test 26 is set up to take longer and longer over time. Today I calibrated the test so that the performance it expects is plausible and will remain so for the next few years, based on results that browsers get on the past few years of Mac laptops.
The Acid3 test has a rendering subtest that checks the positioning of text in particular conditions (absolutely-positioned generated content with embedded fonts). To get precise results, I used a single glyph from the Ahem font, which has well-known metrics. My plan was to have the glyph set up so that a perfect white 20×20 pixel square glyph from the font would be overlaid exactly on a 20×20 pixel square red background, thus hiding everything when things lined up. I positioned this test in the upper right hand corner, snug against the black border of the test.
The problem with this test is that on some platforms, specifically Mac platforms with LCD antialiasing, the font rendering system actually renders the glyph using sub-pixel effects, which ends up overlapping the border and makes the test not look the same as the reference rendering.
This would affect any browser, but only on Mac. Unfortunately, the WebKit team "fixed" this problem by simply hard-coding the Ahem font and making it not antialias.
Now, I argue that this is a bug in the antialiasing, but sadly there's no real spec for the antialiasing and so other people argue that it shouldn't be in the test in the first place, whether I'm right or wrong. What we all agree on is that the font-specific hack is lame. (It's especially bad with this font because Ahem is supposed to be a testing font and we specifically don't want it going down different codepaths!)
So Hyatt and I came to a deal. I would move the test down and to the left one pixel, so it doesn't affect the border anymore, he would accept to remove the hack, and would fix one additional bug (a background-position rounding bug).
The test will probably have a few more minor changes as people track down the last few remaining problems, in particular in the SVG subtests and on the performance part of test 26. (Test 26 is supposed to track the incremental speed of computer hardware, but it hasn't been really calibrated well yet, I just estimated what the numbers should be.)
It's great to see WebKit and Opera work so hard on interoperability issues such as those brought up by Acid3. The Microsoft and Mozilla teams are currently in the "crunch time" of their respective browsers' releases, so it's expected that they wouldn't be working on this at this time — at the end of a release cycle, stability, performance, and user experience are usually much more critical. Hopefully once IE8 and Firefox3 are released they will be able to turn once more to the world of standards and we'll see big improvements of the Web platform again.
Just as Reddit is celebrating Opera reaching 100/100, with the misleading headline Opera the first browser to pass the Acid3 test (hey, submitter: it wouldn't hurt to read the Opera blog post before submitting it to Reddit), the Apple guys track me down and point out that there's yet another bug in the test. With heycam's help, we have now fixed the test. Again. This presumably means Opera is now at 99/100... the race continues!
I have to say, by the way, that the relevant parts of the SVG spec are truly worthless. Where are the UA conformance criteria? You'd think a spec that was so verbose and detailed would actually tell you stuff, instead of just rambling on without actually saying what the requirements were...
Since we announced the Acid3 test a few short weeks ago, two major browser vendors have been tripping over themselves trying to pass the test, with both Opera and Safari getting very close in the last few hours.
Of course, with a test as complex as this one, I was bound to make some mistakes, which the browser vendors were very quick to tell me about. Sometimes they were mistaken, and the test was actually correct; other times I was wrong, and had to fix the test, and in one case, the spec changed.
Here's a quick overview of the more major changes I made to the test. Luckily, none of the errors were too serious.
Sub-pixel testing
It turns out that the original test accidentally required that browsers implement sub-pixel positioning and layout (and in fact the reference rendering got it wrong too, and relied on the same kind of rounding as Firefox does), which is somewhat dubious. I've changed the test to not rely on sub-pixel layout. However, it is very likely that this will be tested in Acid4, if we can get the specs to be clearer on this.
Surrogate pairs in SVG APIs
One of the submitted tests assumed that SVG APIs worked on Unicode codepoints, but the SVG spec changed to work on UTF-16 codepoints, like the rest of the DOM API, so the test was changed there. (The test changed a couple of times, because I originally got the fix wrong.)
The click() method
The test originally assumed that the click() method was reentrant, but the specs were vague on this and someone suggested making it fail if calls to it were nested, so I removed this part of the test (the spec hasn't been updated yet). I replaced it with an attribute test (the new second part of subtest 64).
The Performance Test
I made the loop counter in the performance test (a part of subtest 63) less complicated and shorter, to make it at least plausible that browsers could be fixed to pass that test quickly enough that it wouldn't always feel jerky. At the same time, I updated the test's infrastructure to report more details about pass and fail conditions and how long each subtest takes to run.
Namespace bug
Someone noticed that http://www.w3.org/1998/XML/namespace should have been http://www.w3.org/XML/1998/namespace in one of the subtests.
Linktest timeout
I made the linktest more resilient to slow network conditions. However, the test is still going to give you major issues if you are on a network with multi-second latency, or if the acidtests.org site is being slow.
When we released Acid2, the first browser passed it in about a fortnight. Acid3 is orders of magnitude more complicated. I really didn't expect to see passing browsers this side of August, let alone within a month. I am really impressed by the level of commitment to standards that Opera and the WebKit team (and Apple in particular) are showing here.
After baking it for several weeks, we have finally decided Acid3 is stable enough to announce
it is ready. We'll be working on a guide and other commentary in
the coming weeks and months, but it'll take a while — Acid3 is
far more complex than Acid2 was.
I have to say straight up that I've been really impressed with the
WebKit team. Even before the test was finished, they were actively
following up every single bug the test showed. Safari 3 (the last
released version) scores 39/100 with bad rendering errors. At the
start of last month their nightly builds were scoring about 60/100
with serious rendering errors. Now barely a month later the nightly
builds are already up to 87/100 and most of the rendering errors are
fixed. That's a serious testament to their commitment to
standards. (Also, I have to say, it was quite difficult to find
standards compliance bugs in WebKit to use in the test. I had to go
the extra mile to get WebKit to score low! This was not the case with
most of the other browsers.)
Speaking of standards, and of good news from browser development
teams, Microsoft's IE team announced
today that they were changing their mind about their mode switch, and
that bug fixes they make to their rendering engine will be
applied to pages in what HTML5 calls "no-quirks" mode (what has
historically been known as "standards mode"). I'd like to congratulate
the IE team on this brave decision. It's the right thing for the
Web.