Hixie's Natural Log: The CSS2.1 test suite: Day 1

2003-05-01 16:28 UTC The CSS2.1 test suite: Day 1

(Well, it's not really day 1, but it's the first day I made any real progress, so I'm going to use artistic license and call it day 1. To be honest it's not even really a day, I mean, I spent about 2 hours on it tops...)

Today I worked on converting the CSS1 Test Suite into the format we've decided upon for the CSS2.1 Test Suite. This is a distinctly non-trivial job. To start with, I decided to just do overall changes, converting all the tests into roughly the right shape before looking at any of them in detail.

The first step of this plan was extracting the stylesheet and test content from the 106 original test files. Getting the stylesheet is easy enough, but the content is in most test files twice, and usually prefixed by some comments and another copy of the stylesheet. I ended up using a simple (albeit long) Perl regular expression to do this:

$page =~ m~  <title>(.+)</title>
             .+
             <style\ type="text/css">(.+)</style>
             .+
         (?: <pre>.+</pre> .*? <hr> \s*
           | <body> )
             (.+?)                                                                \n*
             <table\ border\ cellspacing="0"\ cellpadding="3"\ class="tabletest"> \s*
             <tr>                                                                 \s*
             <td\ colspan="2"\ bgcolor="silver">                                  \s*
             <strong>TABLE\ Testing\ Section</strong></td>                        \s*
             </tr>                                                                \s*
             <tr>                                                                 \s*
             <td\ bgcolor="silver"> </td>                                    \s*
             <td>                                                                 \s*
             (.+?)                                                                \n*
             </td>                                                                \s*
             </tr>                                                                \s*
             </table>                                                             \s*
             </body>                                                              \s*
             </html>                                                              \s* $
          ~xios or die "file is not in the expected format\n";
my $title = $1;
my $stylesheet = $2;
my $content1 = $3;

I did originally try to use HTML::TokeParser to parse the HTML files and read out the information I needed that way, but a combination of HTML::TokeParser's non-compliant behaviour, and the somewhat unpredictable nature of the source files, made that more trouble than it was worth and I quickly dropped that plan in favour of the regular expression above.

There were only two files that didn't match this regular expression. The first appears to have gotten corrupted at some point. The second is Todd Fahrner's famous Box Acid Test (home site). Those of you who have been in the CSS community for a long time (in relative Internet terms — I'm talking 1998 here) will almost certainly remember this test. It is probably the single most successful test that the CSS community has ever seen. When Todd wrote it, pretty much every user agent implementor raced to fix their rendering engines to nail the test, and now that is the one page on the Web that you can almost guarentee all UAs will render identically, to the pixel.

The next step in converting the CSS1 test suite to the new format is turning the HTML 4.0 Transitional content into XHTML 1.1 content. I made a brief attempt at doing this by hand, then decided to not reinvent the wheel, and instead opened a two-way pipe to HTML Tidy in "output body only" mode:

open2(*TidyOut, *TidyIn, '/home/ianh/bin/tidy/bin/tidy -i -n -asxhtml -q -config tidy.cfg');
print TidyIn "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\"><title></title>$content1";
close(TidyIn);
$/ = undef; # read everything in at once, not one line at a time
my $content = <TidyOut>;

Amusingly, the code I've described so far ended up catching a bunch of errors (member only links) in the CSS1 tests.

The next step, and the final step performed by my script, is to output the stylesheet and content into new files, ready for being turned into actual CSS2.1-format tests. That will entail splitting the longer tests into multiple files and making the tests quicker to use.

That can wait for another day, though.