Hixie's Natural Log

2023-01-27 23:58 UTC Deciding which bugs to fix

Software has an infinite number of bugs. How can we tell which ones to fix?

I propose that it makes the most sense to optimize for people-happiness per unit bug fixing time, maximizing how much our effort improves the product for our users.

To put it in mathematical terms, we want to fix bugs with the highest N.ΔH / T, where:

(These metrics are very hard to estimate. Don't worry too much about precision here.)

Bugs that improve T for future bugs

The best bugs to fix are those that make us more productive in the future. Reducing test flakiness, reducing technical debt, increasing the number of team members who are able to review code confidently and well: this all makes future bugs easier to fix, which is a huge multiplier to our overall effectiveness and thus to developer happiness.

Bugs affecting more people are more valuable (maximize N)

We will make more people happier if we fix a bug experienced by more people.

One thing to be careful about is to think about the number of people we are ignoring in our metrics. For example, if we had a bug that prevented our product from working on Windows, we would have no Windows users, so the bug would affect nobody. However, fixing the bug would enable millions of developers to use our product, and that's the number that counts.

Bugs with greater impact on developers are more valuable (maximize ΔH)

A slight improvement to the user experience is less valuable than a greater improvement. For example, if our application, under certain conditions, shows a message with a typo, and then crashes because of an off-by-one error in the code, fixing the crash is a higher priority than fixing the typo.

Bugs that are easier to fix are more valuable (minimize T)

The less time we spend working on something, the more time we will have to work on other things. Naturally, therefore, all else being equal, easier bugs are more impactful than harder bugs because we can fix more of the easier bugs in the same time.

This can feel counterintuitive. Surely fixing hard things is more valuable? Well, no. Having impact is better, and all other things being equal, it's more impactful to fix two easy bugs than one hard bug.

Steps to reproduce make a bug more valuable

If a bug has steps to reproduce, we will have a much easier time fixing it. In general, we should focus on bugs like that rather than those where the first step will be determining what the problem even is, because in the time it would take us to figure out a problem, we could have fixed multiple issues where the problem was clear.

Again, we will make more users happier if we fix more bugs each affecting X people than if we fix fewer (but gnarlier) bugs each affecting X people.


A high-profile hard-to-reproduce bug may warrant the extra effort, because the number of people affected is high. We want to take into account the total impact of fixing the bug as well as the time it will take to fix it.

Deciding when to move on

Sometimes, T can turn out to be bigger than estimated. Something looks easy, but turns out to be hard. The right choice may be to dump all one has learnt into the tracking issue and move on to something that one can solve more quickly.

Deciding between tasks of equal merit

Sometimes, it's not easy to decide which of two or three or ten tasks should be prioritized. The icon button's splash radius is too large on a toolbar. Users can't tap on menu items that haven't appeared yet during a popup menu animation. The shadow on the toolbar doesn't quite extend to the far left of the screen. Which of these should we work on, if we only have the time to work on one? It can seem difficult to decide.

The key realization to solving this conundrum is both freeing and mildly unsettling: it doesn't matter. We can do whichever one we feel like.

It doesn't matter because they are (by definition) equally important, and (by definition) we can do only one. Whichever one we do, some people will be happier. Assuming that, across the project, we pick among these choices more or less randomly, we will avoid introducing any particular bias and the product as a whole will get better.

To put it another way: in either case, we are improving the product by the same people-happiness per unit bug fixing time. So the product gets better by the same amount.

This doesn't mean any one of these bugs or features is not important. It just means that they are equally important, and one won the lottery and got fixed.

2022-08-10 23:28 UTC Flutter: Static analysis of sample code snippets in API docs

One of the things I am particularly proud of with Flutter is the quality of our API documentation. With Flutter's web support, we're even able to literally inline full sample applications into the API docs and have them literally editable and executable inline. For example, the docs for the AppBar widget have a diagram followed by some samples.

Here's a neat trick, I can even embed these samples into my blog:

These samples actually are just code in our repo, which has the advantage of meaning we run static analysis on them, and even have unit tests to make sure they actually work. (Side note: this means contributing samples is really easy and really impactful if you're looking for a way to get started with open source. It requires no more skill than just writing simple Flutter apps and tests, and people love sample code, it's hugely helpful. If you're interested, see our CONTRIBUTING.md)

Anyway, sometimes a full application is overkill for sample code and instead we inline the sample code using ```dart markdown. For example, the documentation for ListTile has a bunch of samples, but nonetheless starts with a smaller-scale snippet to convey the relationship between Material and ListTile.

This leads to a difficulty, though. How can we ensure that these snippets are also valid code? This is not an academic question; it's very hard to write code correctly without a compiler and sample code is no exception. What if a typo sneaks in? What if we later change an API in some way that makes the sample code no longer valid?

We've had a variety of answers to this over the years, but as of today the answer is that we actually run the Dart static analyzer against _all_ our sample code, even the little snippets in ```dart blocks!

Our continuous integration (and precommit tests) read every Dart file, extracting any blocks of code in documentation. Each block is then examined using some heuristics to determine if it looks like an expression, a group of statements, a class or function, etc, and is embedded into a file in the temporary directory with suitable framing to make the code compile (if it's correct). We then call out to the Dart analyzer, and report the results.

To make it easier for us to understand the results, the tool that does this keeps track of the source of every line of code it puts in these temporary files, and then tweaks the analyzer's output so that instead of pointing to the temporary file, it points to the right line and column in the original source location (i.e. the comment). (It's kind of fun to see error messages point right at a comment and correctly find an error.)

The code to do all this is pretty hacky. To make sure the code doesn't get compiled wrong (e.g. embedding a class declaration into a function because we think it's a statement), there's a whole bunch of regular expressions and other heuristics. If the sample code starts with `class` then we assume it's a top-level declaration, and stick it in a file after a bunch of imports. If the last line ends with a semicolon or if any line starts with a keyword like `if`, then we stick it into a function declaration.

Some of the more elaborate code snippets chain together, so to make that work we support a load-bearing magical comment (hopefully those words strike fear in your heart) that indicates to the tool that it should embed the earlier example into this one. We also treat // ... as a magical comment: if the snippet contains such a comment, we tell the analyzer to ignore the non_abstract_class_inherits_abstract_member error, so that you don't have to implement every last member of an abstract class in a code snippet. We also have a special Flutter-specific magical comment that tells the tool to embed the snippet into a State subclass, so that you can write snippets with build() methods that call setState et al.

My favourite part of this is that to make it easier to just throw ignores into the mix without worrying too much about whether they're redundant, the tool injects // ignore_for_file: duplicate_ignore into every generated file.

As might be expected, turning all this on found a bunch of errors. Some of these were trivial (e.g. an extra stray ) in an expression), some were amusing (e.g. the sample code for smallestButton() called the function rightmostButton() instead), and some were quite serious (e.g. it turns out the sample code for some of the localizations logic didn't compile at all, either because it was always wrong, or because it was written long ago and the API changed in an incompatible way without us updating the API docs).

2021-11-20 04:05 UTC Assertions

We're pretty aggressive about assertions in the Flutter framework.

There's several reasons for this.

The original reason was that when I wrote a bunch of this code, I had nowhere for it to run. I literally wrote the first few thousand(?) lines of framework code before we had the Dart compiler and Skia hooked up. Because of this, I had no way to test it, and so the only way I had to sanity check what I was doing was to be explicit about all the invariants I was assuming. Once we eventually ran that code, it was invaluable because it meant that many mistakes I'd made were immediately caught, rather than lingering in the core only for us to discover years later that some basic assumption of the whole system was just wrong.

Those asserts also ended up being useful when we extended the framework, because they helped catch mistakes we were making in new code. For example, the render object system has many assumptions about what order things are run in and what must be decided when. It's really helpful when creating a new RenderObject subclass for the system to flag any mistakes you might make in terms of violating those invariants.

Similarly, when Adam did the widgets layer rewrite, he made liberal use of asserts to verify invariants, basically to prove to ourselves that the new model was internally consistent.

We rely on these asserts in the tests, too. Many tests are just "if we put the lego bricks in this order, does anything throw an exception?". That only works because of the sheer number of asserts we have, verifying the internal logic at every step.

Anyway, because of how successful this was internally, we also started using them even more to catch mistakes in code that's _using_ the framework. Hopefully these are helpful. Now days we try to make the asserts have useful error messages when we think that they might report an error you'll run into.

Should you be doing the same in your own code?

It's really a personal preference. Personally I find that documenting every assumption and invariant using asserts is hugely helpful to my own productivity. It's like writing tests: having comprehensive test coverage gives you much more confidence that things are correct. It also really helps when you do code refactors: if you change some class somewhere to work a different way, the asserts and tests for the rest of the code will tell you quickly if anything you did breaks some requirement of the overall system.

On the other hand, some people find that the benefits of tests and asserts are not offset by the amount of time spent writing them, debugging issues they bring up, and so on. They'll point to the fact that sometimes asserts and tests are wrong, and you can spend hours debugging a failure only to find that the only error is in the test/assert logic itself. People who prefer not to have tests and asserts typically rely more on black-box testing or QA testing (running the app and testing it directly). There's certainly a strong argument to be made that testing the actual app is a better use of your time since that's what your customers will actually be using.

This was originally posted as a comment on reddit.

2021-06-07 07:37 UTC Extracts from a private Q&A retrospective about the WHATWG

Several years ago, a group involved in standardisation in an industrial field reached out to me to learn more about our experience with the WHATWG. I thought some of my responses might have broader interest, and saved them for publication at some later date, and then promptly forgot all about it. I came across my notes today and figured that today is later, so why not publish them now?

Other than some very minor edits for grammar and spelling, the questions and answers from that interaction are reproduced here verbatim.

What were the original objectives/goals and success metrics that underpinned the design of the WHATWG organisation (processes, systems and governance)?

At the start we really had very little in the way of governance. We were a group of like-minded individuals from different vendors, all concerned with the direction of the organisation that at the time presented itself as the venue for Web standards work. We created a public mailing list and a private mailing list. All work happened on the public mailing list (and in a public IRC channel, and later in a bug database as well). The private mailing list barely had any communication (for most of the time I was involved it averaged about one e-mail a year).

There was very little process: there was originally just one document, then three or four documents, that I edited, taking into account all input from the public and coming up with the best technical solution, disregarding political pressure. For much of the time I was active in the group, and certainly at the start, the reality was that there was one player in this space, Microsoft, with 99% of the user base, and the remaining players, mostly Apple, Mozilla, and Opera, later also Google, shared the remaining 1%. This grew over time, but slowly. (It's now Chromium that has the bulk of the market, and the dynamics are very different. I stopped participating a few years ago, while the numbers were much more mixed between multiple vendors. At the time, the dynamics were already changing, with everyone much more interested in competing and much less interested in cooperating, which is one of the reasons I eventually disengaged.)

Microsoft was invited to participate many times. These invitations were sincere. Microsoft never took us up on this offer while I was editor. They have since taken a more active role, though their position in the market has declined significantly (to <10% by some metrics) and this may be why. I have since then heard anecdotes that paint Microsoft's internal motivations at the time as being strongly anti-WHATWG (and anti-me specifically), which is consistent with their outwardly behaviour: a lot of what we saw could be interpreted (apparently accurately) as intentionally designed to waste time, sow confusion, or otherwise disrupt the work.

Over the years we added other documents, edited by other people, but in each case our process was basically to have one benevolent dictator for each document, whose job it was to make all the technical decisions. The private mailing list was theoretically empowered to depose any of the editors, in case they went "rogue", but they never did (get deposed, or go rogue). The private mailing list's membership was the people who were active right at the start, with one or two additions over the years, but by the time the WHATWG was really having serious influence on the Web, most of the people on the private mailing list were really not that closely involved any more, which meant they were more absent parents than active supervisors.

In practice, Microsoft's disruption efforts failed completely because there was nothing to really disrupt: the editors (including myself) were working with honest and genuinely objective intent, taking all feedback, examining it critically, and making technical decisions without any process. Sending lots of useless feedback would have been the most effective way to waste our time but that was not a technique they used. Instead they tried to play process games, to which we were largely immune: given the lack of process, there was nothing to game.

Since my departure the governance model has changed; the organization now has a legal entity and some contractual agreements, but I'm not familiar with the details.

At the start, our goals and success metrics were implicit; really, just to create specifications that advanced the development of the Web in a way that browser vendors were in agreement with (something the W3C was not doing).

Has the WHATWG approach and governance model implemented achieved all the outcomes/objectives desired?

While I was involved I would say it was remarkably successful. We developed an extremely detailed and precise specification that was orders of magnitude more useful to implementers than anything the W3C had done to date (I think, to be honest, that even to this day the W3C does not realize that this was the key difference between our approaches). We changed the way specifications are thought of in the Web space, going from these vague documents written in pay-to-play meetings to very precise technical documents that define all behaviour (including error handling, theretofore unheard of in this space, and quite controversial at the time), written in the open. We changed the default model of specifications from one where you would write the specification then set it in stone to one where specifications are living documents that are continually updated for decades. We didn't set out to do these things explicitly at the start (our earliest plans in fact set out clear milestones along the lines of the "set in stone" model), but they were natural outcomes of our intent to create technically precise documents as opposed to what I have previously characterised as "especially dry science fiction".

What is the volume of work for WHATWG resources that have official roles in processing requests, managing, executing work to make changes/updates? i.e. how many requests and changes are managed in a given regular period monthly/quarterly/yearly?

From 2006 (when I started using a version control system) to 2015 (when I stopped being an active editor) I made about 8,874 commits to the specification. Some were trivial typo fixes, some were giant new sections. That's an average of one commit every 10 hours or so. I don't know what the current rate is, but the team uses GitHub now so you can probably find out quite easily.

How many resources/people work full time on WHATWG?

At the time I was involved, it was 1 person, me, who worked full-time on it, with lots of people contributing their time. I've no idea what the current investment is. I imagine nobody is full-time on the WHATWG now but I could be wrong.

How is WHATWG funded?

At the time of my involvement, I was paid by Opera at first and then Google, as a member of their technical staff whose role was to work on the WHATWG mostly full-time (I had some other responsibilities at both companies, but they were a small fraction of my work).

I personally paid for the expenses of the WHATWG itself out of pocket. These amounted to very little, just Web hosting and the domain name registration.

I don't know how it's funded now. I don't pay for the hosting any more, but I'm not sure who does.

Are resources volunteers and/or paid by their parent companies to fulfil obligations/work for WHATWG?

Both. The WHATWG is set up to pay no attention to how someone is participating, because it has no impact on the technical value of their contributions.

I understand part of the decision making process for making a change to the is sufficient “implementer” support. I understand this means two or more browser engines? I assume the implementers are the 4 companies in the steering group?

I don't really understand the current governance style.

At the time I was involved, it was informal. The editor was responsible for making sure that what they specified would in due course match what all the browser vendors implemented. If they did this by writing amazingly compelling specifications that the browser vendors felt obligated to implement by sheer force of technical superiority, or whether they did this by specifying the lowest common denominator that they could get each vendor to individually commit to, or if they did it by the political means of convincing one vendor to publicly commit by privately telling them how another vendor had privately committed if the first vendor would publicly commit, etc, was a matter for the editor.

Personally I did all of the above. Sometimes things I specified turned out to be universally disliked and I (or my successors) ended up removing them. Sometimes things I specified were just describing what the browser vendors all already implemented as a matter of fact, and in those cases there was little for them to argue about.

In our industry we may have 100s of implementers of various sizes. How does the number of implementers scale the challenges we may be faced with? Our users are also industry/companies rather than individuals. I imagine we will have to assess the governance and decision rights with that in mind.

I have no idea to be honest. The WHATWG worked in part because of the esoteric and unique situation that was the Web space at the time. A small number of companies, creating products that were used by billions of people of which thousands were interested enough in the technical details to directly participate, but where weirdly very little money obviously changed hands.

(As a side note: the economies of Web browsers and Web standards are not obvious -- for example, why did Google pay me to do this work? Why did Google tell me I should ignore Google's own interests and just focus on what is technically right for the Web as a whole? The reasoning is amusing in its simplicity: if the Web gets better, then more people will browse the Web; if more people browse the Web, Google can show more ads; if Google shows more ads, more ads might get clicked on; if more ads get clicked on, Google makes more money. Nothing in this reasoning requires that the Web change specifically to help Google's direct interests. This freed me to be actually vendor neutral to an extent that few of my contemporaries truly believed, I think.)

What are the big lessons learned/areas not to overlook?

The biggest lesson I would say I learnt is that it can work. You can create an organization that is truly open, truly technically-driven, does not have really any process at all, yet creates technically-relevant high-quality documents that move an industry. You don't need a big staff, you don't need annual events, you don't need in-person meetings or voice-conference calls. You don't need to make decisions based on "consensus" or majority vote. You don't need to be pay-to-play. You can make decisions that are entirely based on technical soundness.

What would you do differently if you established WHATWG again and why?

One of the things that we did around 2007 (we started in 2004) was to agree to work with the W3C to develop HTML, instead of continuing our independent path. We maintained enough independence that we were able to mostly disengage after that effort went predictably nowhere, but it was definitely a distraction. I wish that we had had more confidence back then in our ability to just ignore the W3C. In retrospect it's more obvious to me now that the W3C really had nothing to give us. (At the time, some of us still viewed the W3C as being the logical place for this work to happen and we viewed the WHATWG as a temporary workaround until the W3C adapted to the new world, but they never really adapted. Even today, where the W3C actually redirects their Web site to the WHATWG for the specifications the WHATWG is working on, the W3C's own processes have not changed in a meaningful way to really fix the problems we saw in 2003 that led to the WHATWG's creation. They just gave up competing on some fronts.)

The other big thing that I wished we had done much earlier is establish a patent policy whereby each vendor would share their relevant patents. This is pretty common in various industries, but we did not pay it enough attention and it hurt our credibility for much longer than it should have. (In practice I believe this is mostly theatre, but in this case it's theatre that matters so we should have done it much earlier.)

One more thing I would do differently is have a much stronger code of conduct from the start, which doesn't just disallow bad behaviour but actively requires positive interactions. There's no excuse for being cranky on a mailing list. Being obtuse or just unpleasant is not necessary. We had many people over the years who would push right to the limit of what was acceptable, and I wish I had been much, much stricter, stepping in and removing participants as soon as they were even slightly obnoxious. I think we would have made much more rapid progress and grown a much bigger community much quicker if we had done that.

2021-01-14 19:55 UTC Ask for forgiveness, not permission

A colleague of mine asked me to explicitly put an LGTM on their design doc so that they could go ahead and implement it. The design doc was one I had previously reviewed and commented on, and had indicated that it seemed like a good idea, but I hadn't filled in the box saying that "my TL has said LGTM".

My answer: no. You don't need my permission.

Ask yourself: why do you want explicit permission? Is anyone asking you to get permission? What would happen if you just... did the thing?

Some people want LGTMs because that way they feel like if they make a mistake, they'll be covered. But that's flawed thinking in two ways. First of all, mistakes are fine. People make mistakes, we all make mistakes, mistakes are how we learn. If you're not making mistakes, then you're not taking enough risks to be successful. Secondly, even if making a mistake was bad, getting some people to sign off on something doesn't mean they are taking any more responsibility than if they didn't. You'd still be responsible for your decisions even if you got permission, and your leadership would still be responsible for your decisions if you didn't get permission.

I have a friend who used to work in Google Search on a tool called "Janitor". It was a tool that would garbage collect the results of processing our Web indexing — there's a lot of temporary files created in indexing the Web, and Janitor would go around deleting them when they weren't needed any more. He literally deleted petabytes of data regularly. One day, Larry Page was visiting his team and asked about this project. Larry asked, "how many files have you accidentally deleted?". My friend very proudly answered "I have never deleted a file that should not have been deleted! I have a 100% success record!".

Larry apparently responded "I think you should take more risks".

Some people want LGTMs because they feel that without them they aren't entitled to do their job. But... it's your job. That's why you were hired. You don't need additional permission to do your job. Your biweekly paycheck is all the permission you need.

Some people want LGTMs because they are not confident enough in their idea to execute it. Having leaders on the team put a stamp on their design doc gives them the confidence that the idea was good enough to execute. The thing is though, we won't know if it was a good idea or not until we try it. These stamps aren't saying "it's a good idea", they're saying "it's not an idea so terrible that I can predict its failure already based on my past experience"... and your leaders and team mates will tell you that something is a bad idea if they see it. That's why you ask for review. If they didn't arch their eyebrows and grimace when you explained your idea, then it's probably fine, and you don't need any more permission.

In conclusion: ask for forgiveness, not permission. Get reviews of your design docs, by all means. But don't wait for a stamp of approval to implement them.