7 Comments

There probably isn't a lesson from your side of the problem other than that if you had waited long enough to work on it, it would have disappeared with the next release of V8. I have no idea how long that would be.

It is disappointing that the V8 team at Google was shipping without even a small set of unit tests for Math.abs(). Inputs [-1, 0, 1] would have caught the problem.

Again, it's scary how much of our software is built using unverified and untested code. Attempting to binary search for the source of the problem across the dependencies might have helped, but you had already established that the problem started with a specific Chrome Release. This meant you could have looked at what had changed in the dependencies of that specific release, but my experience is that it is much more likely that the problem is in my code than in another team's code.

Not every single time, but starting by suspecting someone else rarely helps to find the problem.

Expand full comment

The lesson is: hair pulling IS an integral part of this career.

One must expect to be super productive 80% of the time then drop to 20% over some insane, unexplainable, embarrassing, energy wasting and blood pressure-increasing bug.

Not giving up during this time means you have what it takes 😁

Expand full comment

I really wonder how many people ran into that Math.abs() issue in that version of V8 and actually discovered it was Math.abs(). Definitely would have needed to resort to human compiling.

Expand full comment

Always nice hearing how the dev side of the house handles troubleshooting in the trenches.

I don't think many people outside roles that touch directly on troubleshooting and debugging with years of experience, would really get the contextual background and horror, of issues like this being non-deterministic.

A problem domain that is almost un-characterizable, unpredictable, and unknowable in detail when coming from a working backwards trajectory, and often un-differentiable from random chaos without implicit complete knowledge of the whole system which Ops often didn't have.

There is this orders of magntitude difference in cost between deterministic issues and non-deterministic issues that just can't be conveyed well. That said you did pretty good in your post conveying that.

Expand full comment

That was an excellent explanation. I love the way you think and write.

Expand full comment

40 seconds was slow I guess? :D I had to debug a memory leak in a kiosk application in the very early 2000's. It only ran on IE6. It took *over an hour* for us to even be able to notice the memory going up. And this was way before debugging tools were installed in browsers. We had to use third-party hacker-ish DOM inspectors to get anywhere.

It took us *weeks* to solve this one. And to this day we never figured out why it happened. The fix was to take the button which our robot was constantly pressing, and wrap it in a link. The memory leak went away. We have no idea why.

That's pretty much my only "when I was your age" geezer story.

Expand full comment
4dEdited

Never read a text about a bug in such an exciting way. There was always the question in my head: „What will happen next?“.

Nice! thank you!

Expand full comment