Maximizing the success of A/B testing – book review
One of the benefits of web technology is that it is relatively easy to make design changes to a web site or intranet both at the development stage and even when in production. The same is true of course of open source enterprise applications, such as e-commerce and enterprise search. In principle it seems so easy. Measure the performance of Version A, make some changes and then measure the performance of Version B. All you then have to do is compare and implement. Easy!
Not according to a recently published book from Cambridge University Press. The full title is Trustworthy Online Controlled Experiments – A Practical Guide to A/B Testing and the authors are Ron Kohavi, Diane Tang and Ya Xu. The very fact that the book runs to almost 300 pages is an initial indication that A/B testing is not as easy as many might think. The authors have extensive experience from working at Microsoft, Google and LinkedIn and this experience is very visible throughout the book but is coupled with references to around 300 research papers. The blend between authors, and between practice and research, is exemplary in all regards.
Part 1 of the book is a general introduction to testing, illustrated with some case studies from Google and Microsoft. Part 2 then goes more deeply into organizational metrics, metrics for experimentation and the overall evaluation criteria, institutional memory and meta-analysis and finally a thoughtful chapter on ethics in controlled experiments.
In Part 3 the authors consider complementary techniques and observational causal studies. Part 4 goes into very considerable detail on building an experimentation platform and the book concludes with a 60 page section on advanced topics for analysing experiments. One of the features that intrigued me in the book were the number of named laws, such as Simpson’s Paradox, Goodhart’s Law, Campbell’s Law, the Lucas Critique and Twyman’s Law.
The depth and clarity of the exposition on many quite complex issues is exceptional, and this is clearly a direct result of many years of experience and experimentation. However the authors are not prescriptive in setting out a ‘best practice’ testing regime, instead guiding the reader through the decisions they need to make in developing a robust A/B testing programme. It is difficult to see how any other author is going to match this and I would guess that this book is going to be the benchmark title for some years to come. There is an associated web site on which you can currently find a PDF version of Chapter 1.
I have just two criticisms of this book. The first is the way that four pages of ‘recommendations’ from the good and the great of the web design world are presented in the front of the book. They are unnecessary and look like a triumph of PR over good editorial judgement. You can only see these recommendations when you have bought the book! The second is that I was also unimpressed with the index, with just a long alphabetical list of topics under the headings of ‘experiments’ and ‘metrics’. They, and some others, are crying out for some clustering of the terms. I would expect better from Cambridge University Press on both counts.
Martin White