Relevant Search – Doug Turnbull and John Berryman
The user requirement for a successful search is very easy to state. They want the items that are most relevant to their query to appear on the first page (or at worst the first two pages!) of results. Delivering this requirement is a far greater challenge than users and search managers imagine. The very fact that Relevant Search, written by Doug Turnbull and John Berryman, runs to over 330 pages gives an immediate illustration of the scope and scale of relevance management. I often use the metaphor of looking at an automobile engine. In principle we all know how the engine works but when it doesn’t work to perfection all we can do is look at the collection of modules and wires and wonder just what we have to do to restore the performance. That’s when an engineer with plug-in diagnostic equipment is essential. They can not only spot the problem but also know the systems well enough to sort out the problem.
The reason for presenting this metaphor is that the authors have written this book for relevance engineers. This to me is a new job profile but one that I can immediately relate to. The book presents all that a relevance engineer requires to understand how to go about improving relevancy, and this requires a good knowledge of information retrieval principles and also how these principles are best translated into software code. I should state up front that the examples in the book show code from Elasticsearch or Solr open source software but that should not be seen as limiting the book to open source implementation. Indeed seeing the code will help the reader understand what is going on in any enterprise search application. After all SharePoint 2010/2013 uses the same BM25 ranking model that is now in Lucence v6.
The eleven chapters in the book cover debugging a relevance problem, understanding the role of tokens, basic multi-field search, term-centric search, shaping the relevance function, providing relevance feedback, designing a relevance-focused search application, the relevance centered enterprise and advanced search techniques. There is no other book that I know of that manages to integrate both information retrieval and search management so successfully, with just enough IR fundamentals to show the origin of a relevance problem and the basis for a solution which can be expressed in code. I especially value the way in which the examples are based on a ‘real’ collection of information, the Movie Database. Since we all have a familiarity with movies this for me makes the book come alive.
The quality of the content is not matched by the quality of the publishing format. This review is based on the e-book version. Although there is a list of sub-headings on the PDF version the lack of an index makes it almost impossible to dip into the book to find an explanation of a feature or a solution to problem. The writing style is very conversational but this results in a lot of words with apostrophes, often where they are not needed. Overall the copy editing is patchy.
I cannot recommend this book strongly enough. It is certainly not just for ‘developers’. Search managers, and of course relevance engineers, need to appreciate the fundamentals of search technology and good practice in relevance management even if they are working with commercial applications. Students on computer and information science courses will also find it of great value and hopefully be inspired to follow a career in relevance engineering. All I missed was a consideration of relevance management in federated search implementations, but I’m sure that the authors are saving this for the next edition.
Martin White