So how does search work?

So how does search work?

by | Jan 15, 2017 | Intranets, Search

When I wrote the first edition of Enterprise Search in 2012 I only provided enough of an outline about the inner workings of search to support the recommendations I was making in the rest of the book. Comments from many readers encouraged me to write about what I would call the technology of search in more detail in the 2015 2nd edition. In many respects writing about the technology was the most difficult section of the book as I needed to work through how best to present the technology in a way that made sense to a business manager. The phrase ‘the technology of search’ is a misnomer, because it is actually about the mathematics of search as delivered in some quite sophisticated software programs, often using a blend of computational linguistics and applied probability theory. My contention has always been that a search team has to understand much more about how search works than the teams supporting any other enterprise application as most of these are build around a relational database plus a lot of extras. If enterprise search is complicated then the step up to text mining is quite substantial. The subject is covered in depth by ChengXiang Zhai and Sean Massung and Deep Text is a very good introduction to the topic by Tom Reamy.

Over the last few months I have noted a number of blogs that provide very good descriptions about how search works. These include a series by Daniel Tunkelang published on the Query Understanding website and by Maish Nishani on the Olasearch website. Patrick Lambe has written a very good 15 page summary of search technology in Behind the Curtain: Understanding the Search and Discovery Technology Stack on the Green Chameleon website. In addition Charlie Hull and Udo Kruschwitz are in the process of writing a book on this topic which should be available from Now Publishing in 2017. If you want a really deep dive into the technology then the recently published book Text Data Management and Analysis by Professor ChengXiang Zhai and Sean Massung runs to 500 pages, which gives you a sense of how much detail there really is!

The importance of understanding what is going on under the bonnet has been emphasised by two recent blog posts, one from Mark D. Anderson and another from Marcel Meth. Both these blogs illustrate that sorting out problems in search applications requires a significant amount of technical knowledge and a lot of patient detective work. First you need to know when there is a problem, because search does not break, it just fails to return the results you were expecting, and that requires you to know what you were expecting. Second you need to work out where in the complex series of search processes and sub-processes the solution to the problem may lie. One of the attractions of the Google ESA was that the underlying technology was behind a very robust security firewall so there was no point in having technical expertise on the search team because there was nothing they could do except dust the server casing from time to time. That is now just history as the Google ESA begins to fade away and replacing it is going to need more than rack space and a feather duster.

All the signs are that search, in its various forms, is gradually being recognised for the critical role it plays in enhancing business performance, especially in supporting decision making and identifying expertise and knowledge. Taking advantage of search does require an investment in understanding how it does what it does. That will also stand you in good stead for the arrival of artificial intelligence. I would argue that it is even more important to understand the principles of AI than of search because you are going to need to trust the black box you have invested in. A good place to start is the 3rd edition of Artificial Intelligence – a Modern Approach by Stuart Russell and Peter Norvig. This is now available in paperback but runs to over 1000 pages. So start with search and be well prepared for an interesting future.

Martin White