Precision is vanity – recall is sanity. But can you be sure you have found everything?

Almost without exception enterprise search software vendors focus their marketing on the extent to which they can deliver search precision. “Our AI/ML technology can ensure that all the results you need are on the first page of result”. The assumption is that the search user is looking for a known document, but in my experience this is not a core use case for search unless the intranet, or a personal folder structure, is a complete mess. The challenge of finding a specific document is more often a quest to find the latest version of the document. That depends on there being an information management strategy that sets out guidelines for version management. If only! My response tends to be along the lines of “If your technology is that good why are you offering filters and facets?”

A far more important challenge is finding all the relevant documents. 100% recall! The problem here is that if the required information cannot be found the user does not know if this is because it does not exist or because the combination of search technology and/or an absence of information rules means that it cannot be found. High recall performance is essential for what are termed professional searchers, such as lawyers, clinicians, recruitment teams and patent agents. Finding that a patent has been challenged because your in-house team could not find prior art makes for significant unhappiness!

This is especially the case with eDiscovery software, which tends to promise the levels of recall that enterprise search vendors claim for precision. Whether or not you are in the eDiscovery business you should read a paper with the brilliant title of The eDiscovery Medicine Show which looks in detail, with a somewhat cynical but well-informed perspective, on how recall performance is, and should be, measured. This is a discussion that has been going on since the days of the Cranfield experiments in the late 1950s – and that is not a misprint.

The size of enterprise collections is not often appreciated. CIOs know the amount of server and index space because they are having to pay for it, but that is not the same as knowing the scale of the number of documents. (I’m using ‘document’ as a generic term for a content object). I have worked for a number of clients with collections of the order of 500 million documents in multiple languages and the idea that any search technology could deliver the most relevant documents on the first page of results is just laughable. If you want to get a sense of what it takes to manage very large collections of documents you should take a look at how the Pandora papers have been processed.

We have to face the fact that enterprise search is a wicked problem and there are no easy answers. Even in smaller organisations it takes a dedicated team to optimize search performance and every day there will be lessons to be learned from users and their managers. When did you last sit down with a range of your users? Much easier to accomplish in a WFH situation than on-site. Or you could ask me to run a SearchCheck analysis.

Martin White