Are ‘relevant documents’ all we need to make excellent decisions?
Every month I add 10-15 academic papers to my personal digital library. Most relate to some aspect of enterprise search (a collection that already tops 1100 items!) but there are also many on information management, collaboration and virtual teams, my other main research interests. Most I scan and file away with a descriptive title and a marker that indicates whether I have read them in depth. From time to time I come across a paper that breaks new ground in a research area, and I immediately download and read it. One recent paper in this category has the title ‘Do better research engines really equate to better clinical decisions? If not, why not? Since one of the common propositions about search is that it supports better decision making this title immediately caught my attention.
The paper is published in the Journal of the Association for Information Science and Technology but at present has not yet appeared in one of the monthly issues. In the paper Anton van der Vegt and Guido Zuccon at the University of Queensland together with Bevan Koopman (CSIRO) question some of the core beliefs of search performance measurement. Historically this has been measured through an IR systems approach, of which the TREC programs are a good example. The relevance assessments are not conducted by users and there has been a concern for some years about the extent to which these tests translate to a better user search experience.
The methodology used in the project described in this paper was to ask 109 clinicians and final year medical students to answer clinical decisions using two different search applications. One was a very basic BM25 model and the other was a much more sophisticated state-of-the-art application. The decisions were also assessed against a traditional off-line batch test. In all three cases the same test collection was used.
To my surprise the research showed that there was no significant difference between the results from the two search applications. Another outcome of the research project was that half of the clinical decisions were answered incorrectly, which in clinical medicine is not exactly ideal! When the research team considered the relative contribution of search engine effectiveness to the overall end task success they found that the ability to interpret the documents correctly was a much more important factor impacting task success. The analysis showed that searchers could find and view the same relevant documents and come to different conclusions, some correct and some not.
To quote from the conclusions to the paper
“For medical search, this study confirms that providing clinicians with a ranked list of relevant documents is insufficient. To help clinicians to correctly answer more of their questions the IR system may need to help then to interpret information, potentially from within documents and across them. ….The notion of relevance, also, may need to encompass subdocument and cross-document assessment of interpretability”
As with all research papers it is very important not to generalize outcomes from one piece of research (take a look at ‘cold fusion’ as a warning!) specifically in the clinical domain. I should add that this summary inevitably skates over the considerable amount of detail and statistical validation in the paper.
For me the initial take-aways are that search performance itself does not guarantee that better decisions will be made from documents defined as relevant and that there is considerable scope for better support of interpreting the documents, perhaps by text analytics. It could be that two not-totally relevant documents when considered together could result in a better decision being made by the search user. Hopefully other research teams will take up this research in other subject areas
In a sentence, it is not about information technology or information retrieval but about information management.