Federated enterprise search – ten questions for search vendors and one for the search manager

Implementing federated search is a seriously challenging project. It is not until every repository is on-boarded and there has been six months of user operation that you will be confident that you have a satisfactory search experience. That is probably 18 months after the initial approval to explore the options. You do not want to be in the position in 18 months time of having to explain to stakeholders that it has not worked out as the vendor promised you and you promised the stakeholders. Hardly a week goes by without a new (relatively) entrant to the enterprise search business talking about the amazing federated-search experience they offer across all your enterprise repositories. This is not a new phenomenon. The same claims were made by the Enterprise Information Portal vendors in the early 2000’s.

Below are 10 questions (there are more!) to ask a prospective vendor of a federated search solution. Asking the questions is the easy bit. The challenge is understanding the responses as in many cases a semblance of federated search can be delivered but only with compromises on scalability and extensibility. The decision you have to make is whether the compromises are acceptable solutions to your current requirements and do not create barriers to future business/information expansion.

Before starting to work through these questions there is a question you have to ask yourself. What is the business case for the investment into a federated search application? Is it based on a detailed review of the analytics of your current search applications andran in-depth analysis of user requirements across the organisation? Or it is based on an un-substantiated belief  that every employee should, as a matter of principle, have access to all the information in the organisation that they have permission to use, or perhaps a senior stakeholder who wants the organisation’s search to be like Google?

I covered the two different approaches to federated search in a CMSWire column published in April 2019. These are commonly referred to as index time federation (which involves the creation of a single index) or query time federation, when the query is parsed across each search application and then the results are integrated. It is not uncommon to have a hybrid situation where there is a combination of a substantial index together with some specialized applications. But in reality both have advantages and disadvantages.

Q1 Are you offering index time federation, query time federation or a hybrid approach?

One way or another the application will deliver up a list of results. An important feature of enterprise search is that content can be security trimmed to make sure that employees only see the content that is appropriate to their role and responsibilities. The options are early binding, late binding or a hybrid solution. Of these late binding is computationally expensive and running results from multiple repositories is going to throw some latency challenges, and in cloud applications some processing costs.

Q2 Can you describe in detail how you enable us to manage security trimming across multiple applications and repositories with the latency of result presentation at an acceptable level?

Then you get into whether the federated search is offering multi-lingual search or cross-language search. In multi-lingual search the user has the option of conducting a search in different languages using queries in the language of the repository being searched. Cross-language search takes the query and translates (as best as it can!) the query into all the appropriate language queries and then returns results in the different languages.

Q3 Are you offering multi-lingual or cross-lingual search, and in which languages?

How are the results from the query are going to be presented on a user interface? The options are tabbed, interleaved, panels, side-bar or sequential by repository. Each has benefits and challenges and there is no definitive approach. An associated challenge is how many results are presented from each repository. Most applications present 10 in the UI but if you are searching across a dozen repositories then you might (at first sight) only get one result from each repository. Is that good enough for you?

Q4 What UI display options do you offer and can the user change the views after the query has been parsed into the application?

If there are multiple applications and repositories the source of the result may be important. I have been involved in a number of federated search projects where it was not possible to identify the source of the result and then take a view on the credibility you place in the source.

Q5 Is the source of the result visible to the user?

It is in the nature of things in an enterprise that the same content (especially intranet content) is duplicated in multiple repositories. There could well be duplicate results for the same content, which can be quite frustrating. Duplicated results can often end up being given different rankings, something that is not easy to explain to users.

Q6 Can you de-dup identical results from the repositories being searched?

There is no doubt that metadata supported by a taxonomy is very effective in enhancing search performance. This metadata is often presented an ability to filter or create facets. If these are different across each repository how then will these metadata values be presented and used, and of course they take up space on the UI (go back to Q3)

Q7 How can your application manage search across a number of different metadata schemas (and perhaps in multiple languages) and present the metadata values in a way that enables the user to filter and facet the results set?

Then we come to the most challenging question of all. How will the results be ranked if the approach is not to create a master integrated index? Among the complications are when the federated search is working across databases and so conventional TF.IDF ranking does not work effectively or predictably.

Q8 Can the search user be absolutely certain that they are seeing the most relevant (based on the search query) results from across all the applications and repositories?

Little enough attention is usually paid to search analytics when only a single repository is involved. As you add in more repositories and applications there is probably a power law that relates the complexity to the number of repositories. Yet the optimization of a federated search application is even more important than with a single repository as there are so many points of potential failure.

Q9 Can you demonstrate the options and reports that you offer in your analytics package

Finally there comes Question 10

Q10 Can we talk to a couple of your clients who have the same information management architecture as our own and gain their assessment of the workload involved in implementing your solution and the quality of the search experience?

As with all enterprise search applications start with what the user interface has to offer the user for them to take full advantage of the information resources of the organisation. Then work backwards to define the technology needed to deliver the UI.

Martin White