Glossary of Search Terms
Glossary of search terms
Ensuring that a specified document always appears at the same point in a results set, or always appears on the first page of results
Access control list (ACL)
Defines access permissions at a user or group level (often based on Active Directory) to specific repository, a set of documents, or a section of a document
The provision of a search user interface which prompts the user to enter additional terms to assist in retrieving results, often using Boolean operators.
The Apache Foundation provides support for a wide range of open source applications, including Lucene and Solr
A search application pre-installed on a server ready for insertion into a standard server rack
The presentation of related content items (often referred to as verticals) from a single index in a specific area of a page of search results
A set of technologies that enable machines to sense, comprehend, act and learn in a manner that seeks to emulate a human response to a situation
An automated process for creating a classification system (or taxonomy) from a collection of nominally related documents
An automated process for assigning metadata or index values to documents, usually in conjunction with an existing taxonomy
Average response time
An average of the time taken for the search engine to respond to a query, or the average end-to-end time of a query
Bidirectional Encoder Representations from Transformers (BERT) is a machine learning technique which enhances the performance of training based on natural language processing.
Results that are selected to appear at the top of a list of results that provide a context for other documents generated and ranked by the search application
BM25 (Best Match 25)
A ranking algorithm developed in the 1990s of which there are now multiple variants. It has its origins in the tf.idf ranking function and is widely used as the basis for enterprise search applications
A widely used approach to create search queries; examples include And, OR, and NOT—for example, information AND management
A search query using Boolean operators
Changing search ranking parameters to ensure that certain documents or categories of documents appear higher in the results than the raw algorithm would suggest.
A chatbot application is able to conduct a voice query against a search index in lieu of providing direct contact with (for example) a call-centre operator
The placing of boundaries around objects that share similarities (e.g., taxonomy)
A process employed to generate groupings of related words by identifying patterns in a document index
A description loosely applied by search vendors to applications using machine learning and AI techniques to determine the work context of the user and deliver personalized results
A group of objects methodically sorted and placed into a category
The use of computer-based statistical analysis of language to determine patterns and rules that aid semantic understanding
The process of determining concepts from text using linguistic analysis
A software application that enables a search application to index content in another application
An organized list of words, phrases, or some other set employed to identify and retrieve documents
Commercial off-the-shelf software
Conversational search applications respond to a spoken request or query with a spoken response. See also Chat Bot.
A program used to index documents
A query in one language is translated into other indexed languages (often using a multi-lingual thesaurus) so that all documents relevant to the concept of the query are returned no matter what language is used for the content
Deep learning builds on machine learning principles but makes use of artificial neutral networks to be able to manage very large collections of data with real-time responses
A brief summary, often generated automatically, that provides a description of a document in the list of results
See also Key sentence
A structured sequence of text information, but often used as a generic description of any content item in a information-based application such as a content management system or enterprise search
The deconstruction of a document into a form that can be tokenised and indexed
A site where source documents or other content objects are stored, generally a folder or folders
See also Information source
A search conducted only across documents that a user has permission to access
See also Late binding
The automatic detection of defined items in a document, such as dates, times, locations, names, and acronyms
Two or more words considered mutually inclusive in a search, often by enclosing them in quotation marks—for example, “United Nations”
In exploratory search the search goal is imprecise and open-ended and there is no unique single answer that meets the user’s information needs and no clear criterion on when to end the search.
Presentation of topic categories and content metadata on the search user interface to support the refinement of a search query generated by the search index as the process if query exploration proceeds
A quantity representing the percentage of irrelevant hits retrieved in a search
A search carried out across multiple repositories, indexes and/or applications
A search that is limited to a specific field in a document (e.g., a title or date)
A function that offers specific criteria for search result selection that is independent of the query. For example, file format or publication date
The time period between a document being crawled and the index being updated so that a user will be able to find the document
A search allowing a degree of flexibility for generating hits (i.e., matches that are phonetically or typographically similar)
A set of queries and documents already marked as relevant by topic experts, used to benchmark search performance that is representative of content that will be searched on a regular basis
A search in which the system prompts the user for information that will refine the search results
A search result matching given criteria; sometimes used to denote the number of occurrences of a search term in a document
List containing data and/or metadata indicating the identity and location of a given file or document
A file that stores data in a format capable of retrieval by a search engine
The rate at which documents can be indexed, usually specified in Gb/sec
Inverse document frequency (IDF)
A measure of the rarity of a given term in a file or document collection
A list of the words contained within a set of documents, and which document each word is present in, so acting as a pointer to a document
An index created as an outcome of a crawl of every word, entity and associated metadata in a way that facilitates the very fast retrieval of documents.
A brief statement that effectively summarizes a document, often employed to annotate search results
A word used in a query to search for documents
A search that compares an input word against an index and returns matching results
A knowledge graph is a representation of entities and related attributes
The indexing process identifies the language (or languages) of the content and assigns it to appropriate language specific indexes
Access permission checking carried out immediately before the presentation of the document to the user
See also Early binding
Learning to rank (LTR)
Learning to Rank is a class of techniques that apply supervised machine learning to solve ranking problems by presenting a relative re-ordering of relevant items
A process that identifies the root form of words contained within a given document based on grammatical analysis (e.g., run from running)
See also Stemming
An analysis that reduces text to a set of discrete words, sentences, and paragraphs
The study of the structure, use, and development of language
The classification of a set of words into grammatical classes, such as nouns or verbs
A feature of text-based search in which there are a significant number of low-use queries forming a long tail which is difficult to optimize for an individual query. An example of a Zipf curve.
Machine learning is a method of data analysis that automates analytical model building.
An HTML command located within the header of a website that displays additional or referential data not present on the page itself
Data supplements and/or clarifies index terms generated by text in the document, for example the date of publication or the author or specific controlled terms.
The analysis of the structure of language
Natural language processing
A process that identifies content though using grammatical and semantic rules to understand the intent of a sequence of words in a specified language
Natural language query
A search input entered using conventional language (e.g., a sentence)
Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query.
A search that adheres to predefined attributes present within a given data source
The process of analyzing text to determine its semantic structure
A type of matching that recognizes naturally occurring patterns (word usage, frequency of use, etc.) within a document
The procurement of linguistic concepts, generally phrases, from a given document
The quantification of the number of relevant documents returned in a given search
A term applied to groups of professionals (for example, lawyers and patent agents) who spend a significant proportion of their time using search applications, often in situations where high levels of recall are required.
A search whose results are returned based on the proximity of given words (e.g., ‘pressure’ within four words of ‘testing’)
Query by example
A search in which a previously returned result is used to obtain similar results
The process of analyzing the semantic structure of a query prior to processing in order to improve search performance
Search applications calculate a relevance score for each content item and return results in decreasing order of relevance
A percentage representing the relationship between correct results generated by a query and the total number of correct results within an index
The value that a user places on a specific document or item of information. Both precision and recall are defined in terms of relevance.
The documents or data that are returned from a search
The terms used within a search query. Sometimes incorrectly referred to as ‘keywords’
An analysis based upon grammatical or syntactical constraints that attempts to decipher information contained in a document
The use of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in documents
The duration of the time spent by a user between entering a query term, reviewing results and then closing down the application.
The text that is presented to give a concise representation of the content of a search result sufficient for a user to assess its relevance to their query. It may be generated by the author of the document, extracted from text associated with a specific index term or derived algorithmically from the text of the document
A search in which users receive results that are phonetically similar to their query
An automated process that presents documents to a data extraction or parsing engine by following links on web pages
See also Crawler
A process based on a set of heuristic rules that identifies the root form of words contained thin a given document (e.g., run from running)
See also Lemmatization
Words that are deemed to have no value in an index
See also Word exclusion
The point in a search query session where the user decides that time and effort spent in examining further results is not going to result in additional relevant results
Data that can be represented according to specific descriptive parameters—for example, rows and columns in a relational database, or hierarchical nodes in an XML document or fragment
An automated process for producing a short summary of a document and presenting it in the list of results
Automatically expanding a search by adding synonyms of the query terms derived from a thesaurus
An analysis capable of associating a word with its respective part of speech by determining its context in a given statement
In respect to search, the broad categorization of objects (typically a tree structure of classifications for a given set of objects) in order to make them easier to retrieve and possibly sort
A quantity representing how often a term appears in a document
The “term frequency.inverse document frequency formulation” gives a score that is proportional to the number of times a word appears in the document offset by the frequency of the word in the collection of documents.
See also BM25
A collection of words in a cross-reference system that refers to multiple taxonomies and provides a meta-classification, thereby facilitating document retrieval
An HTML rendition of a page from a document in response (often through a mouse roll-over) to provide the user with additional information about the potential relevance of the result.
The process of identifying the elements of a sentence, such as phrases, words, abbreviations, and symbols, prior to the creation of an index
Removal of a prefix or suffix
Information that is without document or data structure (i.e., cannot be effectively decomposed into constituent elements or chunks for atomic storage and management)
A model that enables documents to be ranked for relevance against a query by comparing an algebraic expression of a set of documents with that of the query
The process of boosting index terms in specific areas of a document (for example the title) or on specific topics
A notation, generally an asterisk or question mark, that when used in a query, represents all possible characters (e.g., a search for boo* would return book, boom, boot, etc.)
A list containing words that will not be indexed—this usually is comprised of words that are excessively common (e.g., a, an, the, etc.) See also Stop List
eXplainable AI is a set of machine learning techniques that produce more explainable models while maintaining a high level of learning performance and enable humans to understand, appropriately trust, and effectively use AI applications.