Types of Queries in IR Systems
Different keywords are associated with the document set during the
process of indexing. These keywords generally consist of words, phrases, and
other characterizations of documents such as date created, author names, and
type of document. They are used by an IR system to build an inverted index (see
Section 27.5), which is then consulted during the search. The queries
formulated by users are compared to the set of index keywords. Most IR systems
also allow the use of Boolean and other operators to build a complex query. The
query language with these operators enriches the expressiveness of a user’s
information need.
1. Keyword Queries
Keyword-based queries are the simplest and most commonly used forms of
IR queries: the user just enters keyword combinations to retrieve documents.
The query keyword terms are implicitly connected by a logical AND operator. A query such as ‘database concepts’ retrieves documents that
contain both the words ‘data-base’ and ‘concepts’ at the top of the retrieved
results. In addition, most systems also retrieve documents that contain only
‘database’ or only ‘concepts’ in their text. Some systems remove most commonly
occurring words (such as a, the, of,
and so on, called stopwords) as a
preprocessing step before sending the filtered query key-words to the IR
engine. Most IR systems do not pay attention to the ordering of these words in
the query. All retrieval models provide support for keyword queries.
2. Boolean Queries
Some IR systems allow using the AND, OR, NOT, ( ), + , and – Boolean operators in combinations of keyword
formulations. AND requires that both terms be found. OR lets either term be found. NOT means any record containing the second
term will be excluded. ‘( )’ means the Boolean operators can be nested using
parentheses. ‘+’ is equivalent to AND,
requiring the term; the ‘+’ should be placed directly in front of the search
term. ‘–’ is equivalent to AND NOT and means to exclude the term; the ‘–’
should be placed directly in front of the search term not wanted. Complex
Boolean queries can be built out of these operators and their combinations, and
they are evaluated according to the classical rules of Boolean algebra. No
ranking is possible, because a document either satisfies such a query (is
“relevant”) or does not satisfy it (is “nonrelevant”). A document is retrieved
for a Boolean query if the query is logically true as an exact match in the
document. Users generally do not use combinations of these complex Boolean
operators, and IR systems support a restricted version of these set operators.
Boolean retrieval models can directly sup-port different Boolean operator
implementations for these kinds of queries.
3. Phrase Queries
When documents are represented using an inverted keyword index for
searching, the relative order of the terms in the document is lost. In order to
perform exact phrase retrieval, these phrases should be encoded in the inverted
index or implemented differently (with relative positions of word occurrences
in documents). A phrase query consists of a sequence of words that makes up a
phrase. The phrase is generally enclosed within double quotes. Each retrieved
document must contain at least one instance of the exact phrase. Phrase
searching is a more restricted and specific version of proximity searching
that we mention below. For example, a phrase searching query could be
‘conceptual database design’. If phrases are indexed by the retrieval model,
any retrieval model can be used for these query types. A phrase thesaurus may
also be used in semantic models for fast dictionary searching for phrases.
4. Proximity Queries
Proximity search refers to a search that accounts for how close within a
record multiple terms should be to each other. The most commonly used
proximity search option is a phrase search that requires terms to be in the
exact order. Other proximity operators can specify how close terms should be
to each other. Some will also specify the order of the search terms. Each
search engine can define proximity operators differently, and the search
engines use various operator names such as NEAR, ADJ(adjacent), or AFTER. In
some cases, a sequence of single words is given, together with a maximum
allowed distance between them. Vector space models that also maintain
information about positions and offsets of tokens (words) have robust
implementations for this query type. However, providing support for complex
proximity operators becomes computationally expensive because it requires the
time-consuming preprocessing of documents, and is thus suitable for smaller
document collections rather than for the Web.
5. Wildcard Queries
Wildcard searching is generally meant to support regular expressions and
pattern matching-based searching in text. In IR systems, certain kinds of
wildcard search support may be implemented—usually words with any trailing
characters (for example, ‘data*’ would retrieve data, database, datapoint, dataset, and so on). Providing support
for wildcard searches in IR systems involves preprocessing over-head and is not
considered worth the cost by many Web search engines today. Retrieval models do
not directly provide support for this query type.
6. Natural Language
Queries
There are a few natural language search engines that aim to understand
the structure and meaning of queries written in natural language text,
generally as a question or narrative. This is an active area of research that
employs techniques like shallow semantic parsing of text, or query
reformulations based on natural language under-standing. The system tries to
formulate answers for such queries from retrieved results. Some search systems
are starting to provide natural language interfaces to provide answers to
specific types of questions, such as definition and factoid questions, which
ask for definitions of technical terms or common facts that can be retrieved
from specialized databases. Such questions are usually easier to answer because
there are strong linguistic patterns giving clues to specific types of
sentences—for example, ‘defined as’ or ‘refers to’. Semantic models can
provide support for this query type.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.