Standard Boolean model
The Boolean model of information retrieval (BIR)^{[1]} is a classical information retrieval (IR) model and, at the same time, the first and most adopted one. It is used by many IR systems to this day.^{[citation needed]}
Contents
Definitions
The BIR is based on Boolean logic and classical set theory in that both the documents to be searched and the user's query are conceived as sets of terms. Retrieval is based on whether or not the documents contain the query terms. Given a finite set
 T = {t1, t2, ..., tj, ..., tm}
of elements called index terms (e.g. words or expressions  which may be stemmed  describing or characterising documents such as keywords given for a journal article), a finite set
 D = {D1, ..., Di, ..., Dn}, where Di is an element of the powerset of T
of elements called documents. Given a Boolean expression  in a normal form  Q called a query as follows:
 Q = (Wi OR Wk OR ...) AND ... AND (Wj OR Ws OR ...),
 with Wi=ti, Wk=tk, Wj=tj, Ws=ts, or Wi=NON ti, Wk=NON tk, Wj=NON tj, Ws=NON ts
where ti means that the term ti is present in document Di, whereas NON ti means that it is not.
Equivalently, Q can be given in a disjunctive normal form, too. An operation called retrieval, consisting of two steps, is defined as follows:
 1. The sets Sj of documents are obtained that contain or not term tj (depending on whether Wj=tj or Wj=NON tj) :

 Sj = {DiWj element of Di}
 2. Those documents are retrieved in response to Q which are the result of the corresponding sets operations, i.e. the answer to Q is as follows:

 UNION ( INTERSECTION Sj)
Example
Let the set of original (real) documents be, for example
O = {O1, O2, O3}
where
O1 = Bayes' Principle: The principle that, in estimating a parameter, one should initially assume that each possible value has equal probability (a uniform prior distribution).
O2 = Bayesian Decision Theory: A mathematical theory of decisionmaking which presumes utility and probability functions, and according to which the act to be chosen is the Bayes act, i.e. the one with highest subjective expected utility. If one had unlimited time and calculating power with which to make every decision, this procedure would be the best way to make any decision.
O3 = Bayesian Epistemology: A philosophical theory which holds that the epistemic status of a proposition (i.e. how well proven or well established it is) is best measured by a probability and that the proper way to revise this probability is given by Bayesian conditionalisation or similar procedures. A Bayesian epistemologist would use probability to define, and explore the relationship between, concepts such as epistemic status, support or explanatory power.
Let the set T of terms be:
T = {t1 = Bayes' Principle, t2 = probability, t3 = decisionmaking, t4 = Bayesian Epistemology}
Then, the set D of documents is as follows:
D = {D1, D2, D3}
where
D1 = {Bayes' Principle, probability}
D2 = {probability, decisionmaking}
D3 = {probability, Bayesian Epistemology}
Let the query Q be:
Q = probability AND decisionmaking
1. Firstly, the following sets S1 and S2 of documents Di are obtained (retrieved):
S1 = {D1, D2, D3}
S2 = {D2}
2. Finally, the following documents Di are retrieved in response to Q: {D1, D2, D3} INTERSECTION {D2} = {D2}
This means that the original document O2 (corresponding to D2) is the answer to Q.
Obviously, if there is more than one document with the same representation, every such document is retrieved. Such documents are, in the BIR, indistinguishable (or, in other words, equivalent).
Advantages
 Clean formalism
 Easy to implement
 Intuitive concept
Disadvantages
 Exact matching may retrieve too few or too many documents
 Hard to translate a query into a Boolean expression
 All terms are equally weighted
 More like data retrieval than information retrieval
Data structures and algorithms
From a pure formal mathematical point of view, the BIR is straightforward. From a practical point of view, however, several further problems should be solved that relate to algorithms and data structures, such as, for example, the choice of terms (manual or automatic selection or both), stemming, hash tables, inverted file structure, and so on.^{[2]}
Hash Sets
Another possibility is to use hash sets. Each document is represented by a hash table which contains every single term of that document. Since Hashtable size increases and decreases in real time with the addition and removal of terms, each document will occupy much less space in memory. However, it will have a slowdown in performance because the operations are more complex than with bit vectors. On the worstcase performance can degrade from O(n) to O(n^{2}). On the average case, the performance slowdown will not be that much worse than bit vectors and the space usage is much more efficient.
References
 ↑ Lancaster, F.W. (1973), Information Retrieval OnLine, Melville Publishing Co., Los Angeles, California Unknown parameter
coauthors=
ignored (help)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>  ↑ Wartik, Steven (1992). "Boolean operations". Information Retrieval Data Structures & Algorithms. PrenticeHall, Inc. ISBN 0134638379.<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>
 Lashkari, A.H. (2009), A Boolean Model in Information Retrieval for Search Engines, doi:10.1109/ICIME.2009.101 Unknown parameter
coauthors=
ignored (help)<templatestyles src="Module:Citation/CS1/styles.css"></templatestyles>