On this page
Text Indexes
On this page
Overview
MongoDB provides text indexes to support text search queries on string content. text
indexes can include any field whose value is a string or an array of string elements.
Versions
text Index Version |
Description |
---|---|
Version 3 | MongoDB introduces a version 3 of the text index. Version 3 is the default version of text indexes created in MongoDB 3.2 and later. |
Version 2 | MongoDB 2.6 introduces a version 2 of the text index. Version 2 is the default version of text indexes created in MongoDB 2.6 and 3.0 series. |
Version 1 | MongoDB 2.4 introduces a version 1 of the text index. MongoDB 2.4 can only support version 1 . |
To override the default version and specify a different version, include the option { "textIndexVersion": <version> }
when creating the index.
Create Text Index
Important
A collection can have at most one text
index.
To create a text
index, use the db.collection.createIndex()
method. To index a field that contains a string or an array of string elements, include the field and specify the string literal "text"
in the index document, as in the following example:
db.reviews.createIndex( { comments: "text" } )
You can index multiple fields for the text
index. The following example creates a text
index on the fields subject
and comments
:
db.reviews.createIndex(
{
subject: "text",
comments: "text"
}
)
A compound index can include text
index keys in combination with ascending/descending index keys. For more information, see Compound Index.
In order to drop a text
index, use the index name. See Use the Index Name to Drop a text Index for more information.
Specify Weights
For a text
index, the weight of an indexed field denotes the significance of the field relative to the other indexed fields in terms of the text search score.
For each indexed field in the document, MongoDB multiplies the number of matches by the weight and sums the results. Using this sum, MongoDB then calculates the score for the document. See $meta
operator for details on returning and sorting by text scores.
The default weight is 1 for the indexed fields. To adjust the weights for the indexed fields, include the weights
option in the db.collection.createIndex()
method.
For more information using weights to control the results of a text search, see Control Search Results with Weights.
Wildcard Text Indexes
When creating a text
index on multiple fields, you can also use the wildcard specifier ($**
). With a wildcard text index, MongoDB indexes every field that contains string data for each document in the collection. The following example creates a text index using the wildcard specifier:
db.collection.createIndex( { "$**": "text" } )
This index allows for text search on all fields with string content. Such an index can be useful with highly unstructured data if it is unclear which fields to include in the text index or for ad-hoc querying.
Wildcard text indexes are text
indexes on multiple fields. As such, you can assign weights to specific fields during index creation to control the ranking of the results. For more information using weights to control the results of a text search, see Control Search Results with Weights.
Wildcard text indexes, as with all text indexes, can be part of a compound indexes. For example, the following creates a compound index on the field a
as well as the wildcard specifier:
db.collection.createIndex( { a: 1, "$**": "text" } )
As with all compound text indexes, since the a
precedes the text index key, in order to perform a $text
search with this index, the query predicate must include an equality match conditions a
. For information on compound text indexes, see Compound Text Indexes.
Case Insensitivity
Changed in version 3.2.
The version 3 text
index supports the common C
, simple S
, and for Turkish languages, the special T
case foldings as specified in Unicode 8.0 Character Database Case Folding .
The case foldings expands the case insensitivity of the text
index to include characters with diacritics, such as é
and É
, and characters from non-Latin alphabets, such as “И” and “и” in the Cyrillic alphabet.
Version 3 of the text
index is also diacritic insensitive. As such, the index also does not distinguish between é
, É
, e
, and E
.
Previous versions of the text
index are case insensitive for [A-z]
only; i.e. case insensitive for non-diacritics Latin characters only . For all other characters, earlier versions of the text index treat them as distinct.
Diacritic Insensitivity
Changed in version 3.2.
With version 3, text
index is diacritic insensitive. That is, the index does not distinguish between characters that contain diacritical marks and their non-marked counterpart, such as é
, ê
, and e
. More specifically, the text
index strips the characters categorized as diacritics in Unicode 8.0 Character Database Prop List .
Version 3 of the text
index is also case insensitive to characters with diacritics. As such, the index also does not distinguish between é
, É
, e
, and E
.
Previous versions of the text
index treat characters with diacritics as distinct.
Tokenization Delimiters
Changed in version 3.2.
For tokenization, version 3 text
index uses the delimiters categorized under Dash
, Hyphen
, Pattern_Syntax
, Quotation_Mark
, Terminal_Punctuation
, and White_Space
in Unicode 8.0 Character Database Prop List .
For example, if given a string "Il a dit qu'il «était le meilleur joueur du monde»"
, the text
index treats «
, »
, and spaces as delimiters.
Previous versions of the index treat «
as part of the term "«était"
and »
as part of the term "monde»"
.
Index Entries
text
index tokenizes and stems the terms in the indexed fields for the index entries. text
index stores one index entry for each unique stemmed term in each indexed field for each document in the collection. The index uses simple language-specific suffix stemming.
Supported Languages and Stop Words
MongoDB supports text search for various languages. text
indexes drop language-specific stop words (e.g. in English, the
, an
, a
, and
, etc.) and use simple language-specific suffix stemming. For a list of the supported languages, see Text Search Languages.
If you specify a language value of "none"
, then the text
index uses simple tokenization with no list of stop words and no stemming.
To specify a language for the text
index, see Specify a Language for Text Index.
sparse
Property
text
indexes are always sparse and ignore the sparse option. If a document lacks a text
index field (or the field is null
or an empty array), MongoDB does not add an entry for the document to the text
index. For inserts, MongoDB inserts the document but does not add to the text
index.
For a compound index that includes a text
index key along with keys of other types, only the text
index field determines whether the index references a document. The other keys do not determine whether the index references the documents or not.
Restrictions
Text Index and Sort
Sort operations cannot obtain sort order from a text
index, even from a compound text index; i.e. sort operations cannot use the ordering in the text index.
Compound Index
A compound index can include a text
index key in combination with ascending/descending index keys. However, these compound indexes have the following restrictions:
- A