Model Data to Support Keyword Search

Note

Keyword search is not the same as text search or full text search, and does not provide stemming or other text-processing features. See the Limitations of Keyword Indexes section for more information.

In 2.4, MongoDB provides a text search feature. See Text Indexes for more information.

If your application needs to perform queries on the content of a field that holds text you can perform exact matches on the text or use $regex to use regular expression pattern matches. However, for many operations on text, these methods do not satisfy application requirements.

This pattern describes one method for supporting keyword search using MongoDB to support application search functionality, that uses keywords stored in an array in the same document as the text field. Combined with a multi-key index, this pattern can support application’s keyword search operations.

Pattern

To add structures to your document to support keyword-based queries, create an array field in your documents and add the keywords as strings in the array. You can then create a multi-key index on the array and create queries that select values from the array.

Example

Given a collection of library volumes that you want to provide topic-based search. For each volume, you add the array topics, and you add as many keywords as needed for a given volume.

For the Moby-Dick volume you might have the following document:

{ title : "Moby-Dick" ,
  author : "Herman Melville" ,
  published : 1851 ,
  ISBN : 0451526996 ,
  topics : [ "whaling" , "allegory" , "revenge" , "American" ,
    "novel" , "nautical" , "voyage" , "Cape Cod" ]
}

You then create a multi-key index on the topics array:

db.volumes.createIndex( { topics: 1 } )

The multi-key index creates separate index entries for each keyword in the topics array. For example the index contains one entry for whaling and another for allegory.

You then query based on the keywords. For example:

db.volumes.findOne( { topics : "voyage" }, { title: 1 } )

Note

An array with a large number of elements, such as one with several hundreds or thousands of keywords will incur greater indexing costs on insertion.

Limitations of Keyword Indexes

MongoDB can support keyword searches using specific data models and multi-key indexes; however, these keyword indexes are not sufficient or comparable to full-text products in the following respects:

  • Stemming. Keyword queries in MongoDB can not parse keywords for root or related words.
  • Synonyms. Keyword-based search features must provide support for synonym or related queries in the application layer.
  • Ranking. The keyword look ups described in this document do not provide a way to weight results.
  • Asynchronous Indexing. MongoDB builds indexes synchronously, which means that the indexes used for keyword indexes are always current and can operate in real-time. However, asynchronous bulk indexes may be more efficient for some kinds of content and workloads.