Perform Incremental Map-Reduce
Map-reduce operations can handle complex aggregation tasks. To perform map-reduce operations, MongoDB provides the
mapReduce command and, in the
mongo shell, the
db.collection.mapReduce() wrapper method.
If the map-reduce data set is constantly growing, you may want to perform an incremental map-reduce rather than performing the map-reduce operation over the entire data set each time.
To perform incremental map-reduce:
- Run a map-reduce job over the current collection and output the result to a separate collection.
- When you have more data to process, run subsequent map-reduce job with:
queryparameter that specifies conditions that match only the new documents.
outparameter that specifies the
reduceaction to merge the new results into the existing output collection.
Consider the following example where you schedule a map-reduce operation on a
sessions collection to run at the end of each day.
sessions collection contains documents that log users’ sessions each day, for example:
Run the first map-reduce operation as follows:
Define the map function that maps the
useridto an object that contains the fields
Define the corresponding reduce function with two arguments
valuesto calculate the total time and the count. The
keycorresponds to the
userid, and the
valuesis an array whose elements corresponds to the individual objects mapped to the
Define the finalize function with two arguments
reducedValue. The function modifies the
reducedValuedocument to add another field
averageand returns the modified document.
Perform map-reduce on the
sessioncollection using the
reduceFunction, and the
finalizeFunctionfunctions. Output the results to a collection
session_stat. If the
session_statcollection already exists, the operation will replace the contents:
Later, as the
sessions collection grows, you can run additional map-reduce operations. For example, add new documents to the
At the end of the day, perform incremental map-reduce on the
sessions collection, but use the
query field to select only the new documents. Output the results to the collection
reduce the contents with the results of the incremental map-reduce: