On this page
Aggregation Pipeline Optimization
On this page
Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.
To see how the optimizer transforms a particular aggregation pipeline, include the explain
option in the db.collection.aggregate()
method.
Optimizations are subject to change between releases.
Projection Optimization
The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.
Pipeline Sequence Optimization
$project
or $addFields
+ $match
Sequence Optimization
For an aggregation pipeline that contains a projection stage ($project
or $addFields
) followed by a $match
stage, MongoDB moves any filters in the $match
stage that do not require values computed in the projection stage to a new $match
stage before the projection.
If an aggregation pipeline contains multiple projection and/or $match
stages, MongoDB performs this optimization for each $match
stage, moving each $match
filter before all projection stages that the filter does not depend on.
Consider a pipeline of the following stages:
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {
name: "Joe Schmoe",
maxTime: { $lt: 20 },
minTime: { $gt: 5 },
avgTime: { $gt: 7 }
} }
The optimizer breaks up the $match
stage into four individual filters, one for each key in the $match
query document. The optimizer then moves each filter before as many projection stages as possible, creating new $match
stages as needed. Given this example, the optimizer produces the following optimized pipeline:
{ $match: { name: "Joe Schmoe" } },
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }
The $match
filter { avgTime: { $gt: 7 } }
depends on the $project
stage to compute the avgTime
field. The $project
stage is the last projection stage in this pipeline, so the $match
filter on avgTime
could not be moved.
The maxTime
and minTime
fields are computed in the $addFields
stage but have no dependency on the $project
stage. The optimizer created a new $match
stage for the filters on these fields and placed it before the $project
stage.
The $match
filter { name: "Joe Schmoe" }
does not use any values computed in either the $project
or $addFields
stages so it was moved to a new $match
stage before both of the projection stages.
Note
After optimization, the filter { name: "Joe Schmoe" }
is in a $match
stage at the beginning of the pipeline. This has the added benefit of allowing the aggregation to use an index on the name
field when initially querying the collection. See Pipeline Operators and Indexes for more information.
$sort
+ $match
Sequence Optimization
When you have a sequence with $sort
followed by a $match
, the $match
moves before the $sort
to minimize the number of objects to sort. For example, if the pipeline consists of the following stages:
{ $sort: { age : -1 } },
{ $match: { status: 'A' } }
During the optimization phase, the optimizer transforms the sequence to the following:
{ $match: { status: 'A' } },
{ $sort: { age : -1 } }
$redact
+ $match
Sequence Optimization
When possible, when the pipeline has the $redact
stage immediately followed by the $match
stage, the aggregation can sometimes add a portion of the $match
stage before the $redact
stage. If the added $match
stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. See Pipeline Operators and Indexes for more information.
For example, if the pipeline consists of the following stages:
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }
The optimizer can add the same $match
stage before the $redact
stage:
{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: