Aggregation with the Zip Code Data Set
On this page
Each document in the
zipcodes collection has the following form:
_idfield holds the zip code as a string.
cityfield holds the city name. A city can have more than one zip code associated with it as different sections of the city can each have a different zip code.
statefield holds the two letter state abbreviation.
popfield holds the population.
locfield holds the location as a longitude latitude pair.
aggregate() method uses the aggregation pipeline to processes documents into aggregated results. An aggregation pipeline consists of stages with each stage processing the documents as they pass along the pipeline. Documents pass through the stages in sequence.
aggregate() method in the
mongo shell provides a wrapper around the
aggregate database command. See the documentation for your driver for a more idiomatic interface for data aggregation operations.
The following aggregation operation returns all states with total population greater than 10 million:
$groupstage groups the documents of the
zipcodecollection by the
statefield, calculates the
totalPopfield for each state, and outputs a document for each unique state.
The new per-state documents have two fields: the
_idfield and the
_idfield contains the value of the
state; i.e. the group by field. The
totalPopfield is a calculated field that contains the total population of each state. To calculate the value,
$sumoperator to add the population field (
pop) for each state.
$groupstage, the documents in the pipeline resemble the following:
$matchstage filters these grouped documents to output only those documents whose
totalPopvalue is greater than or equal to 10 million. The
$matchstage does not alter the matching documents but outputs the matching documents unmodified.
The equivalent SQL for this aggregation operation is:
The following aggregation operation returns the average populations for cities in each state:
$groupstage groups the documents by the combination of
state, uses the
$sumexpression to calculate the population for each combination, and outputs a document for each
After this stage in the pipeline, the documents resemble the following:
$groupstage groups the documents in the pipeline by the
_id.statefield (i.e. the
statefield inside the
_iddocument), uses the
$avgexpression to calculate the average city population (
avgCityPop) for each state, and outputs a document for each state.
The documents that result from this aggregation operation resembles the following:
The following aggregation operation returns the smallest and largest cities by population for each state:
$groupstage groups the documents by the combination of the