This document provides a number of practical examples that display the capabilities of the aggregation framework.
The Aggregations using the Zip Codes Data Set examples uses a publicly available data set of all zipcodes and populations in the United States. These data are available at: zips.json.
Let's check if everything is installed.
Use the following command to load zips.json data set into mongod instance:
$ mongoimport −−drop −d test −c zipcodes zips.json
Let's use the MongoDB shell to verify that everything was imported successfully.
$ mongo test
connecting to: test
> db.zipcodes.count()
29467
> db.zipcodes.findOne()
{
"_id" : "35004",
"city" : "ACMAR",
"loc" : [
−86.51557,
33.584132
],
"pop" : 6055,
"state" : "AL"
}
Each document in this collection has the following form:
{
"_id" : "35004",
"city" : "Acmar",
"state" : "AL",
"pop" : 6055,
"loc" : [−86.51557, 33.584132]
}
In these documents:
|
• |
The _id field holds the zipcode as a string. |
|||
|
• |
The city field holds the city name. |
|||
|
• |
The state field holds the two letter state abbreviation. |
|||
|
• |
The pop field holds the population. |
|||
|
• |
The loc field holds the location as a [latitude, longitude] array. |
To get all states with a population greater than 10 million, use the following aggregation pipeline:
aggregation1.c
#include
<mongoc/mongoc.h>
#include <stdio.h>
static void
print_pipeline (mongoc_collection_t *collection)
{
mongoc_cursor_t *cursor;
bson_error_t error;
const bson_t *doc;
bson_t *pipeline;
char *str;
pipeline =
BCON_NEW ("pipeline",
"[",
"{",
"$group",
"{",
"_id",
"$state",
"total_pop",
"{",
"$sum",
"$pop",
"}",
"}",
"}",
"{",
"$match",
"{",
"total_pop",
"{",
"$gte",
BCON_INT32 (10000000),
"}",
"}",
"}",
"]");
cursor = mongoc_collection_aggregate (collection, MONGOC_QUERY_NONE, pipeline, NULL, NULL);
while
(mongoc_cursor_next (cursor, &doc)) {
str = bson_as_canonical_extended_json (doc, NULL);
printf ("%s\n", str);
bson_free (str);
}
if
(mongoc_cursor_error (cursor, &error)) {
fprintf (stderr, "Cursor Failure: %s\n",
error.message);
}
mongoc_cursor_destroy
(cursor);
bson_destroy (pipeline);
}
int
main (void)
{
mongoc_client_t *client;
mongoc_collection_t *collection;
const char *uri_string =
"mongodb://localhost:27017/?appname=aggregation−example";
mongoc_uri_t *uri;
bson_error_t error;
mongoc_init ();
uri =
mongoc_uri_new_with_error (uri_string, &error);
if (!uri) {
fprintf (stderr,
"failed to parse URI: %s\n"
"error message: %s\n",
uri_string,
error.message);
return EXIT_FAILURE;
}
client =
mongoc_client_new_from_uri (uri);
if (!client) {
return EXIT_FAILURE;
}
mongoc_client_set_error_api
(client, 2);
collection = mongoc_client_get_collection (client,
"test", "zipcodes");
print_pipeline (collection);
mongoc_uri_destroy
(uri);
mongoc_collection_destroy (collection);
mongoc_client_destroy (client);
mongoc_cleanup ();
return
EXIT_SUCCESS;
}
You should see a result like the following:
{
"_id" : "PA", "total_pop" :
11881643 }
{ "_id" : "OH", "total_pop" :
10847115 }
{ "_id" : "NY", "total_pop" :
17990455 }
{ "_id" : "FL", "total_pop" :
12937284 }
{ "_id" : "TX", "total_pop" :
16986510 }
{ "_id" : "IL", "total_pop" :
11430472 }
{ "_id" : "CA", "total_pop" :
29760021 }
The above aggregation pipeline is build from two pipeline operators: $group and $match.
The $group pipeline operator requires _id field where we specify grouping; remaining fields specify how to generate composite value and must use one of the group aggregation functions: $addToSet, $first, $last, $max, $min, $avg, $push, $sum. The $match pipeline operator syntax is the same as the read operation query syntax.
The $group process reads all documents and for each state it creates a separate document, for example:
{ "_id" : "WA", "total_pop" : 4866692 }
The total_pop field uses the $sum aggregation function to sum the values of all pop fields in the source documents.
Documents created by $group are piped to the $match pipeline operator. It returns the documents with the value of total_pop field greater than or equal to 10 million.
To get the first three states with the greatest average population per city, use the following aggregation:
pipeline =
BCON_NEW ("pipeline", "[",
"{", "$group", "{",
"_id", "{", "state",
"$state", "city", "$city",
"}", "pop", "{",
"$sum", "$pop", "}",
"}", "}",
"{", "$group", "{",
"_id", "$_id.state",
"avg_city_pop", "{", "$avg",
"$pop", "}", "}",
"}",
"{", "$sort", "{",
"avg_city_pop", BCON_INT32 (−1),
"}", "}",
"{", "$limit", BCON_INT32 (3)
"}",
"]");
This aggregate pipeline produces:
{
"_id" : "DC", "avg_city_pop" :
303450.0 }
{ "_id" : "FL", "avg_city_pop"
: 27942.29805615551 }
{ "_id" : "CA", "avg_city_pop"
: 27735.341099720412 }
The above aggregation pipeline is build from three pipeline operators: $group, $sort and $limit.
The first $group operator creates the following documents:
{ "_id" : { "state" : "WY", "city" : "Smoot" }, "pop" : 414 }
Note, that the $group operator can't use nested documents except the _id field.
The second $group uses these documents to create the following documents:
{ "_id" : "FL", "avg_city_pop" : 27942.29805615551 }
These documents are sorted by the avg_city_pop field in descending order. Finally, the $limit pipeline operator returns the first 3 documents from the sorted set.
MongoDB, Inc
2017-present, MongoDB, Inc