Introduction to Aggregation Pipeline in MongoDB (Part 1)

Introduction to Aggregation Pipeline in MongoDB (Part 1)

Hello and Welcome back readers to this amazing series in which we are going to explore deeply the MongoDB aggregation pipeline and how it reduces the number of steps and simplifies the process of data extraction from your database.

But Before moving forward with Aggregation Pipeline, let us learn what are the diiferent types of aggregation that are available or provided by MongoDB.

Types of Aggregation in MongoDB:

So MongoDB provides three types of Aggregation

  1. Map Reduce Function

  2. Single Purpose Aggregation

  3. Aggregation Pipeline

Map Reduce Function:

Map Reduce is used for aggregating results for a large volume of data. Map reduce has two main functions one is a map that groups all the documents and the second one is the reduce which operates on the grouped data.

Although the Map Reduce type of Aggregation is not generally used, because MongoDB provide an Advance Aggregation Pipeline, we will see this Map Reduce for learning purposes and how it is used, the basic syntax for map-reduce aggregation is as follows,

db.collectionName.mapReduce(mappingFunction, reduceFunction, {out:'Result'})

You can see from the above expression, that for the map-reduce type of aggregation, we have to provide three arguments, which are mappingFunction, reduceFunction and out Results format.

The mappingFunction generally selects a key or the fields from the document that is generally mentioned inside the mappingFunction.

These selected types of Documents are then fed as the input to the reduceFunction which performs operations to those selected documents and provides the out Results in the form new Collection.

Now Let us See an Example using Map Reduce, Where we will group the documents based on age and find total marks in each age group.

//let the students collections have documents as
{
id:ObjectId("1384y125y1857405745c6d"),
Name:"zack",
age:19,
totalMarks:30
},
{
id:ObjectId("1384y125y1857405745c6d"),
Name:"ray",
age:27,
totalMarks:15
}
{
id:ObjectId("1384y125y1857405745c6d"),
Name:"hiro",
age:27,
totalMarks:30
},
.........
//we can apply map reduce, method of aggregation to 
//calculate the total marks with a particular age group
//here we are mapping the selected key such as age and marks
const mapfunction = function(){emit(this.age, this.totalMarks)}
const reducefunction = function(key, values){return Array.sum(values)}
db.studentsMarks.mapReduce(mapfunction, reducefunction, {'out':'Result'})
//So, we will create two variables first mapfunction 
//which will emit age as a key (expressed as “_id” in the output) and 
//marks as value this emitted data is passed to our reducefunction, 
//which takes key and value as grouped data, and then it performs 
//operations over it. After performing reduction the results are stored 
//in a collection here in this case the collection is Results.
//the results of the operation are stored inside a new collection created as
//Results and when we find the document inside these Results collections as
db.Results.find()
{_id:19, value:30}
{_id:27, value:45}
//here we calculated the total marks for each age group

Single Purpose Aggregation:

This type of aggregation is generally used for simple operations such as counting the number of documents present inside the collections and such as fetching the distinct student's name inside our student collection.

The Basic syntax for single-purpose aggregation is shown

//Single Purpose Aggregation
//for finding all distinct student name
db.students.distinct("name")
//will return an array of distinct names present inside the collection
["zack","ray","hiro"]

//for finding the total count of the document
db.students.count()
//return a number of a total count of documents inside the particular collection
3

Aggregation Pipeline:

Now, we move forward to the next type of aggregation, which is most widely used and is recommended to use for simple as well as advanced aggregation.

As the name suggests, a pipeline is generally a series of stages where the output of one stage is fed as the input of another stage, depending upon your requirements you can put as many stages as you want until you get your desired results.

The basic definition of MongoDB Aggregation Pipeline is mentioned in their official documentation as follows:

"An aggregation pipeline consists of one or more stages that process documents"

  • Each stage operates on the input documents. For example, a stage can filter documents, group documents, and calculate values.

  • The documents that are output from a stage are passed to the next stage.

  • An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values.

The general syntax for the aggregation pipeline is shown

db.collectionName.aggregate([])
//this array inside an aggregate function is called the pipeline array which
//generally carries your stages of pipeline and expression inside it.

//Advnaced syntax breakdown
db.collectionName.aggregate([{$group:  //stage
                             { _id:"$id", //Expression
                              total:{$sum:"$fare"}}}])
                            //Accumulator

From the Above Expression, it is clear for using the aggregation pipeline in MongoDB we have to use the .aggregate method as well as we have to provide the pipeline array inside it where all the stages and filtering happens.

The advanced Aggregation pipeline generally uses multiple stages but they have some common things such as there should be the compulsory a Stage operator and there should be also the Expression present which generally refers to the input field inside the actual document in which the aggregation is going to occur such as age and totalMarks field and the third one is called accumulator which generally refers to the thing you want after you filter or grouped the document, such as sometimes we have to $sort the output or we have to $count of all the document present after the pipeline happens.

Conclusion:

So, till now we only got to see what is the aggregation and what are the different aggregation methods that are provided by MongoDB and the most widely used technique of aggregation,

In the further part of this series, we will be going to explore in depth the different stages and grouping operators provided by MongoDB to carry out aggregation pipelines such as $match,$group,$unwind, and $lookup, and how to optimize these aggregation stages to get the required results in the optimized format.

So stay tuned for the further part of the series, as we will explore this beautiful concept of aggregation pipeline in MongoDB.

Do Like, Comment, Share and Subscribe to my Newsletter for getting my work directly in your inbox. You can also sponsor my work by donating in the below Link.

Did you find this article valuable?

Support Ganesh's Blog by becoming a sponsor. Any amount is appreciated!