Aggregation Pipeline in MongoDB and the use of $match and $group operator (Part 2)

Aggregation Pipeline in MongoDB and the use of $match and $group operator (Part 2)

Hello and welcome back readers to this second part of the MongoDB Aggregation Pipeline series, where we are going to explore the power of Aggregation Pipeline provided by MongoDB to make a developer's life easy.

If you are new to this article, I would like you to check the first part of the series by clicking here, and for the readers who are reading it on a continuous track let's first revise what we have learned till now.

Till now we have a clear understanding of what is aggregation in MongoDB and what are the different types of aggregation provided by MongoDB such as

  1. Map Reduce Function

  2. Single Purpose Aggregation

  3. Aggregation Pipeline

In this article, we will deeply explore the power of aggregation pipeline provided with the use of $match and $group operator

Aggregation Pipeline using $match operator:

$match operator filters the documents to pass only the documents that match the specified conditions to the next pipeline stage.

In General, if you have applied a $match operator to a set of data with mentioned expressions/fields then it will filter out only the documents that will match the field on each document and return the documents.

//Syntax

Syntax:  { $match: { <query> } }

let us understand the $match operator with the basic example where we have the collection of student data

//Collection Students
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score": 85}
{ "_id": ObjectId("55f5a192d4bede9ac365b257"), "student": "Ahn ben", "score": 60}
{ "_id": ObjectId("55f5a192d4bede9ac365b258"), "student": "li xin", "score": 55}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b259"), "student": "Ben hue", "score": 60}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25a"), "student": "li xin", "score": 94}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25b"), "student": "Peter parker", "score": 95}

//Now if we apply the $match operator to the above collection with the student's name 
//Dave Smith We get only the documents in which the student is Dave Smith

db.students.aggregate([{$match:{student:"Dave smith"}}])
//we will get the results as shown below
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score": 85}

Aggregation Pipeline using $group operator:

Now we can move to the next operator which is the $group operator, as we know in the aggregation pipeline there is a series of stages which we can introduce, to extract a certain kind of data as our requirements.

The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key.

syntax:     { $group: { _id: , // Group key : 
              <field1>: {<accumulator> :<expression> }, 
                  ... 
                      } 
             }

The above definition of group operator generally refers to, when applied on a set of documents it will return a set of documents with each document containing the field _id as the first field followed by the second field in which the group is done.

for example, as shown below the group operator is applied in the student collection.

//Our student's collection with the name of the student as field "student"
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score" : 85}
{ "_id": ObjectId("55f5a192d4bede9ac365b257"), "student": "Ahn ben", "score": 60}
{ "_id": ObjectId("55f5a192d4bede9ac365b258"), "student": "li xin", "score": 55}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b259"), "student": "Ben hue", "score": 60}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25a"), "student": "li xin", "score": 94}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25b"), "student": "Peter parker", "score": 95}

//Now if we group the collections based on the student as _id 
db.students.aggregate([{$group:{_id:"$student"}])

//_id is mandatory field for applying $group which always takes the filed in
//which you want your collections to be grouped
//will always return the total distinct names present inside your collections
{"_id": "Dave Smith"}
{"_id": "Ahn ben"}
{"_id": "li xin"}
{"_id": "Peter Parker"}

Now from the above two operators, it is clear that $match can be used when you want to filter out the document based on a certain field and $group can used to group the particular collection based on the "Group key".

Similarly, as we know there can be multiple stages in an aggregation pipeline and we can introduce any number of stages as much as we want, Now we will try to add these two stages in our next example, which utilises both $match and $group operator.

Problem Statement for Two-stage Pipeline:

We want to find the name of the Student who has scored greater than or equal to 80.

From the above problem statement, it is clear that we can use the $match operator to find the student with a score greater than or equal to 80 because in our collection we have the document with a duplicate student name as the student can be present twice, so we have to also use $group operator to also find distinct value.

//Our student's collection with the name of the student as field "student"
{ "_id": ObjectId("512bc95fe835e68f199c8686"), "student": "Dave Smith", "score": 80}
{ "_id": ObjectId("512bc962e835e68f199c8687"), "student": "Dave Smith", "score" : 85}
{ "_id": ObjectId("55f5a192d4bede9ac365b257"), "student": "Ahn ben", "score": 60}
{ "_id": ObjectId("55f5a192d4bede9ac365b258"), "student": "li xin", "score": 55}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b259"), "student": "Ben hue", "score": 60}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25a"), "student": "li xin", "score": 94}
{ "_id": ObjectId("55f5a1d3d4bede9ac365b25b"), "student": "Peter Parker", "score": 95}

//we will be using two stages here for extracting the data
db.students.aggregate([ 
                     {  //first stage will find the document with the score
                        //greater than or equal to 80
                       $match:{"score":{ $gte:80 }}
                     },
                     {  //second Stage will group that distinct name from each document got from first stage
                       $group:{ _id: "$student"}
                     }
                     ])

//The result we get from this two-stage pipeline is 
//In the first stage we get the four documents with similar names  Dave smith two times
//from the second stage we will group those documents based on distinct student name
{"_id": "Dave Smith"}
{"_id": "li xin"}
{"_id": "Peter Parker"}

Conclusion:

In this Article, we now get a basic understanding of what is the $match and $ group operator, why it is used and how it minimizes the filter techniques which involve a tedious process at the frontend if it is not filtered from the backend.

But MongoDB's aggregation pipeline can do this thing in merely two lines of command to extract the particular required data, and that is what we call the essence of MongoDB Aggregation Power which can solve a bigger problem.

I hope you like this article a lot and would appreciate my work, I have learned this thing from the internet as well do check the below link, and also stay tuned for further part of this series

Do Like, Comment, Share and Subscribe to my Newsletter for getting my work directly in your inbox. You can also sponsor my work by donating in the below Link.

Did you find this article valuable?

Support Ganesh Yadav by becoming a sponsor. Any amount is appreciated!