I have a collection in mongodb with below data:
collection name: runState
runId: 1
startTime:2020-09-16T20:56:06.598+00:00
endTime:2020-09-16T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:David
runId: 2
startTime:2021-01-11T20:56:06.598+00:00
endTime:2021-01-11T20:56:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 2
startTime:2021-01-27T20:56:06.598+00:00
endTime:2021-01-27T20:56:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 3
startTime:2021-01-11T20:56:06.598+00:00
endTime:2021-01-11T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 4
startTime:2021-02-09T20:56:06.598+00:00
endTime:2021-02-09T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:John
runId: 5
startTime:2021-02-09T20:56:06.598+00:00
endTime:2021-02-09T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 6
startTime:2021-09-09T20:56:06.598+00:00
endTime:2021-09-09T20:57:09.196+00:00
product_action: org_rhel_postgres_install
Task: completed
ranBy:John
runId: 7
startTime:2022-01-09T20:56:06.598+00:00
endTime:2022-01-09T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: completed
ranBy:David
runId: 8
startTime:2022-01-10T20:56:06.598+00:00
endTime:2022-01-10T20:57:09.196+00:00
product_action: org_rhel_oracle_install
Task: failed
ranBy:David
I want the output as count for last 12 months (Jan 2021 to Jan 2022) for each products where task is completed( product is gettable from product_action)
Output should be in below format:
{
"_id" : "postgres",
completed: [
{
"month" : "FEB-2021",
"count" : 1
},
{
"month" : "SEP-2021",
"count" : 1
},
{
"month" : "JAN-2021",
"count" : 1
}
]
},
{
"_id" : "oracle",
"completed" : [
{
"month" : "FEB-2021",
"count" : 1
},
{
"month" : "JAN-2021",
"count" : 2
}
]
}
I have started with below, but not sure how to get count for month wise like above.
{"product_action":{$regex:"postgres|oracle"},"Task":"completed"}
As this is new to me, can someone help me with mongo DB query to get the result and also code to acheive this in Java springboot?
Java code I tried using aggregation, but this is not yielding the result I want.
Aggregation agg = Aggregation.newAggregation(
Aggregation.project("endTime","Task","product_action").and(DateOperators.Month.monthOf("endTime")).as("month"),
Aggregation.match(Criteria.where("product_action").regex("postgres|oracle").and("Task").is("completed")
.and("endTime").gte(parseDate("2021-02-01"))),
Aggregation.group("month","Task").count().as("count")
);
Try this on for size:
db.foo.aggregate([
// Get easy stuff out way. Filter for the desired date range and only
// those items that are complete:
{$match: {$and: [
{"endTime":{$gte:new ISODate("2021-01-01")}},
{"endTime":{$lt:new ISODate("2022-01-01")}},
{"Task":"completed"}
]} }
// Now group by product and date expressed as month-year. The product
// is embedded in the field value so there are a few approaches to digging
// it out. Here, we split on underscore and take the [2] item.
,{$group: {_id: {
p: {$arrayElemAt:[{$split:["$product_action","_"]},2]},
d: {$dateToString: {date: "$endTime", format: "%m-%Y"}}
},
n: {$sum: 1}
}}
// The OP seeks to make the date component nested inside the product
// instead of having it as a two-part grouping. We will "regroup" and
// create an array. This is slightly different than the format indicated
// by the OP but values as keys (e.g. "Jan-2021: 2") is in general a
// poor idea so instead we construct an array of proper name:value pairs.
,{$group: {_id: '$_id.p',
completed: {$push: {d: '$_id.d', n: '$n'}}
}}
]);
which yields
{
"_id" : "postgres",
"completed" : [
{
"d" : "02-2021",
"n" : 1
},
{
"d" : "09-2021",
"n" : 1
},
{
"d" : "01-2021",
"n" : 1
}
]
}
{
"_id" : "oracle",
"completed" : [
{
"d" : "02-2021",
"n" : 1
},
{
"d" : "01-2021",
"n" : 2
}
]
}
UPDATED
It has come up before that the $dateToString
function does not have a format argument to produce the 3 letter abbreviation for a month e.g. JAN
(or a long form e.g. January
for that matter). Sorting still works with 01-2021,02-2021,04-2021
vs. JAN-2021,FEB-2021,APR-2021
but if such output is really desired directly from the DB instead of post-processing in the client-side code, then the second group is replaced by a $sort
and $group
as follows:
// Ensure the NN-YYYY dates are going in increasing order. The product
// component _id.p does not matter here -- only the dates have to be
// increasing. NOTE: This is OPTIONAL with respect to changing
// NN-YYYY into MON-YYYY but almost always the follow on question is
// how to get the completed list in date order...
,{$sort: {'_id.d':1}}
// Regroup as before but index the NN part of NN-YYYY into an
// array of 3 letter abbrevs, then reconstruct the string with the
// dash and the year component. Remember: the order of the _id
// in the doc stream coming out of $group is not deterministic
// but the array created by $push will preserve the order in
// which it was pushed -- which is the date-ascending sorted order
// from the prior stage.
,{$group: {_id: '$_id.p',
completed: {$push: {
d: {$concat: [
{$arrayElemAt:[ ['JAN','FEB','MAR',
'APR','MAY','JUN',
'JUL','AUG','SEP',
'OCT','NOV','DEC'],
// minus 1 to adjust for zero-based array:
{$subtract:[{$toInt: {$substr:['$_id.d',0,2]}},1]}
]},
"-",
{$substr:['$_id.d',3,4]}
]},
n: '$n'}}
}}
which yields:
{
"_id" : "postgres",
"completed" : [
{
"d" : "JAN-2021",
"n" : 1
},
{
"d" : "FEB-2021",
"n" : 1
},
{
"d" : "SEP-2021",
"n" : 1
}
]
}
{
"_id" : "oracle",
"completed" : [
{
"d" : "JAN-2021",
"n" : 2
},
{
"d" : "FEB-2021",
"n" : 1
}
]
}
As for converting this to Java, there are several approaches but unless a great deal of programmatic control is required, then capturing the query as "relaxed JSON" (quotes not required around keys) in a string in Java and calling Document.parse()
seems to be the easiest way. A full example including helper functions and the appropriate Java drivers calls can be found here: https://moschetti.org/rants/mongoaggcvt.html but the gist of it is:
private static class StageHelper {
private StringBuilder txt;
public StageHelper() {
this.txt = new StringBuilder();
}
public void add(String expr, Object ... subs) {
expr.replace("'", "\""); // This is the helpful part.
if(subs.length > 0) {
expr = String.format(expr, subs); // this too
}
txt.append(expr);
}
public Document fetch() {
Document b = Document.parse(txt.toString());
return b;
}
}
private List<Document> makePipeline() {
List<Document> pipeline = new ArrayList<Document>();
StageHelper s = new StageHelper();
s.add("{$match: {$and: [ ");
// Note use of EJSON here plus string substitution of dates:
s.add(" {endTime:{$gte: {$date: '%s'}} }", "2021-01-01");
s.add(" {endTime:{$lt: {$date: '%s'}} }", "2022-01-01");
s.add(" {Task:'completed'} ");
s.add("]} } ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$group: {_id: { ");
s.add(" p: {$arrayElemAt:[{$split:['$product_action','_']},2]}, ");
s.add(" d: {$dateToString: {date: '$endTime', 'format': '%m-%Y'}} ");
s.add(" }, ");
s.add(" n: {$sum: 1} ");
s.add("}} ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$sort: {'_id.d':1}} ");
pipeline.add(s.fetch());
s = new StageHelper();
s.add("{$group: {_id: '$_id.p', ");
s.add(" completed: {$push: { ");
s.add(" d: {$concat: [ ");
s.add(" {$arrayElemAt:[ ['JAN','FEB','MAR', ");
s.add(" 'APR','MAY','JUN', ");
s.add(" 'JUL','AUG','SEP', ");
s.add(" 'OCT','NOV','DEC'], ");
s.add(" {$subtract:[{$toInt: {$substr:['$_id.d',0,2]}},1]} ");
s.add(" ]}, ");
s.add(" '-', ");
s.add(" {$substr:['$_id.d',3,4]} ");
s.add(" ]}, ");
s.add(" n: '$n'}} ");
s.add(" }} ");
pipeline.add(s.fetch());
return pipeline;
}
...
import com.mongodb.client.MongoCursor;
import com.mongodb.client.AggregateIterable;
AggregateIterable<Document> output = coll.aggregate(pipeline);
MongoCursor<Document> iterator = output.iterator();
while (iterator.hasNext()) {
Document doc = iterator.next();