My JavaRDD structure looks like this :-
[
ObjectHandler [username=KAJAL, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T06:48:10.108Z],
ObjectHandler [username=KAJAL, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T06:51:12.089Z],
ObjectHandler [username=KAJAL, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T06:52:44.285Z],
ObjectHandler [username=KAJAL, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T06:54:23.250Z],
ObjectHandler [username=KAJAL, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T06:55:35.045Z],
ObjectHandler [username=Hello, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T10:40:07.929Z],
ObjectHandler [username=Hello, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T10:40:54.602Z],
ObjectHandler [username=neelam, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-03T07:16:23.085Z]
]
Now i want distinct element on username like this:-
[
ObjectHandler [username=KAJAL, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T06:48:10.108Z],
ObjectHandler [username=Hello, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-02T10:40:54.602Z],
ObjectHandler [username=neelam, properties={}, event_name=INSTALL, pname=null, ptype=null, pvalue=null, date=2016-08-03T07:16:23.085Z]
]
I have used JavaRDD.distinct()
function but result was same.
please help.
JavaRDD.distinct()
would invoke ObjectHandler.equals
to check for distinctness - if you haven't overridden it, each ObjectHandler
would be considered different and thus you'd end up with the same RDD.
You have therefore two options:
Override equals
and hashCode
, only compare username
: this is simple to do, and then distinct
would return the expected result. The downside is - you might want two instances of ObjectHandler
to be considered different even if they have the same username
under some other circumstances. In other words, other use cases in your program would require stricter equality of these objects. If that's the case, you can use the second approach:
Reduce by username
: extract the username
into the RDD's "key", reduce by that key while "randomly" choosing one of the matching values, and then get rid of these keys. With Java 8 this would look like:
final JavaRDD<ObjectHandler> result = rdd
.keyBy(v -> v.username)
.reduceByKey((ObjectHandler v1, ObjectHandler v2) -> v1)
.values();
with Java 7 this would look a bit messy, but the logic is identical:
final JavaRDD<ObjectHandler> result = rdd.keyBy(new Function<ObjectHandler, String>() {
@Override
public String call(ObjectHandler v1) throws Exception {
return v1.username;
}
}).reduceByKey(new Function2<ObjectHandler, ObjectHandler, ObjectHandler>() {
@Override
public ObjectHandler call(ObjectHandler v1, ObjectHandler v2) throws Exception {
return v1; // choosing one "randomly"
}
}).values();