After a couple of thousand tweets my app collecting tweets from the stream API with Twitter4J gets an OutOfMemory error.
At reception of a status, my code does:
- convert the status into a TwitterStatus
object of my own. The reason is that the Status
returned by Twitter4J is an interface, which can't be serialized in MongoDB.
- add this status to a list.
- if the size of the list is above 25 or 100 (depending on the speed of reception of tweets), save to db.
So it is all pretty simple, I don't store anything locally and yet I get this OutOfMemory error. Any clue how I could keep my memory footprint low?
The code:
StatusListener listener;
listener = new StatusListener() {
@Override
public void onStatus(Status status) {
nbTweets++;
//the Status returned by Twitter4j is an interface, not serializable. I convert it into my own TwitterStatus object: same fields, serializable.
twitterStatus = convertStatus.convertOneToTwitterStatus(status);
twitterStatus.setJobId(jobUUID);
twitterStatuses.add(twitterStatus);
statusesIds.add(status.getId());
timeSinceLastStatus = System.currentTimeMillis() - timeLastStatus;
//**************************************
//adjusting the frequency of saves to DB, function of number of statuses received per second
if (timeSinceLastStatus < 200) {
sizeBatch = 100;
} else {
sizeBatch = 25;
}
timeLastStatus = System.currentTimeMillis();
progressLong = (Long) ((System.currentTimeMillis() - startDateTime.getMillis()) * 100 / (stopTime - startDateTime.getMillis()));
if (statusesIds.size() > sizeBatch || progressLong.intValue() > progress) {
//**************************************
//saving statuses to the db.
dsTweets.save(twitterStatuses);
twitterStatuses = new ArrayList();
//**************************************
//updating list of status ids of the job.
opsJob = dsJobs.createUpdateOperations(Job.class).addAll("statuses", statusesIds, true);
dsJobs.update(updateQueryJob, opsJob);
statusesIds = new ArrayList();
//updating progress.
System.out.println("progress: " + progressLong);
progress = progressLong.intValue();
opsJobInfo = dsJobsInfo.createUpdateOperations(JobInfo.class).set("progress", progress).set("nbTweets", nbTweets);
dsJobsInfo.update(updateQueryJobInfo, opsJobInfo);
}
}
}
Got it.
Since v. 2.6, MongoDB's default write concern is "acknowledge", instead of unacknowledged. This slows down operations considerably.
Just adding WriteConcern.UNACKNOWLEDGED
to all db writing operations solved the problem.