I have one million rows of data in .txt format. the format is very simple. For each row:
user1,value1 user2,value2 user3,value3 user1,value4 ...
You know what I mean. For each user, it could appear many times, or appear only once (you never know). I need to find out all the values for each user. Because user may appear randomly, I used Hashmap to do it. That is: HashMap(key: String, value: ArrayList). But to add data to the arrayList, I have to constantly use HashMap get(key) to get the arrayList, add value to it, then put it back to HashMap. I feel it is not that very efficient. Anybody knows a better way to do that?
You don't need to re-add the ArrayList back to your Map. If the ArrayList already exists then just add your value to it.
An improved implementation might look like:
Map<String, Collection<String>> map = new HashMap<String, Collection<String>>();
while processing each line:
String user = user field from line
String value = value field from line
Collection<String> values = map.get(user);
if (values==null) {
values = new ArrayList<String>();
map.put(user, values)
}
values.add(value);
Follow-up April 2014 - I wrote the original answer back in 2009 when my knowledge of Google Guava was limited. In light of all that Google Guava does, I now recommend using its Multimap
instead of reinvent it.
Multimap<String, String> values = HashMultimap.create();
values.put("user1", "value1");
values.put("user2", "value2");
values.put("user3", "value3");
values.put("user1", "value4");
System.out.println(values.get("user1"));
System.out.println(values.get("user2"));
System.out.println(values.get("user3"));
Outputs:
[value4, value1]
[value2]
[value3]