I have a CSV file with datapoints as
student, year, subject, score1, score2, score3, ..., score100
Alex, 2010, Math, 23, 56, 43, ..., 89
Alex, 2011, Science, 45, 32, 45, ..., 65
Matt, 2009, Art, 34, 56, 75, ..., 43
Matt, 2010, Math, 43, 54, 54, ..., 32
What would be the best way to load such CSV as Map in Java. This data is used for lookup service hence the chosen map data structure. The key would be the Tuple (student, year) -> which returns a list of subject + scores (SubjectScore.class). So the idea is given the name of the student and year, get all subjects and scores.
I didn't find an elegant solution while searching to read the CSV file in a Map of defined classes like Map<Tuple, List<SubjectScore>>
class Tuple {
private String student;
private int year;
}
class SubjectScore {
private String subject;
private int score1;
private int score2;
private int score3;
// more fields here
private int score100;
}
Additional details: The CSV file is large ~ 2 GB but is static in nature, hence deciding to load in memory.
Please find below a first example, which may serve as a starting point. I have removed the dots in your example input data and assume a simplified example with 4 scores.
student, year, subject, score1, score2, score3, ..., score100
Alex, 2010, Math, 23, 56, 43, 89
Alex, 2011, Science, 45, 32, 45, 65
Matt, 2009, Art, 34, 56, 75, 43
Matt, 2010, Math, 43, 54, 54, 32
Alex, 2010, Art, 43, 54, 54, 32
I also assume that you have overwritten the equals and hashcode methods in your tuple class and implemented a suitable constructor
class Tuple {
private String student;
private int year;
public Tuple(String student, int year) {
this.student = student;
this.year = year;
}
@Override
public int hashCode() {
int hash = 7;
hash = 79 * hash + Objects.hashCode(this.student);
hash = 79 * hash + this.year;
return hash;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final Tuple other = (Tuple) obj;
if (this.year != other.year) {
return false;
}
return Objects.equals(this.student, other.student);
}
@Override
public String toString() {
return "Tuple{" + "student=" + student + ", year=" + year + '}';
}
}
and a SubjectScore class with a suitable constructor
class SubjectScore {
private String subject;
private int score1;
private int score2;
private int score3;
// more fields here
private int score4;
public SubjectScore(String row) {
String[] data = row.split(",");
this.subject = data[0];
this.score1 = Integer.parseInt(data[1].trim());
this.score2 = Integer.parseInt(data[2].trim());
this.score3 = Integer.parseInt(data[3].trim());
this.score4 = Integer.parseInt(data[4].trim());
}
}
Then you can create the desired map as follows:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.AbstractMap;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Objects;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class Example {
public static void main(String[] args) {
Map<Tuple, List<SubjectScore>> map = new HashMap<>();
try (Stream<String> content = Files.lines(Paths.get("path to your csv file"))) {
map = content.skip(1).map(line -> lineToEntry(line)) //skip header and map each line to a map entry
.collect(Collectors.groupingBy(
Map.Entry::getKey,
Collectors.mapping(Map.Entry::getValue, Collectors.toList()))
);
} catch (IOException ex) {
ex.printStackTrace();
}
map.forEach((k,v) -> {System.out.println(k + " : " + v);});
}
static Entry<Tuple, SubjectScore> lineToEntry(String line) {
//split each line at the first and second comma producing an array with 3 columns
// first column with the name and second with year to create a tuple object
// evrything after the second comma as one column to create a SubjectScore object
String[] data = line.split(",", 3);
Tuple t = new Tuple(data[0].trim(), Integer.parseInt(data[1].trim()));
SubjectScore s = new SubjectScore(data[2]);
return new AbstractMap.SimpleEntry<>(t, s);
}
}
I don't know if you really need individual fields for each score in your SubjectScore
class. If I were you, I would prefer a list of integers. To do so just change your class to something like :
class SubjectScore {
private String subject;
private List<Integer> scores;
public SubjectScore(String row) {
String[] data = row.split(",");
this.subject = data[0];
this.scores = Arrays.stream(data, 1, data.length)
.map(item -> Integer.parseInt(item.trim()))
.collect(Collectors.toList());
}
}