I have 2 CSV files (district1.csv, district2.csv) in a directory, each containing a column schoolCode
.
When I read both CSV files with the Apache commons CSV library, I am reading the distinct values of the schoolCode
column and counting up the results.
Here is my code:
public void getDistinctRecordCount() throws IOException {
Set<String> uniqueSchools = new HashSet<>();
int numOfSchools;
String SchoolCode;
//Filter to only read csv files.
File[] files = Directory.listFiles(new FileExtensionFilter());
for (File f : files) {
CSVParser csvParser;
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim();
reader = Files.newBufferedReader(Paths.get(Directory + "\\" + f.getName() ), StandardCharsets.ISO_8859_1);
csvParser = CSVParser.parse(reader, csvFormat);
for (CSVRecord column : csvParser) {
SchoolCode = column.get("School Code");
uniqueSchools.add(SchoolCode);
}
Logger.info("The list of Schools for " + f.getName() + " are: " + uniqueSchools);
numOfSchools = uniqueSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
Logger.info("-----------------------");
}
}
Here is my output:
[INFO ] [Logger] - The list of Schools for district1.csv are: [01-0003-002, 01-0003-001]
[INFO ] [Logger] - The total count of Schools for district1.csv are: 2
[INFO ] [Logger] - The list of Schools for district2.csv are: [01-0003-002, 01-0003-001, 01-0018-004, 01-0018-005, 01-0018-002, 01-0018-003, 01-0018-008, 01-0018-006]
[INFO ] [Logger] - The total count of Schools for district2.csv are: 8
Problem: The two values read in from the district1.csv result are appended to the district2.csv result, throwing off my count by 2 for district2.csv (actual correct value should be 6). How is it being appended?
If you don't need set of all schools you can just move uniqueSchools
inside loop or clear it:
for (File f : files) {
uniqueSchools.clear();
You can also save in Map<String, String>
the schools per file or create a set per file, log the count and then addAll set to uniqueSchools
Set<String> currentSchools = new HashSet<>();
..
currentSchools.add(SchoolCode);
Logger.info("The list of Schools for " + f.getName() + " are: " + currentSchools);
numOfSchools = currentSchools.size();
Logger.info("The total count of Schools for " + f.getName() + " are: " + numOfSchools);
uniqueSchools.addAll(currentSchools);
SchoolCode
to schoolCode
and Logger
to logger