Search code examples
javacsviojava-streamnio

Selecting a particular Column in a CSV-file Dynamically


I have this CSV file:

id,name,mark
20203923380,Lisa Hatfield,62
20200705173,Jessica Johnson,59
20205415333,Adam Harper,41
20203326467,Logan Nolan,77

And I'm trying to process it with this code:

 try (Stream<String> stream = Files.lines(Paths.get(String.valueOf(csvPath)))) {
                DoubleSummaryStatistics statistics = stream
                        .map(s -> s.split(",")[index]).skip(1)
                        .mapToDouble(Double::valueOf)
                        .summaryStatistics();
} catch (IOException e) // more code

I want to get the column by its name.

I guess I need to validate the index to be the index of the column the user enters as an integer, like this:

int index = Arrays.stream(stream).indexOf(columnNS);

But it doesn't work.

The stream is supposed to have the following values, for example:

Column: "mark"

62, 59, 41, 77


Solution

  • I need to validate the index to be the index of the column the user enters as an integer ... But it doesn't work.

    Arrays.stream(stream).indexOf(columnNS)
    

    There is no method indexOf in the Stream IPA. I'm not sure what did you mean by stream(stream) but this approach is wrong.

    In order to obtain the valid index, you need the name of the column. And based on the name, you have to analyze the very first line retrieved from the file. Like in your example with column name "mark", you need to find out whether this name is present in the first row and what its index is.

    What I want is to get the column by it's name ... The stream is supposed ...

    Streams are intended to be stateful. They were introduced in Java in order to provide to expressive and clear way of structuring the code. And even if you manage to cram stateful conditional logic into a stream, you'll lose this advantage and end up with convoluted code that is less clear performant than plain loop (remainder: iterative solution almost always performs better).

    So you want to keep your code clean, you can choose either: to solve this problem using iterative approach or relinquish the requirement to determine the index of the column dynamically inside the stream.

    That's how you can address the task of reading the file data dynamically based on the column name with loops:

    public static List<String> readFile(Path path, String columnName) {
        List<String> result = new ArrayList<>();
        try(var reader = Files.newBufferedReader(path)) {
            int index = -1;
            String line;
            while ((line = reader.readLine()) != null) {
                String[] arr = line.split("\\p{Punct}");
                if (index == -1) {
                    index = getIndex(arr, columnName);
                    continue; // skipping the first line
                }
                result.add(arr[index]);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return result;
    }
    // validation logic resides here
    public static int getIndex(String[] arr, String columnName) {
        int index = Arrays.asList(arr).indexOf(columnName);
        if (index == -1) {
            throw new IllegalArgumentException("Given column name '" + columnName + "' wasn't found");
        }
        return index;
    }
    // extracting statistics from the file data
    public static DoubleSummaryStatistics getStat(List<String> list) {
        return list.stream()
            .mapToDouble(Double::parseDouble)
            .summaryStatistics();
    }
    
    public static void main(String[] args) {
        DoubleSummaryStatistics stat = getStat(readFile(Path.of("test.txt"), "mark"));
    }