Search code examples
javaspring-bootbackendjhipsterimport-csv

Problems when importing a CSV file and persisting information in a database using Java


I'm having a problem with a CSV data import code into the database. I'm using Java 11 with SpringBoot 2.7.3 REST API

I have a Report class that has a Many-To-Many one-way relationship with the SensorError class, where only Report knows about SensorError and not the other way around:

@ManyToMany @JoinTable( name = "rel_report__sensor_error", joinColumns = @JoinColumn(name = "report_id"), inverseJoinColumns = @JoinColumn(name = "sensor_error_id") ) private Set<SensorError> sensorErrors = new HashSet<>();

I created a code to import CSV data:

public static List<Report> csvToReports(InputStream is, ReportRepository reportRepository,
        SensorErrorRepository sensorErrorRepository) {
    try (BufferedReader fileReader = new BufferedReader(new InputStreamReader(is, Charset.defaultCharset()));
            CSVParser csvParser = new CSVParser(fileReader,
                    CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim());) {
        // Create an empty list to hold the SiraReports that will be parsed from the CSV
        // file
        List<Report> reports = new ArrayList<Report>();

        // Parse the SensorError objects from the CSVRecord
        Set<SensorError> sensorErrors = new HashSet<>();

        // Get an Iterable of CSVRecords from the CSVParser
        Iterable<CSVRecord> csvRecords = csvParser.getRecords();

        for (CSVRecord csvRecord : csvRecords) {

            Report report = new Report(csvRecord.get("final_state"), Instant.parse(csvRecord.get("datetime_begin")),
                    Instant.parse(csvRecord.get("datetime_end")), csvRecord.get("cutting_time"),
                    csvRecord.get("total_time"));

            // Check if sensor_name, error_code and error_detail fields have values
            if (!csvRecord.get("sensor_name").isEmpty() || !csvRecord.get("error_code").isEmpty()
                    || !csvRecord.get("error_detail").isEmpty()) {
                // Create a new SensorError object using the data from the CSVRecord
                SensorError sensorError = new SensorError(csvRecord.get("sensor_name"), csvRecord.get("error_code"),
                        csvRecord.get("error_detail"));

                // Add the SensorError object to the Set
                sensorErrors.add(sensorError);
            }

            // Set the SensorError objects for the SiraReport object
            report.setSensorErrors(sensorErrors);

            // Save the SensorError objects to the database using the SensorErrorRepository
            sensorErrorRepository.saveAll(sensorErrors);

            // Add the Report object to the List
            reports.add(report);
        }

        // Return the List of Reports
        return reports;
    } catch (IOException e) {
        throw new RuntimeException("Failed to parse CSV file: " + e.getMessage());
    }
}

This code works partially for my interest, in my CSV file:

final_state,datetime_begin,datetime_end,cutting_time,total_time,sensor_name,error_code,error_detail
TESTESENSOR,2022-09-30T18:15:00Z,2022-09-30T18:35:00Z,30min,40min, "sensor operador","operando","falha parcial","sensor navalha","erro mundial","falha total"

The code saves only the first SensorError Information inside a Report: ""sensor operador","operando","falha parcial"", but not the second information: ""sensor navalha","erro mundial","falha total"".

I already tried some things, like adding: Set sensorErrors = new HashSet<>();, inside the Loop, but then it saves the second SensorError Information and not the first one anymore. I also tried changing it to a List instead of Set, but it didn't solve the problem.

How to make it import more than one existing SensorError information in the CSV file to a single SiraReport?


Solution

  • Well, due to the questions made here in this topic, I could observe that my error was initially, in how the data were being sent, because if for 1 Report I have several SensorError information, then I would have to duplicate the data lines of the Report and include information other than SensorError to look like this:

    final_state,datetime_begin,datetime_end,cutting_time,total_time,sensor_name,error_code,error_detail TESTESENSOR,2022-09-30T18:15:00Z,2022-09-30T18:35:00Z,30min,40min, "sensor operador","operando","falha parcial" TESTESENSOR,2022-09-30T18:15:00Z,2022-09-30T18:35:00Z,30min,40min, "sensor navalha","erro mundial","falha total"

    From there, I had to review my code due to questions from @g00se and @Tom Elias, arriving at this code that works:

    public static List<Report> csvToReports(InputStream is, ReportRepository reportRepository, SensorErrorRepository sensorErrorRepository) {
        try (
            BufferedReader fileReader = new BufferedReader(new InputStreamReader(is, Charset.defaultCharset()));
            CSVParser csvParser = new CSVParser(fileReader, CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim());
        ) {
            Map<String, Report> reportMap = new HashMap<>();
            List<Report> reports = new ArrayList<>();
    
            Iterable<CSVRecord> csvRecords = csvParser.getRecords();
    
            for (CSVRecord csvRecord : csvRecords) {
                String datetimeBegin = csvRecord.get("datetime_begin");
                String datetimeEnd = csvRecord.get("datetime_end");
                String reportKey = datetimeBegin + datetimeEnd; // Create a new key using final_state and datetime_begin
    
                Report report = reportMap.get(reportKey);
    
                if (report == null) {
                    report = new Report(
                        csvRecord.get("final_state"),
                        Instant.parse(datetimeBegin),
                        Instant.parse(datetimeEnd),
                        csvRecord.get("cutting_time"),
                        csvRecord.get("total_time"),
                    );
                }
    
                // Check if sensor_name, error_code and error_detail fields have values
                if (!csvRecord.get("sensor_name").isEmpty() || !csvRecord.get("error_code").isEmpty() || !csvRecord.get("error_detail").isEmpty()) {
                    // Create a new SensorError object using the data from the CSVRecord
                    SensorError sensorError = new SensorError(
                        csvRecord.get("sensor_name"),
                        csvRecord.get("error_code"),
                        csvRecord.get("error_detail")
                    );
    
                    // Add the SensorError object to the Report
                    report.addSensorError(sensorError);
    
                    // Save the SensorError object to the database using the SensorErrorRepository
                    sensorErrorRepository.save(sensorError);
                }
    
                // Save the Report object to the database using the ReportRepository
                reportRepository.save(report);
    
                // Add the Report object to the Map and List
                reportMap.put(reportKey, report);
                reports.add(report);
            }
    
            // Return the List of Reports
            return reports;
        } catch (IOException e) {
            throw new RuntimeException("Failed to parse CSV file: " + e.getMessage());
        }
    }
    

    I had to create a method that performs an identification if the lines sent in the CSV refer to the same Report and add the SensorError information to the same Report object.

    So now the code creates a new identifier using the datetime_begin and datetime_end field combination to check if there is an existing Report with the datetime_begin and datetime_end field combination. If there is no existing Report, it creates a new Report.

    In summary, it will check the duplicated lines with Report information, if it is the same Report object and then it will register the different SensorError information for that Report object.

    I am aware that a better check would be by ID, but this is generated sequentially by the application.