Search code examples
javaregexreplacefile-processing

Is there a way to process data from a text file containing main headings using a regular expression?


Below is a snippet of the text file format structure

Historical Sales for: 12th of October  2019, 11:37 am

PRODUCT NAME      QUANTITY
Coke B            5

Historical Sales for: 21st of October  2019, 8:15 pm

PRODUCT NAME      QUANTITY
Peanuts           2

I want to process only the column labels and row values, but not including the main heading; in this case, the Historical Sales for: 12th of October 2019, 11:37 am.

This is the code I wrote to process the text using the regex (\\b)

        StringBuilder temporary = new StringBuilder();
   
        InputStream inputStream = new FileInputStream(new File(FILE_NAME));            
        BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
        
        String next; 
        
        while ((next = readFile.readLine()) != null) {
           temporary.append(next).append("\n");
        }   

        next = String.format("%13s", ""); // spacing for column headers          
        System.out.println(temporary.toString().replaceAll("(\\b)", next));

Solution

  • If your intention is to print just the lines:

    PRODUCT NAME      QUANTITY
    Chips             2
    Coke B            5
    

    And similares. I suggest you use Java 8 streams and use the regex bellow to remove the unwanted lines:

    public static void main(String[] args) throws Exception {
        String collect = Files.lines(Paths.get("file.txt"))
                .filter(line -> !line.matches("^Historical Sales for.*$") && !line.matches("^\\s*$"))
                .map(line -> line+="\n")
                .collect(Collectors.joining());
        System.out.println(collect);
    }
    

    This way you'll have:

    PRODUCT NAME      QUANTITY
    Chips             2
    Coke B            5
    PRODUCT NAME      QUANTITY
    (...)
    

    One advantage of using Streams is the .collect() method that allows you to parse the string directly into a List.

    If you want to keep your example, you could do:

    StringBuilder temporaryData = new StringBuilder();
    
    InputStream inputStream = new FileInputStream(new File("file.txt"));
    BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
    
    String next;
    
    while ((next = readFile.readLine()) != null) {
        temporaryData.append(next).append("\n");
    }
    
    next = String.format("%13s", ""); // spacing for column headers
    String formattedString = temporaryData.toString().replaceAll("(\\b{3})", next);
    String stringWithoutHeaders = formattedString.replaceAll("^Historical Sales for.*$", "").replaceAll("^\\s*$", "");
    System.out.println(stringWithoutHeaders);