Search code examples
javafileparsinggraph-theoryjava-6

How to extract some data from a text file in Java SE 1.6?


Before anything, don't pay attention to the names of my methods and macros, I'm french and I'm mixing French and English in my code because it's easier like that for me

I'm working on graphs that are stocked in a text file as something like this (the whitespaces can be ignored):

A: (B, 4.5), (C, 5.8)
B: (A, 3)
C:

With my Graph implementation to get that same graph I would do something like that:

StdGraph<String> graph = new StdGraph<String>();
graph.addSommet("A"); //1 -> add the vertex named "A" to the graph
graph.addSommet("B"); //2 -> add the vertex named "B" to the graph
graph.addSommet("C"); //3 -> add the vertex named "C" to the graph
graph.addArete("A", "B", 4.5); //4 -> add an edge between A and B with a weight of 4.5
graph.addArete("A", "C", 5.8); //5 -> add an edge between A and C with a weight of 5.8
graph.addArete("B", "A", 3.); //6 -> add an edge between B and A with a weight of 3.0

//note that 3 and 4 can switch places, and technically 6 could go right after 2

It has to be in that order because I can't add an edge if one of the vertices doesn't exist. My Graph implementation works but I have no idea how to get the data from my text; plus I have to make sure it follows the right pattern because I need to test that the input file isn't something else.

I tried doing this to get a string from the file without the whitespaces to make it easier to parse later but I never used a Scanner before so I'm not sure if it would work:

private String fileToString(File f) {
        StringBuilder str = new StringBuilder("");
        Scanner s = null;
        try {
            s = new Scanner(f);

        } catch (FileNotFoundException e) {
            System.out.println("ERREUR => FileNotFoundException");
            e.printStackTrace();
        }
        while (s.hasNext()) {
            str.append(s.nextLine());
        }
        String res = str.toString();
        res.replaceAll("//s", "");
        return res;
    }

and my regex for the graphs is this (it's written this way so it's easier to read for my Uni group and I):

//Regex for a Vertex' name:
String SOMMETREGEX = "[a-zA-Z0-9\\-_/]*";
//Regex for an Edge's weight:
String POIDSREGEX = "[\\-+]?\\d+(\\.\\d+)?";
//Regex for one line composed of the vertex and its neighbors with the weight of the edges
String SOMANDSUIV = "("+SOMMETREGEX+":"
            + "(\\("+SOMMETREGEX+","+POIDSREGEX+"\\),)*"
            + "(\\("+SOMMETREGEX+","+POIDSREGEX+"\\))?\\n)";
//Regex for the full graph
String GRAPHREGEX = SOMANDSUIV + "*";

Lastly, I have this method which is what I need to complete but I don't know how:

private StdGraph<String> buildGraphFromFile(File f) {
        StdGraph<String> res = new StdGraph<String>();
        String stringOfFile = fileToString(f);
        if(!Pattern.matches(GRAPHREGEX, stringOfFile)) {
            return null;
        }
        
        // Stuff goes here but idk what
        
        return res;
    }

I've been searching for hours but I don't even know where to start because of the complexity of my regex and the fact that I'm completely inexperienced with files, scanners and Pattern matching...

If you have any suggestion I would love to hear it because I'm completely lost

I have to use java 1.6 so anything from after that is just not an option


Solution

  • If you could not find a proper parser, you could use a pattern similar to:

    "^\\s*([A-Z]{1,})\\s*:|\\s*(?:\\(\\s*([A-Z])\\s*,\\s*([^)\\r\\n]+)\\))"
    

    to find your nodes and weights.

    Code

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class RegularExpression {
        public static void main(String[] args) {
            final String regex = "^\\s*([A-Z]{1,})\\s*:|\\s*(?:\\(\\s*([A-Z])\\s*,\\s*([^)\\r\\n]+)\\))";
            final String string = "A: (B, 4.5), (C, 5.8)\n"
                    + "B: (A, 3)\n"
                    + "C:\n"
                    + "E: (B, 4.5), (E, 5.8), (F, 5.8)\n"
                    + "F: (A, 3)\n"
                    + "G: (A, 3) , (E, 3) , (F, 3), (G, 3)";
    
            final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
            final Matcher matcher = pattern.matcher(string);
    
            while (matcher.find()) {
                System.out.println("Full match: " + matcher.group(0));
    
                for (int i = 1; i <= matcher.groupCount(); i++) {
                    System.out.println("Group " + i + ": " + matcher.group(i));
                }
            }
        }
    }
    
    
    

    Prints

    Full match: A:
    Group 1: A
    Group 2: null
    Group 3: null
    Full match:  (B, 4.5)
    Group 1: null
    Group 2: B
    Group 3: 4.5
    Full match:  (C, 5.8)
    Group 1: null
    Group 2: C
    Group 3: 5.8
    Full match: B:
    Group 1: B
    Group 2: null
    Group 3: null
    Full match:  (A, 3)
    Group 1: null
    Group 2: A
    Group 3: 3
    Full match: C:
    Group 1: C
    Group 2: null
    Group 3: null
    Full match: E:
    Group 1: E
    Group 2: null
    Group 3: null
    Full match:  (B, 4.5)
    Group 1: null
    Group 2: B
    Group 3: 4.5
    Full match:  (E, 5.8)
    Group 1: null
    Group 2: E
    Group 3: 5.8
    Full match:  (F, 5.8)
    Group 1: null
    Group 2: F
    Group 3: 5.8
    Full match: F:
    Group 1: F
    Group 2: null
    Group 3: null
    Full match:  (A, 3)
    Group 1: null
    Group 2: A
    Group 3: 3
    Full match: G:
    Group 1: G
    Group 2: null
    Group 3: null
    Full match:  (A, 3)
    Group 1: null
    Group 2: A
    Group 3: 3
    Full match:  (E, 3)
    Group 1: null
    Group 2: E
    Group 3: 3
    Full match:  (F, 3)
    Group 1: null
    Group 2: F
    Group 3: 3
    Full match:  (G, 3)
    Group 1: null
    Group 2: G
    Group 3: 3
    

    Note

    • You can code the rest.
    • The first group is the node and the other groups are the adjacent nodes.