Search code examples
javaregexnumberformatexception

Java Not crashing on double comma "malformed line"


My task at hand is to be able to process lines based on a text file from their addresses and sort them into their respective categories, "east", "west", "broadway", "avenue" and "bad ids". The code below does this 100% properly until it is faced with a malformed line that contains a double comma, while i could replace all double commas with a singular comma that doesnt entirely fix the issue as this line is to be considered "malformed" and should then be added to the badId's category but instead it causes a NumberFormatException Full Error and code below. What i'm wondering is if its possible to ignore the double comma in a way that doesnt cause this exception but still be able to parse through the rest of the file adding this line to the badId's array as intended.

Text File Read From

123-ABC-4567, 15 W. 15th St., 50.1
456-BGT-9876,22 Broadway,24
QAZ-456-QWER, 100 East 20th Street,50
Q2Z-457-QWER, 200 East 20th Street, 49
678-FGH-9845 ,,45 5th Ave, 12.2,
678-FGH-9846 ,45 5th Ave, 12.2

123-ABC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24

Error

java.lang.NumberFormatException: For input string: "45 5th Ave"
    at sun.misc.FloatingDecimal.readJavaFormatString(Unknown Source)
    at sun.misc.FloatingDecimal.parseFloat(Unknown Source)
    at java.lang.Float.parseFloat(Unknown Source)
    at java.lang.Float.valueOf(Unknown Source)
    at csi311.HelloCsi311.readFile(HelloCsi311.java:99)
    at csi311.HelloCsi311.run(HelloCsi311.java:28)
    at csi311.HelloCsi311.main(HelloCsi311.java:240)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at edu.rice.cs.drjava.model.compiler.JavacCompiler.runCommand(JavacCompiler.java:267)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at edu.rice.cs.dynamicjava.symbol.JavaClass$JavaMethod.evaluate(JavaClass.java:362)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.handleMethodCall(ExpressionEvaluator.java:92)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.visit(ExpressionEvaluator.java:84)
    at koala.dynamicjava.tree.StaticMethodCall.acceptVisitor(StaticMethodCall.java:121)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.value(ExpressionEvaluator.java:38)
    at edu.rice.cs.dynamicjava.interpreter.ExpressionEvaluator.value(ExpressionEvaluator.java:37)
    at edu.rice.cs.dynamicjava.interpreter.StatementEvaluator.visit(StatementEvaluator.java:106)
    at edu.rice.cs.dynamicjava.interpreter.StatementEvaluator.visit(StatementEvaluator.java:29)
    at koala.dynamicjava.tree.ExpressionStatement.acceptVisitor(ExpressionStatement.java:101)
    at edu.rice.cs.dynamicjava.interpreter.StatementEvaluator.evaluateSequence(StatementEvaluator.java:66)
    at edu.rice.cs.dynamicjava.interpreter.Interpreter.evaluate(Interpreter.java:77)
    at edu.rice.cs.dynamicjava.interpreter.Interpreter.interpret(Interpreter.java:47)
    at edu.rice.cs.drjava.model.repl.newjvm.InterpreterJVM.interpret(InterpreterJVM.java:249)
    at edu.rice.cs.drjava.model.repl.newjvm.InterpreterJVM.interpret(InterpreterJVM.java:222)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
    at sun.rmi.transport.Transport$1.run(Unknown Source)
    at sun.rmi.transport.Transport$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

Code

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.ArrayList;
/**
 * Hello world example.  Shows passing in command line arguments, in this case a filename. 
 * If the filename is given, read in the file and echo it to stdout.
 */
public class HelloCsi311 {

    /**
     * Class construtor.
     */
    public HelloCsi311() {
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    public void run(String filename) throws Exception {
     if (filename != null) {
      readFile(filename); 
     }
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    private void readFile(String filename) throws Exception {
     System.out.println("Processing file: " + filename); 
     // Open the file and connect it to a buffered reader.
     BufferedReader br = new BufferedReader(new FileReader(filename));  
     ArrayList<String> foundaddr = new ArrayList<String>();
     ArrayList<String> broadway = new ArrayList<String>();
     ArrayList<String> ave = new ArrayList<String>();
     ArrayList<String> east = new ArrayList<String>();
     ArrayList<String> west = new ArrayList<String>();
     ArrayList<String> overlb = new ArrayList<String>();
     ArrayList<String> badId = new ArrayList<String>();
     String line = null;  
     String pattern = "^\\d\\d\\d-[A-Za-z][A-Za-z][A-Za-z]-\\d\\d\\d\\d";
     String west1 = "^\\d{1,4}\\s\\b(West|west)\\b\\s\\d{1,3}\\w+\\s\\b(St|st)\\B";
     String west2 = "^\\d{1,4}\\s\\b(W|w)\\.\\s\\d{1,3}\\w+\\s\\b(St|st)\\.";
     String west3 = "^\\d{1,4}\\s\\b(W|w)\\s\\d{1,3}\\w+\\s\\b(St|st)";
     String east1 = "^\\d{1,4}\\s\\b(East|east)\\b\\s\\d{1,3}\\w+\\s\\b(St|st)\\B";
     String east2 = "^\\d{1,4}\\s\\b(E|e)\\.\\s\\d{1,3}\\w+\\s\\b(St|st)";
     String east3 = "^\\d{1,4}\\s\\b(E|e)\\s\\d{1,3}\\w+\\s\\b(St|st)";
     String broad1 = "^\\d{1,4}\\s\\b(B|b)\\B(Way|way)";
     String broad2 = "^\\d{1,4}\\s\\b(B|b)\\b(.|')(Way|way)";
     String broad3 = "^\\d{1,4}\\s\\b(Broadway|broadway)";
     String avenue1 = "^\\d{1,4}\\s\\w+\\s\\b(Ave|ave)";
     String avenue2 = "^\\d{1,4}\\s\\w+\\s\\b(Ave.|ave.)";
     String avenue3 = "^\\d{1,4}\\s\\w+\\s\\b(Avenue|avenue)";
     Pattern r = Pattern.compile(pattern);
     Pattern z = Pattern.compile(east1);
     Pattern zz = Pattern.compile(east2);
     Pattern zzz = Pattern.compile(east3);
     Pattern we = Pattern.compile(west1);
     Pattern wee = Pattern.compile(west2);
     Pattern weee = Pattern.compile(west3);
     Pattern broadc = Pattern.compile(broad1);
     Pattern broadcc = Pattern.compile(broad2);
     Pattern broadccc = Pattern.compile(broad3);
     Pattern avec = Pattern.compile(avenue1);
     Pattern avecc = Pattern.compile(avenue2);
     Pattern aveccc = Pattern.compile(avenue3);
     // Get lines from the file one at a time until there are no more.
     while ((line = br.readLine()) != null) {
       if(line.trim().isEmpty()) {
         continue;
       }
       String sample = line.replaceAll("\\s+,", ",").replaceAll(",+\\s",",");
       String[] result = sample.split(",");
       String pkgId = result[0].trim().toUpperCase();
       String pkgAddr = result[1].trim();
             // System.out.println(sample);
       //System.out.println(pkgId);
       //System.out.println(pkgAddr);
         Matcher easts = z.matcher(pkgAddr);
         Matcher eastss = zz.matcher(pkgAddr);
         Matcher eastsss = zzz.matcher(pkgAddr);
         Matcher wests = we.matcher(pkgAddr);
         Matcher westss = wee.matcher(pkgAddr);
         Matcher westsss = weee.matcher(pkgAddr);
         Matcher broadways = broadc.matcher(pkgAddr);
         Matcher broadwayss = broadcc.matcher(pkgAddr);
         Matcher broadwaysss = broadccc.matcher(pkgAddr);
         Matcher avenues = avec.matcher(pkgAddr);
         Matcher avenuess = avecc.matcher(pkgAddr);
         Matcher avenuesss = aveccc.matcher(pkgAddr);
         Float f = Float.valueOf(result[2]);
       for(String str : result){
         //System.out.println(str);
         // Trying to match for different types
         Matcher m = r.matcher(str);
         // REMEMBER TO ADD BROADWAYS AND AVENUES HERE TOO AND FIX SO IT DOESNT HAVE ALL THE IDS
         if (!pkgId.matches(pattern) || !pkgAddr.matches(west1) && !pkgAddr.matches(west2) && !pkgAddr.matches(west3)
            && !pkgAddr.matches(east1) && !pkgAddr.matches(east2) && !pkgAddr.matches(east3) && !pkgAddr.matches(broad1) 
            && !pkgAddr.matches(broad2) && !pkgAddr.matches(broad3) && !pkgAddr.matches(avenue1) && !pkgAddr.matches(avenue2)
            && !pkgAddr.matches(avenue3)) {
           if(!badId.contains(pkgId)){
             badId.add(pkgId);
           }
           //System.out.println(pkgId);
         } 
           if(f < 50){
           //System.out.println(str);
           if(m.find()) {
             //System.out.println(str);
             //System.out.println(pkgAddr); 
             if(avenues.find() || avenuess.find() || avenuesss.find()){
               if(!ave.contains(pkgAddr)){
                 ave.add(pkgAddr);
               }
             }

             if(broadways.find() || broadwayss.find() || broadwaysss.find()){
               if(!broadway.contains(pkgAddr)){
                 broadway.add(pkgAddr);
               }
             }

         if(easts.find() || eastss.find() || eastsss.find()){
           if(!east.contains(pkgAddr)){
             east.add(pkgAddr);
           }
                       }

         if(wests.find() || westss.find() || westsss.find()){
           if(!west.contains(pkgAddr)){
             west.add(pkgAddr);
           }
                       }
           }        
           //System.out.println(str);
         }
          if(f > 50){
            if(avenues.find() || avenuesss.find() || avenuesss.find()){
              if(!ave.contains(pkgAddr)){
                ave.add(pkgAddr);
                if(!overlb.contains(pkgId)){
                  overlb.add(pkgId);
                }
              }
            }

            if(broadways.find() || broadwayss.find() || broadwaysss.find()){
              if(!broadway.contains(pkgAddr)){
                broadway.add(pkgAddr);
                if(!overlb.contains(pkgId)){
                  overlb.add(pkgId);
                }
              }
            }            
          // System.out.println(str);
                    if(easts.find() || eastss.find() || eastsss.find()){
           if(!east.contains(pkgAddr)){
             east.add(pkgAddr);
                      if(!overlb.contains(pkgId)){
           //System.out.println(pkgId);
           overlb.add(pkgId);
           }
           }
                       }
         if(wests.find() || westss.find() || westsss.find()){
           if(!west.contains(pkgAddr)){
             west.add(pkgAddr);
                      if(!overlb.contains(pkgId)){
           //System.out.println(pkgId);
           overlb.add(pkgId);
           }
           }
                       }
         //System.out.println(str);


         }
       }

     }

     if(west != null) {
      // System.out.println(east);
       System.out.println("West: " + west.size());
     }

     if(east != null){
      // System.out.println(west);
       System.out.println("East: " + east.size());
     } 

     if(ave != null){
       //System.out.println(ave);
       System.out.println("Ave: " + ave.size());
     }

     if(broadway != null){
       //System.out.println(broadway);
       System.out.println("Bway: " + broadway.size());
     }

      if(overlb != null){
       // System.out.println(overlb);
        System.out.println(">50lbs: " + overlb.size());
      }

      if(badId != null){
        System.out.println("Ids?: " + badId);

      }


     // Close the buffer and the underlying file.
     br.close();
    }



    /**
     * @param args filename
     */
    public static void main(String[] args) {
     // Make an instance of the class.
     HelloCsi311 theApp = new HelloCsi311();
     String filename = null; 
     // If a command line argument was given, use it as the filename.
     if (args.length > 0) {
      filename = args[0]; 
     }
     try { 
      // Run the run(), passing in the filename, null if not specified.
      theApp.run(filename);
     }
     catch (Exception e) {
      // If anything bad happens, report it.
      System.out.println("Something bad happened!");
      e.printStackTrace();
     }

    }
}

Expected output

Processing file: test.in
West:   1
East:   0
Ave:    1
Bway:   2
>50lbs: 1
Ids?:   [QAZ-456-QWER, Q2Z-457-QWER, 678-FGH-9845, 123-ABC-9999]

Solution

  • The issue here is in the line:

    Float f = Float.valueOf(result[2]);
    

    Here you are attempting to cast the value of the 2nd index into a Float.

    In the first four rows of data, the casting is without issue because the values being casted were 50.1,24,50.

    However because of the 'double comma', which will actually be parsed as an empty string, the casting is now instead on 45 5th Ave, which will then throw the NumberFormatException.

    Added below portion after query in comments about filtering out empty values in array:

    You can filter out empty values in the array with the following (Java 8 and above):

    String[] filteredResult = Arrays.stream(result).filter(o -> !o.isEmpty()).toArray(String[]::new);

    That being said.. this solution is specifically for the problem that you're facing in this scenario, and probably not a good solution.

    The actual solution is to actually sanitize your data before you start parsing it.