Search code examples
javanumberformatexception

JAVA- Integer.parseInt( str ) gives NumberFormatException, input is a str representing an integer


UPDATE - chardetect considers the srt file to be encoded as UTF-8 with a confidence score of 1.0. I have been reading in the file with: Files.readAllLines set to interpret it as utf-8. With the help of others here, I determined the single digit string in fact has a length of 2... the trick now is to figure out where that extra character is coming from. The following is a en excerpt of the hex dump:

00000000: efbb bf31 0d0a 3030 3a30 303a 3034 2c35  ...1..00:00:04,5
00000010: 3031 202d 2d3e 2030 303a 3030 3a30 362c  01 --> 00:00:06,
00000020: 3439 300d 0ae3 8199 e381 9fe3 8198 e381  490.............
00000030: 8ae3 8198 e381 b6e3 828a e381 95e3 818f  ................
00000040: e381 b2e3 8293 0d0a 2d20 4c75 7069 6e20  ........- Lupin 
00000050: 3230 3033 2070 7265 7365 6e74 7320 2d0d  2003 presents -.
00000060: 0a0d 0a32 0d0a 3030 3a30 303a 3036 2c35  ...2..00:00:06,5
00000070: 3030 202d 2d3e 2030 303a 3030 3a30 382c  00 --> 00:00:08,
00000080: 3032 340d 0a41 2053 7475 6469 6f20 4768  024..A Studio Gh
00000090: 6962 6c69 2046 696c 6d0d 0a0d 0a33 0d0a  ibli Film....3..
000000a0: 3030 3a30 303a 3133 2c30 3437 202d 2d3e  00:00:13,047 -->
000000b0: 2030 303a 3030 3a31 352c 3132 360d 0a3c   00:00:15,126..<
000000c0: 666f 6e74 2063 6f6c 6f72 3d22 2338 3838  font color="#888
000000d0: 3838 3822 3ee3 81a1 e381 b2e3 828d e380  888">...........
000000e0: 80e3 8192 e382 93e3 818d e381 a7e3 81ad  ................
000000f0: e380 82e3 8080 e381 bee3 819f e380 80e3  ................
00000100: 8182 e381 8ae3 8186 e381 ade3 8080 e382  ................

Original question:

I'm reading in a .srt file using java.nio.file.Files.readAllLines().

In .srt files, every subtitle has a number line - an integer that indexes the subtitle. As captured, this line is a string. When I use Integer.parsint( numberLineString ), I get

java.lang.NumberFormatException

I've troubleshot this as best I can, by:

  • homing in on a very specific subtitle so I know this isn't being caused by some errant subtitle that might have an erroneous index with non digit characters in it.

  • removing any potential \n or \r away from the index number

  • printing the variable passed to Integer.parsint() to verify it is definitely the index string, representing an integer.

As you can see, the exception thrown is even confessing it threw the exception for the input string "1"

Exception in thread "main" java.lang.NumberFormatException: For input string: "1"
    at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:68)
    at java.base/java.lang.Integer.parseInt(Integer.java:658)
    at java.base/java.lang.Integer.parseInt(Integer.java:776)
    at japanese.engine.kana.JPSubsParser.parseTitles(JPSubsParser.java:218)
    at japanese.engine.kana.srtManager.main(srtManager.java:25)

I'm at a loss - any help would be appreciated.

Here is an extract from the file being read:

1
00:00:04,501 --> 00:00:06,490
すたじおじぶりさくひん
- Lupin 2003 presents -

2
00:00:06,500 --> 00:00:08,024
A Studio Ghibli Film

3
00:00:13,047 --> 00:00:15,126
<font color="#888888">ちひろ げんきでね。 また あおうね りさ</font>
<font color="#8888FF">Good Luck, Chihiro. We'll meet again</font>

Here is the relevant Java code:

    public int parseTitles( Map<String, ArrayList<String>> subsMap ) {
        /* Setup, get iterator for the Map
         * Instantiate a charMatch object for checking for the various characters */
        Iterator<String> keyIter = subsMap.keySet().iterator(); 
        JPcharMatch charMatch = new JPcharMatch();

        // Loop through the map 
        int tempCounter = 0;
        while (keyIter.hasNext() && tempCounter == 0) {

            //get the current title
            ArrayList<String> titleLines = subsMap.get(keyIter.next());

            String trimmedLine0 = titleLines.get(0).replace("\n", "").replace("\r", "").replace("\"", "");

            int indexLineInt = Integer.parseInt( trimmedLine0 );
public class srtManager {


    public static void main( String[] args ) {

        Path subsFilePath = Paths.get("/Volumes/Multimedia/Coding/experimenting_srts/Sen to Chihiro Kanji+Hir+Eng.srt");

        SubsFileDAO subsFileDAO = new SubsFileDAO(subsFilePath);
        List<String> fileLines = subsFileDAO.getLinesList();

        JPSubsParser parser = new JPSubsParser( fileLines );

        Map<String, ArrayList<String>> subsMap = parser.getSubsMap();

        // TODO analyze to detect kanji lines and store.
        parser.parseTitles(subsMap);


    }
}

Solution

  • You are trying to interpret BOM at beginning of file as a number.

    You can remove it by hand before parsing:

    String BOM = new String(new byte[] { (byte) 0xef, (byte) 0xbb, (byte) 0xbf});
    
    int indexLineInt = Integer.parseInt( trimmedLine0.replaceAll(BOM, ""));
    

    Alternatively, there are solutions like BOMInputStream that will remove BOM transparently from provided InputStream