Search code examples
javaparsingjava.util.scannerbuffering

Parse text- Scanner or BufferedReader?


For my data structures class, the first project requires a text file of songs to be parsed.

An example of input is:
ARTIST="unknown"
TITLE="Rockabye Baby"
LYRICS="Rockabye baby in the treetops
When the wind blows your cradle will rock
When the bow breaks your cradle will fall
Down will come baby cradle and all
"

I'm wondering the best way to extract the Artist, Title and Lyrics to their respective string fields in a Song class. My first reaction was to use a Scanner, take in the first character, and based on the letter, use skip() to advance the required characters and read the text between the quotation marks.

If I use this, I'm losing out on buffering the input. The full song text file has over 422K lines of text. Can the Scanner handle this even without buffering?


Solution

  • For something like this, you should probably just use Regular Expressions. The Matcher class supports buffered input.

    The find method takes an offset, so you can just parse them at each offset.

    http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html

    Regex is a whole world into itself. If you've never used them before, start here http://download.oracle.com/javase/tutorial/essential/regex/ and be prepared. The effort is so very worth the time required.