Search code examples
javafilejava.util.scannerpunctuation

Punctuation and the .next() method


Does anyone know how the Scanner's .next() method treats punctuation? I couldn't find the answer to this anywhere. I have a program that's reading each word in from a text file and I am unsure of how it treats parts like "that's" or "they are," or "her."

For periods and commas, are they counted as a separate entity or are they considered part of the word if it occurs like "her." or "her,"? Depending on what it does, are "her" and "her." or "her" and "her," considered two different words by the Scanner?

For apostrophes, do they get accounted for or do they effectively split the word in two? For example, would "they're" be recognized as "they" "'" "re" or would it be recognized as "they're" altogether?

I hope I came across clearly on this question.


Solution

  • Scanner has a useDelimiter method which lets you specify which characters will be considered as 'word breakers'. The default delimiter is the whitespace pattern (so punctuations symbols will be included in the word)