I'm building a lexical analyzer in java. This is what I have now:
import java.io.*;
enum TokenType{ NUM,SOMA, MULT,APar,FPar, EOF}
class Token{
char lexema;
TokenType token;
Token (char l, TokenType t)
{ lexema=l;token = t;}
}
class AnaliseLexica {
BufferedReader arquivo;
AnaliseLexica(String a) throws Exception
{
this.arquivo = new BufferedReader(new FileReader(a));
}
Token getNextToken() throws Exception
{
Token token;
int eof = -1;
char currchar;
int currchar1;
do{
currchar1 = arquivo.read();
currchar = (char) currchar1;
} while (currchar == '\n' || currchar == ' ' || currchar =='\t' || currchar == '\r');
if(currchar1 != eof && currchar1 !=10)
{
if (currchar >= '0' && currchar <= '9')
return (new Token (currchar, TokenType.NUM));
else
switch (currchar){
case '(':
return (new Token (currchar,TokenType.APar));
case ')':
return (new Token (currchar,TokenType.FPar));
case '+':
return (new Token (currchar,TokenType.SOMA));
case '*':
return (new Token (currchar,TokenType.MULT));
default: throw (new Exception("Caractere inválido: " + ((int) currchar)));
}
}
arquivo.close();
return (new Token(currchar,TokenType.EOF));
}
With this code I can read numbers from '0' to '9' and operators like '*', '+' using this part of the code:
do{
currchar1 = arquivo.read();
currchar = (char) currchar1;
} while (currchar == '\n' || currchar == ' ' || currchar =='\t' || currchar == '\r');
How could I read natural numbers from the file and continue reading the arithmetic operators?
Since spaces are valid separators of your tokens, you can make your code simpler. The Scanner class will, by default, separate the read values by spaces. You just need to read one by one. When the scanner does not have any more data to read, we close it and return an EOF Token.
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;
public class AnalisadorLexico {
public enum TokenType {
NUM,
SOMA,
MULT,
APar,
FPar,
EOF
}
public class Token {
String lexema;
TokenType token;
Token( String l, TokenType t ) {
lexema = l;
token = t;
}
Token( char l, TokenType t ) {
lexema = String.valueOf( l );
token = t;
}
@Override
public String toString() {
return lexema + " (" + token + ")";
}
}
private Scanner fileReader;
private boolean scannerClosed;
public AnalisadorLexico( String filePath ) throws IOException {
fileReader = new Scanner( new FileReader( filePath ) );
}
public Token getNextToken() throws IOException {
if ( !scannerClosed && fileReader.hasNext() ) {
String currentData = fileReader.next();
try {
Integer.parseInt( currentData );
return new Token( currentData, TokenType.NUM );
} catch ( NumberFormatException exc ) {
}
switch ( currentData ) {
case "(":
return new Token( currentData,TokenType.APar );
case ")":
return new Token( currentData,TokenType.FPar );
case "+":
return new Token( currentData,TokenType.SOMA );
case "*":
return new Token( currentData,TokenType.MULT );
}
} else {
scannerClosed = true;
fileReader.close();
return new Token( "", TokenType.EOF );
}
return null;
}
public static void main( String[] args ) throws IOException {
AnalisadorLexico al = new AnalisadorLexico( "testAL.txt" );
Token t = null;
while ( ( t = al.getNextToken() ).token != TokenType.EOF ) {
System.out.println( t );
}
System.out.println( al.getNextToken() );
System.out.println( al.getNextToken() );
System.out.println( al.getNextToken() );
System.out.println( al.getNextToken() );
}
}
If you can't use the class Scanner, you can keep using the BufferedReader, tokenizing its data:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class AnalisadorLexico2 {
public enum TokenType {
NUM,
SOMA,
MULT,
APar,
FPar,
EOF
}
public class Token {
String lexema;
TokenType token;
Token( String l, TokenType t ) {
lexema = l;
token = t;
}
Token( char l, TokenType t ) {
lexema = String.valueOf( l );
token = t;
}
@Override
public String toString() {
return lexema + " (" + token + ")";
}
}
private BufferedReader fileReader;
private boolean fileReaderClosed;
public AnalisadorLexico2( String filePath ) throws IOException {
fileReader = new BufferedReader( new FileReader( filePath ) );
}
public Token getNextToken() throws IOException {
String currentData = nextBufferedReaderToken();
if ( currentData != null ) {
try {
Integer.parseInt( currentData );
return new Token( currentData, TokenType.NUM );
} catch ( NumberFormatException exc ) {
}
switch ( currentData ) {
case "(":
return new Token( currentData,TokenType.APar );
case ")":
return new Token( currentData,TokenType.FPar );
case "+":
return new Token( currentData,TokenType.SOMA );
case "*":
return new Token( currentData,TokenType.MULT );
}
} else {
if ( !fileReaderClosed ) {
fileReaderClosed = true;
fileReader.close();
}
return new Token( "", TokenType.EOF );
}
return null;
}
public String nextBufferedReaderToken() throws IOException {
boolean started = false;
String data = null;
while ( !fileReaderClosed ) {
int d = fileReader.read();
char c = (char) d;
if ( d != -1 ) {
if ( c == '\n' || c == ' ' || c == '\t' || c == '\r' ) {
if ( !started ) {
// discard...
} else {
break;
}
} else {
if ( !started ) {
data = "";
started = true;
}
data += c;
}
} else {
break;
}
}
return data;
}
public static void main( String[] args ) throws IOException {
AnalisadorLexico2 al = new AnalisadorLexico2( "testAL.txt" );
Token t = null;
while ( ( t = al.getNextToken() ).token != TokenType.EOF ) {
System.out.println( t );
}
System.out.println( al.getNextToken() );
System.out.println( al.getNextToken() );
System.out.println( al.getNextToken() );
System.out.println( al.getNextToken() );
}
}
My testAL.txt
file contents are:
1234 + 5 * 65 + ( 44 * 55555 ) * 444 + ( 2354 * ( 34 + 44 ) )
1234 + 5 * 65 + ( 44 * 55555 ) * 444 + ( 2354 * ( 34 + 44 ) )
1234 + 5 * 65 + ( 44 * 55555 ) * 444 + ( 2354 * ( 34 + 44 ) )
1234 + 5 * 65 + ( 44 * 55555 ) * 444 + ( 2354 * ( 34 + 44 ) )