Search code examples
ctokenlexical-analysisautomata

Lexical analyser : how to identify the end of a token


I need a function that identifies the end of token so that i can save in it an array and send it to my automata for identification(Operator,Keyword,Identifiers)

the automata is working great when i enter only 1 token , but when there is `lots of tokens including spaces it doesn't work , i need this function to remove spaces and stops at the end of each token and send each token in array to my automata function, i'am stuck..

I'am using C

ex: ABC + D

: ABC token 1

: + token 2

: D token 3

ex2: ABC++D12*/z (ABC,+,+,D12,*,/,z) 7 tokens ex3: AD ++ - C (AD,+,+,-,C) 5 tokens

edit: i'am not using any tool , only c with Deterministic finite automaton


Solution

  • void lirelexeme(char chaine[500]){
    int i,j=0,k;
    char tc,tc2;               
    char lexeme[500];memset(lexeme,0,500);
    
    for(i=0;i<length;i++){
    tc=chaine[i]; // terme courant
    tc2=chaine[i+1]; // terme suivant
    
    if(tc!=' ' && tc!='\0' && tc!='\n'&& tc!='\t'){
    
    if((tc==':' && tc2=='=') || (tc=='>' && tc2=='=') || (tc=='<' && tc2=='=') || (tc=='<' && tc2=='>')){  // ex: a:= / >= / <=
    lexeme[0]=tc;
    lexeme[1]=tc2;
    lex(lexeme);
    memset(lexeme,0,500);
    j=0;    // préparer pour recevoir le nouveau lexeme
    i++;    // on évite de prendre tc2
    }
    

    here is the function that will split the tokens , use puts() instead of lex() to see the result

    note : lex() is lexical analyser function i made, that will take token as argument and give you as return its type ( constant , identifier , keyword , arithmetique operator , logical op...)