Search code examples
javaword-count

Token Cleaner Java Method


I need a TokenCleaner method for the WordCount project that I am doing. A token is a sequence of characters surrounded by whitespace, usually a word, that needsto be "cleaned" of any punctuation and capitalization. I have a template for it but Im not sure how to do or start it.

public class TokenCleaner
{
    public static void main()
    {
        String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
        for(int i = 0; i < tokens.length; i++)
        {
            System.out.println("Original:\t"+tokens[i]);
            System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
        }
    }
private static String cleanToken(String token)
    {
        /** remove leading special characters and numbers **/
        // while the token's length is greater than zero AND the first character isn't a letter
            // remove the first character from the token
        /** remove trailing special characters and numbers **/
        // while the token's length is greater than zero AND the last character isn't a letter
            // remove the last character from the token
        // return a lowercase version of the token
        /** Note: It is possible for the cleaned token to be an empty String if the given token
            consisted of only non-letter characters */
        return null; // placeholder return statement
    }

Can someone please help?

Thank you


Solution

  • I can suggest you to parse every caracter , and if its equal to anything you want to delete you can delete it , and if not lowercase it , for instance :

    private static String cleanToken(String token) {
    // arraylist of new token
    ArrayList<String> newtoken = new ArrayList<String>();
    // arraylist of elements you wanna delete
    ArrayList<String> todelete = new ArrayList<String>();
    todelete.add("@"); // you can add all element u wanna delete
    // parsing your token
    for(int i=0 ; i < token.lentgh() ; i++ ) {
        if ( todelete.contains( token.charAt(i) ) ) {
            // you can delete it in the way you want
        }
        else {
            // lowercase it
            newtoken.add( (token.charAt(i)).toString().toLowerCase() ) ;
        }
    }
    // and now you can merge all elements of your newtoken list to one String
    String NewToken = "";
    for ( String t : newtoken ) {
         NewToken = NewToken + t ;
    }
    return NewToken;
    }