I need a TokenCleaner method for the WordCount project that I am doing. A token is a sequence of characters surrounded by whitespace, usually a word, that needsto be "cleaned" of any punctuation and capitalization. I have a template for it but Im not sure how to do or start it.
public class TokenCleaner
{
public static void main()
{
String[] tokens = {"That's","empty-handed?","42","...idk...","\"quote\""};
for(int i = 0; i < tokens.length; i++)
{
System.out.println("Original:\t"+tokens[i]);
System.out.println("Cleaned:\t"+cleanToken(tokens[i]));
}
}
private static String cleanToken(String token)
{
/** remove leading special characters and numbers **/
// while the token's length is greater than zero AND the first character isn't a letter
// remove the first character from the token
/** remove trailing special characters and numbers **/
// while the token's length is greater than zero AND the last character isn't a letter
// remove the last character from the token
// return a lowercase version of the token
/** Note: It is possible for the cleaned token to be an empty String if the given token
consisted of only non-letter characters */
return null; // placeholder return statement
}
Can someone please help?
Thank you
I can suggest you to parse every caracter , and if its equal to anything you want to delete you can delete it , and if not lowercase it , for instance :
private static String cleanToken(String token) {
// arraylist of new token
ArrayList<String> newtoken = new ArrayList<String>();
// arraylist of elements you wanna delete
ArrayList<String> todelete = new ArrayList<String>();
todelete.add("@"); // you can add all element u wanna delete
// parsing your token
for(int i=0 ; i < token.lentgh() ; i++ ) {
if ( todelete.contains( token.charAt(i) ) ) {
// you can delete it in the way you want
}
else {
// lowercase it
newtoken.add( (token.charAt(i)).toString().toLowerCase() ) ;
}
}
// and now you can merge all elements of your newtoken list to one String
String NewToken = "";
for ( String t : newtoken ) {
NewToken = NewToken + t ;
}
return NewToken;
}