Friday, May 25, 2012

StringTokenizer and removing Punctuations from String

Dear Reader,
I am writing a small article to remove all punctuations
from a String and to print only words. This is useful when you
want to read a text document and takes only words.

import java.util.StringTokenizer;

public class StringTokenTest {
    public static void main(String[] args) {
        String sentence="As you can tell from the characters passed into the StringTokenizer, " +
                "this approach handles a space, tab, newline and linefeed characters, period, colon, " +
                "semi-colon, question mark, exclamation mark, brackets, and single-quotes.";

        StringTokenizer tokenizer = new StringTokenizer(sentence, " \t\n\r\f,.:;?![]'"); //This will remove all punctuations.
        //StringTokenizer tokenizer = new StringTokenizer(sentence," ");
        
        System.out.println(tokenizer.countTokens());
        while(tokenizer.hasMoreTokens()){
            System.out.println(tokenizer.nextToken());
        }
    }
}
------------------END-------------

1 comment: