Search code examples
pythontokenize

Tokenize my CSV in one list rather than separate using Python


I want to tokenize my CSV in one list rather than a separate list?

with open ('train.csv') as file_object:
    for trainline in file_object:
        tokens_train = sent_tokenize(trainline)
        print(tokens_train)

This is how I am getting the output:

['2.1 Separated of trains']
['Principle: The method to make the signal is different.']
['2.2 Context']

I want all of them in one list

['2.1 Separated of trains','Principle: The method to make the signal is different.','2.2 Context']

Solution

  • Since sent_tokenize() returns a list, you could simply extend a starting list each time.

    alltokens = []
    
    with open ('train.csv') as file_object:
        for trainline in file_object:
            tokens_train = sent_tokenize(trainline)
            alltokens.extend(tokens_train)
        print(alltokens)
    

    Or with a list comprehension:

    with open ('train.csv') as file_object:
        alltokens = [token for trainline in file_object for token in sent_tokenize(trainline)]
    print(alltokens)
    

    Both solutions will work even if sent_tokenize() returns a list longer than 1.