Search code examples
pythondataframetokenize

Adding values to Data frame and Export


I am trying to add two values as a list in Data Frame one is the Sentence and other once is the List of words I got, after tokenization of those sentences

for now, I have done the following code

from nltk.tokenize import word_tokenize
example = ['Mary had a little lamb' , 
        'Jack went up the hill' , 
        'Jill followed suit' ,    
        'i woke up suddenly' ,
       'it was a really bad dream...']


def hi():
    for i in example:
        #print (word_tokenize(i),i)
        a=[i,word_tokenize(i)]

        print(a) 

The expected output would be

Data Frame having two columns, Original Sentence and Tokens of that sentence

Example

Orignal Sentence | Tokens

My name is max | my,name,is,max

This is windows | This, is , windows


Solution

  • df['Original Sentence'] = a[0]  
    df['Tokens'] = a[1]
    

    Or we can skip your function entirely:

    df['Original Sentence'] = example
    df['Tokens'] = [word_tokenize(i) for i in example]
    

    EDIT:
    Since it appears you do not have a dataframe to begin with.

    import pandas as pd
    df = pd.DataFrame.from_dict({'Original Sentence': example,
                       'Tokens': [word_tokenize(i) for i in example]})
    print(df) #to see your dataframe 
    df.to_csv('mydata.csv') #To output your dataframe into a csv file  
    

    Other format:

    df.to_sql(etc...) #Refer to comment below  
    

    To output as a sql direct to your database, setup specific to your db is required. Refer here for example: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html