Search code examples
pythonpython-3.xpandasfor-loopnlp

How to Iterate all the rows in dataframe and returning the results for all the rows?


I do have 1 column and 3 rows in dataframe. The dataframe is below

    Text
0   Provided by Hindustan Times Wuhan Institute of...
1   Kattappa continues to narrate how he ended up ...
2   National Commercial Bank (NCB), Saudi Arabia’s...

I'm trying to summarize all the 3 rows and want to create another column like

    Text                                               Summarize
0   Provided by Hindustan Times Wuhan Institute of...   It's related to virus
1   Kattappa continues to narrate how he ended up ...   It's a movie story
2   National Commercial Bank (NCB), Saudi Arabia’s...   Article related to finance

I tried the below code

for index, row in df.iterrows():
    
    chunks = generate_chunks(row['Text'])
    
    res = summarizer(chunks, max_length=1000, min_length=20)

    text = ' '.join([summ['summary_text'] for summ in res])

print(text)

But the output is

Article related to finance

Can anyone help me with this?


Solution

  • You overwrite the value of text at each iteration - so it gets changed to "It's related to virus", then changed to "It's a movie story" and the previous value forgotten, and finally changed to "Article related to finance" and both the previous values forgotten.

    Instead of using a single string, use a list of strings and append to it at each iteration, like this:

    summaries = []
    for index, row in df.iterrows():
        chunks = generate_chunks(row['Text'])
        res = summarizer(chunks, max_length=1000, min_length=20)
        text = ' '.join([summ['summary_text'] for summ in res])
        summaries.append(text)
    
    print(summaries)