Search code examples
pythonexcelpandasloopsimport-from-excel

Python - Pandas - Import Excel file, iterate through each row, add new value, and add to dataframe


I have an Excel file with a project code and abstract fields I need to import so I can run a simple text summarizer on the abstract and then add to the dataframe.

My Excel dataset looks like this:

[Proj_Number] | [Abstract]

JJF-123          | Diabetes is a serious chronic condition.  
JFR-223          | Cardiovascular disease is also a chronic condition. 
JF3-334          | Don't forget about asthma and how much it sucks. 

once I import the data, I want to apply my text summarizer and get this:

[Proj_Number] | [Abstract]                        [Ab_keywords]

JJF-123       | Diabetes is a chronic condition.  |Diabetes, chronic condition                                                                 
JFR-223       | COPD is a also chronic condition. | COPD, chronic condition
JF3-334       | Don't forget about asthma too.    | asthma, forgot

I know my code is wrong, but I just don't know how to loop through each row, get the abstract keywords from the abstract, add it to the dataframe, and export it.

from gensim.summarization.summarizer import summarize
from gensim.summarization import keywords
import pandas as pd

dataset = pd.read_excel('abstracts.xlsx',encoding="ISO-8859-1")
df = pd.DataFrame(dataset)
cols = [1,2]
df = df[df.columns[cols]]

for d in df:
d =  keywords(d, ratio=0.15, split=True))
print(d)

Solution

  • You dont want to iterate over every row in the df with for d in df:

    Pandas has a way of applying a function to each row of a dataframe and returning a series via the apply function

    Providing you rename the columns of your dataframe appropriately,

    df['Ab_keywords'] = df['Abstract'].apply(lambda text: keywords(text, ratio=0.15, split=True))

    should work.

    Here the lambda function is applied to each row of df['Abstract'] and is given the value of each row as its argument.