Search code examples
pythonexcelpandaslanguagetool

how do i apply language tool to Python df and add results as new column in df?


I am trying to add a column to a df (large Excel imported as df with Panda). The new column would be the output errors of using Language Tool import when applied to a column in the df. So for each row, I'd have the errors or blank/no errors in new column 'Issues'

import language_tool_python
import pandas as pd
tool = language_tool_python.LanguageTool('en-US') 
fn = "Example.xlsx"
xlreader = pd.read_excel(fn, sheet_name="This is Starting File")
for row in xlreader:
    text= str(xlreader[['Description']])
    xlreader['Issues'] = tool.check(text)

The above results in a ValueError.

I also tried,

xlreader['Issues'] = xlreader.apply(lambda x: tool.check(text)) 

The result was NaN, even though there are errors.

Is there a way to accomplish the desired output?

Desired output:

ID Description Added column 'Issues'
1-432 "The text withissues to check" Possible spelling mistake

Solution

  • Maybe do thé changes:

    To cast as str:

    xlreader['Description'].astype('str')
    

    To apply the function:

    xlreader['Issues'] = xlreader['Description'].apply(lambda x: tool.check(x))