Search code examples
pythonspacyphrase

spaCy library to extract noun phrase - ValueError: [E866] Expected a string or 'Doc' as input, but got: <class 'float'>


currently I'm trying to extract noun phrase from sentences. The sentences were stored in a column in excel file. Here the code using python:

import pandas as pd
import spacy

df = pd.read_excel("xxx.xlsx")

nlp = spacy.load("en_core_web_md")
for row in range(len(df)):
    doc = nlp(df.loc[row, "Title"])
    for np in doc.noun_chunks:
        print(np.text)

But I got this error:

Traceback (most recent call last):
  File "/Users/pusinov/PycharmProjects/textsummarizer/paper_term_extraction.py", line 10, in <module>
    doc = nlp(df.loc[row, "Title"])
  File "/Users/pusinov/PycharmProjects/textsummarizer/venv/lib/python3.9/site-packages/spacy/language.py", line 1002, in __call__
    doc = self._ensure_doc(text)
  File "/Users/pusinov/PycharmProjects/textsummarizer/venv/lib/python3.9/site-packages/spacy/language.py", line 1093, in _ensure_doc
    raise ValueError(Errors.E866.format(type=type(doc_like)))
ValueError: [E866] Expected a string or 'Doc' as input, but got: <class 'float'>.

Can anyone help me to make better code? Thank you very much.

p.s. I'm still newbie in python


Solution

  • I faced a similar issue and I fixed it using

    df['Title']= df['Title'].astype(str)
    

    The use of this code will fix the problem. As you have to convert all the data values to str format (usually it happens as comment might be number, or nan or null).