Search code examples
excelnlpspacydoc

How to tokenize/parse data in an excel sheet using spacy


I'm trying to convert an excel sheet into a doc object using spacy, I spent the last couple of days trying to go around it but it seems a bit challenging. I have opened the sheet in both openpyxl and pandas, I can read the excel sheet and output the content but I couldn't integrate spacy to create doc/token objects.

Is it possible to process excel sheets in spacy's pipeline?

Thank you!


Solution

  • Spacy has no support for excel. You could use pandas to read either the csv(if csv format) or excel file like

         import pandas as pd
         df = pd.read_csv(file)
    

    or

         df  = pd.read_excel(file)
    

    respectively. Select required text column and iterate over df 'column' values and pass them over to nlp() of spacy