Search code examples
pythondictionarynlpspacynamed-entity-recognition

How to extract the output froman NLP model to a dataframe?


I have trained an NLP Model (NER) and I have results in the below format:

for text, _ in TEST_DATA:
    doc = nlp(text)
    print([(ent.text, ent.label_) for ent in doc.ents])

#Output
[('1131547', 'ID'), ('12/9/2019', 'Date'), ('USA', 'ShippingAddress')]
[('567456', 'ID'), ('Hills', 'ShippingAddress')]

#I need the output in the below format

ID       Date     ShippingAddress 
1131547 12/9/2019 USA     
567456    NA      Hills    

Thanks for your help in advance


Solution

  • In order to import the data into a Pandas dataframe, you can use

    data_array = []
    
    for text, _ in TEST_DATA:
        doc = nlp(text)
        data_array.append({ent.label_:ent.text for ent in doc.ents})
    
    import pandas as pd
    df = pd.DataFrame.from_dict(data_array)
    

    The test result:

    >>> pd.DataFrame.from_dict(data_array)
            ID       Date ShippingAddress
    0  1131547  12/9/2019             USA
    1   567456        NaN           Hills