Search code examples
pythondataframenltkwordnet

How to change a list of synsets to list elements?


I have tried out the following snippet of code for my project:

import pandas as pd
import nltk
from nltk.corpus import wordnet as wn

nltk.download('wordnet')
df=[]
hypo = wn.synset('science.n.01').hyponyms()
hyper = wn.synset('science.n.01').hypernyms()
mero = wn.synset('science.n.01').part_meronyms()
holo = wn.synset('science.n.01').part_holonyms()
ent = wn.synset('science.n.01').entailments()
df = df+hypo+hyper+mero+holo+ent
df_agri_clean = pd.DataFrame(df)
df_agri_clean.columns=["Items"]
print(df_agri_clean)

pd.set_option('display.expand_frame_repr', False)

It has given me this output of a dataframe:

                             Items
0            Synset('agrobiology.n.01')
1               Synset('agrology.n.01')
2               Synset('agronomy.n.01')
3         Synset('architectonics.n.01')
4      Synset('cognitive_science.n.01')
5          Synset('cryptanalysis.n.01')
6    Synset('information_science.n.01')
7            Synset('linguistics.n.01')
8            Synset('mathematics.n.01')
9             Synset('metallurgy.n.01')
10             Synset('metrology.n.01')
11       Synset('natural_history.n.01')
12       Synset('natural_science.n.01')
13             Synset('nutrition.n.03')
14            Synset('psychology.n.01')
15        Synset('social_science.n.01')
16            Synset('strategics.n.01')
17           Synset('systematics.n.01')
18           Synset('thanatology.n.01')
19            Synset('discipline.n.01')
20     Synset('scientific_theory.n.01')
21  Synset('scientific_knowledge.n.01')

This can be converted to a list by just printing df.

[Synset('agrobiology.n.01'), Synset('agrology.n.01'), Synset('agronomy.n.01'), Synset('architectonics.n.01'), Synset('cognitive_science.n.01'), Synset('cryptanalysis.n.01'), Synset('information_science.n.01'), Synset('linguistics.n.01'), Synset('mathematics.n.01'), Synset('metallurgy.n.01'), Synset('metrology.n.01'), Synset('natural_history.n.01'), Synset('natural_science.n.01'), Synset('nutrition.n.03'), Synset('psychology.n.01'), Synset('social_science.n.01'), Synset('strategics.n.01'), Synset('systematics.n.01'), Synset('thanatology.n.01'), Synset('discipline.n.01'), Synset('scientific_theory.n.01'), Synset('scientific_knowledge.n.01')]

I wish to change every word under "Items" like so : Synset('agrobiology.n.01') => agrobiology.n.01 or Synset('agrobiology.n.01') => 'agrobiology' Any answer associated will be appreciated! Thanks!


Solution

  • To access the name of these items, just do function.name(). You could use line comprehension update these items as follows:

    df_agri_clean['Items'] = [df_agri_clean['Items'][i].name() for i in range(len(df_agri_clean))] 
    df_agri_clean
    

    The output will be as you expected

        Items
    0   agrobiology.n.01
    1   agrology.n.01
    2   agronomy.n.01
    3   architectonics.n.01
    4   cognitive_science.n.01
    5   cryptanalysis.n.01
    6   information_science.n.01
    7   linguistics.n.01
    8   mathematics.n.01
    9   metallurgy.n.01
    10  metrology.n.01
    11  natural_history.n.01
    12  natural_science.n.01
    13  nutrition.n.03
    14  psychology.n.01
    15  social_science.n.01
    16  strategics.n.01
    17  systematics.n.01
    18  thanatology.n.01
    19  discipline.n.01
    20  scientific_theory.n.01
    21  scientific_knowledge.n.01
    

    To further replace ".n.01" as well from the string, you could do the following:

    df_agri_clean['Items'] = [df_agri_clean['Items'][i].name().replace('.n.01', '') for i in range(len(df_agri_clean))] 
    df_agri_clean
    

    Output (just like your second expected output)

    
    Items
    0   agrobiology
    1   agrology
    2   agronomy
    3   architectonics
    4   cognitive_science
    5   cryptanalysis
    6   information_science
    7   linguistics
    8   mathematics
    9   metallurgy
    10  metrology
    11  natural_history
    12  natural_science
    13  nutrition.n.03
    14  psychology
    15  social_science
    16  strategics
    17  systematics
    18  thanatology
    19  discipline
    20  scientific_theory
    21  scientific_knowledge