Search code examples
pythontuplescounterfrequency

Count frequency of second element of tuple across rows


I have a document containing >1000 instances of tuples. I want to calculate the frequency of the second element of the tuple across all rows, and then delete the tuples that belong to the "NN" group.

Here is my data:

pos_tag
[(semoga, SC), (saja, RB), (di, IN), (sini, PR), (bisa, MD), (cepat, JJ), (cair, NN), (semoga, NN), (saja, RB), (ini, PR), (beneran, NN), (ada, VB), (nya, NN), (bantuan, NN), (buat, JJ), (butuh, VB), (banget, NN)]
[(kak, VB), (kenapa, WH), (perbaikan, NN), (sistem, NN), (nya, PRP), (tidak, NEG), (selesai, VB)]
[(sangat, RB), (baik, JJ)]

I would like to know the frequency, showing:

tag frequency
SC 1
RB 3
IN 1
PR 2
MD 1
JJ 3
NN 8
etc. ...

After deleting words that belong to NN, the data will be:

pos_tag pos_tag_clean
[(semoga, SC), (saja, RB), (di, IN), (sini, PR), (bisa, MD), (cepat, JJ), (cair, NN), (semoga, NN), (saja, RB), (ini, PR), (beneran, NN), (ada, VB), (nya, NN), (bantuan, NN), (buat, JJ), (butuh, VB), (banget, NN)] [(semoga, SC), (saja, RB), (di, IN), (sini, PR), (bisa, MD), (cepat, JJ), (saja, RB), (ini, PR), (ada, VB),(buat, JJ), (butuh, VB)]
[(kak, VB), (kenapa, WH), (perbaikan, NN), (sistem, NN), (nya, PRP), (tidak, NEG), (selesai, VB)] [(kak, VB), (kenapa, WH), (nya, PRP), (tidak, NEG), (selesai, VB)]
[(sangat, RB), (baik, JJ)] [(sangat, RB), (baik, JJ)]

Really need help, thanks!


Solution

  • You can explode, slice the second item, and value_counts:

    out = (df['pos_tag']
           .explode()
           .str[1]
           .value_counts()
           .reset_index(name='frequency')
          )