Search code examples
pandasdata-scienceseries

Can't append data from one series to other series in pandas


I am trying to find the type of vaccines used for Covid-19. I am having a database of different vaccines being used in different countries, not the numbers just the type of them. Example of the column below.

enter image description here

Many countries have multiple vaccines being used in their country. So I want to separate each of them and keep them in one series and then find the number of all unique ones.

typesofvaccine = vaccinations_df.vaccines.str.split(',',expand=True)
print(typesofvaccine)

enter image description here

Then I created a Series in which I want to append other series with the help of loop.

Vaccine_one = pd.Series(dtype=object)

for v in typesofvaccine.iteritems():
  Vaccine_one.append(typesofvaccine[v].values)

print(Vaccine_one)
print(Vaccine_one.unique())

enter image description here

I am getting this key error.


Solution

  • You're getting a key error cause in the new df you defined the elements are strings, like "Johnson&Jhonson", but the column names are integers so when you pass v to typesofvaccine[v], v is a string, where it should be a number.

    I general, if your goal is to find the number of unique vaccines across all countries you're making your like harder by using this approach. What you could do instead is something like this:

    import pandas as pd
    vaccines = pd.DataFrame({"vaccines":["Jonhson,Moderna","AstraZeneca,Moderna","Johnson,Pfizer"]})
    print(vaccines)
    

    Out:

                  vaccines
    0      Jonhson,Moderna
    1  AstraZeneca,Moderna
    2       Johnson,Pfizer
    

    Get list of lists containing all vaccines combinations

    vaccines_split = [v.split(",") for v in vaccines["vaccines"].unique()]
    print(vaccines_split)
    

    Out:

    [['Jonhson', 'Moderna'], ['AstraZeneca', 'Moderna'], ['Johnson', 'Pfizer']]
    

    Compress list of lists into single list with unique values

    unique_names = list({v for i in vaccines_split for v in i})
    print(unique_names)
    

    Out:

    ['Moderna', 'AstraZeneca', 'Johnson', 'Pfizer', 'Jonhson']
    

    Now you can get the number of unique vaccines just by printing the length of unique names:

    print(len(unique_names)