I am trying to find the type of vaccines used for Covid-19. I am having a database of different vaccines being used in different countries, not the numbers just the type of them. Example of the column below.
Many countries have multiple vaccines being used in their country. So I want to separate each of them and keep them in one series and then find the number of all unique ones.
typesofvaccine = vaccinations_df.vaccines.str.split(',',expand=True)
print(typesofvaccine)
Then I created a Series in which I want to append other series with the help of loop.
Vaccine_one = pd.Series(dtype=object)
for v in typesofvaccine.iteritems():
Vaccine_one.append(typesofvaccine[v].values)
print(Vaccine_one)
print(Vaccine_one.unique())
I am getting this key error.
You're getting a key error cause in the new df you defined the elements are strings, like "Johnson&Jhonson", but the column names are integers so when you pass v to typesofvaccine[v], v is a string, where it should be a number.
I general, if your goal is to find the number of unique vaccines across all countries you're making your like harder by using this approach. What you could do instead is something like this:
import pandas as pd
vaccines = pd.DataFrame({"vaccines":["Jonhson,Moderna","AstraZeneca,Moderna","Johnson,Pfizer"]})
print(vaccines)
Out:
vaccines
0 Jonhson,Moderna
1 AstraZeneca,Moderna
2 Johnson,Pfizer
Get list of lists containing all vaccines combinations
vaccines_split = [v.split(",") for v in vaccines["vaccines"].unique()]
print(vaccines_split)
Out:
[['Jonhson', 'Moderna'], ['AstraZeneca', 'Moderna'], ['Johnson', 'Pfizer']]
Compress list of lists into single list with unique values
unique_names = list({v for i in vaccines_split for v in i})
print(unique_names)
Out:
['Moderna', 'AstraZeneca', 'Johnson', 'Pfizer', 'Jonhson']
Now you can get the number of unique vaccines just by printing the length of unique names:
print(len(unique_names)