I have a master dataset in pandas usa_df
that looks like this
usa_df = {
"state_name": ["Alabama", "Alabama", "Alaska", "Alaska", "Arkansas"],
"persons": [1, 2, 1, 1, 1],
"day_week": ["monday", "monday", "tuesday", "monday", "tuesday"]
}
There are about 200k rows in the dataset, and each state name appears for several thousand rows. Each row is a recorded crime, with the state name, persons, and day of the week being datapoints out of about a hundred).
I have a code that filters the state_name column for a specific state and then exports it as a csv
state_df = usa_df[usa_df.state_name == "Alabama"]
state_df.reset_index(drop=True)
os.chdir("/path/to/file")
state_df.to_csv("./Alabama.csv", index=False)
This method works fine, but I am trying to find a way that will automatically create a file for each state without having to do this fifty times.
Being new, I don't know how to proceed from here beyond maybe setting a list of states as a list[str]
, but beyond that I'm lost. Is it possible to automatically create all 50 files for each row that matches a specific state? Thanks for your help!
A quick and simple solution would be using a for
loop over all unique state names. You can get them from your dataframe itself:
import pandas as pd
usa_df = {
"state_name": ["Alabama", "Alabama", "Alaska", "Alaska", "Arkansas"],
"persons": [1, 2, 1, 1, 1],
"day_week": ["monday", "monday", "tuesday", "monday", "tuesday"]
}
usa_df = pd.DataFrame(usa_df)
for current_name in sorted(set(usa_df.state_name)):
state_df = usa_df[usa_df.state_name == current_name]
state_df.to_csv(f"/path/to/file/{current_name}.csv", index=False)
The more efficient solution, as pointed out in @mozway's comment, would be using grouping, so as to not filter the dataframe multiple times:
import pandas as pd
usa_df = {
"state_name": ["Alabama", "Alabama", "Alaska", "Alaska", "Arkansas"],
"persons": [1, 2, 1, 1, 1],
"day_week": ["monday", "monday", "tuesday", "monday", "tuesday"]
}
usa_df = pd.DataFrame(usa_df)
for current_name, current_group in usa_df.groupby("state_name"):
current_group.to_csv(f"/path/to/file/{current_name}.csv", index=False)
Also note:
index=False
anyway.to_csv()
method.