Dataframe has 200+columns with repeated dates and empty columns
weight height total weight height total
2019 2019 2019 2020 2020 2020
Species jan1 jan1 '' jan1 jan1 ''
cat 1.0 2.0 3 4.0 3.0 7
dog 3.0 4.0 9 4.0 5.0 9
I tried:
[x for x in df.columns if df.columns.count(x) >1]
#error: 'MultiIndex' object has no attribute 'count'
df.stack(dropna=False)
#error: cannot reindex from a duplicate axis
Objective: To add any string value like 'a.jan1' repeated column and blank columns renamed as a, b.... so on.
Output required in tabular form for further processing and storage ..
class year Month cat dog
weight 2019 jan1 1 3
height 2019 jan1 2 4
weight 2020 jan1 4 4
height 2020 jan1 3 5
So, given the following dataframe:
import pandas as pd
df = pd.DataFrame(
{
("weight", 2019, "jan1"): {"cat": 1, "dog": 3},
("height", 2019, "jan1"): {"cat": 2, "dog": 4},
("total", 2019, ""): {"cat": 3, "dog": 9},
("weight", 2020, "jan1"): {"cat": 4, "dog": 4},
("height", 2020, "jan1"): {"cat": 3, "dog": 5},
("total", 2020, ""): {"cat": 7, "dog": 9},
}
)
print(df)
# Outputs
weight height total weight height total
2019 2019 2019 2020 2020 2020
jan1 jan1 jan1 jan1
cat 1 2 3 4 3 7
dog 3 4 9 4 5 9
You could try this:
# UNpivot the dataframe
new_df = df.reset_index().melt(
id_vars=[("index", "", "")],
value_vars=[
("weight", 2019, "jan1"),
("height", 2019, "jan1"),
("weight", 2020, "jan1"),
("height", 2020, "jan1"),
],
)
new_df.columns = ["species", "class", "year", "month", "value"]
# Make separate dataframes for "cats" and "dogs" and store them in a list
temp_dfs = []
for species in new_df["species"].unique():
temp_df = new_df.loc[new_df["species"] == species, :]
temp_df = temp_df.rename(columns={"value": species}).drop(columns="species")
temp_dfs.append(temp_df)
# Merge "cats" and "dogs"
final_df = temp_dfs[0]
for temp_df in temp_dfs[1:]:
final_df = pd.merge(final_df, temp_df, on=["class", "year", "month"], how="outer")
And so:
print(final_df)
# Output
class year month cat dog
0 weight 2019 jan1 1 3
1 height 2019 jan1 2 4
2 weight 2020 jan1 4 4
3 height 2020 jan1 3 5