I am trying to create a function for normalizing a given dataframe for the requested frequency.
code:
import numpy as np
import pandas as pd
def timeseries_dataframe_normalized(df, normalization_freq = 'complete'):
"""
Input:
df : dataframe
input dataframe
normalization_freq : string
'daily', 'weekly', 'monthly','quarterly','yearly','complete' (default)
Return: normalized dataframe
"""
# auxiliary dataframe
adf = df.copy()
# convert columns to float
# Ref: https://stackoverflow.com/questions/15891038/change-column-type-in-pandas
adf = adf.astype(float)
# normalized columns
nor_cols = adf.columns
# add suffix to columns and create new names for maximum columns
max_cols = adf.add_suffix('_max').columns
# initialize maximum columns
adf.loc[:,max_cols] = np.nan
# check the requested frequency
if normalization_freq =='complete':
adf[max_cols] = adf[nor_cols].max()
# compute and return the normalized dataframe
print(adf[nor_cols])
print(adf[max_cols])
adf[nor_cols] = adf[nor_cols]/adf[max_cols]
# return the normalized dataframe
return adf[nor_cols]
# Example
df2 = pd.DataFrame(data={'A':[20,10,30],'B':[1,2,3]})
timeseries_dataframe_normalized(df2)
Expected output:
df2 =
A B
0 0.666667 0.333333
1 0.333333 0.666667
2 1.000000 1.000000
Present output:
I am surprized to get following error. However, when I compute df2/df2.max()
I am getting the expected output but this function giving me error.
ValueError: Columns must be same length as key
Change the line to (that way you divide the dataframe with numpy ndarray):
adf[nor_cols] = adf[nor_cols] / adf[max_cols].to_numpy()
then the return value is:
A B
0 0.666667 0.333333
1 0.333333 0.666667
2 1.000000 1.000000