I want to define a function as follow...
I have two dataframes, in all rows I have float values. Both data frames have date time as index. Both data frames have different columns.
I want to take the smallest data frame and copy those columns into biggest data frame. if smallest data frame start after biggest data frame then I'd like to fill the first dates in biggest data frame with the first values in the first row in smallest data frame.
If smallest data frame finish before, then I'd like to fill the last dates in the biggest data frame with the last values in the last row in the smallest data frame.
The result must be the biggest data frame with columns of the smallest df with the last mentioned details
Here's what I did, the problem is I have null values.
def copy_up(df1, df2):
''' df1 and df2 must finish with the same date when you pass them as argument
df1, df2: Date time index, all float values in all columns
Returns the biggest data frame with all columns of the smallest df, fill the initial days repeating the first row of smallest df
'''
#Checking columns are unique
columnas_df1 = set(df1.columns)
columnas_df2 = set(df2.columns)
columnas_repetidas = columnas_df1.intersection(columnas_df2)
if len(columnas_repetidas) > 0:
print("¡Hay columnas repetidas en los DataFrames!")
print("Columnas repetidas:", columnas_repetidas)
return 0
else:
print("No hay columnas repetidas en los DataFrames.")
# Compare which one is bigger
if len(df1) > len(df2):
df_grande = df1.copy()
df_pequeno = df2.copy()
else:
df_grande = df2.copy()
df_pequeno = df1.copy()
#N of times we will repeat the first row is restante
restante = len(df_grande) - len(df_pequeno)
#Repeating first row in biggest data frame
for i in df_pequeno.columns:
lista_ultimos = list(df_pequeno[i].values)
lista_primer = [lista_ultimos[0]] * restante
final = lista_primer
final.extend(lista_ultimos)
df_grande[i] = final
return df_grande
I did something similar with copy_down(df1, df2) function
then I did pd.concat([copy_down(df1, df2), copy_up(df1, df2)])
Assuming these two DataFrames are similar to the ones you refer to:
import pandas as pd
# Example DataFrame 1 (biggest)
df1 = pd.DataFrame(
{
"A": [1.1, 2.2, 3.3, 4.4, 5.5],
"B": [6.6, 7.7, 8.8, 9.9, 10.0],
},
index=pd.date_range(start="2024-01-01", periods=5),
)
# Example DataFrame 2 (smallest)
df2 = pd.DataFrame(
{
"C": [11.1, 12.2, 13.3],
"D": [14.4, 15.5, 16.6],
},
index=pd.date_range(start="2024-01-03", periods=3),
)
A B
2024-01-01 1.1 6.6
2024-01-02 2.2 7.7
2024-01-03 3.3 8.8
2024-01-04 4.4 9.9
2024-01-05 5.5 10.0
C D
2024-01-03 11.1 14.4
2024-01-04 12.2 15.5
2024-01-05 13.3 16.6
If I understand you logic, you can do an outer merge, and then back-fill and forward-fill the empty values:
df = pd.merge(df1, df2, how="outer", left_index=True, right_index=True).bfill().ffill()
For the example the result would be:
A B C D
2024-01-01 1.1 6.6 11.1 14.4
2024-01-02 2.2 7.7 11.1 14.4
2024-01-03 3.3 8.8 11.1 14.4
2024-01-04 4.4 9.9 12.2 15.5
2024-01-05 5.5 10.0 13.3 16.6