Search code examples
arrayspandasnumpydataframesmoothing

Pandas DataFrame from Numpy Array - column order


I'm trying to read data from a .csv file using Pandas, smoothing it with Savitsky-Golay filter, filtering it and then using Pandas again to write an output csv file. Data must be converted from DataFrame to an array to perform smoothing and then again to DataFrame to create the output file.

I found a topic on creation of dataframe from numpy arrays (Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?) and i used the dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]}) line to create mine.

The problem is that when I rename the column names to 'time' for first column and 'angle' for the second one, the order in the final dataframe changes. It seems as if the alphabetical order is important, which seems weird. Can someone help me with an explanation?

My complete code:

import scipy as sp
from scipy import signal
import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

# Specify the input file
in_file = '0_chunk0_test.csv'

# Define min and max angle values
alpha_min = 35
alpha_max = 45

# Define Savitsky-Golay filter parameters
window_length = 15
polyorder = 1

# Read input .csv file, but only time and pitch values using usecols argument
data = pd.read_csv(in_file,usecols=[0,2])

# Replace ":" with "" in time values
data['time'] = data['time'].str.replace(':','')

# Convert pandas dataframe to a numpy array, use .astype to convert
# string to float
data_arr = data.to_numpy(dtype=np.dtype,copy=True)
data_arr = data_arr.astype(np.float)

# Perform a Savitsky-Golay filtering with signal.savgol_filter
data_arr_smooth = signal.savgol_filter(data_arr[:,1],window_length,polyorder)

# Convert smoothed data array to dataframe and rename Pitch: to angle
data_fr = pd.DataFrame({'time': data_arr[:,0],'angle': data_arr_smooth})

print data_fr

Solution

  • If your data is already in a dataframe, it's much easier to just pass the values of the Pitch column to savgol_filter:

    data_arr_smooth = signal.savgol_filter(data.Pitch.values, window_length, polyorder)
    data_fr = pd.DataFrame({'time': data.time.values,'angle': data_arr_smooth})
    

    There's no need to explicitly convert your data to float as long as they are numeric, savgol_filter will do this for you:

    If x is not a single or double precision floating point array, it will be converted to type numpy.float64 before filtering.

    If you want both original and smoothed data in you original dataframe then just assign a new column to it:

    data['angle'] = signal.savgol_filter(data.Pitch.values, window_length, polyorder)