I'm having trouble getting data from several .csv files into a single array. I can get all of the data from the .csv files fine, I just can't get everything into a simple numpy array. The name of each .csv file is important to me so in the end I'd like to have a Pandas DataFrame with the columns labeled by the initial name of the .csv file.
import glob
import numpy as np
import pandas as pd
files = glob.glob("*.csv")
temp_dict = {}
wind_dict = {}
for file in files:
data = pd.read_csv(file)
temp_dict[file[:-4]] = data['HLY-TEMP-NORMAL'].values
wind_dict[file[:-4]] = data['HLY-WIND-AVGSPD'].values
temp = []
wind = []
name = []
for word in temp_dict:
name.append(word)
temp.append(temp_dict[word])
for word in wind_dict:
wind.append(wind_dict[word])
temp = np.array(temp)
wind = np.array(wind)
When I print temp or wind I get something like this:
[array([ 32.1, 31.1, 30.3, ..., 34.9, 33.9, 32.9])
array([ 17.3, 17.2, 17.2, ..., 17.5, 17.5, 17.2])
array([ 41.8, 41.1, 40.6, ..., 44.3, 43.4, 42.6])
...
array([ 32.5, 32.2, 31.9, ..., 34.8, 34.1, 33.7])]
when what I really want is:
[[ 32.1, 31.1, 30.3, ..., 34.9, 33.9, 32.9]
[ 17.3, 17.2, 17.2, ..., 17.5, 17.5, 17.2]
[ 41.8, 41.1, 40.6, ..., 44.3, 43.4, 42.6]
...
[ 32.5, 32.2, 31.9, ..., 34.8, 34.1, 33.7]]
This does not work but is the goal of my code:
df = pd.DataFrame(temp, columns=name)
And when I try to use a DataFrame from Pandas each row is its own array which isn't helpful because it thinks every row has only element in it. I know the problem is with "array(...)" I just don't know how to get rid of it. Thank you in advance for your time and consideration.
I think you can use:
files = glob.glob("*.csv")
#read each file to list of DataFrames
dfs = [pd.read_csv(fp) for fp in files]
#create names for each file
lst4 = [x[:-4] for x in files]
#create one big df with MultiIndex by files names
df = pd.concat(dfs, keys=lst4)
If want separately DataFrame
s change last row above solution with reshape:
df = pd.concat(dfs, keys=lst4).unstack()
df_temp = df['HLY-TEMP-NORMAL']
df_wind = df['HLY-WIND-AVGSPD']