I'm trying to read multiple csv
files with python. The index of raw data(or the first column) has a little problem, the partial csv file looks like this:
NoDemande;"NoUsager";"Sens";"IdVehiculeUtilise";"NoConducteur";"NoAdresse";"Fait";"HeurePrevue"
42210000003;"42210000529";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000005;"42210001805";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000004;"42210002678";"+";"265Véh";"42210000032";"42210002932";"1";"25/07/2015 10:00:04"
42210000003;"42210000529";"—";"265Véh";"42210000032";"42210004900";"1";"25/07/2015 10:50:03"
42210000004;"42210002678";"—";"265Véh";"42210000032";"42210007072";"1";"25/07/2015 11:25:03"
42210000005;"42210001805";"—";"265Véh";"42210000032";"42210004236";"1";"25/07/2015 11:40:03"
The first index has no ""
, after reading the file, it looks like: "NoDemande"
while others have no ""
, and the rest of column looks just fine, which makes the result looks like(not the same lines):
"NoDemande" NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse Fait HeurePrevue
42209000003 42209001975 + 245Véh 42209000002 42209005712 1 24/07/2015 06:30:04
42209000004 42209002021 + 245Véh 42209000002 42209005712 1 24/07/2015 06:30:04
42209000005 42209002208 + 245Véh 42209000002 42209005713 1 24/07/2015 06:45:04
42216000357 42216001501 - 190Véh 42216000139 42216001418 1 31/07/2015 17:15:03
42216000139 42216000788 - 309V7pVéh 42216000059 42216006210 1 31/07/2015 17:15:03
42216000118 42216000188 - 198Véh 42216000051 42216006374 1 31/07/2015 17:15:03
It causes problem identifying name of index in the coming moves. How to solve this problem? Here's my code of reading files:
import pandas as pd
import glob
pd.set_option('expand_frame_repr', False)
path = r'D:\Python27\mypfe\data_test'
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
#Read file
df = pd.read_csv(file_,header=0,sep=';',dayfirst=True,encoding='utf8',
dtype='str')
df['Sens'].replace(u'\u2014','-',inplace=True)
list_.append(df)
print"fichier lu ",file_
frame = pd.concat(list_)
print frame
I think the simpliest is set new column names:
df.columns = ['NoDemande1'] + df.columns[1:].tolist()
print (df)
NoDemande1 NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse \
0 42210000003 42210000529 + 265Véh 42210000032 42210002932
1 42210000005 42210001805 + 265Véh 42210000032 42210002932
2 42210000004 42210002678 + 265Véh 42210000032 42210002932
3 42210000003 42210000529 - 265Véh 42210000032 42210004900
4 42210000004 42210002678 - 265Véh 42210000032 42210007072
5 42210000005 42210001805 - 265Véh 42210000032 42210004236
Fait HeurePrevue
0 1 25/07/2015;10:00:04
1 1 25/07/2015;10:00:04
2 1 25/07/2015;10:00:04
3 1 25/07/2015;10:50:03
4 1 25/07/2015;11:25:03
5 1 25/07/2015;11:40:03
Another solution is strip
values "
from column names:
print (df)
"NoDemande" NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse \
0 42210000003 42210000529 + 265Véh 42210000032 42210002932
1 42210000005 42210001805 + 265Véh 42210000032 42210002932
2 42210000004 42210002678 + 265Véh 42210000032 42210002932
3 42210000003 42210000529 - 265Véh 42210000032 42210004900
4 42210000004 42210002678 - 265Véh 42210000032 42210007072
5 42210000005 42210001805 - 265Véh 42210000032 42210004236
Fait HeurePrevue
0 1 25/07/2015;10:00:04
1 1 25/07/2015;10:00:04
2 1 25/07/2015;10:00:04
3 1 25/07/2015;10:50:03
4 1 25/07/2015;11:25:03
5 1 25/07/2015;11:40:03
df.columns = df.columns.str.strip('"')
print (df)
NoDemande NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse \
0 42210000003 42210000529 + 265Véh 42210000032 42210002932
1 42210000005 42210001805 + 265Véh 42210000032 42210002932
2 42210000004 42210002678 + 265Véh 42210000032 42210002932
3 42210000003 42210000529 - 265Véh 42210000032 42210004900
4 42210000004 42210002678 - 265Véh 42210000032 42210007072
5 42210000005 42210001805 - 265Véh 42210000032 42210004236
Fait HeurePrevue
0 1 25/07/2015;10:00:04
1 1 25/07/2015;10:00:04
2 1 25/07/2015;10:00:04
3 1 25/07/2015;10:50:03
4 1 25/07/2015;11:25:03
5 1 25/07/2015;11:40:03