I am using np.genfromtxt to read a csv. I am not sure why it is raising a ValueError(errmsg) on the data. When I read the file in excel and it shows a total of 23 columns for all the 33 rows in the file
Here is the code and error:
csv = np.genfromtxt (fname, delimiter=",",names=True)
Here is a snippet of the csv records:
,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_NN__alpha,param_NN__hidden_layer_sizes,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,split4_test_score,split4_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
0,0.34166226387023924,0.0010362625122070312,0.842927342927343,0.8468980402379758,0.1,"(7,)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (7,)}",25,0.8420706295240185,0.8475292052871167,0.8398771660451854,0.8463774474853288,0.845360824742268,0.846158065046893,0.8385256691531373,0.8486892618185806,0.8488040377441299,0.8457362215519605,0.05093153997183547,0.00018195987247183776,0.0037378988316037944,0.0010747322296072162
1,0.5543142318725586,0.0018250465393066407,0.8465250965250966,0.8527554135893668,0.1,"(25, 7)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (25, 7)}",5,0.846018863785918,0.8530137662480118,0.846018863785918,0.8589919376953875,0.8479929809168677,0.8496681840618658,0.8400614304519526,0.851486234506965,0.8525345622119815,0.8506169454346038,0.10835399357094619,0.00018853748087819175,0.004013613789285713,0.003306836154659678
2,0.5266880512237548,0.0013680458068847656,0.8437609687609687,0.8478413817137904,0.1,"(11, 7)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (11, 7)}",17,0.842509322219785,0.8479679701639884,0.8354902390875192,0.8431964021280096,0.8455801710901514,0.8520265452750507,0.8433523475208424,0.851595919710431,0.8518762343647136,0.8444200712914725,0.1041624682160838,0.0003233587082439388,0.005278162504355272,0.0036030369022985215
3,0.49459095001220704,0.0011162281036376954,0.8406458406458407,0.845428443186931,0.1,"(7, 5)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (7, 5)}",32,0.8383417416100022,0.848461580650469,0.8429480149155516,0.8501617945483464,0.8468962491774512,0.8514780891789612,0.8312856516015796,0.8381046396841066,0.8437568575817423,0.8389361118727722,0.10397613499936685,0.00018889068500539376,0.005421511394261151,0.005726975087304059
4,0.6175418376922608,0.0024899959564208983,0.8449017199017199,0.8508140227747922,0.1,"(25, 11, 7)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (25, 11, 7)}",11,0.8414125904803685,0.8493939560138211,0.8427286685676684,0.8546591345362804,0.8501864443957008,0.8519716996654417,0.8459850811759544,0.8564769112646704,0.8441957428132544,0.8415684123937482,0.1940231074769015,0.00047604030307216253,0.003049662553913791,0.005209439647677219
Error Received:
ValueError: Some errors were detected !
Line #2 (got 26 columns instead of 22)
Line #3 (got 26 columns instead of 22)
Line #4 (got 26 columns instead of 22)
Line #5 (got 26 columns instead of 22)
Line #6 (got 28 columns instead of 22)
Line #7 (got 26 columns instead of 22)
Line #8 (got 28 columns instead of 22)
Line #9 (got 26 columns instead of 22)
Line #10 (got 26 columns instead of 22)
Line #11 (got 26 columns instead of 22)
Line #12 (got 26 columns instead of 22)
Line #13 (got 26 columns instead of 22)
Line #14 (got 28 columns instead of 22)
Line #15 (got 26 columns instead of 22)
Line #16 (got 28 columns instead of 22)
Line #17 (got 26 columns instead of 22)
Line #18 (got 26 columns instead of 22)
Line #19 (got 26 columns instead of 22)
Line #20 (got 26 columns instead of 22)
Line #21 (got 26 columns instead of 22)
Line #22 (got 28 columns instead of 22)
Line #23 (got 26 columns instead of 22)
Line #24 (got 28 columns instead of 22)
Line #25 (got 26 columns instead of 22)
Line #26 (got 26 columns instead of 22)
Line #27 (got 26 columns instead of 22)
Line #28 (got 26 columns instead of 22)
Line #29 (got 26 columns instead of 22)
Line #30 (got 28 columns instead of 22)
Line #31 (got 26 columns instead of 22)
Line #32 (got 28 columns instead of 22)
Line #33 (got 26 columns instead of 22)
You're passing a ,
as a delimiter while a lot of your column values contain elements themselves. You'd need to specify an explicit quotechar to get this to work.
Fortunately, pandas
handles this really well without much handholding. You could try loading your data with read_csv
and then convert the loaded dataframe to an array.
import pandas as pd
array = pd.read_csv(name, index_col=[0]).values
The loaded dataframe (what you get before calling .values
) looks like this:
df = pd.read_csv(name, index_col=[0])
print(df)
mean_fit_time mean_score_time mean_test_score mean_train_score \
0 0.341662 0.001036 0.842927 0.846898
1 0.554314 0.001825 0.846525 0.852755
2 0.526688 0.001368 0.843761 0.847841
3 0.494591 0.001116 0.840646 0.845428
4 0.617542 0.002490 0.844902 0.850814
param_NN__alpha param_NN__hidden_layer_sizes \
0 0.1 (7,)
1 0.1 (25, 7)
2 0.1 (11, 7)
3 0.1 (7, 5)
4 0.1 (25, 11, 7)
params rank_test_score \
0 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 25
1 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 5
2 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 17
3 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 32
4 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 11
split0_test_score split0_train_score ... split2_test_score \
0 0.842071 0.847529 ... 0.845361
1 0.846019 0.853014 ... 0.847993
2 0.842509 0.847968 ... 0.845580
3 0.838342 0.848462 ... 0.846896
4 0.841413 0.849394 ... 0.850186
split2_train_score split3_test_score split3_train_score \
0 0.846158 0.838526 0.848689
1 0.849668 0.840061 0.851486
2 0.852027 0.843352 0.851596
3 0.851478 0.831286 0.838105
4 0.851972 0.845985 0.856477
split4_test_score split4_train_score std_fit_time std_score_time \
0 0.848804 0.845736 0.050932 0.000182
1 0.852535 0.850617 0.108354 0.000189
2 0.851876 0.844420 0.104162 0.000323
3 0.843757 0.838936 0.103976 0.000189
4 0.844196 0.841568 0.194023 0.000476
std_test_score std_train_score
0 0.003738 0.001075
1 0.004014 0.003307
2 0.005278 0.003603
3 0.005422 0.005727
4 0.003050 0.005209
[5 rows x 22 columns
And yes, columns are automatically converted to the appropriate datatypes.
print(df.dtypes)
mean_fit_time float64
mean_score_time float64
mean_test_score float64
mean_train_score float64
param_NN__alpha float64
param_NN__hidden_layer_sizes object
params object
rank_test_score int64
split0_test_score float64
split0_train_score float64
split1_test_score float64
split1_train_score float64
split2_test_score float64
split2_train_score float64
split3_test_score float64
split3_train_score float64
split4_test_score float64
split4_train_score float64
std_fit_time float64
std_score_time float64
std_test_score float64
std_train_score float64
dtype: object
Statutory warning: This data, owing to its nature, will probably be more useful to you as a python list, than a numpy array (which is optimised to work with scalars).