I get the following error:
exportStore.append(key, hdfStoreLocal, index = False, data_columns = True)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 911, in append
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 1270, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3605, in write
File "/usr/local/lib/python2.7/dist-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py", line 3293, in create_axes
raise e
ValueError: invalid itemsize in generic type tuple
Any ideas on why this would happen? It's a rather large project, so I'm not sure what code I can offer, but this happens on the first append. Any help would be very much appreciated.
Show Version result:
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
pandas: 0.14.1
nose: None
Cython: 0.20.2
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: 1.2.2
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: 2012c
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.3.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: 0.8
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None
Info result:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 61500 entries, 0 to 61499
Data columns (total 48 columns):
Sequential_Code_1 61500 non-null float64
Age_1 61500 non-null float64
Sex_1 61500 non-null object
Race_1 61500 non-null object
Ethnicity_1 61500 non-null object
Principal_Code_1 61500 non-null object
Admitting_Code_1 61500 non-null object
Principal_Code_2 61500 non-null object
Other_Codes_1 61500 non-null object
Other_Codes_2 61500 non-null object
Other_Codes_3 61500 non-null object
Other_Codes_4 61500 non-null object
Other_Codes_5 61500 non-null object
Other_Codes_6 61500 non-null object
Other_Codes_7 61500 non-null object
Other_Codes_8 61500 non-null object
Other_Codes_9 61500 non-null object
Other_Codes_10 61500 non-null object
Other_Codes_11 61500 non-null object
Other_Codes_12 61500 non-null object
Other_Codes_13 61500 non-null object
Other_Codes_14 61500 non-null object
Other_Codes_15 61500 non-null object
Other_Codes_16 61500 non-null object
Other_Codes_17 61500 non-null object
Other_Codes_18 61500 non-null object
Other_Codes_19 61500 non-null object
Other_Codes_20 61500 non-null object
Other_Codes_21 61500 non-null object
Other_Codes_22 61500 non-null object
Other_Codes_23 61500 non-null object
Other_Codes_24 61500 non-null object
External_Code_1 61500 non-null object
Place_Code_1 61500 non-null object
head Sequential_Number_1 Age_1 Sex_1 Race_1 \
1128 2.000000e+13 73 F 01
2185 2.000000e+13 52 M 01
2202 2.000000e+13 64 M 01
2283 2.000000e+13 72 F 01
4471 2.000000e+13 62 F 01
The problem is that you need to specify a min_itemsize
, see docs here.
This controls how big the column is for string-like columns. If you don't have any length to ANY values it fails (prob could be a better error message). It will take the biggest length of the passed values to figure out what size it needs to be.
The reason to specify this is that say you are appending in multiple chunks. You could have a longer string in chunk 2 which means the column should be at least that size, but only looking at chunk 1 doesn't tell you this.
Further would pre-process this data to not have 0-len strings instead use np.nan
as the missing value (which HDFstore / pandas) handle properly.