I have a strange issue when I am loading my csv file in Pandas (version 1.0.3).
I want to convert automatically some columns to category
.
To this end, I created a dictionary with the column names and their type.
Well, for one column it does actually works and for others not.
I don't get any error.
Which might be the cause such that a column is not parsed into a category
?
Strange as it may seem, if I try to convert that column afterwards to category
by casting it, the operation works perfectly.
So at a first glance didn't seem to be a column mistype issue.
col_types = {
'CURRENCY': "category",
'PRODUCT': "category",
'PRODUCT_TYPE': "category",
}
def parse_csv(path_location):
df = pd.read_csv(
path_location,
sep=';',
engine='c',
dtype=col_types,
true_values=['Y', 'y'],
false_values=['N', 'n'],
converters=converters,
usecols=['PRODUCT', 'PRODUCT_TYPE', 'PORTFOLIO_CURRENCY', 'NATIONALITY'],
nrows=99)
return df
The result I get by the function above is:
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PORTFOLIO_CURRENCY 198 non-null category
1 PRODUCT 198 non-null object
2 PRODUCT_TYPE 198 non-null object
3 AGE 185 non-null float64
4 NATIONALITY 198 non-null object
dtypes: category(1), float64(1), object(3)
Although I can't install 1.0.3 to test if version is the problem, I have tested it on 1.1.4 and It works as expected. Please update pandas to newest version, as there were a lot of fixes with categorical in v1.1.0.
If it doesn't help, check provided converters and validate if CSV doesn't contain malformed data, such as wrong unicode, but I wouldn't expect problems of this kind.