When I used the below code and an IndexError occurred. Where the details are shown below.
def get_code(seq):
return [x.split('.')[0] for x in seq if x]
all_codes = get_code(all_cats)
code_index = pd.Index(np.unique(all_codes))
dummy_frame = df(np.zeros((len(data), len(code_index))), index=data.index, columns=code_index)
for row, cat in zip(data.index, data.CATEGORY):
codes = get_code(to_cat_list(cat))
dummy_frame.iloc[row, codes] = 1
data = data.join(dummy_frame.add_prefix('category_'))
data.iloc[:, 10:15]
Below shown is the IndexError that occurred.
---------------------------------------------------------------------------
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
---------------------------------------------------------------------------
However, the error occurs on the below line of code,
dummy_frame.iloc[row, codes] = 1
How can I resolve the above error to get the below information.
category_1 100 non-null values
category_1a 100 non-null values
category_1b 100 non-null values
category_1c 100 non-null values
category_1d 100 non-null values
iloc
is for integer-based indexing and passing ["1", "3"]
to it as the column indexer part is the reason why it fails. You can get the integer indexes i.e., positions of ["1", "3"]
in your frame's columns and pass that:
# these are integer positions of `codes` so that `iloc` works
codes_positions = dummy_frame.columns.get_indexer(codes)
# using `codes_positions` instead of `codes` directly
dummy_frame.iloc[row, codes_positions] = 1
There is also loc
which looks for label-based indexing instead of integers. It seems that your row indexes are 0..N-1
so loc
can work here, too:
# indexers remain the same but now using `loc`
dummy_frame.loc[row, codes] = 1
But please note that loc
can be an alternative for iloc
only when the index entries are integers (which seems so in your case). Otherwise, the first approach is more generic, less error-prone & clarifies the intent more.