I am facing an error when I try to match every distinct pair of values in a dataframe column. every value was written in this way:
['http://dbpedia.org/resource/Category:American_books,http://dbpedia.org/resource/Category:American_literature_by_medium,http://dbpedia.org/resource/Category:Autobiographies,http://dbpedia.org/resource/Category:Bertelsmann_subsidiaries']
i = 0
j = 0
for i in range(len(book_dc.dc_term)):
values_i = set(book_dc['dc_term'][i].split(','))
for j in range(i+1, len(book_dc.dc_term)):
values_j = set(book_dc['dc_term'][j].split(','))
num_matching = len(values_i.intersection(values_j))
print("i:", i, "j:", j, "num_matching:", num_matching)
print('\n')
I should have the matching number of values between every 2 values(cells). i am getting this error:
KeyError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3360 try: 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err:
5 frames pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3361 return self._engine.get_loc(casted_key) 3362 except KeyError as err: 3363 raise KeyError(key) from err 3364 3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 1
Solved.
for i, item_i in enumerate(book_dc.dc_term):
values_i = set(item_i.split(','))
for j, item_j in enumerate(book_dc.dc_term[i+1:]):
values_j = set(item_j.split(','))
num_matching = len(values_i.intersection(values_j))
print("i:", i, "j:", j+i+1, "num_matching:", num_matching)
print('\n')