Search code examples
pythonpandasdataframeapplylookup

Python Replace existing column value in df based on separate lookup Code/Value (df) KeyError on columns.get_loc


** Problem **

Replace existing column value in destination data frame based on separate lookup type operations to match Code/Value column in another separate source data frame, and update, e.g., replace text in destination data frame column with Code/Value of source data. Basically replacing something like '10' with 'Your Full Name'.

** Code attempt with Key Error **

This effort raised a Key Error.

countynames.set_index('CountyCode')
employee['County.Code'] = countynames.lookup(countynames.index, countynames['CountyCode'])

** Potential Solution Idea **

Something similar to have apply() function lookup employee['County.Code'] in dataframe 'countynames' and replace/overwite/update the existing employee['County.Code'] with the countynames['Value'].

Looking into alternative approaches, as my first attempt resulted in KeyError.

### potential approach 1: 
employee['County.Code'] = countynames.apply(lambda x: employee.loc[x['County.Code'], x['Value']], axis=1)

### potential approach 2: 
employee['County.Code']<- lapply(employee, function(x) look$class[match(x, look$CountyCode)])

** Experimental code **

employee = pd.read_csv("employee_data.csv")
countynames = pd.read_csv("County Codes.csv")
employee['County.Code']

0    34
1    34
2    34
3    34
4    55
Name: County.Code, dtype: int64

Source, lookup data frame:

countynames.head()
    CountyCode  Value
0   1   Alameda
1   2   Alpine
2   3   Amador
3   4   Butte
4   5   Calaveras

** Error: KeyError **

Error is raised on columns.get_loc(item)

KeyError                                  Traceback (most recent call last)
Input In [410], in <cell line: 2>()
      1 countynames.set_index('CountyCode')
----> 2 employee['County.Code'] = countynames.lookup(countynames.index, countynames['CountyCode'])

File ~\anaconda3\lib\site-packages\pandas\core\frame.py:4602, in DataFrame.lookup(self, row_labels, col_labels)
   4600     result = np.empty(n, dtype="O")
   4601     for i, (r, c) in enumerate(zip(row_labels, col_labels)):
-> 4602         result[i] = self._get_value(r, c)
   4604 if is_object_dtype(result):
   4605     result = lib.maybe_convert_objects(result)

File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3615, in DataFrame._get_value(self, index, col, takeable)
   3612     series = self._ixs(col, axis=1)
   3613     return series._values[index]
-> 3615 series = self._get_item_cache(col)
   3616 engine = self.index._engine
   3618 if not isinstance(self.index, MultiIndex):
   3619     # CategoricalIndex: Trying to use the engine fastpath may give incorrect
   3620     #  results if our categories are integers that dont match our codes
   3621     # IntervalIndex: IntervalTree has no get_loc

File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3931, in DataFrame._get_item_cache(self, item)
   3926 res = cache.get(item)
   3927 if res is None:
   3928     # All places that call _get_item_cache have unique columns,
   3929     #  pending resolution of GH#33047
-> 3931     loc = self.columns.get_loc(item)
   3932     res = self._ixs(loc, axis=1)
   3934     cache[item] = res

File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance)
   3621     return self._engine.get_loc(casted_key)
   3622 except KeyError as err:
-> 3623     raise KeyError(key) from err
   3624 except TypeError:
   3625     # If we have a listlike key, _check_indexing_error will raise
   3626     #  InvalidIndexError. Otherwise we fall through and re-raise
   3627     #  the TypeError.
   3628     self._check_indexing_error(key)

KeyError: 1

Solution

  • Always hard without having the data.

    But try:

    employee['County.Code'].replace(countynames.set_index("CountyCode")["Value"].to_dict(), inplace=True)