Search code examples
pythonpandasdataframesparse-matrixkeyerror

Column available in data frame but not getting while slicing the data frame


While slicing the pivotted data frame unable to get the column names and throwing error below.Unable to find out what is happening as though columns available it is not extracted during slicing the dataframe. Data set is 'MovieLens 100K Dataset'

Data set is

movieRatings=ratings.pivot_table(index=['user_id'],columns=['title'],values=['rating'])

Out of 'movieRatings.head()' as below

    rating
title   'Til There Was You (1997)   1-900 (1994)    101 Dalmatians (1996)   12 Angry Men (1957) 187 (1997)  2 Days in the Valley (1996) 20,000 Leagues Under the Sea (1954) 2001: A Space Odyssey (1968)    3 Ninjas: High Noon At Mega Mountain (1998) 39 Steps, The (1935)    ... Yankee Zulu (1994)  Year of the Horse (1997)    You So Crazy (1994) Young Frankenstein (1974)   Young Guns (1988)   Young Guns II (1990)    Young Poisoner's Handbook, The (1995)   Zeus and Roxanne (1997) unknown Á köldum klaka (Cold Fever) (1994)
user_id                                                                                 
1   NaN NaN 2.0 5.0 NaN NaN 3.0 4.0 NaN NaN ... NaN NaN NaN 5.0 3.0 NaN NaN NaN 4.0 NaN
2   NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3   NaN NaN NaN NaN 2.0 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5   NaN NaN 2.0 NaN NaN NaN NaN 4.0 NaN NaN ... NaN NaN NaN 4.0 NaN NaN NaN NaN 4.0 NaN

Next statement

starWarsRatings = movieRatings["12 Angry Men (1957)"]
starWarsRatings.head()

Error message

KeyError                                  Traceback (most recent call last)
D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: '12 Angry Men (1957)'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-134-9c0f30e1dc98> in <module>
----> 1 starWarsRatings = movieRatings["12 Angry Men (1957)"]
      2 starWarsRatings.head()

D:\Anaconda\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2797         if is_single_key:
   2798             if self.columns.nlevels > 1:
-> 2799                 return self._getitem_multilevel(key)
   2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):

D:\Anaconda\lib\site-packages\pandas\core\frame.py in _getitem_multilevel(self, key)
   2847     def _getitem_multilevel(self, key):
   2848         # self.columns is a MultiIndex
-> 2849         loc = self.columns.get_loc(key)
   2850         if isinstance(loc, (slice, Series, np.ndarray, Index)):
   2851             new_columns = self.columns[loc]

D:\Anaconda\lib\site-packages\pandas\core\indexes\multi.py in get_loc(self, key, method)
   2660         if not isinstance(key, (tuple, list)):
   2661             # not including list here breaks some indexing, xref #30892
-> 2662             loc = self._get_level_indexer(key, level=0)
   2663             return _maybe_to_slice(loc)
   2664 

D:\Anaconda\lib\site-packages\pandas\core\indexes\multi.py in _get_level_indexer(self, key, level, indexer)
   2927         else:
   2928 
-> 2929             code = self._get_loc_single_level_index(level_index, key)
   2930 
   2931             if level > 0 or self.lexsort_depth == 0:

D:\Anaconda\lib\site-packages\pandas\core\indexes\multi.py in _get_loc_single_level_index(self, level_index, key)
   2596             return -1
   2597         else:
-> 2598             return level_index.get_loc(key)
   2599 
   2600     def get_loc(self, key, method=None):

D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: '12 Angry Men (1957)'

Solution

  • Edited - Original answer wasn't what the OP wanted.

    The problem is with the way you created it. An array/list values argument is interpreted differently than a simple string. In this case, you need to just use the string one. If you use the array, you'll need to index with [ratings, ...].

    So just change the declaration of the table and it should work.

    table = df.pivot_table(index=[..], values='JUST A STRING', columns=[..])