Search code examples
pythonpandaspandas-loc

Beginner Level - Python Pandas query working with Pokemon DB


I am learning Python (Pandas) and working with the Pokemon DB (following Youtube tutorial) I am currently learning .loc and query methods so that I can access data in different ways. I encounter an error while working with df.query.

Please find my information below:

  1. I imported all my libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
  1. Read my DB file
df = pd.read_csv(r"C:\XXXX\pokemon.csv")
  1. I print the head of my df. Please see the results here

When I try to access the Pokemons that are type 1 = fire and type 2= flying using .loc it works and I have no errors as shown below:

df.loc[(df['Type 1'] == "Fire") & (df['Type 2'] == "Flying")]

Please see the results here

But, when I try to do the same using .query I get an error

df.query("'Type 1' == 'Fire' and 'Type 2' == 'Flying'")

The error is the following:

KeyError                                  Traceback (most recent call last)
Cell In[30], line 3
      1 #if we want to bring their rows to see their names. We need to be aware that it is 6 Pokemons
      2 #df.loc[(df['Type 1'] == "Fire") & (df['Type 2'] == "Flying")]
----> 3 df.query("'Type 1' == 'Fire' and 'Type 2' == 'Flying'")

File ~\anaconda3\envs\env1\lib\site-packages\pandas\core\frame.py:4605, in DataFrame.query(self, expr, inplace, **kwargs)
   4602 res = self.eval(expr, **kwargs)
   4604 try:
-> 4605     result = self.loc[res]
   4606 except ValueError:
   4607     # when res is multi-dimensional loc raises, but this is sometimes a
   4608     # valid query
   4609     result = self[res]

File ~\anaconda3\envs\env1\lib\site-packages\pandas\core\indexing.py:1153, in _LocationIndexer.__getitem__(self, key)
   1150 axis = self.axis or 0
   1152 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1153 return self._getitem_axis(maybe_callable, axis=axis)

File ~\anaconda3\envs\env1\lib\site-packages\pandas\core\indexing.py:1393, in _LocIndexer._getitem_axis(self, key, axis)
   1391 # fall thru to straight lookup
   1392 self._validate_key(key, axis)
-> 1393 return self._get_label(key, axis=axis)

File ~\anaconda3\envs\env1\lib\site-packages\pandas\core\indexing.py:1343, in _LocIndexer._get_label(self, label, axis)
   1341 def _get_label(self, label, axis: AxisInt):
   1342     # GH#5567 this will fail if the label is not present in the axis.
-> 1343     return self.obj.xs(label, axis=axis)

File ~\anaconda3\envs\env1\lib\site-packages\pandas\core\generic.py:4236, in NDFrame.xs(self, key, axis, level, drop_level)
   4234             new_index = index[loc]
   4235 else:
-> 4236     loc = index.get_loc(key)
   4238     if isinstance(loc, np.ndarray):
   4239         if loc.dtype == np.bool_:

File ~\anaconda3\envs\env1\lib\site-packages\pandas\core\indexes\range.py:418, in RangeIndex.get_loc(self, key)
    416         raise KeyError(key) from err
    417 if isinstance(key, Hashable):
--> 418     raise KeyError(key)
    419 self._check_indexing_error(key)
    420 raise KeyError(key)

KeyError: False

I searched for the error, and I understand that KeyError happens when I am calling the wrong key or unexisting key. Why I am getting this issue when trying to retrieve Keys that I already use before?

**What I have tried until now? **

  1. I used other queries in my documents and they run just fine. such as:
df.query("Speed <= 10")
  1. I tried different syntaxes for my query thinking that maybe the "" and '' were wrong, but it did not work. I checked the documentation for that.

  2. I checked if maybe the names are wrong and verify the names of my columns with:

print(df.columns.tolist()) 
  1. I restarted my kernels and loaded my file again, thinking I moved or did something to change some settings.

  2. Updated my libraries in Anaconda just in case is something related to that

  3. I haven't tried the exceptions yet, as I am very curious why the video I am following has no issues

Please let me know if you have any ideas,

Thank you in advance


Solution

  • You have to use backticks because Type 1 and Type 2 are not valid Python variable names:

    >>> df.query("`Type 1` == 'Fire' and `Type 2` == 'Flying'")
    
                              Name Type 1  Type 2  Total   HP  Attack  Defense  Sp. Atk  Sp. Def  Speed  Generation  Legendary
    #                                                                                                                         
    6                    Charizard   Fire  Flying    534   78      84       78      109       85    100           1      False
    6    CharizardMega Charizard Y   Fire  Flying    634   78     104       78      159      115    100           1      False
    146                    Moltres   Fire  Flying    580   90     100       90      125       85     90           1       True
    250                      Ho-oh   Fire  Flying    680  106     130       90      110      154     90           2       True
    662                Fletchinder   Fire  Flying    382   62      73       55       56       52     84           6      False
    663                 Talonflame   Fire  Flying    499   78      81       71       74       69    126           6      False
    

    From documentation:

    You can refer to column names that are not valid Python variable names by surrounding them in backticks.