I am trying to select from the data frame df
the rows that match a list of keys IDs
.
df.loc[IDs]
In rare cases they keys are not contained in the DataFrame and a KeyError is thrown:
KeyError: "None of [['001U000001c6OczIAE' '001U000000fgVR9IAM' '0015800000ecNcjAAE'\n '001U000000fgVRDIA2']] are in the [index]"
Can I easily get to the list of missing keys from that exception? Would it be better/cleaner/more pythonic to not do this by exception handling?
Without more info it looks like you may have forgotten a step- have you set your index to the Salesforce ID (assuming that's what it is)?
For example (random data):
df
a account
0 1 abc
1 3 abc
2 5 abc
3 7 def
4 7 def
5 34 gf
6 3 hj
7 24 hj
lis = ['abc', 'hj']
df.loc[lis]
KeyError: "None of [['abc', 'hj']] are in the [index]"
For pandas versions prior to 0.21.0
After setting index:
df.set_index('account').loc[lis]
a
account
abc 1
abc 3
abc 5
hj 3
hj 24
Missing values should not throw an error, rather np.nan
value:
lis = ['abc', 'hj', 'j']
df.set_index('account').loc[lis]
a
account
abc 1.0
abc 3.0
abc 5.0
hj 3.0
hj 24.0
j NaN
For pandas versions 0.21.0+
You will need to instead use the dataframe method reindex()
. With reindex, however, you will not be able to have duplicates in the index (so my example above will not work without some deduplication):
df.set_index('account')\
.groupby(level=0).first()\ # de-duplicate index here
.reindex(lis2)
a
account
abc 1.0
hj 3.0
j NaN
Unfortunately using reindex()
after deduplicating there is data loss if you're trying to continue working with the dataframe. Regardless, this is probably not the best way to find identifiers missing from your dataframe.