Search code examples
pythondictionarykeyerror

Run into "KeyError: 2L" assigning to dictionary


I have a pandas data frame and I'm trying to add each acct_id_adj number to a dictionary and search for all phone numbers associated with that id through the notes.

Example of the dataframe
Index    RowNum    acct_id_adj    NOTE_DT    NOTE_TXT
0        1  A20000000113301111        5/2/2017  t/5042222222 lm w/ 3rd jn
1        2  A20000000038002222        5/4/2017        OB CallLeft Message
3        4  A20000000107303333        5/4/2017  8211116411 FOR $18490 MLF


import pandas
import re
PhNum = pandas.read_csv('C:/PhoneNumberSearch.csv')
PhNum = PhNum[PhNum['NOTE_TXT'].notnull()]

D = {}
#for i in xrange(PhNum.shape[0]):
for i in xrange(3):
    ID = PhNum['acct_id_adj'][i]
    Note = re.sub(r'\W+', ' ', PhNum['NOTE_TXT'][i])
    print(Note)
    Numbers = [int(s) for s in Note.split() if s.isdigit()]
    print(Numbers)
    for j in xrange(len(Numbers)):
        if Numbers[j] > 1000000000:
            D[ID] = Numbers[j]

print(D)

Out = pandas.DataFrame(D.items(), columns=['acct_id_adj', 'Phone_Number'])

However, at the third row I keep running into an error "KeyError: 2L" at ID = PhNum['acct_id_adj'][i]. Not finding good documentation and can't figure out why the issue would wait until then to arise.

All help appreciated in cluing me into what might be causing this error or if I'm thinking about dictionaries in the wrong way.


Solution

  • Analyse:

    It seems that your PhoneNumberSearch.csv file is malformed, if so, pandas.read_csv will use the first column as the index, for example:

    if csv file is:

    Index,RowNum,acct_id_adj,NOTE_DT,NOTE_TXT
    0,1,A20000000113301111,5/2/2017,t/5042222222 lm w/ 3rd jn,
    1,2,A20000000038002222,5/4/2017,OB CallLeft Message,
    3,4,A20000000107303333,5/4/2017,8211116411 FOR $18490 MLF,
    

    The PhNum will be like this:

        Index   RowNum  acct_id_adj NOTE_DT NOTE_TXT
    0   1   A20000000113301111  5/2/2017    t/5042222222 lm w/ 3rd jn   NaN
    1   2   A20000000038002222  5/4/2017    OB CallLeft Message NaN
    3   4   A20000000107303333  5/4/2017    8211116411 FOR $18490 MLF   NaN
    

    as you can see, there is no index 2 but 3, that's why ID = PhNum['acct_id_adj'][2] will raise error.

    Solution:

    What you can do you might consider index_col=False to force pandas to not use the first column as the index, refer to official doc:

    PhNum = pandas.read_csv('C:/PhoneNumberSearch.csv',index_col=False)
    

    The PhNum will give you with correct index:

        Index   RowNum  acct_id_adj NOTE_DT NOTE_TXT
    0   0   1   A20000000113301111  5/2/2017    t/5042222222 lm w/ 3rd jn
    1   1   2   A20000000038002222  5/4/2017    OB CallLeft Message
    2   3   4   A20000000107303333  5/4/2017    8211116411 FOR $18490 MLF