I have a pandas data frame and I'm trying to add each acct_id_adj number to a dictionary and search for all phone numbers associated with that id through the notes.
Example of the dataframe
Index RowNum acct_id_adj NOTE_DT NOTE_TXT
0 1 A20000000113301111 5/2/2017 t/5042222222 lm w/ 3rd jn
1 2 A20000000038002222 5/4/2017 OB CallLeft Message
3 4 A20000000107303333 5/4/2017 8211116411 FOR $18490 MLF
import pandas
import re
PhNum = pandas.read_csv('C:/PhoneNumberSearch.csv')
PhNum = PhNum[PhNum['NOTE_TXT'].notnull()]
D = {}
#for i in xrange(PhNum.shape[0]):
for i in xrange(3):
ID = PhNum['acct_id_adj'][i]
Note = re.sub(r'\W+', ' ', PhNum['NOTE_TXT'][i])
print(Note)
Numbers = [int(s) for s in Note.split() if s.isdigit()]
print(Numbers)
for j in xrange(len(Numbers)):
if Numbers[j] > 1000000000:
D[ID] = Numbers[j]
print(D)
Out = pandas.DataFrame(D.items(), columns=['acct_id_adj', 'Phone_Number'])
However, at the third row I keep running into an error "KeyError: 2L" at ID = PhNum['acct_id_adj'][i]. Not finding good documentation and can't figure out why the issue would wait until then to arise.
All help appreciated in cluing me into what might be causing this error or if I'm thinking about dictionaries in the wrong way.
Analyse:
It seems that your PhoneNumberSearch.csv
file is malformed, if so, pandas.read_csv will use the first column as the index, for example:
if csv file is:
Index,RowNum,acct_id_adj,NOTE_DT,NOTE_TXT
0,1,A20000000113301111,5/2/2017,t/5042222222 lm w/ 3rd jn,
1,2,A20000000038002222,5/4/2017,OB CallLeft Message,
3,4,A20000000107303333,5/4/2017,8211116411 FOR $18490 MLF,
The PhNum
will be like this:
Index RowNum acct_id_adj NOTE_DT NOTE_TXT
0 1 A20000000113301111 5/2/2017 t/5042222222 lm w/ 3rd jn NaN
1 2 A20000000038002222 5/4/2017 OB CallLeft Message NaN
3 4 A20000000107303333 5/4/2017 8211116411 FOR $18490 MLF NaN
as you can see, there is no index
2 but 3, that's why ID = PhNum['acct_id_adj'][2]
will raise error.
Solution:
What you can do you might consider index_col=False
to force pandas
to not use the first column as the index, refer to official doc:
PhNum = pandas.read_csv('C:/PhoneNumberSearch.csv',index_col=False)
The PhNum
will give you with correct index
:
Index RowNum acct_id_adj NOTE_DT NOTE_TXT
0 0 1 A20000000113301111 5/2/2017 t/5042222222 lm w/ 3rd jn
1 1 2 A20000000038002222 5/4/2017 OB CallLeft Message
2 3 4 A20000000107303333 5/4/2017 8211116411 FOR $18490 MLF