My specific question is if anyone can recognize why when I run this code below, I am getting this specific error. Or better yet, how to fix it. I'm trying to map a department description of a department number in df5 to a second data frame's (df2) TrueDepartment column. Df2 has a column called "Department" that I want to iterate through, searching for substrings that contain 4 or 5 digit dpt_nbrs. Dpt_Nbr's go in ascending order from 1 to over 10000 in df5 with no blank rows. There is a Dept_Desc_HR for every Dept_Nbr in df5 and when a substring is found (4 or 5 consecutive digits) in df2's Department column, I want to write this Dept_Desc to the TrueDepartment column of df2. so for each data frame (df2 has 2 columns and df5 has 3 columns). df2 has a column Deparment that i want to iterate though and a column TrueDepartment that I want to write to. df5 has 3 columns, Dept_Nbr, Dept_Desc_HR, and Dept_Desc_AD. The Department Column of df2 has many blank cells and many cells with values. Some of these values have no numbers in them while others have several numbers and some cells have a combination of digits and letters and special characters. I want to use the cells that have 4 or 5 consecutive digits to identify a dept_nbr and then map the dept_desc of that Dept_Nbr to TrueDepartment column of df2. If the Dept_Nbr has a value in Dept_Desc_AD, I want to use this value and write it to the TrueDepartment column of df2. If it does not have a value in the Dept_Desc_AD, I want to write the contents of Dept_Desc_HD to the TrueDepartment column of df2. My code works on a sample data set, but on the larger data set using the full excelspreadsheet, it gives me the error you see at the bottom. I appreciate any help in solving this problem. I will be happy to provide the spreadsheets or any other infor if needed. Thanks
import pandas as pd
import numpy as np
import re
#reading my two data frames from 2 excel files
excel_file='/Users/j0t0174/anaconda3/Depts_sheets_withonlyAD_4columns.xlsx'
df2 = pd.read_excel(excel_file)
excel_file='/Users/j0t0174/anaconda3/dept_nbr.xlsx'
df5=pd.read_excel(excel_file)
df2=df2.replace(np.nan, "Empty",regex=True)
df5=df5.replace(np.nan, "Empty",regex=True)
numbers = df5['Dept_Nbr'].tolist()#-->adding dept_nbr's to list
df5['Dept_Nbr'] = [int(i) for i in df5['Dept_Nbr']]
df5 = df5.set_index('Dept_Nbr') #<--setting data frame 5 (df5) to the new index
for n in numbers:
for i in range(len(df5.index)): #<--iterate through the number of elements not the elements themselves
if str(n) == df2.loc[i, 'Department'][-4:]: #<-- convert n to str and slice df2 string for the last 4 chars
if df5.loc[n, 'Dept_Desc_AD'] != "Empty": #<--checking against a string, not a NaN
df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_AD'] #<-- use .loc not .at
else:
df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_HD']
TypeError Traceback (most recent call last)
<ipython-input-5-aa578c4c334c> in <module>()
17 for n in numbers:
18 for i in range(len(df5.index)): #<-- you want to iterate through the number of elements not the elements themselves
---> 19 if str(n) == df2.loc[i, 'Department'][-4:]: #<-- convert n to str and slice df2 string for the last 4 chars
20 if df5.loc[n, 'Dept_Desc_AD'] != "Empty": #<-- you're actually checking against a string, not a NaN
21 df2.loc[i, 'TrueDepartment'] = df5.loc[n, 'Dept_Desc_AD'] #<-- use .loc not .at
TypeError: 'int' object is not subscriptable
Your error is raised because
df2.loc[i, 'Department']
returns an int
, which is not subscriptable. If you want the last 4 digits of this integer, make it a str
first
str(df2.loc[i, 'Department'])
and just then you can subscript it
str(df2.loc[i, 'Department'])[-4:]