I want to extract an item from two columns use np.where, DataFrame like: (total 100,000+ lines)
add Description: the "eNBID" is not always the third part of "ID" , the data is crazy dirty.
ID eNBID
460-00-2354-9 2354
4600023549 2354
46001368511 6789
4600332783112 32783
the result I want is:
ID eNBID CI
460-00-2354-9 2354 9
4600023549 2354 9
46001368511 6789 11
4600332783112 32783 112
my code is :
df['Ci'] = np.where(df['ID'].astype(str).str.contains(r'-',na=False,regex=True), \
df['ID'].apply(lambda x:re.split('-',str(x))[-1], \
df.apply(lambda x:re.findall('([\w]{5})'+'([\w]{%d}'%(len(str(x.eNBID)))+'(\w*)',str(x.ID))[0][-1], axis=1))
the error is:
IndexError:('list index out of range','occurred at index 0')
there is my new code:
cond = df['ID'].astype(str).str.contains('-',na=False,regex=True)
df['CI'] = np.where(cond,df['ID'].apply(lambda x:re.split('-',str(x))[-1]), \
df[~cond].apply(lambda x:re.findall('([\w]{5})'+'([\w]{%d}'%(len(str(x.eNBID)))+'(\w*)',str(x.ID))[0][-1], axis=1)) if len(str(x.eNBID))<(len(str(x.ID))-5) else "null", axis=1))
the error is :
ValueError:operands could not be broadcast together with shapes(100883,)(100883,)(78,)
Can anyone help me?
Try this
df['s']=df['ID'].replace('-','', regex=True)
df['Ci'] = df.apply(lambda x: x['s'][(5+len(str(x.eNBID))):], axis=1)
df.drop('s', axis=1, inplace = True)
Output
ID eNBID Ci
0 460-00-2354-9 2354 9
1 4600023549 2354 9
2 46001368511 6789 11
3 4600332783112 32783 112