I'm trying to find the sub-string(taking from one data frame) from main-string(from main data frame), but I didn't get the desired result. The following are file details and output.
First data frame
handleid
49483
51466
83821
94159
105068
I want to search 49483 from the main data frame (id column). The result as follows.
id collection_id dc_language_iso
dli_ndli/49483 NaN English
dli_ndli/494830 NaN Kannada
dli_ndli/494831 NaN Kannada
dli_ndli/494832 NaN Kannada
Above results shows that I am getting 4983, 49830, 49831, 49832. But I only want first row i.e dli_ndli/49483 NaN English
. I don't want the rows with 49830, 49831, 49832 values as substring.
I am using contains functions available in pandas.
This should work:
newdf[newdf['id'].str.contains('49483$', regex=True)]
#Out[216]:
# id collection_id dc_language_iso
#0 dli_ndli/49483 NaN English