my dataframe df:
index url
1 [{'url': 'http://bhandarkarscollegekdp.org/'}]
2 [{'url': 'http://cateringinyourhome.com/'}]
3 NaN
4 [{'url': 'http://muddyjunction.com/'}]
5 [{'url': 'http://ecskouhou.jp/'}]
6 [{'url': 'http://andersrice.com/'}]
7 [{'url': 'http://durager.cz/'}, {'url': 'http:andersrice.com'}]
8 [{'url': 'http://milenijum-osiguranje.rs/'}]
9 [{'url': 'http://form-kind.org/'}, {'url': 'https://osiguranje'},{'url': 'http://beseka.com.tr'}]
I would like to select the rows if the last item in the list of the row of url column contains 'https', while skipping missing values.
My current script
df[df['url'].str[-1].str.contains('https',na=False)]
returns False values for all the rows while some of them actually contains https.
Can anybody help with it?
I think you can first replace NaN
to empty url
and then use apply
:
df = pd.DataFrame({'url':[[{'url': 'http://bhandarkarscollegekdp.org/'}],
np.nan,
[{'url': 'http://cateringinyourhome.com/'}],
[{'url': 'http://durager.cz/'}, {'url': 'https:andersrice.com'}]]},
index=[1,2,3,4])
print (df)
url
1 [{'url': 'http://bhandarkarscollegekdp.org/'}]
2 NaN
3 [{'url': 'http://cateringinyourhome.com/'}]
4 [{'url': 'http://durager.cz/'}, {'url': 'https...
df.loc[df.url.isnull(), 'url'] = [[{'url':''}]]
print (df)
url
1 [{'url': 'http://bhandarkarscollegekdp.org/'}]
2 [{'url': ''}]
3 [{'url': 'http://cateringinyourhome.com/'}]
4 [{'url': 'http://durager.cz/'}, {'url': 'https...
print (df.url.apply(lambda x: 'https' in x[-1]['url']))
1 False
2 False
3 False
4 True
Name: url, dtype: bool
First solution:
df.loc[df.url.notnull(), 'a'] =
df.loc[df.url.notnull(), 'url'].apply(lambda x: 'https' in x[-1]['url'])
df.a.fillna(False, inplace=True)
print (df)
url a
1 [{'url': 'http://bhandarkarscollegekdp.org/'}] False
2 NaN False
3 [{'url': 'http://cateringinyourhome.com/'}] False
4 [{'url': 'http://durager.cz/'}, {'url': 'https... True