i'm trying to send a request to a website then get the scrape the Text out of the website. however i get warning.
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
i already tried .copy()
and the issue still remains and also with _df.loc
i get too many indexers
error. It's important to note that the dataframe that i pass is in for loop soi call get_the_text2
method in a for loop then pass a row each time
def get_the_text2(_df):
'''
sending a request for second time with a different method to recieve the Text of the Articles
Parameters
----------
_df : DataFrame
Returns
-------
only the text contained in the url
'''
df['text']=''
# for k,i in enumerate(_df['url']):
if str(_df):
website_text=list()
print(_df)
#time.sleep(2)
try:
response=requests.get(_df['url'],headers={"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"})
status_code=response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
if len(website_text)<=10:
website_text=list()
if soup.article:
if soup.article.find_all(['p',re.compile("^h\d{1}")]):
for data in soup.article.find_all(['p',re.compile("^h\d{1}")]):
website_text.append(data.get_text(strip=True))
#df.at[k,'text']=remove_one_words_from_list(website_text,df.at[k,'language'])
_df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
print('****ARTICLE P & H{1}****',remove_one_words_from_list(website_text,_df['language']))
for _index,item in enumerate(df['status_code']):
if item !=200:
get_the_text2(df.loc[_index])
just to show the error message with .loc
my Code:
_df['text']=remove_one_words_from_list(website_text,_df.loc[:,'language']).copy()
error message:
IndexingError Traceback (most recent call last)
Cell In[14], line 102
100 for _index,item in enumerate(df['status_code']):
101 if item !=200:
--> 102 get_the_text2(df.loc[_index])
File c:\Users\\anaconda3\envs\GDELT\Lib\site-packages\pandas\core\indexing.py:939, in _LocationIndexer._validate_key_length(self, key)
937 raise IndexingError(_one_ellipsis_message)
938 return self._validate_key_length(key)
--> 939 raise IndexingError("Too many indexers")
940 return key
IndexingError: Too many indexers
found out if i use this .loc['language']
it won't throw error although the SettingWithCopyWarning
is still there.
_df['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()
according to this post i know why it's happened but don't know how to fix it.
i tried to assign the new value to a new Dataframe and this did the job.
_df2=pd.DataFrame(columns=list(df.columns)) # to get the columns from the original Dataframe
_df2['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()