Search code examples
pythonpandasdataframeviewcopy

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation:


i'm trying to send a request to a website then get the scrape the Text out of the website. however i get warning.

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()

i already tried .copy() and the issue still remains and also with _df.loc i get too many indexers error. It's important to note that the dataframe that i pass is in for loop soi call get_the_text2 method in a for loop then pass a row each time

    def get_the_text2(_df):
  '''
  sending a request for second time with a different method to recieve the Text of the Articles

  Parameters
  ----------
  _df : DataFrame
  
  Returns
  -------
  only the text contained in the url
  '''  
  df['text']=''
#   for k,i in enumerate(_df['url']):
  if str(_df):
        website_text=list()
        print(_df)   
        #time.sleep(2)
        try:
          response=requests.get(_df['url'],headers={"User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"})              
          status_code=response.status_code
          soup = BeautifulSoup(response.content, 'html.parser')
          
          if len(website_text)<=10:
                      website_text=list()
                      if soup.article:
                          if soup.article.find_all(['p',re.compile("^h\d{1}")]):   
                              for data in soup.article.find_all(['p',re.compile("^h\d{1}")]):                          
                                  website_text.append(data.get_text(strip=True))            
                              #df.at[k,'text']=remove_one_words_from_list(website_text,df.at[k,'language'])
                              _df['text']=remove_one_words_from_list(website_text,_df['language']).copy()
                              print('****ARTICLE P & H{1}****',remove_one_words_from_list(website_text,_df['language']))

for _index,item in enumerate(df['status_code']):
  if item !=200:
    get_the_text2(df.loc[_index])

EDIT:

just to show the error message with .loc

my Code:

_df['text']=remove_one_words_from_list(website_text,_df.loc[:,'language']).copy()

error message:

IndexingError                             Traceback (most recent call last)
Cell In[14], line 102
    100 for _index,item in enumerate(df['status_code']):
    101   if item !=200:
--> 102     get_the_text2(df.loc[_index])

File c:\Users\\anaconda3\envs\GDELT\Lib\site-packages\pandas\core\indexing.py:939, in _LocationIndexer._validate_key_length(self, key)
    937             raise IndexingError(_one_ellipsis_message)
    938         return self._validate_key_length(key)
--> 939     raise IndexingError("Too many indexers")
    940 return key

IndexingError: Too many indexers

EDIT 2

found out if i use this .loc['language'] it won't throw error although the SettingWithCopyWarning is still there.

_df['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()

according to this post i know why it's happened but don't know how to fix it.


Solution

  • i tried to assign the new value to a new Dataframe and this did the job.

    _df2=pd.DataFrame(columns=list(df.columns)) # to get the columns from the original Dataframe
    _df2['text']=remove_one_words_from_list(website_text,_df.loc['language']).copy()