I have extracted the tweets and locations of the users as well as other essential tweet information. The next step is extracting the water level data (that is, if the tweet has a 'number' followed by 'm' or 'meter' then that could be treated as the water level data.
the dataset sample is this ('text' is the column name of the extracted tweets, 'df' is the name of the data frame where column 'text' can be found):
text
there is 12m water here
I saw a 5m wave height
I have tried to use the following code:
length = len(df['text'])
for i in range(length):
if df.loc[df['text'].str.contains('%d'+ 'm')] or if df.loc[df['text'].str.contains('%d'+ 'meter')] :
df.loc[df['remarks']]== 'YES'
else:
df.loc[df['remarks']] == 'NO'
my error is:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I just understand that '%d' are used for digits however I am not an expert in python. Anyone who can help alter the code stated above?
You should use regex, for example:
import re
txt = "The rain is 12m"
x = re.findall("\d[\d]*m*", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")