I can't find why is this error is happening.
TypeError: expected string or bytes-like object
I am trying to get the time that 'Huffington Post' appears as sponsor, using the code:
polls = list(set(covid_approval_polls["sponsor"]))
Huff_Post_regexp = r"\bHuffington Post\b"
Huff_Post = [
approval
for approval in polls
if re.search(Huff_Post_regexp, approval) is not None
]
The dataframe looks like:
start_date end_date pollster sponsor sample_size population \
0 2020-02-02 2020-02-04 YouGov Economist 1500.0 a
1 2020-02-02 2020-02-04 YouGov Economist 376.0 a
2 2020-02-02 2020-02-04 YouGov Economist 523.0 a
3 2020-02-02 2020-02-04 YouGov Economist 599.0 a
4 2020-02-07 2020-02-09 Morning Consult NaN 2200.0 a
The arguments of the re.search
must be "string" or "byte". As I see in your "sponser" column, there is a NaN which interpreted as float
so in that iteration approval
is neither string nor byte. This is why you get that TypeError.
Write this code to see that in action:
for item in list(set(covid_approval_polls["sponsor"])):
print(item, type(item))
To solve this, you can either ignore the re.search
with single condition pd.isna()
or maybe replace the NaNs in DataFrame with empty string ""
.