I have a dataset that contains NameAccount of reddit and messages that they have written with time and subreddit. Like this:
For my porpuse, I need an array with [name of account , all the messages that he has written] (because body (look at the picture) has only one message, but if we see all the authors there will repetitions).
So I have written this program:
test_data = pd.read_csv("addres/test_data.csv", encoding="utf8")
test = test_data[['author', 'body']]
lista = [list(x) for x in test.values]
test=dict()
for i in range(1107946):
if lista[i][0] in test:
test[lista[i][0]].append(lista[i][1])
else:
test[lista[i][0]]=[lista[i][1]]
And I obtain something that I like. If I write test["Name"] I obtain all the messages of that person. For example:
test["ZenDragon"]
['At 7680 by 4320 with 64x AA, right?', 'Wrong subreddit for this kind of post, but /r/frugal and /r/lifeprotips might be interested.', 'This is something GravityBox can do. (a module for XPosed Framework)',etc]
Now I want to join all these lines. For example: ["message1","message2","message3",etc..] -> ["message 1 message 2 etc..."] I have tried to write this thing:
for i in test.keys():
X.append(" ".join(line.strip() for line in test[i]))
But I have this error: 'float' object has no attribute 'strip'
But i don't have float object?
Well, obviously there exists a key i
in your test
dictionary, whose associated value is a list of elements, at least one of which is not a string, but a float.
You can wrap your code in a try-catch to help and narrow down the cause of your problem:
for i in test.keys():
try:
for line in test[i]:
line.strip()
except:
print(i)
print(line)