I have a pandas series like this
0 [['word1', 527], ['word2', 708]]
1 [['word3', 976], ['word1', 980], ['word3',...
where the values are in the form of a string. Basically this whole thing is a str
: "[['word1', 527], ['word2', 708]]"
I want an array or a counter that is of the form
word1 number1
word2 number2
.
.
.
where number_i
is an average of all the numeric values associated with the [word,value]
tuple from the pandas series.
I tried extracting and parsing the string to get the values and then as and when a value is encountered, taking average with the previous value. Wanted to know if there is a efficient way
Here is one way:
import ast
import pandas as pd
series = pd.Series([
"[['word1', 527], ['word2', 708]]",
"[['word3', 976], ['word1', 980], ['word3', 100]]"
])
out = (
series
.apply(ast.literal_eval) # Parse the strings to list[tuple[str, int]]
.explode() # Place each tuple on its own line
.apply(pd.Series) # Convert to dataframe with 2 columns: word and number
.groupby(0)[1] # Group by word, compute on number
.mean() # Take the mean
)
out:
0
word1 753.5
word2 708.0
word3 538.0
Name: 1, dtype: float64
but I would seriously reconsider your data formatting...