Search code examples
pandasaverageseries

pandas series average of values


I have a pandas series like this

0         [['word1', 527], ['word2', 708]]

1         [['word3', 976], ['word1', 980], ['word3',...

where the values are in the form of a string. Basically this whole thing is a str: "[['word1', 527], ['word2', 708]]"

I want an array or a counter that is of the form

word1 number1 

word2 number2
.
.
.

where number_i is an average of all the numeric values associated with the [word,value] tuple from the pandas series.

I tried extracting and parsing the string to get the values and then as and when a value is encountered, taking average with the previous value. Wanted to know if there is a efficient way


Solution

  • Here is one way:

    import ast
    import pandas as pd
    
    series = pd.Series([
        "[['word1', 527], ['word2', 708]]",
        "[['word3', 976], ['word1', 980], ['word3', 100]]"
    ])
    
    out = (
        series
        .apply(ast.literal_eval) # Parse the strings to list[tuple[str, int]]
        .explode()               # Place each tuple on its own line
        .apply(pd.Series)        # Convert to dataframe with 2 columns: word and number
        .groupby(0)[1]           # Group by word, compute on number
        .mean()                  # Take the mean
    )
    

    out:

    0
    word1    753.5
    word2    708.0
    word3    538.0
    Name: 1, dtype: float64
    

    but I would seriously reconsider your data formatting...