Search code examples
pythonpandasdataframenumber-formatting

how could I create a mean value column with the multiple values column


I've created a new column with str.findall() method because the original value is a sentence, like "Preis ab 23,48 EUR pro Person/Tag" so I extracted the number ['23,48'] as a new value, but some sentences contain two prices, Like "from €83 to €120", so I will get the new value ['83' , '120'], now I want to replace the row with two values with the average, but I can't find a solution plan

The extract method is as follows:

df['pricerange_All'] = df.pricerange.str.findall(r'([0-9,.]+(?:\.[0-9,.]+)?)')

and the dataframe which looks like this:

       pricerange_All 
0         ['16.51']
1         ['17,61']
2         ['16,96']
3         ['13,70']
4         ['17,85']
.
100       ['690.94', '690.76']

as you can see some of the values have two prices

I've try the solution :

df2= pd.DataFrame([pd.Series(x) for x in df.pricerange_All])
df2.columns = ['price_{}'.format(x+1) for x in df2.columns]

but it didn't work

How to separate the column or create a new column that save the mean value of each row? like:

         mean_price    
0         16.51 
1         17,61 
2         16,96   
3         13,70   
4         17,85  
.           .
100       690.85


I've racked my brains and really need to solve this as soon as possible. Thanks.

Really appreciate for the reply.


Solution

  • If you do in fact want every comma and decimal to remain the same, then the final column must be of a string type because you can't have floats and strings together.

    You could use pd.apply and convert the elements of any lists longer than 1 to floats, take the mean, and convert the result back to a string type. For example:

    df['pricerange_All_new'] = df['pricerange_All'].apply(lambda row: row[0] if len(row) == 1 else str(np.mean([float(p.replace(',','.')) for p in row])))
    

    Result:

    df
         pricerange_All pricerange_All_new
    0           [16.51]              16.51
    1           [17,61]              17,61
    2           [16,96]              16,96
    3           [13,70]              13,70
    4           [17,85]              17,85
    5  [690.94, 690.76]             690.85