I've created a new column with str.findall() method because the original value is a sentence, like "Preis ab 23,48 EUR pro Person/Tag" so I extracted the number ['23,48'] as a new value, but some sentences contain two prices, Like "from €83 to €120", so I will get the new value ['83' , '120'], now I want to replace the row with two values with the average, but I can't find a solution plan
The extract method is as follows:
df['pricerange_All'] = df.pricerange.str.findall(r'([0-9,.]+(?:\.[0-9,.]+)?)')
and the dataframe which looks like this:
pricerange_All
0 ['16.51']
1 ['17,61']
2 ['16,96']
3 ['13,70']
4 ['17,85']
.
100 ['690.94', '690.76']
as you can see some of the values have two prices
I've try the solution :
df2= pd.DataFrame([pd.Series(x) for x in df.pricerange_All])
df2.columns = ['price_{}'.format(x+1) for x in df2.columns]
but it didn't work
How to separate the column or create a new column that save the mean value of each row? like:
mean_price
0 16.51
1 17,61
2 16,96
3 13,70
4 17,85
. .
100 690.85
I've racked my brains and really need to solve this as soon as possible. Thanks.
Really appreciate for the reply.
If you do in fact want every comma and decimal to remain the same, then the final column must be of a string type because you can't have floats and strings together.
You could use pd.apply
and convert the elements of any lists longer than 1 to floats, take the mean, and convert the result back to a string type. For example:
df['pricerange_All_new'] = df['pricerange_All'].apply(lambda row: row[0] if len(row) == 1 else str(np.mean([float(p.replace(',','.')) for p in row])))
Result:
df
pricerange_All pricerange_All_new
0 [16.51] 16.51
1 [17,61] 17,61
2 [16,96] 16,96
3 [13,70] 13,70
4 [17,85] 17,85
5 [690.94, 690.76] 690.85