I am trying to add another column that will contain combination of two columns (Total & percentage) into a result column(labels_value) which look like: (Total) percentage%
.
Basically to wrap bracket
strings on Total
column and add %
string at the end of combination of these two columns.
df
import polars as pl
so_df = pl.DataFrame(
[{'Flag': 'Outof Range',
'Category': 'Thyroid',
'len': 7,
'Total': 21,
'percentage': 33.33,
'value': 33.33},
{'Flag': 'Outof Range',
'Category': 'Inflammatory Marker',
'len': 2,
'Total': 8,
'percentage': 25.0,
'value': 25.0},
{'Flag': 'Outof Range',
'Category': 'Lipid',
'len': 12,
'Total': 63,
'percentage': 19.05,
'value': 19.05},
{'Flag': 'Outof Range',
'Category': 'LFT',
'len': 14,
'Total': 87,
'percentage': 16.09,
'value': 16.09},
{'Flag': 'Outof Range',
'Category': 'DLC',
'len': 11,
'Total': 126,
'percentage': 8.73,
'value': 8.73},
{'Flag': 'Outof Range',
'Category': 'Vitamin',
'len': 1,
'Total': 14,
'percentage': 7.14,
'value': 7.14},
{'Flag': 'Outof Range',
'Category': 'CBC',
'len': 2,
'Total': 45,
'percentage': 4.44,
'value': 4.44},
{'Flag': 'Outof Range',
'Category': 'KFT',
'len': 2,
'Total': 56,
'percentage': 3.57,
'value': 3.57},
{'Flag': 'Outof Range',
'Category': 'Urine Examination',
'len': 1,
'Total': 28,
'percentage': 3.57,
'value': 3.57},
{'Flag': 'Within Range',
'Category': 'Thyroid',
'len': 14,
'Total': 21,
'percentage': 66.67,
'value': -66.67},
{'Flag': 'Within Range',
'Category': 'Inflammatory Marker',
'len': 6,
'Total': 8,
'percentage': 75.0,
'value': -75.0},
{'Flag': 'Within Range',
'Category': 'Lipid',
'len': 51,
'Total': 63,
'percentage': 80.95,
'value': -80.95},
{'Flag': 'Within Range',
'Category': 'LFT',
'len': 73,
'Total': 87,
'percentage': 83.91,
'value': -83.91},
{'Flag': 'Within Range',
'Category': 'DLC',
'len': 115,
'Total': 126,
'percentage': 91.27,
'value': -91.27},
{'Flag': 'Within Range',
'Category': 'Vitamin',
'len': 13,
'Total': 14,
'percentage': 92.86,
'value': -92.86},
{'Flag': 'Within Range',
'Category': 'CBC',
'len': 43,
'Total': 45,
'percentage': 95.56,
'value': -95.56},
{'Flag': 'Within Range',
'Category': 'KFT',
'len': 54,
'Total': 56,
'percentage': 96.43,
'value': -96.43},
{'Flag': 'Within Range',
'Category': 'Urine Examination',
'len': 27,
'Total': 28,
'percentage': 96.43,
'value': -96.43},
{'Flag': 'Within Range',
'Category': 'Anemia',
'len': 38,
'Total': 38,
'percentage': 100.0,
'value': -100.0},
{'Flag': 'Within Range',
'Category': 'Diabetes',
'len': 22,
'Total': 22,
'percentage': 100.0,
'value': -100.0},
{'Flag': 'Within Range',
'Category': 'Electrolyte',
'len': 46,
'Total': 46,
'percentage': 100.0,
'value': -100.0}]
)
shape: (21, 6)
Flag Category len Total percentage value
str str i64 i64 f64 f64
"Outof Range" "Thyroid" 7 21 33.33 33.33
"Outof Range" "Inflammatory Marker" 2 8 25.0 25.0
"Outof Range" "Lipid" 12 63 19.05 19.05
"Outof Range" "LFT" 14 87 16.09 16.09
"Outof Range" "DLC" 11 126 8.73 8.73
… … … … … …
"Within Range" "KFT" 54 56 96.43 -96.43
"Within Range" "Urine Examination" 27 28 96.43 -96.43
"Within Range" "Anemia" 38 38 100.0 -100.0
"Within Range" "Diabetes" 22 22 100.0 -100.0
"Within Range" "Electrolyte" 46 46 100.0 -100.0
I have tried below three ways and none of them worked:
(so_df
# .with_columns(labels_value = "("+str(pl.col("Total"))+") "+str(pl.col("percentage"))+"%")
# .with_columns(labels_value = "".join(["(",str(pl.col("Total")),") ",str(pl.col("percentage")),"%"]))
# .with_columns(labels_value =pl.concat_str([pl.col("Total"),pl.col("percentage")])))
Desired Result would be like:
(21) 33.33% # 1st row of labels_value column
(8) 25%
(63) 19.05%
.
.
Here are a few options:
so_df.with_columns(
labels_value_1=pl.format("({}) {}%", "Total", "percentage"),
labels_value_2=pl.concat_str(
pl.lit("("), "Total", pl.lit(") "), "percentage", pl.lit("%")
),
labels_value_3=(
"("
+ pl.col("Total").cast(pl.String)
+ ") "
+ pl.col("percentage").cast(pl.String)
+ "%"
),
)
# Only Total, percentage and outputs shown for brevity
# ┌───────┬────────────┬────────────────┬────────────────┬────────────────┐
# │ Total ┆ percentage ┆ labels_value_1 ┆ labels_value_2 ┆ labels_value_3 │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ f64 ┆ str ┆ str ┆ str │
# ╞═══════╪════════════╪════════════════╪════════════════╪════════════════╡
# │ 21 ┆ 33.33 ┆ (21) 33.33% ┆ (21) 33.33% ┆ (21) 33.33% │
# │ 8 ┆ 25.0 ┆ (8) 25.0% ┆ (8) 25.0% ┆ (8) 25.0% │
# │ 63 ┆ 19.05 ┆ (63) 19.05% ┆ (63) 19.05% ┆ (63) 19.05% │
# │ 87 ┆ 16.09 ┆ (87) 16.09% ┆ (87) 16.09% ┆ (87) 16.09% │
# │ 126 ┆ 8.73 ┆ (126) 8.73% ┆ (126) 8.73% ┆ (126) 8.73% │
# │ … ┆ … ┆ … ┆ … ┆ … │
# │ 56 ┆ 96.43 ┆ (56) 96.43% ┆ (56) 96.43% ┆ (56) 96.43% │
# │ 28 ┆ 96.43 ┆ (28) 96.43% ┆ (28) 96.43% ┆ (28) 96.43% │
# │ 38 ┆ 100.0 ┆ (38) 100.0% ┆ (38) 100.0% ┆ (38) 100.0% │
# │ 22 ┆ 100.0 ┆ (22) 100.0% ┆ (22) 100.0% ┆ (22) 100.0% │
# │ 46 ┆ 100.0 ┆ (46) 100.0% ┆ (46) 100.0% ┆ (46) 100.0% │
# └───────┴────────────┴────────────────┴────────────────┴────────────────┘
pl.format
I would say is most idiomatic, and it does the casting to string for you (much like Python's f-strings). It uses {}
as a placeholder. This is what I'd personally recommend out of the options presented.
For concat_str
, you can add the parentheses and "%" with pl.lit
(strings are parsed as column names in this function). It will cast numeric columns to string for you.
Lastly, string concatenation with the +
operator does work, but you must cast the column(s) with Polars' .cast
method, rather than Python's builtin str
.
One thing to note is that row 2 of the output does differ slightly from your desired output. (8) 25%
(expected) vs (8) 25.0%
(actual). This is because the string representation of floating point columns in Polars contain at least one decimal place. If that is an issue, drop a comment and I will try to come up with a solution.