I have this dataframe:
sample = pl.DataFrame({"equip": ['AmuletsMedals', 'Guns, CrossbowsOff-Hands', 'Melee WeaponsShieldsOff-Hands',
'All Armor', 'Chest Armor', 'Shields', 'All WeaponsShieldsOff-Hands']})
print(sample)
shape: (7, 1)
┌───────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═══════════════════════════════╡
│ AmuletsMedals │
│ Guns, CrossbowsOff-Hands │
│ Melee WeaponsShieldsOff-Hands │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All WeaponsShieldsOff-Hands │
└───────────────────────────────┘
My aim is to put a comma between words:
answer = pl.DataFrame({"equip": ['Amulets, Medals', 'Guns, Crossbows, Off-Hands', 'Melee Weapons, Shields, Off-Hands',
'All Armor', 'Chest Armor', 'Shields', 'All Weapons, Shields, Off-Hands']})
print(answer)
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═════════════════════════════════════╡
│ Amulets, Medals │
│ Guns, Crossbows, Off-Hands │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All Weapons, Shields, Off-Hands │
└─────────────────────────────────────┘
I tried replace, but the replace didn't take an expression:
sample.with_columns(pl.col("equip").str.replace("[a-z][A-Z]", "[a-z], [A-Z]"))
and a tip found on polars github, but it cuts the last and first letter of the first and last word on each encounter, as it would with:
sample.with_columns(pl.col("equip").str.replace("[a-z][A-Z]", ", "))
Any ideas?
Bonus question: I imagine the answer for the simple case would also solve the harder case, but in case it does not, here is the hard case:
I do have another column with a slightly harder regex pattern than "[a-z][A-Z]", should be something like "[a-z][A-Z]|[a-z]+|[a-z][1-9]" (I did not stress much about the exact regex yet). The aim is also to just put a comma between attributes:
sample2 = pl.DataFrame({"attributes": ['+10% Aether Damage+30 Defensive Ability16% Aether Resistance6% Less Damage from Aetherials6% Less Damage from Aether Corruptions',
'4-6 Aether Damage+25% Aether Damage10% Physical Damage converted to Aether DamageAether Tendril (Granted by Item)',
'2-8 Lightning Damage+25% Lightning Damage+25% Electrocute Damage10% Physical Damage converted to Lightning DamageEmpowered Lightning Nova (Granted by Item)',
'+10 Health Regenerated per Second+24 Armor20% Poison & Acid Resistance',
'+22 Defensive Ability10% Chance to Avoid Projectiles+18 Armor',
'+15 Physique+10% Shield Block ChanceShield Slam (Granted by Item)',
'+10% Chaos Damage+30 Defensive Ability16% Chaos Resistance6% Less Damage from Chthonics']})
You can use capture groups in your pattern:
df.with_columns(pl.col("equip").str.replace_all(r"([a-z])([A-Z])", "$1, $2"))
shape: (7, 1)
┌─────────────────────────────────────┐
│ equip │
│ --- │
│ str │
╞═════════════════════════════════════╡
│ Amulets, Medals │
│ Guns, Crossbows, Off-Hands │
│ Melee Weapons, Shields, Off-Hand... │
│ All Armor │
│ Chest Armor │
│ Shields │
│ All Weapons, Shields, Off-Hands │
└─────────────────────────────────────┘
You may also want to use the unicode classes \p{lower}
and \p{upper}
instead.
The regex syntax that polars supports is: https://docs.rs/regex/latest/regex/