I have to split column game
by the delimiter -
df:
game home_team away_team
0 Bordj Menail – Hamra Annaba Bordj Menail Hamra Annaba
1 CA Batna – US Souf CA Batna US Souf
2 Eulma – Ouargla Eulma Ouargla
1860 Bella Vista – Miramar Bella Vista Miramar
1861 U.A.N.L.- Tigres W – Club Leon W U.A.N.L.- Tigres W Club Leon W
1862 Queretaro – Toluca Queretaro Toluca
0 Sport Recife - Imperatriz Sport Recife - Imperatriz None
1 ABC - America RN ABC - America RN None
2 Frei Paulistano - Nautico Frei Paulistano - Nautico None
3 Botafogo PB - Confianca Botafogo PB - Confianca None
I am trying
df[team_cols] = df['game'].str.split(' – ', expand=True, n=1)
But I am only able to do so partially as above
When I look it via excel, I can see that the delimiter "appears" differently
e.g.
Sport Recife â Sport Recife ## Here delimiter is a special character?
Bordj Menail – Hamra Annaba
How can I split the values? And what is this behaviour?
Unclear what you mean, but I would do this this way
import pandas as pd
data = {
'game': [
'Bordj Menail – Hamra Annaba',
'CA Batna – US Souf',
'Eulma – Ouargla',
'Bella Vista – Miramar',
'U.A.N.L.- Tigres W – Club Leon W',
'Queretaro – Toluca',
'Sport Recife - Imperatriz',
'ABC - America RN',
'Frei Paulistano - Nautico',
'Botafogo PB - Confianca'
]
}
df = pd.DataFrame(data)
# Split the game column
pattern = r'\s*[-–â]\s*'
team_cols = ['home_team', 'away_team']
df[team_cols] = df['game'].str.split(pattern, expand=True, n=1)
# Print the result
print(df)
which gives
game home_team away_team
0 Bordj Menail – Hamra Annaba Bordj Menail Hamra Annaba
1 CA Batna – US Souf CA Batna US Souf
2 Eulma – Ouargla Eulma Ouargla
3 Bella Vista – Miramar Bella Vista Miramar
4 U.A.N.L.- Tigres W – Club Leon W U.A.N.L. Tigres W – Club Leon W
5 Queretaro – Toluca Queretaro Toluca
6 Sport Recife - Imperatriz Sport Recife Imperatriz
7 ABC - America RN ABC America RN
8 Frei Paulistano - Nautico Frei Paulistano Nautico
9 Botafogo PB - Confianca Botafogo PB Confianca