It seems the problem is not difficult, but somehow I am not able to make it work. My problem is as follows. I have a dataframe say as follows:
dfin
A B C
a 1 198q24
a 2 128q6
a 6 1456
b 7 67q22
b 1 56
c 3 451q2
d 11 1q789
So now what I want to do is as follows, whenever the script will encounter a 'q', it will split the values and create a separate column with values starting from 'q'. The part before 'q' will remain in the original ( or maybe can crate a new column). So my desired output should be as follows:
dfout
A B C D
a 1 198 q24
a 2 128 q6
a 6 1456
b 7 67 q22
b 1 56
c 3 451 q2
d 11 1 q789
So what I have tried till now is as follows:
dfout = dfin.replace('\q\d*', '', regex=True)
Its creating one column without q, but I am not able to create the column D and not working as expected.
Any help/ideas will help and be appreciated.
import pandas as pd
def get_input() -> pd.DataFrame:
csv_text = """
a 1 198q24
a 2 128q6
a 6 1456
b 7 67q22
b 1 56
c 3 451q2
d 11 1q789
""".strip()
return pd.DataFrame(map(str.split, csv_text.splitlines()), columns=["a", "b", "c"])
def split_on_q(df_in: pd.DataFrame) -> pd.DataFrame:
df = df_in.c.str.split("q", expand=True)
df_out = df_in.copy()
df_out["c"] = df[0]
df_out["d"] = _prepend_q(df[1])
return df_out
def _prepend_q(series: pd.Series) -> pd.Series:
return series.apply(lambda s: None if s is None else f"q{s}")
if __name__ == "__main__":
print(split_on_q(get_input()))
Output:
a b c d
0 a 1 198 q24
1 a 2 128 q6
2 a 6 1456 None
3 b 7 67 q22
4 b 1 56 None
5 c 3 451 q2
6 d 11 1 q789