I have three columns (col1, col2, col3) having string contents with line breaks inserted. I would like to concatenate col1,col2 and col3 with new column (DesiredCol) as shown below
Here is the dataset
import pandas as pd
d = {'col1': ["ABC1"+"\n"+"ABC2"+"\n"+"ABC3", "BBC1"+"\n"+"BBC2"+"\n"+"BBC3"], 'col2': ["A"+"\n"+"B"+"\n"+"C", "A"+"\n"+"B"+"\n"+"C"],'col3': ["YES"+"\n"+"NO"+"\n"+"YES", "NO"+"\n"+"NO"+"\n"+"YES"]}
df = pd.DataFrame(data=d)
I tried using lambda function as below, however its not giving the desired column
cols=['col1','col2','col3']
df['DesiredCol'] = df[cols].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
Here is one possible solution (the key being to use the zip
built-in function to join the respective elements, and then further apply
methods on the dataframe to insert the desired newline characters specifically):
(Thanks to mozway.) For an understanding of what this code achieves, and how each operation changes the data step by step into the desired column output, see solution "B" below. However, for large dataframes, it is better to avoid the use of
applymap
and instead combinemap
andzip
in a list comprehension as shown here.
df['DesiredCol'] = ['\n\n'.join(map(' '.join, zip(*(y.splitlines() for y in x))))
for x in zip(*(df[c] for c in cols))]
Providing a breakdown of the dataframe operations. *Note:
applymap
is a simple way of applying a function to every element (or cell) of a dataframe and is shown here for illustration of the intermittent states of the dataframe, but usually there is a faster approach.
However, solution "A" above will be faster and is recommended. Using the transpositions as shown here is not the most efficient approach.
df = pd.DataFrame(data=d)
df2 = df.T.apply(
lambda x: list(
map(" ".join, list(zip(*[y.split("\n") for y in x.values])))
)
).T
cols = df2.columns
df['col4'] = df2[cols].astype(str).apply('\n'.join, axis=1)
df2 = df.applymap(lambda s: s.split('\n'))
df3 = df2.T.apply(lambda x: list(zip(*x.values)))
df3 = df3.T.apply(lambda x: x.str.join(' '))
cols = df3.columns
df['col4'] = df3[cols].astype(str).apply('\n'.join, axis=1)
applymap
* to delimit each cell in cols 1-3import pandas as pd
d = {
"col1": [
"ABC1" + "\n" + "ABC2" + "\n" + "ABC3",
"BBC1" + "\n" + "BBC2" + "\n" + "BBC3",
],
"col2": ["A" + "\n" + "B" + "\n" + "C", "A" + "\n" + "B" + "\n" + "C"],
"col3": [
"YES" + "\n" + "NO" + "\n" + "YES",
"NO" + "\n" + "NO" + "\n" + "YES",
],
}
df = pd.DataFrame(data=d)
df2 = df.applymap(lambda s: s.split("\n"))
df2
zip
to respectively join the elements of the lists from the columnsdf3 = df2.T.apply(lambda x: list(zip(*x.values)))
df3 = df3.T.apply(lambda x: x.str.join(" "))
df3
cols = df3.columns
df["col4"] = df3[cols].astype(str).apply("\n".join, axis=1)
df
which gives the final solution: