Search code examples
pythonpandasdataframeconcatenation

How to combine DataFame column data and fixed text string


I want to combine 4 columns within a larger DataFrame with a custom (space) delimiter (which I have done with the code below) but then I want to add a fixed string to the start and end of each concatenation. The columns are pairs of X & Y coordinates, but they can be dealt with as str for this purpose (once I've trimmed to 3 decimal places).

I have found many options on this website for joining the columns, but none to join columns and a consistent fixed string.The lazy way would be for me to just make two more DataFrame columns, one for the start, one for the end, and cat everything. Is there a more sophisticated way to do it?

import pandas as pd
from pandas import DataFrame
import numpy as np

def str_join(df, sep, *cols):
    from functools import reduce
    return reduce (lambda x,y: x.astype(str).str.cat(y.astype(str), sep=sep),
                   [df[col] for col in cols])

data= pd.read_csv('/Users/XXXXXX/Desktop/Lines.csv')
df=pd.DataFrame(data, columns=['Name','SOLE','SOLN','EOLE','EOLN','EOLKP','Wind','Wave']) 

df['SOLE']=round(df['SOLE'],3)
df['SOLN']=round(df['SOLN'],3)
df['EOLE']=round(df['EOLE'],3)
df['EOLN']=round(df['EOLN'],3)

df['WKT']=str_join(df,' ','SOLE','SOLN','EOLE','EOLN')

df.to_csv('OutLine.csv') #turn on to create output file

which gives me.

WKT
476912.131 6670122.285 470329.949 6676260.271

What I want to do is add '(LINESTRING ' to the start of each concatenation and ')' to the end of each to give me.

WKT
(LINESTRING 476912.131 6670122.285 470329.949 6676260.271 )

Solution

  • You could also create a collection of the columns you want to export, do a quick data type format, and apply a join.

    target_cols = ['SOLE','SOLN','EOLE','EOLN',]
    
    
    # Make sure to use along axis 1 (columns) because default is 0
    # Also, if you're on Python 3.6+, I think you can use f-strings to format your floats.
    df['WKT'] = df[target_cols].apply(lambda x: '(LINESTRING ' + ' '.join(f"{i:.3f}" for i in x) + ')', axis=1)
    

    result:

    In [0]: df.iloc[:,-3:]
    
    Out [0]:
            Wind   Wave                                                WKT
        0  wind1  wave1  (LINESTRING 476912.131 6670122.285 470329.949 ...
    

    ** Sorry, I'm using Spyder, which is a terminal output miser. Here's a printout of 'WKT'

    In [1]: print(df['WKT'].values)
    Out [1]: ['(LINESTRING 476912.131 6670122.285 470329.949 6676260.271)']
    

    * **EDIT: To add a comma after 'SOLN', we could use an alternative route:

    target_cols = ['SOLE','SOLN','EOLE','EOLN',]
    
    # Format strings in advance
    # Set comma_col to our desired column name. This could also be a tuple for multiple names, then replace `==` with `in` in the loop below.
    
    comma_col = 'SOLN'
    
    
    # To find the last column, which doesn't need a space here, we just select the last value from our list.  I did it this way in case our list order doesn't match the dataframe order.
    
    last_col = df[target_cols].columns.values.tolist()[-1]
    
    
    # Traditional if-then method
    for col in df[target_cols]:
        if col == comma_col:
            df[col] = df[col].apply(lambda x: f"{x:.3f}" + ",") # Explicit comma
        elif col == last_col:
            df[col] = df[col].apply(lambda x: f"{x:.3f}")
        else:
            df[col] = df[col].apply(lambda x: f"{x:.3f}" + " ") # Explicit whitespace
    
    # Adding our 'WKT' column as before, but the .join() portion doesn't have a space in it now.
    df['WKT'] = df[target_cols].apply(lambda x: '(LINESTRING ' + ''.join(i for i in x) + ')', axis=1)
    

    Finally:

    In [0]: print(df['WKT'][0])
    Out [0]: (LINESTRING 476912.131 6670122.286,470329.950 6676260.271)