Search code examples
pythonpandasdataframenumpyruntime

Faster way of building string combinations (with separator) than using a for loop?


I am working with a relatively large dataset (in Python with Pandas) and am trying to build combinations of multiple columns as a string.

Let's say I have two lists; x and y, where: x = ["sector_1", "sector_2", "sector_3", ...] and y = [7, 19, 21, ...].

I have been using a for loop to build combinations such that combined = ["sector_1--7", "sector_1--19", "sector_1--21", "sector_2--7", "sector_2--19", ...], with the separator here defined as --.

My current code looks like this:

sep = '--'
combined = np.empty(0, dtype='object')
for x_value in x:
    for y_value in y:
        combined = np.append(combined,  str(x_value) + sep + str(y_value))
combined = pd.DataFrame(combined)
combined = combined.iloc[:, 0].str.split(sep, expand=True)

The code above works but I was just wondering if there was a better way (perhaps more efficient in runtime).


Solution

  • Try this:

    import itertools as it
    combined = [f'{a}--{b}' for a, b in it.product(x, y)]
    

    Output:

    >>> combined
    ['sector_1--7',
     'sector_1--19',
     'sector_1--21',
     'sector_1--Ellipsis',
     'sector_2--7',
     'sector_2--19',
     'sector_2--21',
     'sector_2--Ellipsis',
     'sector_3--7',
     'sector_3--19',
     'sector_3--21',
     'sector_3--Ellipsis',
     'Ellipsis--7',
     'Ellipsis--19',
     'Ellipsis--21',
     'Ellipsis--Ellipsis']
    

    Instead of all that though, you should use a combination of np.tile and np.repeat:

    combined_df = pd.DataFrame({0: np.repeat(x, len(x)), 1: np.tile(y, len(x))})
    

    Output:

    >>> combined_df
               0         1
    0   sector_1         7
    1   sector_1        19
    2   sector_1        21
    3   sector_1  Ellipsis
    4   sector_2         7
    5   sector_2        19
    6   sector_2        21
    7   sector_2  Ellipsis
    8   sector_3         7
    9   sector_3        19
    10  sector_3        21
    11  sector_3  Ellipsis
    12  Ellipsis         7
    13  Ellipsis        19
    14  Ellipsis        21
    15  Ellipsis  Ellipsis