Search code examples
pythonpandaspetl

Pythonic syntax for extended variable transformation (multiple lengthy method calls)


Trying to seek some guidance on the best way of curating an extensive ETL process. My pipeline has a reasonably sleek extract section, and loads into a designated file in a succinct manner; but the only way I can think to do transformation steps is a series of variable assignments:

a = ['some','form','of','petl','data']
b = petl.addfield(a, 'NewStrField', str(a))
c = petl.addrownumbers(b)
d = petl.rename(c, 'row', 'ID')
.......

Reformatting to assign the same variable name makes some sense, but doesn't aid readability:

a = ['some','form','of','petl','data']
a = petl.addfield(a, 'NewStrField', str(a))
a = petl.addrownumbers(a)
a = petl.rename(a, 'row', 'ID')
.......

I've read up on multiple method calls like this:

a = ['some','form','of','data']

result = petl.addfield(a, 'NewStrField', str(a))
    .addrownumbers(a)
    .rename(a, 'row', 'ID')
.......

but that won't work, as the functions require the table as the first parameter passed.

Is there some fundamental I am missing? I'm loathe to believe that the right way of doing this commercially involves 1000+ LOC?


Solution

  • Create a list of partially applied functions, then loop over that list.

    transforms = [
        lambda x: petl.addfield(x, 'NewStrField', str(x)),
        petl.addrownumbers,
        lambda x: petl.rename(x, 'row', 'ID')
    ]
    
    a = ['some', 'form', 'of', 'petl', 'data']
    for f in transforms:
        a = f(a)
    

    Your "total" transformation is the composition of the transformations in the list transforms. You can do those upfront (at the cost of some additional function calls) using a library that provides function composition, or rolling your own.

    def compose(*f):
        if not f:
            return lambda x: x  # Identity function, the identity for function composition
        return lambda x: f[0](compose(f[1:])(x))
    
    # Note the reversed order of the functions compared to 
    # the list above.
    transform = compose(
        lambda x: petl.rename(x, 'row', 'ID'),
        petl.addrownumbers,
        lambda x: petl.addfield(x, 'NewStrField', str(x)),
    )
    
    
    a = ['some', 'form', 'of', 'petl', 'data']
    result = transform(a)