Clean pipeline of operations in python

I have a long pipeline which does various operations to a list of strings input_list. The pipeline maps each word to lowercase, replaces underscores, filters out a specific word, remove duplicates, and clips to a certain length.

result = list(set(filter(lambda x : x != word, map(lambda x : x.lower().replace('_',' '), input_list))))[:clip_length]

My problem with this is its not very readable: its not very clear what the input to this pipeline is and in what order operations are applied. It hurts to look at a bit, and I probably won't know what it does later on unless its been nicely commented.

Is there any way to write a pipeline in python where I can clearly see which operations happen in what order, what goes in and what goes out? To be more specific, I'd like to be able to write it so that operations go either right-to-left or left-to-right, not inner-to-outer.

Solution

Well it's functional, but it has no (consistent) style. The "problem" is the wide variety of syntaxes used for these expressions.

calling a func is done with normal prefix notation f(arg)
getting a sub array uses a special syntax arr[n?:m?], instead of a function slice(n,m)
set is a completely different type, but it is used intermediately to because sets happen to have some of the behavior we want - what we want is "unique" elements in an iterable, and so our function should be called unique. If we happen to implement unique using a set, that's fine, but that is not the concern of the reader, whose mind is free from such distractions
x.lower() is a dynamic call with lower in infix position. Compare to prefix position lower(x). The same applies for s.replace(pat,rep) vs replace(s, pat, rep)
map and filter however do have a functional interface map(f,iter) and filter(f,iter)

But to write a program like the one you've shared, sort of misses out on functional style's strongest and most versatile trait: the function. Yes, functional programming is also about composing beautiful chains of expressions, but not at the cost of readability! If readability starts to hurt, make it better with... a function :D

Consider this program that uses a uniform functional style. It's still a regular python program.

def program (word = '', clip_length = 5, input = ''):
  make_words = \
    compose ( lower
            , partial (replace, '_', ' ')
            )

  process = \
    compose ( partial (map, make_words)
            , partial (filter, lambda x: x != word)
            , unique
            , partial (take, clip_length)
            )

  return process (input)

print (program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e'))
# ['d', ' ', 'e', 'a']
# Note, your output may vary. More on this later.

And now the dependencies. Each function operates solely on its arguments and returns an output.

def partial (f, *xs):
  return lambda *ys: f (*xs, *ys)

def compose (f = None, *fs):
  def comp (x):
    if f is None:
      return x
    else:
      return compose (*fs) (f (x))
  return comp

def take (n = 0, xs = []):
  return xs [:n]

def lower (s = ''):
  return s .lower ()

def replace (pat = '', rep = '', s = ''):
  return s .replace (pat, rep)

def unique (iter):
  return list (set (iter))

Really, this question couldn't have setup a better stage for some of these bullet points. I'm going to revisit the choice of set as used in the original question (and in the program above) because there's a huge problem: if you re-run our program several times, we will get a different output. In fancier words, we have no referential transparency. That's because Python's sets are unordered, and when we convert from an ordered list, to a set, then back to a list, it's not guaranteed that we'll always get the same elements.

Using set this way shows good intuition on how to solve the uniques problem using existing language features, but we want to restore referential transparency. In our program above, we clearly encoded our intention of getting an inputs unique elements by calling the unique function on it.

# deterministic implementation of unique
def unique (iter):
  result = list ()
  seen = set ()
  for x in iter:
    if x not in seen:
      seen .add (x)
      result .append (x)
  return result

Now when we run our program, we always get the same result

print (program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e'))
# ['a', ' ', 'c', 'd']
# always the same output now

This brings me to another point. Because we abstracted unique into its own function, we're automatically given a scope to define its behavior in. I chose to use imperative style in unique's implementation, but that's fine as it is still a pure function and the consumer of the function cannot tell the difference. You can come up with 100 other implementations of unique so long as program works, it doesn't matter.

Functional programming is about functions. The language is yours to tame. It's still a regular python program.

def fwd (x):
  return lambda k: fwd (k (x))

def program (word = '', clip_length = 5, input = ''):
  make_words = \
    compose ( lower
            , partial (replace, '_', ' ')
            )

  fwd (input)                               \
    (partial (map, make_words))             \
    (partial (filter, lambda x: x != word)) \
    (unique)                                \
    (partial (take, clip_length))           \
    (print)

program ('b', 4, 'A_a_a_B_b_b_c_c_c_d_e')
# ['a', ' ', 'c', 'd']

Touch and experiment with this program on repl.it