Search code examples
javascriptfunctional-programming

How to chain map and filter functions in the correct order


I really like chaining Array.prototype.map, filter and reduce to define a data transformation. Unfortunately, in a recent project that involved large log files, I could no longer get away with looping through my data multiple times...

My goal:

I want to create a function that chains .filter and .map methods by, instead of mapping over an array immediately, composing a function that loops over the data once. I.e.:

const DataTransformation = () => ({ 
    map: fn => (/* ... */), 
    filter: fn => (/* ... */), 
    run: arr => (/* ... */)
});

const someTransformation = DataTransformation()
    .map(x => x + 1)
    .filter(x => x > 3)
    .map(x => x / 2);

// returns [ 2, 2.5 ] without creating [ 2, 3, 4, 5] and [4, 5] in between
const myData = someTransformation.run([ 1, 2, 3, 4]); 

My attempt:

Inspired by this answer and this blogpost I started writing a Transduce function.

const filterer = pred => reducer => (acc, x) =>
    pred(x) ? reducer(acc, x) : acc;

const mapper = map => reducer => (acc, x) =>
    reducer(acc, map(x));

const Transduce = (reducer = (acc, x) => (acc.push(x), acc)) => ({
    map: map => Transduce(mapper(map)(reducer)),
    filter: pred => Transduce(filterer(pred)(reducer)),
    run: arr => arr.reduce(reducer, [])
});

The problem:

The problem with the Transduce snippet above, is that it runs “backwards”... The last method I chain is the first to be executed:

const someTransformation = Transduce()
    .map(x => x + 1)
    .filter(x => x > 3)
    .map(x => x / 2);

// Instead of [ 2, 2.5 ] this returns []
//  starts with (x / 2)       -> [0.5, 1, 1.5, 2] 
//  then filters (x < 3)      -> [] 
const myData = someTransformation.run([ 1, 2, 3, 4]);

Or, in more abstract terms:

Go from:

Transducer(concat).map(f).map(g) == (acc, x) => concat(acc, f(g(x)))

To:

Transducer(concat).map(f).map(g) == (acc, x) => concat(acc, g(f(x)))

Which is similar to:

mapper(f) (mapper(g) (concat))

I think I understand why it happens, but I can't figure out how to fix it without changing the “interface” of my function.

The question:

How can I make my Transduce method chain filter and map operations in the correct order?


Notes:

  • I'm only just learning about the naming of some of the things I'm trying to do. Please let me know if I've incorrectly used the Transduce term or if there are better ways to describe the problem.
  • I'm aware I can do the same using a nested for loop:

const push = (acc, x) => (acc.push(x), acc);
const ActionChain = (actions = []) => {
  const run = arr =>
    arr.reduce((acc, x) => {
      for (let i = 0, action; i < actions.length; i += 1) {
        action = actions[i];

        if (action.type === "FILTER") {
          if (action.fn(x)) {
            continue;
          }

          return acc;
        } else if (action.type === "MAP") {
          x = action.fn(x);
        }
      }

      acc.push(x);
      return acc;
    }, []);

  const addAction = type => fn => 
    ActionChain(push(actions, { type, fn }));

  return {
    map: addAction("MAP"),
    filter: addAction("FILTER"),
    run
  };
};

// Compare to regular chain to check if 
// there's a performance gain
// Admittedly, in this example, it's quite small...
const naiveApproach = {
  run: arr =>
    arr
      .map(x => x + 3)
      .filter(x => x % 3 === 0)
      .map(x => x / 3)
      .filter(x => x < 40)
};

const actionChain = ActionChain()
  .map(x => x + 3)
  .filter(x => x % 3 === 0)
  .map(x => x / 3)
  .filter(x => x < 40)


const testData = Array.from(Array(100000), (x, i) => i);

console.time("naive");
const result1 = naiveApproach.run(testData);
console.timeEnd("naive");

console.time("chain");
const result2 = actionChain.run(testData);
console.timeEnd("chain");
console.log("equal:", JSON.stringify(result1) === JSON.stringify(result2));

  • Here's my attempt in a stack snippet:

const filterer = pred => reducer => (acc, x) =>
  pred(x) ? reducer(acc, x) : acc;

const mapper = map => reducer => (acc, x) => reducer(acc, map(x));

const Transduce = (reducer = (acc, x) => (acc.push(x), acc)) => ({
  map: map => Transduce(mapper(map)(reducer)),
  filter: pred => Transduce(filterer(pred)(reducer)),
  run: arr => arr.reduce(reducer, [])
});

const sameDataTransformation = Transduce()
  .map(x => x + 5)
  .filter(x => x % 2 === 0)
  .map(x => x / 2)
  .filter(x => x < 4);
  
// It's backwards:
// [-1, 0, 1, 2, 3]
// [-0.5, 0, 0.5, 1, 1.5]
// [0]
// [5]
console.log(sameDataTransformation.run([-1, 0, 1, 2, 3, 4, 5]));


Solution

  • before we know better

    I really like chaining ...

    I see that, and I'll appease you, but you'll come to understand that forcing your program through a chaining API is unnatural, and more trouble than it's worth in most cases.

    const Transduce = (reducer = (acc, x) => (acc.push(x), acc)) => ({
      map: map => Transduce(mapper(map)(reducer)),
      filter: pred => Transduce(filterer(pred)(reducer)),
      run: arr => arr.reduce(reducer, [])
    });
    

    I think I understand why it happens, but I can't figure out how to fix it without changing the “interface” of my function.

    The problem is indeed with your Transduce constructor. Your map and filter methods are stacking map and pred on the outside of the transducer chain, instead of nesting them inside.

    Below, I've implemented your Transduce API that evaluates the maps and filters in correct order. I've also added a log method so that we can see how Transduce is behaving

    const Transduce = (f = k => k) => ({
      map: g =>
        Transduce(k =>
          f ((acc, x) => k(acc, g(x)))),
      filter: g =>
        Transduce(k =>
          f ((acc, x) => g(x) ? k(acc, x) : acc)),
      log: s =>
        Transduce(k =>
          f ((acc, x) => (console.log(s, x), k(acc, x)))),
      run: xs =>
        xs.reduce(f((acc, x) => acc.concat(x)), [])
    })
    
    const foo = nums => {
      return Transduce()
        .log('greater than 2?')
        .filter(x => x > 2)
        .log('\tsquare:')
        .map(x => x * x)
        .log('\t\tless than 30?')
        .filter(x => x < 30)
        .log('\t\t\tpass')
        .run(nums)
    }
    
    // keep square(n), forall n of nums
    //   where n > 2
    //   where square(n) < 30
    console.log(foo([1,2,3,4,5,6,7]))
    // => [ 9, 16, 25 ]


    untapped potential

    Inspired by this answer ...

    In reading that answer I wrote, you overlook the generic quality of Trans as it was written there. Here, our Transduce only attempts to work with Arrays, but really it can work with any type that has an empty value ([]) and a concat method. These two properties make up a category called Monoids and we'd be doing ourselves a disservice if we didn't take advantage of transducer's ability to work with any type in this category.

    Above, we hard-coded the initial accumulator [] in the run method, but this should probably be supplied as an argument – much like we do with iterable.reduce(reducer, initialAcc)

    Aside from that, both implementations are essentially equivalent. The biggest difference is that the Trans implementation provided in the linked answer is Trans itself is a monoid, but Transduce here is not. Trans neatly implements composition of transducers in the concat method whereas Transduce (above) has composition mixed within each method. Making it a monoid allows us to rationalize Trans the same way do all other monoids, instead of having to understand it as some specialized chaining interface with unique map, filter, and run methods.

    I would advise building from Trans instead of making your own custom API


    have your cake and eat it too

    So we learned the valuable lesson of uniform interfaces and we understand that Trans is inherently simple. But, you still want that sweet chaining API. OK, ok...

    We're going to implement Transduce one more time, but this time we'll do so using the Trans monoid. Here, Transduce holds a Trans value instead of a continuation (Function).

    Everything else stays the same – foo takes 1 tiny change and produces an identical output.

    // generic transducers
    const mapper = f =>
      Trans(k => (acc, x) => k(acc, f(x)))
    
    const filterer = f =>
      Trans(k => (acc, x) => f(x) ? k(acc, x) : acc)
    
    const logger = label =>
      Trans(k => (acc, x) => (console.log(label, x), k(acc, x)))
    
    // magic chaining api made with Trans monoid
    const Transduce = (t = Trans.empty()) => ({
      map: f =>
        Transduce(t.concat(mapper(f))),
      filter: f =>
        Transduce(t.concat(filterer(f))),
      log: s =>
        Transduce(t.concat(logger(s))),
      run: (m, xs) =>
        transduce(t, m, xs)
    })
    
    // when we run, we must specify the type to transduce
    //   .run(Array, nums)
    // instead of
    //   .run(nums)
    

    Expand this code snippet to see the final implementation – of course you could skip defining a separate mapper, filterer, and logger, and instead define those directly on Transduce. I think this reads nicer tho.

    // Trans monoid
    const Trans = f => ({
      runTrans: f,
      concat: ({runTrans: g}) =>
        Trans(k => f(g(k)))
    })
    
    Trans.empty = () =>
      Trans(k => k)
    
    const transduce = (t, m, xs) =>
      xs.reduce(t.runTrans((acc, x) => acc.concat(x)), m.empty())
    
    // complete Array monoid implementation
    Array.empty = () => []
    
    // generic transducers
    const mapper = f =>
      Trans(k => (acc, x) => k(acc, f(x)))
      
    const filterer = f =>
      Trans(k => (acc, x) => f(x) ? k(acc, x) : acc)
      
    const logger = label =>
      Trans(k => (acc, x) => (console.log(label, x), k(acc, x)))
    
    // now implemented with Trans monoid
    const Transduce = (t = Trans.empty()) => ({
      map: f =>
        Transduce(t.concat(mapper(f))),
      filter: f =>
        Transduce(t.concat(filterer(f))),
      log: s =>
        Transduce(t.concat(logger(s))),
      run: (m, xs) =>
        transduce(t, m, xs)
    })
    
    // this stays exactly the same
    const foo = nums => {
      return Transduce()
        .log('greater than 2?')
        .filter(x => x > 2)
        .log('\tsquare:')
        .map(x => x * x)
        .log('\t\tless than 30?')
        .filter(x => x < 30)
        .log('\t\t\tpass')
        .run(Array, nums)
    }
    
    // output is exactly the same
    console.log(foo([1,2,3,4,5,6,7]))
    // => [ 9, 16, 25 ]


    wrap up

    So we started with a mess of lambdas and then made things simpler using a monoid. The Trans monoid provides distinct advantages in that the monoid interface is known and the generic implementation is extremely simple. But we're stubborn or maybe we have goals to fulfill that are not set by us – we decide to build the magic Transduce chaining API, but we do so using our rock-solid Trans monoid which gives us all the power of Trans but also keeps complexity nicely compartmentalised.


    dot chaining fetishists anonymous

    Here's a couple other recent answers I wrote about method chaining