Search code examples
u-sql

U-SQL user defined combiner returns wrong data


I experiment with Data lake Analytics and have created a simple USQL combiner but it returns wrong data as long as I understand it. It returns 3 rows :

2R

3R

3R

but I'd expect it should return 6 rows : 1L, 2L, 3L, 1R, 2R, 3R.

This is the code :

@T1 = SELECT * FROM (VALUES ("1"), ("2"), ("3")) AS T(DummyValue);
@T2 = SELECT * FROM (VALUES ("1"), ("2"), ("3")) AS T(DummyValue);

@Result =
 COMBINE @T1 AS fis
 WITH @T2 AS frs
 ON fis.DummyValue == frs.DummyValue
 PRODUCE DummyValue string
 USING new Demo.MyCombiner();

OUTPUT @Result TO "/o.csv" USING Outputters.Csv();


 [SqlUserDefinedCombiner(Mode = CombinerMode.Full)]
 public class MyCombiner : ICombiner {

  public override IEnumerable<IRow> Combine(IRowset left, IRowset right, IUpdatableRow output) {
   var CopyLeft = left.Rows.ToList();
   var CopyRight = right.Rows.ToList();

   foreach (var Item in CopyLeft) {
    var X = Item.Get<string>("DummyValue");
    output.Set<string>("DummyValue", X + "L");
   }
   foreach (var Item in CopyRight) {
    var X = Item.Get<string>("DummyValue");
    output.Set<string>("DummyValue", X + "R");
   }

   yield return output.AsReadOnly();
  }

 }

Solution

  • The operation you are doing is really a UNION ALL so this could be done more simply with base U-SQL, eg something like:

    @Result =
        SELECT DummyValue + "L" AS DummyValue
        FROM @T1
        UNION ALL
        SELECT DummyValue + "R" AS DummyValue
        FROM @T2;
    

    Assuming you want to use a custom COMBINER for some specific reason, then you are calling the yield return... only once for the two loops, so that's why you are getting only three rows. Perhaps you could tell us more about what you are trying to do?

    However if you really need to do UNION ALL with custom combiner then this worked for me:

    [SqlUserDefinedCombiner]
        public class MyCombiner : ICombiner
        {
            public override IEnumerable<IRow> Combine(IRowset left, IRowset right, IUpdatableRow output)
            {
                foreach (IRow rowR in right.Rows)
                {
                    output.Set<string>("NewValue", rowR.Get<string>("DummyValue").ToString() + "R");
                    yield return output.AsReadOnly();
                }
    
                foreach (IRow rowL in left.Rows)
                {
                    output.Set<string>("NewValue", rowL.Get<string>("DummyValue").ToString() + "L");
                    yield return output.AsReadOnly();
                }
            }
        }
    

    U-SQL to call my custom combiner:

    @Result =
        COMBINE @T1 AS fis WITH @T2 AS frs
        ON fis.DummyValue == frs.DummyValue
        PRODUCE NewValue string
        USING new Demo.MyCombiner();
    

    My results: Results