c#sql-server t-sql sqlclr user-defined-aggregate

SQL CLR aggregate not terminating correctly when applied over huge amount of data

I have create and used a lot of times a SQL CLR aggregate which is concatenating values - it also order the values by specified number and use user input separator for concatenating the them.

I have used the same aggregate over large amount of data and notice that the separator is not used - the values are concatenated but without the separator.

After a lot of tests, I found that in the Terminate method, the delimiter is missing/not read. I have double check this using hard-coded separator - everything worked fine.

I guess that there is something wrong with my Read and Write method (used when large amount of data is handled) but not able to understand what.

Here is the function code:

[Serializable]
[
    Microsoft.SqlServer.Server.SqlUserDefinedAggregate
    (
        Microsoft.SqlServer.Server.Format.UserDefined,
        IsInvariantToNulls = true,
        IsInvariantToDuplicates = false,
        IsInvariantToOrder = false,
        IsNullIfEmpty = false,
        MaxByteSize = -1
    )
]
/// <summary>
/// Concatenates <int, string, string> values defining order using the specified number and using the given delimiter
/// </summary>
public class ConcatenateWithOrderAndDelimiter : Microsoft.SqlServer.Server.IBinarySerialize
{
    private List<Tuple<int, string>> intermediateResult;
    private string delimiter;
    private bool isDelimiterNotDefined;

    public void Init()
    {
        this.delimiter = ",";
        this.isDelimiterNotDefined = true;
        this.intermediateResult = new List<Tuple<int, string>>();
    }

    public void Accumulate(SqlInt32 position, SqlString text, SqlString delimiter)
    {
        if (this.isDelimiterNotDefined)
        {
            this.delimiter = delimiter.IsNull ? "," : delimiter.Value;
            this.isDelimiterNotDefined = false;
        }

        if (!(position.IsNull || text.IsNull))
        {
            this.intermediateResult.Add(new Tuple<int, string>(position.Value, text.Value));
        }
    }

    public void Merge(ConcatenateWithOrderAndDelimiter other)
    {
        this.intermediateResult.AddRange(other.intermediateResult);
    }

    public SqlString Terminate()
    {
        this.intermediateResult.Sort();
        return new SqlString(String.Join(this.delimiter, this.intermediateResult.Select(tuple => tuple.Item2)));
    }

    public void Read(BinaryReader r)
    {
        if (r == null) throw new ArgumentNullException("r");

        int count = r.ReadInt32();
        this.intermediateResult = new List<Tuple<int, string>>(count);

        for (int i = 0; i < count; i++)
        {
            this.intermediateResult.Add(new Tuple<int, string>(r.ReadInt32(), r.ReadString()));
        }

        this.delimiter = r.ReadString();
    }

    public void Write(BinaryWriter w)
    {
        if (w == null) throw new ArgumentNullException("w");

        w.Write(this.intermediateResult.Count);

        foreach (Tuple<int, string> record in this.intermediateResult)
        {
            w.Write(record.Item1);
            w.Write(record.Item2);
        }

        w.Write(this.delimiter);
    }
}

Solution

The Merge() method is invoked only when parallelism is used and a particular group is spread across more than 1 thread. In this case, Init() has been called, and 0 or more instances of Accumulate().

So, in the case of parallelism, if Init() has been called but no Accumulate() has been called yet, then the value in delimiter would be what was set in the Init() method. The code in the question shows that it is being set to ,, but I suspect that was added later in trying to figure this out. Of course, this assumes that a comma is being passed in as the delimiter into Accumulate(). Or perhaps the comma was always being set as the default in Init(), but another character was passed in via Accumulate() and that was not coming through the final output (the specific call to the UDA is not shown in the question, nor is the incorrect output, so there is some ambiguity here).

While the fix shown in the other answer appears to work, it is not a universal fix given that there could be a case where the current object has had Accumulate() called at least once, but the "other" object being merged into this one is still empty (no matching rows perhaps, or some other reason that values were not stored locally when Accumulate() was called). In that case, the current object would have the desired delimiter but the "other" object would still have the default. The ideal solution would be to also store the value of isDelimiterNotDefined in the Write() method, get it back out again in the Read() method, and compare the local value to other.isDelimiterNotDefined in the Merge() method so that you can determine if you should keep the local or other value of delimiter (depending on which one is set / defined).