Search code examples
c#deedle

Concatenating two string columns using Deedle in C#


The function add_ints correctly adds two integer columns

A,B
2,3
5,7
9,11

in a CSV file.

Why does the function add_strings not correctly concatenate two string columns

L,R
"a","b"
"c","d"
"e","f"

into a third column

L,R,C
"a","b","ab"
"c","d","cd"
"e","f","ef"

when starting from a similar CSV file?

using Deedle;
using System.IO;

namespace NS
{
    class TwoColumnOps
    {
        static void Main(string[] args)
        {
            string root = "path/to";
            add_ints(root);
            add_strings(root);
        }
        static void add_ints(string root)
        {
            Deedle.Frame<int, string> df = Frame.ReadCsv(Path.Combine(root, "data_ints.csv"));

            Series<int, int> a = df.GetColumn<int>("A");
            Series<int, int> b = df.GetColumn<int>("B");

            Series<int, int> c = a + b;
            df.AddColumn("C", c);
            df.Print();
        }
        static void add_strings(string root)
        {
            Deedle.Frame<int, string> df = Frame.ReadCsv(Path.Combine(root, "data_strings.csv"));

            Series<int, string> a = df.GetColumn<string>("L");
            Series<int, string> b = df.GetColumn<string>("R");

            // Series<int, string> c = a + b;
            // Series<int, string> c = $"{a} and {b}";
            Series<int, string> c = string.Concat(a, b);

            df.AddColumn("C", c);
            df.Print();
        }
    }
}

The error for all three styles of concatenation is:

Error   CS0029  Cannot implicitly convert type 'string' to 'Deedle.Series<int, string>' 

Solution

  • The reason why + works on series of numbers, but string.Concat does not work on series of strings is that the series type defines an overloaded + operator for numerical series. This sadly only works on numbers.

    For non-numeric series, the easiest option is to use ZipInner to align the two series. This gives you a series of tuples. You can then use Select to transfom the values in an element-wise way:

    var df = Frame.ReadCsv("/some/test/file.csv");
    var s1 = df.GetColumn<string>("first");
    var s2 = df.GetColumn<string>("second");
    var added = s1.ZipInner(s2).Select(t => t.Value.Item1 + t.Value.Item2);
    df.AddColumn("added", added);