Search code examples
c++c++20

Correct approach to convert std::vector<std::vector<std::string>> to std::vector<std::vector<std::variant<Types...>>>


template <class... Types>
class Dataframe
{
public:
    Dataframe(const std::vector<std::vector<std::string>>& input);
    std::vector<std::vector<std::variant<Types...>>> Data();

private:
    std::vector<std::vector<std::variant<Types...>>> data_;
};

template <class... Types>
std::vector<std::vector<std::variant<Types...>>> Dataframe<Types...>::Data() {
    return data_;
}


std::vector<std::vector<std::string>> input = {{"2024-03-03", "1.3", "1"}, {"2024-04-04", "3.4", "10"}}

auto df = Dataframe<std::string, double, int>(input);
df.Data(); //expected output: {{"2024-03-03", "2024-04-04"}, {1.3, 3.4}, {1, 10}}


std::vector<std::vector<std::string>> input_2 = {{"2024-03-03", "1", "1", "2.3"}, {"2024-04-04", "3.4", "10", "1.2"}}
auto df_2 = Dataframe<std::string, int, double, double>(input_2); // Can also handle this 
df_2.Data(); //expected output: {{"2024-03-03", "2024-04-04"}, {1, 3}, {1.0, 10.0}, {2.3, 1.2}}  

I am trying to create a dataframe class with a dynamic type signature to handle multiple data types. I am new to templates (and c++ in general). I am trying to create a class similar to pandas dataframe so I can then get averages, rolling averages, etc. with the columns.

  1. Is this the right approach? or would std::vector<std::tuple<Types...>> data_; be better?
  2. If this is the right approach, since Dataframe<std::string, int, double> can vary in length, how can I create the std::vector<std::vector<std::variant<Types...>>> data_ so when I loop over the input I can do something like below but how do I take the types into account here?
for (std::size_t i = 0; i < input.size(); ++i) {
    for (std::size_t j = 0; j < input[i].size(); ++j) {
        data_[i].push_back(input[i][j]);
    }
}

Not an expert in templates and keep getting errors.


Solution

  • As types are known, std::variant seems not needed, and using std::tuple seems to do the job:

    template <class... Types>
    class Dataframe
    {
    public:
        Dataframe(const std::vector<std::vector<std::string>>& input)
        {
            auto make_frame = [](const std::vector<std::string>& v){
                assert(v.size() == sizeof...(Types));
                return [&]<std::size_t...Is>(std::index_sequence<Is...>){
                    return std::tuple<Types...>(convert_to(std::type_identity<Types>{}, v[Is])...);
                }(std::make_index_sequence<sizeof...(Types)>());
            };
            for (const auto& v : input) {
                data_.emplace_back(make_frame(v));
            }
    
        }
        std::vector<std::tuple<Types...>> Data() { return data_; }
    
    private:
        std::vector<std::tuple<Types...>> data_;
    };
    

    You also want to transform std::vector<std::tuple<Types...>> into std::tuple<std::vector<Types>...>:

    template <typename... Types>
    std::tuple<std::vector<Types>...>
    AoS2SoA(const std::vector<std::tuple<Types...>>& v)
    {
        std::tuple<std::vector<Types>...> res;
        for (const auto& e : v)
        {
            [&]<std::size_t...Is>(std::index_sequence<Is...>){
                (std::get<Is>(res).push_back(std::get<Is>(e)), ...);
            }(std::make_index_sequence<sizeof...(Types)>());
        }
        return res;
    }
    

    Demo