Search code examples
csvjuliaparquet

How to convert CSV to Parquet in Julia


I have a CSV file that I want to convert to Parquet in Julia. I couldn't find anything about it in the forums nor the docs on this conversion. Is such a conversion possible in Julia or do I just simply read the CSV as Parquet? If so? How can I go about doing that?

This is what I have so far.

begin
    using Pkg
    Pkg.add("PlutoUI")
    Pkg.add("HTTP")
    Pkg.add("StatsModels")
    Pkg.activate(".")
    import CSV, DataFrames, Dates, StatsPlots, StatsModels
    import DataFrames.DataFrame
    using Plots, PlutoUI, HTTP, DelimitedFiles, Parquet
end

begin
    df = CSV.read("/home/onur/julia-assignment/temp.csv", DataFrame)
end

Solution

  • Use Parquet.jl as in code below:

    using CSV,DataFrames,Parquet
    c = CSV.read(IOBuffer("a;b;c\n1;2.5;a\n2;3.5;b"), DataFrame, delim=";")
    Parquet.write_parquet("dat.parquet", c)
    

    As for a test let us try to read this back:

    julia> Parquet.read_parquet("dat.parquet") |> DataFrame
    2×3 DataFrame
     Row │ a       b         c
         │ Int64?  Float64?  String?
    ─────┼───────────────────────────
       1 │      1       2.5  a
       2 │      2       3.5  b