Suppose I have CSV data with a categorical variable in it, like
Entry Color
0 -> 1 Red
1 -> 2 Blue
I would like to translate the variable into a discriminated union. I have tried row.GetAs<Color>
, this results in an InvalidCastException
. If I use fromString
/toString
, I have to keep track of which variable is already cast/read from records and which is not/ read from csv data. Is there a better solution?
#r "nuget: Deedle"
open Deedle
//https://stackoverflow.com/questions/21559497/create-discriminated-union-case-from-string
module Util =
open Microsoft.FSharp.Reflection
let toString (x:'a) =
let (case, _ ) = FSharpValue.GetUnionFields(x, typeof<'a>)
case.Name
let fromString<'a> (s:string) =
match FSharpType.GetUnionCases typeof<'a> |> Array.filter (fun case -> case.Name = s) with
|[|case|] -> (FSharpValue.MakeUnion(case,[||]) :?> 'a)
|_ -> failwith $"Unknown union case {s}"
type Color =
| Red
| Blue
| Green
override this.ToString() = Util.toString this
static member fromString s = Util.fromString<Color> s
let data = "Entry;Color\n1;Red\n2;Blue"
//https://stackoverflow.com/questions/44344061/converting-a-string-to-a-stream/44344794
let bytes = System.Text.Encoding.UTF8.GetBytes data
let stream = new MemoryStream( bytes )
let df:Frame<int,string> = Frame.ReadCsv(
stream = stream,
separators = ";",
hasHeaders = true
)
df.Print()
//let col = df |> Frame.mapRowValues (fun row -> row.GetAs<Color>"Color")
//Invalid cast from 'System.String' to 'FSI_...+Color'.
let col' = df |> Frame.mapRowValues (fun row -> Color.fromString (row.GetAs<string> "Color"))
//works
df.ReplaceColumn("Color", col')
df.SaveCsv(__SOURCE_DIRECTORY__ + "/df.csv",includeRowKeys=false)
let df' = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/df.csv", schema="int,Color")
df |> Frame.mapRowValues (fun row -> row.GetAs<Color> "Color")
//works
df' |> Frame.mapRowValues (fun row -> row.GetAs<Color> "Color")
//breaks
Unfortunately, there is no way to tell Deedle to convert particular columns to a discriminated union when reading CSV data. (This would not really work with unions that have cases with arguments and Deedle also does not know what types are defined in your F# code.)
The best way is something along the lines of what you are currently doing - that is, read the CSV file with categorical values as string and then parse those manually and replace the column. I would probably do this by getting the specified series and using Series.mapValues
to transform the data (as that is a bit more direct than using Frame.mapRowValues
):
let df = Frame.ReadCsv(stream = stream, separators = ";", hasHeaders = true)
let newCol = df.Columns.["Color"].As<string>() |> Series.mapValues Color.fromString
df.ReplaceColumn("Color", newCol)