Search code examples
f#deedle

Column/Validation guarantees with Deedle


Is there a way to express the notion that a data frame has already been validated? The best way I can think of is creating a wrapper type to restrict access. I welcome any suggestions.

Simple example:

#r "nuget: Deedle"

open Deedle

type CustomFrame = 
    | Some of Frame<int,string>
    | None
let map mapping option = match option with None -> None | Some x -> Some (mapping x)
let iter action option = match option with None -> () | Some x -> action x

let parse (df:Frame<_,_>) = 
    let keys = df.ColumnKeys |> Set.ofSeq
    if keys.Contains("Entry") && keys.Contains("Color") then
        df    
        |> Frame.indexRowsInt "Entry"
        |> CustomFrame.Some
    else
    CustomFrame.None

let go (df:CustomFrame) =
    df
    |> map (Frame.filterRowsBy "Color" "Red")

let data = "Entry;Color;N\n1;Red;7\n2;Blue;42\n3;Blue;21"
let bytes = System.Text.Encoding.UTF8.GetBytes data
let stream =  new MemoryStream( bytes )

Frame.ReadCsv(stream = stream,separators = ";",hasHeaders = true)
|> parse
|> go
|> iter (fun d-> d.Print())
     Color N 
1 -> Red   7 

Solution

  • Two suggestions:

    • Have parse return a standard Option value, as Fyodor suggests.
    • Short-circuit the computation if validation fails. In other words, if parse returns None, don't call go and iter at all.

    If you really want to program defensively after parse, and no further validation is required, you no longer need a None value. So you could use a simplified wrapper type to ensure you always have a validated frame:

    type ValidFrame = Valid of Frame<int,string>
    
    let map f (Valid df) = f df |> Valid
    let iter (f : _ -> unit) (Valid df) = f df
    let go = map (Frame.filterRowsBy "Color" "Red")
    

    And then use it like this:

    Frame.ReadCsv(stream = stream,separators = ";",hasHeaders = true)
    |> parse
    |> Option.map (
        Valid
          >> go
          >> iter (fun df -> df.Print()))
    

    However, personally, I consider the wrapper type to be overkill unless there's a compelling reason for it.