Search code examples
f#type-providersf#-datafsharp.data.typeproviders

Is it possible to use a custom type with the FSharp.Data.CsvProvider when specifying a Schema?


I'm working in F# with a CSV that looks like this:

When,Where,HowMuch
11/24/2019,Germany,100 EUR
11/25/2019,France,100 EUR
11/26/2019,Switzerland,50 CHF
11/27/2019,USA,75 USD

I'm using the CSV Type Provider in the FSharp.Data package to parse this data.

type CurrencyDetector = CsvProvider<"Currencies.csv">

Obviously the first column is a date, and the second is a string.

For the third, I'd like to use this type:

type Money (amountAndDenomination : string) =

    let parts = amountAndDenomination.Split ' '

    member __.Amount = Decimal.Parse parts.[0]
    member __.Denomination = parts.[1]

I've tried a few permutations of the Schema argument in my CsvProvider line, but nothing has worked so far. For example:

type CurrencyDetector = CsvProvider<"Currencies.csv",Schema="When (date),Where (string),HowMuch (Money)">

When comes out as DateTime and Where as string, but HowMuch becomes a string property named HowMuch (Money):

Demonstration of type provider properties in use

Is there a way to use my own classes with the CsvProvider, or is this just not possible?


Solution

  • According to the documentation for the CsvProvider, I don't think it is possible:

    Schema parameter: "Optional column types, in a comma separated list. Valid types are int, int64, bool, float, decimal, date, guid, string, int?, int64?, bool?, float?, decimal?, date?, guid?, int option, int64 option, bool option, float option, decimal option, date option, guid option and string option. You can also specify a unit and the name of the column like this: Name (type<\unit>), or you can override only the name. If you don't want to specify all the columns, you can reference the columns by name like this: ColumnName=type."

    Note in the above, however, that there is the possibility of using units of measure. So you might explore creating units of measure for the currency denominations. The type provider might require a different format for that column though.

    In the documentation for the CSV Type Provider, you can find further information about units of measure and also how to "transform the columns themselves by using Map", which should enable you to map the string type to a custom type. Looking at the source code, suggests that the Rows collection is a sequence of Row objects, each of which is a tuple. So you should be able to use Seq.map and/or any other function in the Seq module to post-process the generated collection.