Search code examples
ado.netf#haskellhdbc

Haskell HDBC Elegance in F#?


I'm struck by Haskell's terseness and elegance. But I work in a .Net house, so I use F# when I can get away with it--I may be the only one of hundreds across the country who uses it.

Does ADO.NET or F# offer something as terse and elegant as HDBC's executeMany? I'm making my way through Real World Haskell. In chapter 21 it offers this example:

ghci> conn <- connectSqlite3 "test1.db"
ghci> stmt <- prepare conn "INSERT INTO test VALUES (?, ?)"
ghci> executeMany stmt [[toSql 5, toSql "five's nice"], [toSql 6, SqlNull]]
ghci> commit conn
ghci> disconnect conn

I'd like to get this elegance and terseness in my F#. I've seen a lot of hype around using parameterized queries to avoid SQL injection attacks. I'm not using them in this case for three reasons:

  1. I find parameterized queries in .Net ugly and burdensome.
  2. My data comes from the corporate office, so it's (mostly) clean.
  3. My table has 34 columns. I despise the idea of parameterizing a query with 34 columns.

Here's my F# code:

module Data

open System
open System.Data
open System.Data.OleDb
open System.Text.RegularExpressions

type Period = Prior | Current

let Import period records db =
    use conn = new OleDbConnection(@"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + db + ";Persist Security Info=False;")

    let execNonQuery s =
        let comm = new OleDbCommand(s, conn) in
        comm.ExecuteNonQuery() |> ignore

    let enquote = sprintf "\"%s\""
    let escapeQuotes s = Regex.Replace(s, "\"", "\"\"")
    let join (ss:string[]) = String.Join(",", ss)

    let table = match period with
                | Prior   -> "tblPrior"
                | Current -> "tblCurrent"
    let statements =
        [| for r in records do
               let vs = r |> Array.map (escapeQuotes >> enquote) |> join
               let vs' = vs + sprintf ",\"14\",#%s#" (DateTime.Now.ToString "yyyy-MM-dd") in
               yield sprintf "INSERT INTO %s ( [Field01], [Field02], [Field03] [Field04], [Field05], [Field06], [Field07], [Field08], [Field09], [Field10], [Field11], [Field12], [Field13], [Field14], [Field15], [Field16], [Field17], [Field18], [Field19], [Field20], [Field21], [Field22], [Field23], [Field24], [Field25], [Field26], [Field27], [Field28], [Field29], [Field30], [Field31], [Field32], [Field33], [Field34] ) VALUES (%s)" table vs' |] in

    do conn.Open()
    execNonQuery (sprintf "DELETE FROM %s" table)
    statements |> Array.iter execNonQuery

I've renamed the fields of the table(s) for security reasons.

Because all the fields on the table are text, I can easily Array.map them to escape and quote the values.

At between 9,000 and 10,000 records per day to import to each of the two tables, I want to do this as efficiently as possible. Hence my interest in the executeMany of Haskell. Too, though, I like the idea behind parameterized queries, and I like the way Hasekll has implemented them. Is there something equivalent in terseness and elegance in F#?


Solution

  • I agree with @JonnyBoats comment that generally using an F# SQL type provider like SqlDataConnection (LINQ-to-SQL) or SqlEntityConnection (Entity Framework) would be far more elegant than any kind of solution involving building insert statement strings by hand.

    But, there is one important qualifier to your question: "At between 9,000 and 10,000 records per day to import to each of the two tables, I want to do this as efficiently as possible." In a scenario like this, you'll want to use SqlBulkCopy for efficient bulk inserts (it leverages native database driver features for much faster inserts than you are likely getting with HDBC's executeMany).

    Here's a small example that should help you getting started using SqlBulkCopy with F#: https://stackoverflow.com/a/8942056/236255. Note that you'll be working with a DataTable to stage the data which though old and somewhat awkward to use from F#, is still superior to building insert statement strings in my opinion.

    Update in response to comment

    Here's a generalized approach to using SqlBulkCopy which is improved for your scenario (we pass in a column specification separately from the row data, and both are dynamic):

    //you must reference System.Data and System.Xml
    open System
    open System.Data
    open System.Data.SqlClient
    
    let bulkLoad (conn:SqlConnection) tableName (columns:list<string * Type>) (rows: list<list<obj>>) =
        use sbc = new SqlBulkCopy(conn, SqlBulkCopyOptions.TableLock, null, BatchSize=500, BulkCopyTimeout=1200, DestinationTableName=tableName)
        sbc.WriteToServer(
            let dt = new DataTable()
            columns
            |> List.iter (dt.Columns.Add>>ignore)
    
            for row in rows do
                let dr = dt.NewRow()
                row |> Seq.iteri(fun i value -> dr.[i] <- value)
                dt.Rows.Add(dr)
            dt)
    
    //example usage:
    
    //note: since you know all your columns are of type string, you could define columns like
    //let columns = ["Field1", "Field2", "Field3"] |> List.map (fun name -> name, typeof<String>)
    let columns = [
        "Field1", typeof<String>
        "Field2", typeof<String>
        "Field3", typeof<String>
    ]
    
    let rows = [
        ["a"; "b"; "c"]
        ["d"; "e"; "f"]
        ["g"; "h"; "i"]
        ["j"; "k"; "l"]
        ["m"; "n"; "o"]
    ]
    
    //a little funkiness to transform our list<list<string>> to list<list<obj>>, 
    //probably not needed in practice because you won't be constructing your lists literally
    let rows = rows |> List.map (fun row -> row |> List.map (fun value -> value :> obj))
    
    bulkLoad conn "tblPrior" columns rows
    

    You could get even fancier / more terse using an approach involving reflection. e.g. create a type like

    type RowData = { Field1:string; Field2:string; Field3:string }
    

    and make a bulkLoad with a signature that takes a list<'a> argument such that it reflects over the property names and types of typeof<'a> to build the DataTable Columns, and similarly uses reflection to iterate over all the properties of a row instance to create and add a new row to the DataTable. In fact, this question shows how to make a generic ToDataTable method that does it (in C#).