Search code examples
f#type-providersf#-data

Is it possible to count tables in the HTML Type Provider?


I have a Wiki page and for specific reasons I am interested in counting tables there.

Apparently, deep inside the props Lists and Tables are represented as sequences: enter image description here

Is there a way to retrieve those counts in code?

I have tried several horrible hacks:

open System
open FSharp.Data
open FSharp.Data.Runtime

type Wiki = HtmlProvider<"https://en.wikipedia.org/wiki/F_Sharp_(programming_language)">

let getTablesCount (url : string) =
    let data = Wiki.Load url
    let tables = data.Tables

    // won't compile - type constraint mismatch
    // let attempt1 = tables :> Map<string, HtmlTable> |> Map.count

    // won't compile - type is not compatible
    // let attempt2 = tables |> Seq.cast<Tuple<string, HtmlTable>> |> Seq.length

    // compiles - throws in the runtime InvalidCastException
    // let attempt3 = (box tables) :?> Map<string, HtmlTable> |> Map.count

    42

Nothing works, likely for good. Maybe I am missing something obvious?

I am ready to parse html with regex use e.g. FSharp.Data HTML Parser for it, just want to be sure.


Solution

  • I'm not very familiar with the HtmlProvider, I guess you could use reflection and maybe get the non-public types, which is quite hacky, or use the HtmlAgilityPack.

    Within the HtmlProvider searching for the "table" nodes gives me a count of 10:

    enter image description here

    open FSharp.Data
    
    type Wiki = HtmlProvider<"https://en.wikipedia.org/wiki/F_Sharp_(programming_language)">
    
    [<EntryPoint>]
    let main argv = 
    
        let getTablesCount (url : string) =
            let data = Wiki.Load url
            let tables = data.Tables
            let props = tables.Html.Descendants("table") 
            props |> Seq.length |> (printfn "%A %A" "Table count is:")
    
    
        getTablesCount("https://en.wikipedia.org/wiki/F_Sharp_(programming_language)")    
        0
    

    enter image description here