I have a Wiki page and for specific reasons I am interested in counting tables there.
Apparently, deep inside the props Lists
and Tables
are represented as sequences:
Is there a way to retrieve those counts in code?
I have tried several horrible hacks:
open System
open FSharp.Data
open FSharp.Data.Runtime
type Wiki = HtmlProvider<"https://en.wikipedia.org/wiki/F_Sharp_(programming_language)">
let getTablesCount (url : string) =
let data = Wiki.Load url
let tables = data.Tables
// won't compile - type constraint mismatch
// let attempt1 = tables :> Map<string, HtmlTable> |> Map.count
// won't compile - type is not compatible
// let attempt2 = tables |> Seq.cast<Tuple<string, HtmlTable>> |> Seq.length
// compiles - throws in the runtime InvalidCastException
// let attempt3 = (box tables) :?> Map<string, HtmlTable> |> Map.count
42
Nothing works, likely for good. Maybe I am missing something obvious?
I am ready to parse html with regex use e.g. FSharp.Data HTML Parser for it, just want to be sure.
I'm not very familiar with the HtmlProvider
, I guess you could use reflection and maybe get the non-public types, which is quite hacky, or use the HtmlAgilityPack.
Within the HtmlProvider searching for the "table" nodes gives me a count of 10:
open FSharp.Data
type Wiki = HtmlProvider<"https://en.wikipedia.org/wiki/F_Sharp_(programming_language)">
[<EntryPoint>]
let main argv =
let getTablesCount (url : string) =
let data = Wiki.Load url
let tables = data.Tables
let props = tables.Html.Descendants("table")
props |> Seq.length |> (printfn "%A %A" "Table count is:")
getTablesCount("https://en.wikipedia.org/wiki/F_Sharp_(programming_language)")
0