Search code examples
c#htmlvb.netparsinghref

how to extract HTML table contents to DataTable


I have this html page and the contents in the page looks like below

enter image description here

I'm trying to fetch the contents in the page into a DataTable and display it to a grid

for example in

<a href='/exodus-5.1/bacon/exodus-5.1-20150612-NIGHTLY-bacon.zip'>exodus-5.1-20150612-NIGHTLY-bacon.zip</a>

I need to get the name of the link as well as the uri too

name : - exodus-5.1-20150612-NIGHTLY-bacon.zip
uri : - /exodus-5.1/bacon/exodus-5.1-20150612-NIGHTLY-bacon.zip

following is what I have ended up

 Dim request As HttpWebRequest = HttpWebRequest.Create(url)
 request.Method = WebRequestMethods.Http.Get
 Dim response As HttpWebResponse = request.GetResponse()
 Dim reader As New StreamReader(response.GetResponseStream())
 Dim webpageContents As String = reader.ReadToEnd()
 response.Close()

Solution

  • Although not VB.Net this is a very easy task to achieve using another .Net language F# and the HTML Type Provider which is part of the FSharp.Data project available via Nuget.

    The HTML Type Provider gives you typed access to HTML documents inside Visual Studio, i.e.

    // Reference the FSharp.Data Nuget package
    #r @".\packages\FSharp.Data.2.2.3\lib\net40\FSharp.Data.dll"
    // Type provider over your HTML document specified in yourUrl
    type html = FSharp.Data.HtmlProvider<yourUrl>
    // Get the rows from the HTML table in the page
    let allRows = html.GetSample().Tables.Table1.Rows |> Seq.skip 1
    // Skip empty rows
    let validRows = allRows |> Seq.where (fun row -> row.Name <> "")
    

    Then load the valid rows into a DataTable:

    // Reference the System.Data assembly
    #r "System.Data.dll"
    // Create a DataTable
    let table = new System.Data.DataTable()
    // Add column names to the table
    for name in ["Parent";"Name";"Last modified";"Size"] do table.Columns.Add(name) |> ignore
    // Add row values to the table
    for row in validRows do
      table.Rows.Add(row.Column1, row.Name, row.``Last modified``, row.Size) |> ignore
    

    and finally show the DataTable on a form:

    // Reference the Windows.Forms assembly
    #r "System.Windows.Forms.dll"
    open System.Windows.Forms
    // Create a form
    let form = new Form(Width=480,Height=320)
    // Initialise a grid
    let grid = new DataGridView(Dock=DockStyle.Fill)
    form.Controls.Add(grid)
    // Set the grid data source with the table
    form.Load.Add(fun _ -> grid.DataSource <- table)
    form.Show()
    

    Which shows a populated DataGrid in a form:

    DataTable