Search code examples
regexdatagridviewscreen-scrapingmatch

Web Scraper - Regex Match.Value returning string length not string itself


I'm having trouble configuring a web scraper that i'm working on for a project i've on the go at present

I'm trying to scrape a series of links from a page in order to assess which ones I want to process. Here is my code:

public partial class Form1 : Form
{
    private byte[] aRequestHTML;
    private string sourceString = null;
    string[] a;
    WebClient objWebClient = new WebClient();
    LinkScraper linkScraper = new LinkScraper();

    public Form1()
    {
        InitializeComponent();
    }

    private void button1_Click(object sender, EventArgs e)
    {
        ScrapeLinks(textBox1.Text);
    }


    public void ScrapeLinks(string sourceLink)
    {
        // gets the HTML from the url written in the textbox
        aRequestHTML = objWebClient.DownloadData(sourceLink);
        // creates UTf8 encoding object
        UTF8Encoding utf8 = new UTF8Encoding();
        // gets the UTF8 encoding of all the html we got in aRequestHTML
        sourceString = utf8.GetString(aRequestHTML);
        // this is a regular expression to check for the urls 
        Regex r = new Regex("\\<a\\shref\\=(.*)\\>(.*)\\<\\/a\\>");
        // get all the matches depending upon the regular expression
        MatchCollection mcl = r.Matches(sourceString);

        a = new string[mcl.Count];
        int i = 0;
        foreach (Match ml in mcl)
        {
            // Add the extracted urls to the array list
            a[i] = ml.ToString();
            Console.WriteLine(a[i]);
            i++;
        }

        dataGridView1.DataSource = a;
        // binds the databind

        // The following lines of code writes the extracted Urls to the file named test.txt
        StreamWriter sw = new StreamWriter("test.txt");
        foreach (string aElement in a)
        {
            sw.Write(aElement + "\n");
        }
        sw.Close();
    }
}

My issue arises from setting my datagrid datasource. Instead of the datagrid being populated with the list of strings it is populated with each strings length instead. As you will see i've a test.txt file writing out to see if I was doing something stupid but the text file contains each string as i'd expect to see it in the datagrid

I've trawled the forums for 12 hrs for a solution but with no joy

Could someone be kind enough to advise why the .Value is not returning my strings into the string array 'a' for binding to the datagrid?

Any help is as always greatly appreciated

Regards Barry


Solution

  • Found the solution folks just now

    DataGridView displays the first property it can find for a string which is its length property Workaround is to use a DataTable

     DataTable links = new DataTable();
     links.Columns.Add("Link URL");
    
     foreach (Match ml in mcl)
     {
       // Add the extracted urls to table
       links.Rows.Add(new object[] {ml.Value});
     }