Search code examples
c#linqhtml-agility-pack

Working with HtmlAgilityPack, nested List and Linq


List<List<string>> table = playerDoc.DocumentNode
    .SelectSingleNode($"//*[@id='lg_team_user_leagues-{leagueId}']/div[4]/table/tbody")
    .Descendants("tr")
    .Skip(1)
    .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
    .ToList();

I have this code block which gathers all the correct information from a table on a website. My issue is the data looks like this:

data

I'm trying to figure out how to search the data for 2 matching strings, for example S16 and Pre and be able to set a class called CareerProperties (a class of props I can post if needed). I have tried different variations of the LINQ statement and using a foreach loop but I either get exceptions thrown or I get everythign in the table.

I'm trying to simplify my code as it takes around 3-4 seconds to retrieve the data using a foreach with xpaths, and when I tested the LINQ statement it came back as Elapsed: 00:00:00.0068306.

Any help would be appreciated as I'm still learning C# and such. If I need to post a sample web page or any other part of the code I will do so. Thank you.

Edit:

foreach (var careerStats in findCareerNode)
{
    if (careerStats
        .SelectSingleNode($"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[1]").InnerText.Trim() != seasonId)
    {
        index++;
        continue;
    }
    else if (careerStats
       .SelectSingleNode(
           $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[2]")
       .InnerText.Trim() != "Reg")
    {
        index++;
        continue;
    }
    var type = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[2]")
        .InnerText;
    var record = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[3]")
        .InnerText;
    var amr = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[4]")
        .InnerText ?? "0.0";
    var goals = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[5]")
        .InnerText;
    var assists = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[6]")
        .InnerText;
    var sot = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[7]")
        .InnerText;
    var shots = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[8]")
        .InnerText;
    var passC = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[9]")
        .InnerText;
    var passA = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[10]")
        .InnerText;
    var keypass = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[11]")
        .InnerText;
    var interceptions = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[12]")
        .InnerText;
    var tac = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[13]")
        .InnerText;
    var tacA = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[14]")
        .InnerText;
    var blk = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[15]")
        .InnerText;
    var rc = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[16]")
        .InnerText;
    var yc = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[17]")
        .InnerText;
    ...
}

Solution

  • To filter data of the career stats table you can use LINQ method Where. And then filtered data can be used to create list of CareerProperties objects using LINQ method Select.

    Here is how we can get career stats for selected seasonId and Reg:

    // Now the return type is a List of CareerProperties.
    List<CareerProperties> table = playerDoc.DocumentNode
        .SelectSingleNode($"//*[@id='lg_team_user_leagues-{leagueId}']/div[4]/table/tbody")
        .Descendants("tr")
        .Skip(1)
        // Up to here is your code. Here you select all rows from the table.
        // Each row is presented as List<string>.
        .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
        // Here we filter table rows by "seasonId" and "Reg".
        .Where(tr => tr[0] == seasonId && tr[1] == "Reg")
        // Here we create objects CareerProperties from filtered rows.
        .Select(tr => new CareerProperties
            {
                Type = tr[2],
                Record = tr[3],
                Amr = tr[4],
                Goals = tr[5]
                Assists = tr[6],
                // Fill other properties.
                ...
            })
        .ToList();