I am trying to scrape some data off this webpage and having some trouble doing so. I would like to only obtain 3 node data, 1 for Team Name, 1 for points, and 1 for position. So an example of the console output would like similar to this:
Uta 23.52 Centers
Uta 29.22 Power Forwards
Uta 29.86 Point Guards
Uta 26.22 Small Forward
Uta 26.61 Shooting Guard
I have devised the code below but the foreach loops are duplicating the data, seems to be assigning each value to each position,to each point etc. Any help would be greatly appreciated!
private void button1_Click(object sender, EventArgs e)
{
try
{
var doc = new HtmlWeb().Load("https://www.sportingcharts.com/nba/defense-vs-position/");
HtmlAgilityPack.HtmlNodeCollection teams = doc.DocumentNode.SelectNodes("//div[@class='col col-md-3']//tr/td[2]");
HtmlAgilityPack.HtmlNodeCollection points = doc.DocumentNode.SelectNodes(".//div[@class='col col-md-3']//tr/td[3]");
HtmlAgilityPack.HtmlNodeCollection positions = doc.DocumentNode.SelectNodes(".//div[@class='col col-md-3']//span[1]");
List<Record> lstRecords = new List<Record>();
foreach (HtmlAgilityPack.HtmlNode teamnode in teams)
{
foreach (HtmlAgilityPack.HtmlNode pointsnode in points)
{
foreach (HtmlAgilityPack.HtmlNode positionnode in positions)
Console.WriteLine(teamnode.InnerText + ' ' + pointsnode.InnerText + ' ' + positionnode.InnerText);
}
}
}
catch { }
}
Your main problem is the approach with the foreach, what you are telling your code is for each team, give me all the points, and for each point give me all the positions. Since the team points and the points are the same my approach will be done with for, where it gets tricky is with the positions, but again, you know that every position only has 30 rows.
var doc = new HtmlWeb().Load("https://www.sportingcharts.com/nba/defense-vs-position/");
HtmlAgilityPack.HtmlNodeCollection teams = doc.DocumentNode.SelectNodes("//div[@class='col col-md-3']//tr/td[2]");
HtmlAgilityPack.HtmlNodeCollection points = doc.DocumentNode.SelectNodes(".//div[@class='col col-md-3']//tr/td[3]");
HtmlAgilityPack.HtmlNodeCollection positions = doc.DocumentNode.SelectNodes(".//div[@class='col col-md-3']//span[1]");
string[] positions_aux = positions.Where(x => x.InnerText.Length >= 6).Select(y => y.InnerText).ToArray();
for (int i = 0; i < teams.Count - 1; i++)
{
var aux = i / 30;
Console.WriteLine(teams[i].InnerText + ' ' + points[i].InnerText + ' ' + positions_aux[aux]);
}