Hi I need to get all the data page. In case the photo and the name of each topic. The page is here.
I know I have two alternatives. With this I can only get an image of the entire page. But if anyone knows complementary to catch everything would be the best way:
int startIndex = e.Result.IndexOf(@"><img");
string result = e.Result;
result = e.Result.Substring(startIndex, e.Result.Length - startIndex);
startIndex = result.IndexOf(".php?src=") + 9;
int endIndex = result.IndexOf(".jpg", startIndex);
string link = result.Substring(startIndex, endIndex - startIndex) + ".jpg";
MessageBox.Show(link);
imagem.Source = new BitmapImage(new Uri(link));
another way is this. I created a class to hold the data and creating a list, but the string "pattern" must be totally wrong. Because i did not like riding a string of this type. Just copied from another topic and tried to create my own based on this:
private void ConsultaPopularVideos(string uri)
{
WebClient web2 = new WebClient();
web2.DownloadStringAsync(new Uri(uri));
web2.DownloadStringCompleted += web2_DownloadStringCompleted;
}
void web2_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
if (!e.Cancelled && e.Error == null && !String.IsNullOrEmpty(e.Result))
{
_popVideos = new List<PopularVideos>();
// Aqui você pega todos os links da página
// P.S.: Se a página mudar, você tem que alterar o pattern aqui.
string pattern = @"\<a\shref\=[\""|\'](?<url>[^\""|\']+)[\""|\']\stitle\=[\""|\'](?<title>[^\""|\']+).php?src=[\""|\'](?<img>[^\""|\']+)[\""|\']\s\width='275'";
// Busca no HTML todos os links
MatchCollection ms = Regex.Matches(e.Result, pattern, RegexOptions.Multiline);
Debug.WriteLine("----- OK {0} links encontrados", ms.Count);
foreach (Match m in ms)
{
// O pattern acima está dizendo onde fica o Url e onde fica o nome do artista
// e esses são resgatados aqui
Group url = m.Groups["url"];
MessageBox.Show(m.Groups.ToString());
Group title = m.Groups["title"];
Group img = m.Groups["img"];
if (url != null && title != null && img != null)
{
//Debug.WriteLine("author: {0}\nUrl: {1}", author.Value, url.Value);
// Se caso tenha encontrado o link do artista (pois há outros links na página) continua
if (url.Value.ToLower().IndexOf("/") > -1)
{
// Adiciona um objeto Artista à lista
PopularVideos video = new PopularVideos(title.Value, url.Value, img.Value);
_popVideos.Add(video);
}
}
}
listBoxPopular.ItemsSource = _popVideos;
}
}
Class:
class PopularVideos
{
public PopularVideos() { }
public PopularVideos(string nome, string url, string img)
{
Nome = nome;
Url = new Uri(url);
BitmapImage Img = new BitmapImage(new Uri(img));
}
public string Nome { get; set; }
public string Img { get; set; }
public Uri Url { get; set; }
}
Using regex to scrape data from web page is not a good solution, as it will be unreliable, fragile and difficult to implment. I will recommend to use [htmlagilitypack][http://htmlagilitypack.codeplex.com/] to scrape the data, it is a mature library, support windows phone, i used the tool in my windows phone app, and very happy with it.