I'm developing an app that takes data from a website with JSoup. I was able to get the normal data.
But now I need to implement a pagination on it. I was told it would have to be with Web Driver, Selenium. But I do not know how to work with him, could someone tell me how I can do it?
public class MainActivity extends AppCompatActivity {
private String url = "http://www.yudiz.com/blog/";
private ArrayList<String> mAuthorNameList = new ArrayList<>();
private ArrayList<String> mBlogUploadDateList = new ArrayList<>();
private ArrayList<String> mBlogTitleList = new ArrayList<>();
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
new Description().execute();
}
private class Description extends AsyncTask<Void, Void, Void> {
@Override
protected Void doInBackground(Void... params) {
try {
// Connect to the web site
Document mBlogDocument = Jsoup.connect(url).get();
// Using Elements to get the Meta data
Elements mElementDataSize = mBlogDocument.select("div[class=author-date]");
// Locate the content attribute
int mElementSize = mElementDataSize.size();
for (int i = 0; i < mElementSize; i++) {
Elements mElementAuthorName = mBlogDocument.select("span[class=vcard author post-author test]").select("a").eq(i);
String mAuthorName = mElementAuthorName.text();
Elements mElementBlogUploadDate = mBlogDocument.select("span[class=post-date updated]").eq(i);
String mBlogUploadDate = mElementBlogUploadDate.text();
Elements mElementBlogTitle = mBlogDocument.select("h2[class=entry-title]").select("a").eq(i);
String mBlogTitle = mElementBlogTitle.text();
mAuthorNameList.add(mAuthorName);
mBlogUploadDateList.add(mBlogUploadDate);
mBlogTitleList.add(mBlogTitle);
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
@Override
protected void onPostExecute(Void result) {
// Set description into TextView
RecyclerView mRecyclerView = (RecyclerView)findViewById(R.id.act_recyclerview);
DataAdapter mDataAdapter = new DataAdapter(MainActivity.this, mBlogTitleList, mAuthorNameList, mBlogUploadDateList);
RecyclerView.LayoutManager mLayoutManager = new LinearLayoutManager(getApplicationContext());
mRecyclerView.setLayoutManager(mLayoutManager);
mRecyclerView.setAdapter(mDataAdapter);
}
}
}
Problem statement (as per my understanding): Scraper should be able to go to the next page until all pages are done using the pagination options available at the end of the blog page.
Now if we inspect the next button in the pagination, we can see the following html. a class="next_page" href="http://www.yudiz.com/blog/page/2/"
Now we need to instruct Jsoup to pick up this dynamic url in the next iteration of the loop to scrap data. This can be done using the following approach:
String url = "http://www.yudiz.com/blog/";
while (url!=null){
try {
Document doc = Jsoup.connect(url).get();
url = null;
System.out.println(doc.getElementsByTag("title").text());
for (Element urls : doc.getElementsByClass("next_page")){
//perform your data extractions here.
url = urls != null ? urls.absUrl("href") : null;
}
} catch (IOException e) {
e.printStackTrace();
}
}