I have an async method fillind pojo class fields with parsed data via jsoup. I'm trying to parse urls of mp3 files for single chapters of the book from this page via foreach, but all the queries I've tryied failed.
http://www.loyalbooks.com/book/adventures-of-huckleberry-finn-by-mark-twain
A single element looks like this in the page code and id number is changing from chapter to chapter
<div class="jp-free-media" style="font-size:xx-small;">(<a id="jp_playlist_1_item_0_mp3" href="http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3" tabindex="1">download</a>)</div>
My AsyncTask, mp3 URLs are searched for in mLines2:
public class FillBook extends AsyncTask<Void, Void, SingleBook> {
private String link;
private String imgLink;
private String title;
ArrayList<String> tmpChapters = new ArrayList<>();
private SingleBook book;
public FillBook(String link, String imgLink, String title) {
this.link = link;
this.imgLink = imgLink;
this.title = title;
}
@Override
protected SingleBook doInBackground(Void... params) {
Document doc = null;
book = new SingleBook(imgLink, title, false, false, null, new ArrayList<String>());
Elements mLines;
Elements mLines2;
try {
doc = Jsoup.connect(link).get();
} catch (IOException | RuntimeException e) {
e.printStackTrace();
}
if (doc != null) {
mLines = doc.getElementsByClass("book-description");
for (Element mLine : mLines) {
String description= mLine.text();
book.setDescription(description);
}
mLines2 = doc.select(".jp-free-media");
for (Element mLine2 : mLines2) {
tmpChapters.add(mLine2.attr("href"));
}
}else
System.out.println("ERROR");
book.setChapters(tmpChapters);
return book;
}
protected void onPostExecute(SingleBook book) {
super.onPostExecute(book);
Toast.makeText(BookActivity.this, book.getChapters().get(0), Toast.LENGTH_LONG).show();
Picasso.get().load(book.getImgUrl()).into(bookCover);
nameAndAuthor.setText(book.getTitleAndAuthor());
bookDescription.setText(book.getDescription());
And I end up with empty ArrayList. How to get http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3 String taking into account that the next chapter wil be id="jp_playlist_1_item_1_mp3"?
Tiarait from Russian Stackoverflow helped to find the solution. The point is that the above mentioned element is created by js. I needed to get the document body and then get the following array via splits.
var audioPlaylist = new Playlist("1", [ {name:"Chapter 01", free:true, mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_01_twain_64kb.mp3"}, {name:"Chapter 02", free:true, mp3:"http://www.archive.org/download/huckleberry_mfs_librivox/huckleberry_finn_02_twain_64kb.mp3"}, ...
doInBackground method should change into this:
@Override
protected SingleBook doInBackground(Void... params) {
Document doc = null;
book = new SingleBook(imgLink, title, false, false, null, new ArrayList<String>());
Elements mLines;
try {
doc = Jsoup.connect(link).get();
} catch (IOException | RuntimeException e) {
e.printStackTrace();
}
if (doc != null) {
mLines = doc.getElementsByClass("book-description");
for (Element mLine : mLines) {
String description= mLine.text();
book.setDescription(description);
}
String arr = "";
String html = doc.body().html();
if (html.contains("var audioPlaylist = new Playlist(\"1\", ["))
arr = html.split("var audioPlaylist = new Playlist\\(\"1\", \\[")[1];
if (arr.contains("]"))
arr = arr.split("\\]")[0];
//-----------------------------------------
if (arr.contains("},{")) {
for (String mLine2 : arr.split("\\},\\{")) {
if (mLine2.contains("mp3:\""))
tmpChapters.add(mLine2.split("mp3:\"")[1].split("\"")[0]);
}
} else if (arr.contains("mp3:\""))
tmpChapters.add(arr.split("mp3:\"")[1].split("\"")[0]);
}else
System.out.println("ERROR");
book.setChapters(tmpChapters);
return book;
}