runBlocking {
bookLinks.mapIndexed { ranking, bookLink ->
val job = async { scrapeBookData(browser, bookLink, ranking) }
val result = job.await()
if (result != null) {
bestsellers.add(result)
}
}
}
private suspend fun scrapeBookData(browser: Browser, bookUrl: String, ranking: Int): BookDTO? {
val page = browser.newPage()
page.navigate(bookUrl, Page.NavigateOptions().setWaitUntil(WaitUntilState.DOMCONTENTLOADED))
printWithThread("${bookUrl}에 접근 완료")
delay(3000)
val data = page.evaluate(
""" () => JSON.stringify({
title: document.querySelector('.prod_title')?.innerText?.trim() || '',
author: document.querySelector('.author')?.innerText?.trim() || '',
isbn: document.querySelector('#scrollSpyProdInfo .product_detail_area.basic_info table tbody tr:nth-child(1) td')?.innerText?.trim() || '',
description: document.querySelector('.intro_bottom')?.innerText?.trim() || '',
image: document.querySelector('.portrait_img_box img')?.getAttribute('src') || ''
}) """
).toString()
val type = object : TypeToken<Map<String, String>>() {}.type
val json: Map<String, String> = Gson().fromJson(data, type)
page.close()
printWithThread("${bookUrl}의 데이터 파싱 완료")
if (json.values.all { it.isBlank() }) {
return null
}
return BookDTO(
id = 0L,
title = json["title"] ?: "",
author = json["author"] ?: "",
description = json["description"] ?: "",
image = json["image"] ?: "",
isbn = json["isbn"] ?: "",
ranking = ranking + 1,
favoriteCount = 0
)
}
I expected that if I delay scrapeBookData, which is a suspend function, for 3 seconds, the coroutine would switch during the delay and execute scrapeBookData again. I expected that after repeatedly executing scrapeBookData for 3 seconds, the first coroutine would parse the page for which the network response was completed. However, the coroutine is operating synchronously.
[http-nio-8080-exec-2 @coroutine#2] https:S000215819502에 접근 완료
[http-nio-8080-exec-2 @coroutine#2] https:S000215819502의 데이터 파싱 완료
[http-nio-8080-exec-2 @coroutine#3] https:S000215150862에 접근 완료
[http-nio-8080-exec-2 @coroutine#3] https:S000215150862의 데이터 파싱 완료
[http-nio-8080-exec-2 @coroutine#4] https:S000215787651에 접근 완료
The question is not quite clear, but I guess you expected scrapeBookData
to be executed in parallel for each entry in bookLinks
.
That's not what your code does, though, because after you launch a new coroutine with async
, you immediately suspend your code by calling await
, waiting for that coroutine to finish - regardless of how long you delay in that coroutine. Calling async
immediately followed by await
is almost always a bug because it makes the coroutine superfluous, it is basically the same as just calling
val result = scrapeBookData(browser, bookLink, ranking)
What you want instead is to await the launched coroutines after the loop, after all coroutines were launched - not after each single one:
bookLinks
.mapIndexed { ranking, bookLink ->
async { scrapeBookData(browser, bookLink, ranking) }
}
.awaitAll()
The loop now launches a coroutine for each bookLink and immediately continues with the next link, without waiting for the coroutine to finish. Since async returns a Deferred
(and not a Job
, as the variable name of your original code suggested), after all coroutines were launched the mapIndexed returns a list of Deferred. And now you want to await all coroutines until they are finished. Luckily Kotlin provides a handy function for that, awaitAll
.
awaitAll
now returns a simple List<BookDTO?>
which you can process further. From looking at your code you want to filter out all null values, so you should apply .filterNotNull()
next. You can now do whatever you want with the resulting List<BookDTO>
. If you want to add the entire list to another list bestsellers
, you could append .also { bestsellers.addAll(it) }
. But it probably suffices to simply do this:
val bestsellers = bookLinks
.mapIndexed { ranking, bookLink ->
async { scrapeBookData(browser, bookLink, ranking) }
}
.awaitAll()
.filterNotNull()
You should remove the delay
in scrapeBookData, you want your coroutines to finish as fast as possible. If you just want to add a suspension point at that part of your code you can call yield
instead. But I don't see why that would be necessary here, so you should remove it completely.