java selenium-webdriver web-scraping web-crawler

How to load and collect all comments with Selenium and Java

I have Java application which is using Selenium Web Driver to crawl/scrape information from Google Play Store applications. I have about 30 links from apps and i have a problem with collecting ALL comments from each application. For example this application needs a lot of scrolling to load all comments, but some other applications need less/more scrolling. How can i dynamically load all comments for each app?

Solution

Since you have not shared sample code i will share javascript snippet and then provide a C# implementation that you can use in your refer for your Java Selenium project.

Sample JavaScript code

let i=0;
var element = document.querySelectorAll("div>span[jsname='bN97Pc']")[i];
var timer = setInterval(function()
{
    console.log(element);
    element.scrollIntoView();

        i++;
        element = document.querySelectorAll("div>span[jsname='bN97Pc']")[i];
        if(element===undefined)
            clearTimeout(timer);

},500);

Running above code in console once you are on the application page with comments that you have shared will scroll until the end of page while printing out each comment on the console.

Sample code with Selenium C# bindings :

static void Main(string[] args)
        {
            ChromeDriver driver = new ChromeDriver();
            driver.Navigate().GoToUrl("https://play.google.com/store/apps/details?id=com.plokia.ClassUp&hl=en&showAllReviews=true");

            ExtractComments(driver);
            driver.Quit();
        }

        private static void ExtractComments(ChromeDriver driver,int startingIndex=0)
        {
            IEnumerable<IWebElement> comments = driver.FindElementsByCssSelector(@"div>span[jsname='bN97Pc']");

            if (comments.Count() <= startingIndex)
                return; //no more new comments hence return.

            if (startingIndex > 0)
                comments = comments.Skip(startingIndex); //skip already processed elements


            //process located comments
            foreach (var comment in comments)
            {
                string commentText = comment.Text;
                Console.WriteLine(commentText);
                (driver as IJavaScriptExecutor).ExecuteScript("arguments[0].scrollIntoView()", comment);
                Thread.Sleep(250);
                startingIndex++;
            }

            Thread.Sleep(2000); // Let more comments load once we have consumed existing
            ExtractComments(driver,startingIndex); //Recursively call self to process any further comments that have been loaded after scrolling
        }

Hope this helps.