Search code examples
web-crawleralgoliagoogle-codelab

Algolia crawler not fetching google-codelabs-step label


Problem

I am using Algolia Crawler to fetch content from a webapp made with Codelabs. I do not understand why it is ignoring my request, and would like to know how to fetch element.

What I have done:

Crawler cannot identify the element, which I need to get hierarchy lvl1.

What I need to fetch is one of either elements:

google-codelab-step label="" 
h2 is-upgraded="" class="step-title"

But I have not being able to do so. This is the html:

    <google-codelab-step label="" duration="" step="">
      <div class="">
        <div class="inner">
          <h2 is-upgraded="" class="step-title"> </h2>
    </google-codelab-step>

This is how my crawler:

    {
          indexName: "crawler_name",
          pathsToMatch: ["https://website.com/**"],
          recordExtractor: ({ helpers, $ }) => {
            const trainingLabel = $("#step-title").attr("h2");
            let returnLabel = trainingLabel ? trainingLabel : "";
            return helpers.docsearch({
              recordProps: {
                lvl1: returnLabel,
                content: "p, td, li",
                lvl0: {
                  selectors: ["#codelab-title > h1", "h1.title", "title"],
                  defaultValue: "default",
                },
                lvl2: "google-codelab-step h2",
                lvl3: "google-codelab-step h3",
                lvl4: "google-codelab-step h4",
                lvl5: "google-codelab-step h5, google-codelab-step td:first-child",
                lvl6: "google-codelab-step h6",
              },
            });
          },
        },

I am currently able to fetch: content, lvl0, lvl2 and lvl3. I tried to get lvl1 using the following possibilities, but non worked:

  • const trainingLabel = $("#step-title").attr("h2");

  • const trainingLabel = $("google-codelab-step").attr("label");

  • const trainingLabel = $("google-codelab-step > #step-title").attr("h2");

  • const trainingLabel = $("#step-title").attr("h2");

  • const trainingLabel = $("step-title").attr("h2");

  • const trainingLabel = $("step-title, h2").text();

  • lvl1: "google-codelab-step > label",

  • lvl1: "google-codelab-step > #step-title h2";

I highly appreciate it any help about how to fetch lvl1 item.


Solution

  • I solved my issue by using the basic template instead of using Docusaurus template, which comes with several information I did not understand.

    This is how my action look like now:

    {
      indexName: "crawler_name",
      pathsToMatch: ["https://url.com/**"],
      recordExtractor: ({ $, url }) => {
        const trainingLabel = $("google-codelab").attr("id");
        const trainingSubModules = $("google-codelab-step").attr("label");
        const titleHome = $("h1")
          .map((i, e) => $(e).text())
          .get();
        const moduleDescription = $("div.description")
          .map((i, e) => $(e).text())
          .get();
        const subtitle = $("h2")
          .map((i, e) => $(e).text())
          .get();
        const items = $("h3")
          .map((i, e) => $(e).text())
          .get();
        return [
          {
            titleHome: titleHome,
            trainingLabel: trainingLabel,
            trainingSubModules: trainingSubModules,
            moduleDescription: moduleDescription,
            subtitle: subtitle,
            items: items,
            objectID: url.href,
          },
        ];
      },
    },
    

    I had much more control by creating my own labels, instead of using the pre set hierarchy and helper.