Problem
I am using Algolia Crawler to fetch content from a webapp made with Codelabs. I do not understand why it is ignoring my request, and would like to know how to fetch element.
What I have done:
Crawler cannot identify the element, which I need to get hierarchy lvl1.
What I need to fetch is one of either elements:
google-codelab-step label=""
h2 is-upgraded="" class="step-title"
But I have not being able to do so. This is the html:
<google-codelab-step label="" duration="" step="">
<div class="">
<div class="inner">
<h2 is-upgraded="" class="step-title"> </h2>
</google-codelab-step>
This is how my crawler:
{
indexName: "crawler_name",
pathsToMatch: ["https://website.com/**"],
recordExtractor: ({ helpers, $ }) => {
const trainingLabel = $("#step-title").attr("h2");
let returnLabel = trainingLabel ? trainingLabel : "";
return helpers.docsearch({
recordProps: {
lvl1: returnLabel,
content: "p, td, li",
lvl0: {
selectors: ["#codelab-title > h1", "h1.title", "title"],
defaultValue: "default",
},
lvl2: "google-codelab-step h2",
lvl3: "google-codelab-step h3",
lvl4: "google-codelab-step h4",
lvl5: "google-codelab-step h5, google-codelab-step td:first-child",
lvl6: "google-codelab-step h6",
},
});
},
},
I am currently able to fetch: content, lvl0, lvl2 and lvl3. I tried to get lvl1 using the following possibilities, but non worked:
const trainingLabel = $("#step-title").attr("h2");
const trainingLabel = $("google-codelab-step").attr("label");
const trainingLabel = $("google-codelab-step > #step-title").attr("h2");
const trainingLabel = $("#step-title").attr("h2");
const trainingLabel = $("step-title").attr("h2");
const trainingLabel = $("step-title, h2").text();
lvl1: "google-codelab-step > label",
lvl1: "google-codelab-step > #step-title h2";
I highly appreciate it any help about how to fetch lvl1 item.
I solved my issue by using the basic template instead of using Docusaurus template, which comes with several information I did not understand.
This is how my action look like now:
{
indexName: "crawler_name",
pathsToMatch: ["https://url.com/**"],
recordExtractor: ({ $, url }) => {
const trainingLabel = $("google-codelab").attr("id");
const trainingSubModules = $("google-codelab-step").attr("label");
const titleHome = $("h1")
.map((i, e) => $(e).text())
.get();
const moduleDescription = $("div.description")
.map((i, e) => $(e).text())
.get();
const subtitle = $("h2")
.map((i, e) => $(e).text())
.get();
const items = $("h3")
.map((i, e) => $(e).text())
.get();
return [
{
titleHome: titleHome,
trainingLabel: trainingLabel,
trainingSubModules: trainingSubModules,
moduleDescription: moduleDescription,
subtitle: subtitle,
items: items,
objectID: url.href,
},
];
},
},
I had much more control by creating my own labels, instead of using the pre set hierarchy and helper.