Applying a Regex Filter to Crawler to crawl specific pages

I am using storm crawler 1.10 and Elastic Search 6.3.x. For Example I have a main website https://www.abce.org and it has subpages like https://abce.org/def and https://abce.org/ghi. I want to crawl specifically the pages under https://www.abce.org/ghi.

My seed Url is https://www.abce.org/ghi/.

Currently I applied below different regex filters at each time.

+^https:\/\/www.abce.org\/ghi*
+^(?:https?:\/\/)www.abce.org\/ghi(.+)*$
+^(?:https?:\/\/)?(?:www\.)?abce\.[a-zA-Z0-9.\S]+$

I tested my regex expressions regexr its shows valid. But when I check on statusindex its displaying only discovered seed url and nothing else.

Solution

Try the FastURLFilter which you might find more intuitive to use. Run the topology in debug mode to check that you do have URLs submitted to the URLFilters and that they behave as you expect.

Before you ask, here's a tip on debugging Storm

Start a PowerShell script in R via system2()
plot from sankeyNetwork in networkD3 does not show output (issue is not number of unique nodes)
"Target position can only be set for new windows" in chromote in R
Extract the correct data type in a PDF table
Time conversion in R
Comparing the values of a certain number previous rows with the current row
Run a single test function in R's testthat
rpart package installation in R
An efficient way to assign value based on a min-max range and category
Change output of the `purrr::map` function
osmdata_sf returns failed to perform HTTP request curl::curl_fetch_memory() error in R?
Comparing nls() to nls2() - what am I doing wrong
How to add "variables grid" below ggplot
How can I use predefined code snippets outside of code chunks in Quarto within RStudio/Posit?
Wrap text for collapse rows in KableExtra for a long table in R
Implementation of Breusch-Pagan test for random effects in plm with unbalanced panels
Finding a value of a dataset in different ones
Replicate matrix
Unexpected results after converting raster data from geographic to projected coordinate system using the terra package
How to remove rows by condition in R?
How do I add an alias for magrittr pipe from R in vscode
Package ‘neuralnet’ in R, rectified linear unit (ReLU) activation function?
Sub-subtitle in a graph made with `ggplot2`
How can I execute a statement and ignore warnings with tryCatch?
Enumerate events where n consecutive values are not NA
Serialize/deserialize a column with R and DuckDB
How can I structure my sapflow data to analyze using "TREX" package in R?
Putting multiple plots on the same page in R?
NA values in a non-editable date column in a datatable in a shiny app change to "Invalid Date" when clicked on
How to enable/disable checkboxInput when certain panel is selected