Search code examples
import.io

Does Import.io api support status of the extractor?


I've just created an extractor with import.io. This extractor uses chaining. Firstly I'm extracting some urls from one page and with these extracted urls, I'm extracting detail pages. When detail pages' extraction finish, I want to get the results. But how can I be sure that extraction is completed. Is there any api endpoint for checking the status of extraction?

I found "GET /store/connector/{id}" endpoint from legacy. But when I try this, I got 404. You can take a look at the screenshot.

enter image description here

Another question is, I want to schedule my extractor twice a day. Is this possible?

Thanks


Solution

  • Associated with each Extractor are Crawl Runs. A crawl run represents the running of an extractor with a specific configuration (training, list of URLs, etc). The state of each of a crawl run can have one of the following values:

    • STARTED => Currently running
    • CANCELLED => Started but cancelled by the user
    • FINISHED => Run was complete

    Additional metadata that is included is as follows:

    • Started At - When the run started
    • Stopped At - When the run finished
    • Total URL Count - Total number of URLs in the run
    • Success URL Count - # of successful URLs queried
    • Failed URL Count - # of failed URLs queried
    • Row Count - Total number of rows returned in the run

    The REST API to get the list of craw runs associated with an extractor is as follows:

    curl -s X GET "https://store.import.io/store/crawlrun/_search?_sort=_meta.creationTimestamp&_page=1&_perPage=30&extractorId=$EXTRACTOR_ID&_apikey=$IMPORT_IO_API_KEY"

    where

    • $EXTRACTOR_ID - Extractor to list crawl runs
    • $IMPORT_IO_API_KEY - Import.io API from your account