Search code examples
elasticsearchfscrawler

fscrawler get extracted text in restapi response


I implemented fscrawler with elasticsearch. Rest is enabled. I can post a file to fscrawler and the text is correctly extracted and put in the elasticsearch index. I can verify that with Kibana.

I m not able to get the extracted text in the response.

I tried several setups in the _settings.yaml. But i don't get the text back in the reponse, unless i add debug=true as queryParam calling fscrawler endpoint.

http://localhost:8080/_document?debug=true

The endpoint is called directly with postman.

Here is my _settings.yaml

---
name: "idx"
fs:
  indexed_chars: 100%
  lang_detect: true
  continue_on_error: true
  logging: ERROR

  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "auto"
elasticsearch:
  nodes:
    - url: "https://elasticsearch:9200"
  username: "elastic"
  password: "Test123"
  ssl_verification: false
  store_source: true
  index_content: true
rest :
  url: "http://fscrawler:8080"

my fscrawler image:

dadoonet/fscrawler:2.10-SNAPSHOT

Elasticstackversion: 8.6.2

response:

{
    "ok": true,
    "filename": "JAVASCRIPT.pdf",
    "url": "https://elasticsearch:9200/idx/_doc/337d3e366ce4b765f650c5a87011e117"
}

I found no way to get the extracted text in the response, unless as i mentioned setting ?debug=true.


Solution

  • You can either call Elasticsearch to get the indexed document:

    curl https://localhost:9200/idx/_doc/337d3e366ce4b765f650c5a87011e117
    

    Or call the simulate API of fscrawler.