I implemented fscrawler with elasticsearch. Rest is enabled. I can post a file to fscrawler and the text is correctly extracted and put in the elasticsearch index. I can verify that with Kibana.
I m not able to get the extracted text in the response.
I tried several setups in the _settings.yaml. But i don't get the text back in the reponse, unless i add debug=true as queryParam calling fscrawler endpoint.
http://localhost:8080/_document?debug=true
The endpoint is called directly with postman.
Here is my _settings.yaml
---
name: "idx"
fs:
indexed_chars: 100%
lang_detect: true
continue_on_error: true
logging: ERROR
ocr:
language: "eng"
enabled: true
pdf_strategy: "auto"
elasticsearch:
nodes:
- url: "https://elasticsearch:9200"
username: "elastic"
password: "Test123"
ssl_verification: false
store_source: true
index_content: true
rest :
url: "http://fscrawler:8080"
my fscrawler image:
dadoonet/fscrawler:2.10-SNAPSHOT
Elasticstackversion: 8.6.2
response:
{
"ok": true,
"filename": "JAVASCRIPT.pdf",
"url": "https://elasticsearch:9200/idx/_doc/337d3e366ce4b765f650c5a87011e117"
}
I found no way to get the extracted text in the response, unless as i mentioned setting ?debug=true.
You can either call Elasticsearch to get the indexed document:
curl https://localhost:9200/idx/_doc/337d3e366ce4b765f650c5a87011e117
Or call the simulate API of fscrawler.