Search code examples
elasticsearchreindexelasticsearch-mapping

Reindex multiple types from one index to single type in another index


I have two indexes: twitter and reitwitter

twitter has multiple documents across different types like:

"hits": [
{
"_index": "twitter",
"_type": "tweet",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch"
}
},
{
"_index": "twitter",
"_type": "tweet2",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch2"
}
},
{
"_index": "twitter",
"_type": "tweet1",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch1"
}
}
]

Now, when I reindex, I wanted to get rid of all the different types and just use one because essentially they have the same field mappings.

I tried several different combinations but I always only get one document instead of those three: Approach 1:

POST _reindex/
{
"source": {
"index": "twitter"
}
,
"dest": {
"index": "reitwitter",
"type": "reitweet"
}
}

Response:

{
"took": 12,
"timed_out": false,
"total": 3,
"updated": 3,
"created": 0,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}

Note : It says updated 3 because this was the second time I made the same call I guess?

Second approach:

POST _reindex/
{
"source": {
"index": "twitter",
"query": {
"match_all": {
}
}
}
,
"dest": {
"index": "reitwitter",
"type": "reitweet"
}
}

Same response as first one.

In both cases when I make this GET call:

GET reitwitter/_search
{
"query": {
"match_all": {
}
}
}

I only get one document:

{
"_index": "reitwitter",
"_type": "reitweet",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch1"
}

Is this use case even supported by reindex ? If not, do I have to write a script using scan and scroll to get all the documents from source index and reindex them with same doc type in destination?

PS: I don't want to use "_source": ["tweet1", "tweet"] because I have around million doc type which have one document each that I want to map to the same doc type in the destination.


Solution

  • The problem is that all the documents has the same id(1), and then they are overriding themselves during the re-index process.

    Try to index your documents with different ids and you will see it works.