Migrate ElasticSearch data

jboothomas
4 min readAug 28, 2020

Having been asked this by clients several times, here is a quick write up on how to migrate ElasticSearch data from a legacy storage system to a Pure Storage FlashBlade.

Environment setup

I will be using an ElasticSearch deployed with the ECK operator on Kubernetes. My initial operator definition yaml file contains two nodesets, one for nodes with the master role and the other for nodes with the ingest and data roles :

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 7.6.2
nodeSets:
- name: master-legacy
count: 1
config:
node.master: true
node.data: false
node.ingest: false
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: legacy-block
- name: data-legacy
count: 2
config:
node.master: false
node.data: true
node.ingest: true
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: legacy-block

The deployment creates three persistent volume claims, 2 for the data nodes and 1 for the master node:

NAME                                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
elasticsearch-data-elasticsearch-es-data-legacy-0 Bound pvc-86d4a99d-0f0d-4355-9e68-1b3dd0b6eaee 10Gi RWO legacy-block 9s
elasticsearch-data-elasticsearch-es-data-legacy-1 Bound pvc-a8f6aadc-b84c-488b-905e-3db608f6a45d 10Gi RWO legacy-block 9s
elasticsearch-data-elasticsearch-es-master-legacy-0 Bound pvc-b0dadd8a-ed5c-424d-b0fb-d97731892b4b 5Gi RWO legacy-block 9s

It creates three pods :

NAME                                READY   STATUS    RESTARTS   AGE
elasticsearch-es-data-legacy-0 1/1 Running 0 65s
elasticsearch-es-data-legacy-1 1/1 Running 0 65s
elasticsearch-es-master-legacy-0 1/1 Running 0 65s

Using filebeat I push a few 100MB of data into an index, and check the status of my ES cluster nodes and shards:

In order to migrate my index shards from legacy storage to the Pure Storage FlashBlade, I first need to add some data nodes configured to use persistent volume claims on the FlashBlade. Note: these persistent volume claims are provisioned using Pure Service Orchestrator (check out this post for more details on PSO).

To do so I add a new nodeset statement to my elasticsearch definition and apply from kubernetes, here is the added section:

- name: data-pure
count: 2
config:
node.master: false
node.data: true
node.ingest: true
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: pure-file

Within Elasticsearch I can now see my newly added data nodes:

Data migration

In order to migrate my index shards I will mark the legacy-data nodes as transient, this will drain their shards onto the remaining Pure Storage backed data nodes.

The operation will take a certain amount of time dependant on the amount of data to migrate. Once complete we can see that all shards are now owned by our two Pure Storage backed data nodes:

I can now edit my elastic definition and remove the legacy-data nodeset section. I then reapply this definition and let the Elasticsearch operator scale down these data nodes.

NAME                                READY   STATUS        RESTARTS   AGE
elasticsearch-es-data-legacy-0 1/1 Terminating 0 58m
elasticsearch-es-data-legacy-1 1/1 Terminating 0 58m
elasticsearch-es-data-pure-0 1/1 Running 0 15m
elasticsearch-es-data-pure-1 1/1 Running 0 15m
elasticsearch-es-master-legacy-0 1/1 Running 0 58m

Master node migration

I can proceed in a similar fashion for the master node(s), by adding a new section to my Elasticsearch operator definition for a Pure Storage backed master nodeset. Here is the nodeset section added:

- name: master-pure
count: 1
config:
node.master: true
node.data: false
node.ingest: false
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: pure-file

I now have the following pods running:

NAME                                READY   STATUS    RESTARTS   AGE
elasticsearch-es-data-pure-0 1/1 Running 0 20m
elasticsearch-es-data-pure-1 1/1 Running 0 20m
elasticsearch-es-master-legacy-0 1/1 Running 0 63m
elasticsearch-es-master-pure-0 1/1 Running 0 41s

and persistent volume claims:

elasticsearch-data-elasticsearch-es-data-pure-0       Bound    pvc-4ade5d4b-c4cd-4544-aa81-1439c9e47f67   10Gi       RWO            pure-file      19m
elasticsearch-data-elasticsearch-es-data-pure-1 Bound pvc-2c601757-f7a8-42c6-918d-1d525fe855c1 10Gi RWO pure-file 19m
elasticsearch-data-elasticsearch-es-master-legacy-0 Bound pvc-b0dadd8a-ed5c-424d-b0fb-d97731892b4b 5Gi RWO legacy-block 63m
elasticsearch-data-elasticsearch-es-master-pure-0 Bound pvc-82bc87c8-9e97-4a8c-88f2-0194d82476cc 5Gi RWO pure-file 10s

I can now remove the master-legacy nodeset definition and reapply for the Elasticsearch operator to remove my previous master. The pod will be terminated it’s storage cleaned up, and in Kibana we will see it pass offline:

In a few simple steps I have migrated an Elasticsearch’ master and data nodes from a legacy storage backend to a full flash scalable file and object platform.

--

--

jboothomas

Infrastructure engineering for modern data applications