Migrate ElasticSearch data

4 min readAug 28, 2020

Having been asked this by clients several times, here is a quick write up on how to migrate ElasticSearch data from a legacy storage system to a Pure Storage FlashBlade.

Environment setup

I will be using an ElasticSearch deployed with the ECK operator on Kubernetes. My initial operator definition yaml file contains two nodesets, one for nodes with the master role and the other for nodes with the ingest and data roles :

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
spec:
  version: 7.6.2
  nodeSets:
  - name: master-legacy
    count: 1
    config:
      node.master: true
      node.data: false
      node.ingest: false
      node.store.allow_mmap: false
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: legacy-block
  - name: data-legacy
    count: 2
    config:
      node.master: false
      node.data: true
      node.ingest: true
      node.store.allow_mmap: false
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: legacy-block

The deployment creates three persistent volume claims, 2 for the data nodes and 1 for the master node:

NAME                                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
elasticsearch-data-elasticsearch-es-data-legacy-0     Bound    pvc-86d4a99d-0f0d-4355-9e68-1b3dd0b6eaee   10Gi       RWO            legacy-block   9s
elasticsearch-data-elasticsearch-es-data-legacy-1     Bound    pvc-a8f6aadc-b84c-488b-905e-3db608f6a45d   10Gi       RWO            legacy-block   9s
elasticsearch-data-elasticsearch-es-master-legacy-0   Bound    pvc-b0dadd8a-ed5c-424d-b0fb-d97731892b4b   5Gi        RWO            legacy-block   9s

It creates three pods :

NAME                                READY   STATUS    RESTARTS   AGE
elasticsearch-es-data-legacy-0      1/1     Running   0          65s
elasticsearch-es-data-legacy-1      1/1     Running   0          65s
elasticsearch-es-master-legacy-0    1/1     Running   0          65s

Using filebeat I push a few 100MB of data into an index, and check the status of my ES cluster nodes and shards:

In order to migrate my index shards from legacy storage to the Pure Storage FlashBlade, I first need to add some data nodes configured to use persistent volume claims on the FlashBlade. Note: these persistent volume claims are provisioned using Pure Service Orchestrator (check out this post for more details on PSO).

To do so I add a new nodeset statement to my elasticsearch definition and apply from kubernetes, here is the added section:

- name: data-pure
    count: 2
    config:
      node.master: false
      node.data: true
      node.ingest: true
      node.store.allow_mmap: false
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: pure-file

Within Elasticsearch I can now see my newly added data nodes:

Data migration

In order to migrate my index shards I will mark the legacy-data nodes as transient, this will drain their shards onto the remaining Pure Storage backed data nodes.

The operation will take a certain amount of time dependant on the amount of data to migrate. Once complete we can see that all shards are now owned by our two Pure Storage backed data nodes:

I can now edit my elastic definition and remove the legacy-data nodeset section. I then reapply this definition and let the Elasticsearch operator scale down these data nodes.

NAME                                READY   STATUS        RESTARTS   AGE
elasticsearch-es-data-legacy-0      1/1     Terminating   0          58m
elasticsearch-es-data-legacy-1      1/1     Terminating   0          58m
elasticsearch-es-data-pure-0        1/1     Running       0          15m
elasticsearch-es-data-pure-1        1/1     Running       0          15m
elasticsearch-es-master-legacy-0    1/1     Running       0          58m

Master node migration

I can proceed in a similar fashion for the master node(s), by adding a new section to my Elasticsearch operator definition for a Pure Storage backed master nodeset. Here is the nodeset section added:

- name: master-pure
    count: 1
    config:
      node.master: true
      node.data: false
      node.ingest: false
      node.store.allow_mmap: false
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
        storageClassName: pure-file

I now have the following pods running:

NAME                                READY   STATUS    RESTARTS   AGE
elasticsearch-es-data-pure-0        1/1     Running   0          20m
elasticsearch-es-data-pure-1        1/1     Running   0          20m
elasticsearch-es-master-legacy-0    1/1     Running   0          63m
elasticsearch-es-master-pure-0      1/1     Running   0          41s

and persistent volume claims:

elasticsearch-data-elasticsearch-es-data-pure-0       Bound    pvc-4ade5d4b-c4cd-4544-aa81-1439c9e47f67   10Gi       RWO            pure-file      19m
elasticsearch-data-elasticsearch-es-data-pure-1       Bound    pvc-2c601757-f7a8-42c6-918d-1d525fe855c1   10Gi       RWO            pure-file      19m
elasticsearch-data-elasticsearch-es-master-legacy-0   Bound    pvc-b0dadd8a-ed5c-424d-b0fb-d97731892b4b   5Gi        RWO            legacy-block   63m
elasticsearch-data-elasticsearch-es-master-pure-0     Bound    pvc-82bc87c8-9e97-4a8c-88f2-0194d82476cc   5Gi        RWO            pure-file      10s

I can now remove the master-legacy nodeset definition and reapply for the Elasticsearch operator to remove my previous master. The pod will be terminated it’s storage cleaned up, and in Kibana we will see it pass offline:

In a few simple steps I have migrated an Elasticsearch’ master and data nodes from a legacy storage backend to a full flash scalable file and object platform.

Migrate ElasticSearch data

Environment setup

Data migration

Master node migration

Written by jboothomas