On premise S3 with Cloudera v7

jboothomas
2 min readAug 27, 2020

--

Part 1 of the series on Cloudera S3 access to a Pure Storage FlashBlade covering the GUI configuration of Cloudera v7 to use on premise S3 storage.

part2 covering credentials

part3 covering workloads

External S3 user

From within our Cloudera interface we create an external user provide name, API access key and API secret key:

S3 connector service

We configure the S3 connector service to use our external account and add as Default S3 Endpoint, fs.s3a.endpoint our FlashBlade S3 data ip.

HDFS configuration settings

To link HDFS to S3 we need to provide configuration parameters. Go to the HDFS configuration page and add the following variables.

Disable s3a ssl :

fs.s3a.connection.ssl.enabled        false

Set path style access:

fs.s3a.path.style.access             true

Provide our FlashBlade data ip for S3 access (endpoint):

fs.s3a.endpoint                     x.y.z.w 

Apply and deploy the configuration across the HDFS cluster.

Documentation for these and some additional parameters.

Replication job

We can now replicate from HDFS to S3 or the reverse. Make sure to provide a full path as s3a location ex: s3a://my-bucket/

--

--

jboothomas
jboothomas

Written by jboothomas

Infrastructure engineering for modern data applications

No responses yet