On premise S3 with Cloudera v7
Part 1 of the series on Cloudera S3 access to a Pure Storage FlashBlade covering the GUI configuration of Cloudera v7 to use on premise S3 storage.
External S3 user
From within our Cloudera interface we create an external user provide name, API access key and API secret key:
S3 connector service
We configure the S3 connector service to use our external account and add as Default S3 Endpoint, fs.s3a.endpoint our FlashBlade S3 data ip.
HDFS configuration settings
To link HDFS to S3 we need to provide configuration parameters. Go to the HDFS configuration page and add the following variables.
Disable s3a ssl :
fs.s3a.connection.ssl.enabled false
Set path style access:
fs.s3a.path.style.access true
Provide our FlashBlade data ip for S3 access (endpoint):
fs.s3a.endpoint x.y.z.w
Apply and deploy the configuration across the HDFS cluster.
Documentation for these and some additional parameters.
Replication job
We can now replicate from HDFS to S3 or the reverse. Make sure to provide a full path as s3a location ex: s3a://my-bucket/