Recently my company created an application for managing 3D printing projects, profiles, and slices. Check it out at layerkeep.com.
We wanted users to be able to keep track of all their file revisions and also be able to manage the files without having to go through the browser. To accomplish this, we decided to use Git which meant we needed a scalable filesystem.
The first thing we did is setup a Kubernetes cluster on DigitalOcean.
Currently DigitalOcean only provides Volumes that are ReadWriteOnce. Since we have multiple services that need access to the files (api, nginx, slicers), we needed to be able to mount the same volume with ReadWriteMany.
I decided to try s3fs with DigitalOcean Spaces since they are S3-compatible object stores. I setup the CSI from https://github.com/ctrox/csi-s3. I tried both the s3fs and goofys mounter. Both worked and both were way too slow. Most of our APIs require accessing the filesystem multiple times and each access took between 3-15 seconds so I moved on to Ceph.
Ceph Preparation:
There is a great storage manager called Rook (https://rook.github.io/) that can be used to deploy many different storage providers to Kubernetes.
** Kubernetes on DigitalOcean doesn’t support FlexVolumes so you need to use CSI instead.
Hardware Requirements.
You can check the ceph docs to see what you might need. http://docs.ceph.com/docs/jewel/start/hardware-recommendations/#minimum-hardware-recommendations
Create the Kubernetes Cluster
Follow the directions here to create the cluster: https://www.digitalocean.com/docs/kubernetes/how-to/create-clusters/
** Initially I tried a 3 node pool each with 1 CPU and 2GB of memory but it wasn’t enough. It needed more CPU on startup. I changed each node to have 2 CPUs and 2 GBs of memory which worked.
We’ll make sure to keep all ceph services constrained to this pool by naming it “storage-pool” (or whatever name you want) and adding a node affinity using that name later.
Cluster Access
Make sure you followed DO directions to accessing the Cluster with kubectl. (https://www.digitalocean.com/docs/kubernetes/how-to/connect-to-cluster/)
You also might want to add a Kubernetes Dashboard. (https://github.com/kubernetes/dashboard)
SSH:
Right now it doesn’t look like you can ssh into the droplets that DigitalOcean creates when you create a node pool. I wanted to have access just in case so I went to the droplets section and reset the root password for each of them. I was then able to add my ssh key and disable root login. I recommend doing this before adding any services.
Create Volumes
Go to the Volumes section in DigitalOcean dashboard. We want to create a volume for each node in the node pool we just created. Don’t format it. Then attach it to the correct droplet. Remember that volumes can only be increased in size (not decreased) without having to create a new one.
Create the Ceph Cluster
Clone down the Rook repository or just copy down the ceph directory from: https://github.com/rook/rook/tree/release-1.0/cluster/examples/kubernetes/ceph
cd cluster/examples/kubernetes/ceph
Modify the cluster.yaml file.
This is where we’ll add the node affinity to run the ceph cluster only on nodes with the “storage-pool” name.
placement: all: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: doks.digitalocean.com/node-pool operator: In values: - storage-pool podAffinity: podAntiAffinity: tolerations: - key: storage-pool operator: Exists
There are also other configs that are commented out that you might need to change. For example, if your disks are smaller than 100 GB you’ll need to uncomment the ‘databaseSizeMB: “1024”‘.
Modify the filesystem.yaml file if you want. (Filesystem Design)
Once you’re done configuring you can run:
kubectl apply -f ceph/common.yaml
kubectl apply -f ceph/csi/rbac/cephfs/
kubectl apply -f ceph/filesystem.yaml
kubectl apply -f ceph/operator-with-csi.yaml
kubectl apply -f ceph/cluster.yaml
If you want the ceph dashboard you can run:
kubectl apply -f ceph/dashboard-external-https.yaml
Your operator should create your cluster. You should see 3 managers, 3 monitors, and 3 osds. Check here for issues: https://rook.github.io/docs/rook/master/ceph-common-issues.html
Deploy the CSI
https://rook.github.io/docs/rook/master/ceph-csi-drivers.html
We need to create a secret to give the provisioner permission to create the volumes.
To get the adminKey we need to exec into the operator pod. We can print it out in one line with:
POD_NAME=$(kubectl get pods -n rook-ceph | grep rook-ceph-operator | awk '{print $1;}'); kubectl exec -it $POD_NAME -n rook-ceph ceph auth get-key client.admin
Create a secret.yaml file:
apiVersion: v1 kind: Secret metadata: name: csi-cephfs-secret namespace: default data: # Required if provisionVolume is set to true adminID: admin adminKey: {{ PUT THE RESULT FROM LAST COMMAND }}
Create the CephFS StorageClass.
We’ll need to modify the example storageclass in ceph/csi/example/cephfs/storageclass.yaml.
The storageclass.yaml file should look like:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: csi-cephfs provisioner: cephfs.csi.ceph.com parameters: # Comma separated list of Ceph monitors # if using FQDN, make sure csi plugin's dns policy is appropriate. monitors: rook-ceph-mon-a.rook-ceph:6789,rook-ceph-mon-b.rook-ceph:6789,rook-ceph-mon-c.rook-ceph:6789 # For provisionVolume: "true": # A new volume will be created along with a new Ceph user. # Requires admin credentials (adminID, adminKey). # For provisionVolume: "false": # It is assumed the volume already exists and the user is expected # to provide path to that volume (rootPath) and user credentials (userID, userKey). provisionVolume: "true" # Ceph pool into which the volume shall be created # Required for provisionVolume: "true" pool: myfs-data0 # The secrets have to contain user and/or Ceph admin credentials. csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret csi.storage.k8s.io/provisioner-secret-namespace: default csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret csi.storage.k8s.io/node-stage-secret-namespace: default reclaimPolicy: Retain allowVolumeExpansion: true
Change the storage class name to whatever you want.
*If you changed metadata.name in filesystem.yaml to something other than “myfs” then make sure you update the pool name here.
Create the PVC:
Remember that Persistent Volume Claims are accessible only from within the same namespace.
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: my-pv-claim spec: storageClassName: csi-cephfs accessModes: - ReadWriteMany resources: requests: storage: 10Gi
Use the Storage
Now you can mount your volume using the persistent volume claim you just created in your Kubernetes resource. An example Deployment is:
apiVersion: apps/v1 kind: Deployment metadata: name: webserver namespace: default labels: k8s-app: webserver spec: replicas: 2 selector: matchLabels: k8s-app: webserver template: metadata: labels: k8s-app: webserver spec: containers: - name: web-server image: nginx volumeMounts: - name: my-persistent-storage mountPath: /var/www/assets volumes: - name: my-persistent-storage persistentVolumeClaim: claimName: my-pv-claim
Both deployment replicas will have access to the same data inside /var/www/assets.
ADDITIONAL TOOLS
You can also test and debug the filesystem using the Rook toolbox. (https://rook.io/docs/rook/v1.0/ceph-toolbox.html).
First start the toolbox with: kubectl apply -f ceph/toolbox.yaml
Shell into the pod.
TOOL_POD=$(kubectl get pods -n rook-ceph | grep tools | head -n 1 | awk '{print $1;}'); kubectl exec -it $TOOL_POD -n rook-ceph /bin/bash
Run Ceph commands: http://docs.ceph.com/docs/giant/rados/operations/control/
Validate the filesystem is working by mounting it directly into the toolbox pod.
From: https://rook.io/docs/rook/v1.0/direct-tools.html
# Create the directory
mkdir /tmp/registry
# Detect the mon endpoints and the user secret for the connection
mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')
# Mount the file system
mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /tmp/registry
# See your mounted file system
df -h
Try writing and reading a file to the shared file system.
echo "Hello Rook" > /tmp/registry/hello
cat /tmp/registry/hello
# delete the file when you're done
rm -f /tmp/registry/hello
Unmount the Filesystem
To unmount the shared file system from the toolbox pod:
umount /tmp/registry
rmdir /tmp/registry
No data will be deleted by unmounting the file system.
Monitoring
Now that everything is working you should add monitoring and alerts.
You can add the Ceph dashboard and/or Prometheus/Grafana to monitor your filesystem.
http://docs.ceph.com/docs/master/mgr/dashboard/
https://github.com/rook/rook/blob/master/Documentation/ceph-monitoring.md