This post describes how I’ve put together a simple static content server for kubernetes clusters using a Pod with a persistent volume and multiple containers: an sftp server to manage contents, a web server to publish them with optional access control and another one to run scripts which need access to the volume filesystem.
The sftp server runs using MySecureShell, the web server is nginx and the script runner uses the webhook tool to publish endpoints to call them (the calls will come from other Pods that run backend servers or are executed from Jobs or CronJobs).
Note: This service has been developed for Kyso and the version used in our current architecture includes an additional container to index documents for Elasticsearch, but as it is not relevant for the description of the service as a general solution I’ve decided to ignore it on this post.
HistoryThe system was developed because we had a NodeJS API with endpoints to upload files and store them on S3 compatible services that were later accessed via HTTPS, but the requirements changed and we needed to be able to publish folders instead of individual files using their original names and apply access restrictions using our API.
Thinking about our requirements the use of a regular filesystem to keep the files and folders was a good option, as uploading and serving files is simple.
For the upload I decided to use the sftp protocol, mainly because I already had an sftp container image based on mysecureshell prepared; once we settled on that we added sftp support to the API server and configured it to upload the files to our server instead of using S3 buckets.
To publish the files we added a nginx container configured to work as a reverse proxy that uses the ngx_http_auth_request_module to validate access to the files (the sub request is configurable, in our deployment we have configured it to call our API to check if the user can access a given URL).
Finally we added a third container when we needed to execute some tasks directly on the filesystem (using kubectl exec with the existing containers did not seem a good idea, as that is not supported by CronJobs objects, for example).
The solution we found avoiding the NIH Syndrome (i.e. write our own tool) was to use the webhook tool to provide the endpoints to call the scripts; for now we have three:
one to get the disc usage of a PATH,one to hardlink all the files that are identical on the filesystem,one to copy files and folders from S3 buckets to our filesystem....