Serverless WebDAV
Recently I moved most of my AWS workloads to ECS backed by a spot fleet and managed with Terraform using the gitops model.
I wanted to create a quick solution for sharing files between my different computers such that:
- it is natively support by the operating systems (macOS and Windows)
- it can work over the internet
- it can run on my ECS cluster
- adding/removing users does not require file based config/deployments
I thought back and remembered WebDAV, which I have used in the past and wondered how I could bring this into the modern age.
I started by creating a Docker image with Apache httpd and mod_dav enabled and this worked well. Then I turned my attention to authentication.
Authentication
One of my requirements is that users can be added/removed without file config changes so that ruled out .htaccess files and other similar mechanisms. I did some research into using modern approaches like SAML, Oauth2 or OIDC and found that these are largely incompatible with the WebDAV protocol.
So I settled on using HTTP basic authentication over HTTPS and backed by a store of user data in a DynamoDB table. Next came the task of figuring out how to do it.
PAM to the rescue
Luckily Linux has Pluggable Authentication Modules (PAM) which allows libraries to be written to provide authentication, account management and session features to consumers such as web servers. After reading about these PAM modules I decided to implement one which would query a DynamoDB table to see if a user had valid credentials or not.
PAM modules are supposed to be written in C, however AWS does not offer a C SDK for their APIs, only a C++ version. So I wrote the module in C++, but included C exports so that the PAM wrapper could call the module.
You can see the module here https://github.com/richardjkendall/pam-dynamo
The module uses cmake
to build and produces a deb archive that can be installed on Ubuntu. I've tested it with Ubuntu 18.04 LTS. It also requires the AWS C++ SDK for DynamoDB which can take a while to build. For this reason you can use my pre-packaged Docker image here https://hub.docker.com/r/richardjkendall/ubuntu-pam-dynamo.
The module takes its config in the form of parameters which you include in the pam.d config file. For example:
account required libpam-dynamo.so ${REGION} ${TABLE} ${REALM} ${CACHE_FOLDER} ${CACHE_DURATION}
auth required libpam-dynamo.so ${REGION} ${TABLE} ${REALM} ${CACHE_FOLDER} ${CACHE_DURATION}
where REGION = the aws region, TABLE = the name of the dynamodb table, REALM = the realm (hash key) in the table containing the users, CACHEFOLDER = the folder where the cache database should be created and CACHEDURATION is the number of seconds that cache entries should live for.
Integration with Apache httpd
There is an existing module called modauthnzpam which allows use of the PAM system for authentication tasks. Once enabled it is relatively simple to configure basic authentication where the username and password provided is passed to the PAM system to be validated.
On Ubuntu it means first installing the module sudo apt-get install -y libapache2-mod-authnz-pam
and then adding the appropriate configuration to your site. On Ubuntu you can find the enabled sites in /etc/apache2/sites-enabled
(these are symbolic links to the files in the /etc/apache2/sites-available
directory). Add the following config to the site you want to be protected:
<Location />
AuthType Basic
AuthName "private area"
AuthBasicProvider PAM
AuthPAMService aws
Require valid-user
</Location>
In this block of config ensure you change the AuthPAMService
entry to match the name of your config file in /etc/pam.d
which contains the references to libpam-dynamo.so. In my example the file was called /etc/pam.d/aws
.
After you restart httpd you should see a basic authentication prompt when you attempt to visit http://server:port. At this point all attempts should fail though because we have not set up the DynamoDB table with users.
DynamoDB Table
The module uses a very simple table with the following setup
Field | Type | Role |
---|---|---|
realm | String | hash key |
username | String | sort key |
password | String | n/a |
scopes | List of strings | planned for future feature support |
The password field should contain SHA3-256 hashed passwords.
AWS Permissions
Because the PAM module uses the AWS C++ SDK it can pick up credentials the normal way any AWS component does, either:
- using the
~/.aws/credentials
file - using environment variables
- picking up EC2 instance roles / task roles where running on AWS infrastructure
The role uses needs to have the dynamodb:GetItem
permission for the table where the user details are stored.
Caching
In my early tests of WebDAV working together with this PAM module I found that it was too slow to perform the checking and this was causing WebDAV to produce inconistent results.
To resolve this I have included caching in a local sqlite database. You can read more about sqlite here https://www.sqlite.org/index.html and the C/C++ iterface documentation is here https://www.sqlite.org/c3ref/intro.html.
The module creates a database file per realm in a configurable directory and caches the results of any successful authentication against DynamoDB. The username and hashed password are cached. The cache ttl is configurable, but it defaults to 120 seconds.
File Storage
One of the complexities with containerised services is the storage of persistent data, that is data that will survive the stopping and starting of the container. We can't store data on the EC2 hosts or on the container filesystems themselves because these are ephemeral and they could disappear and be recreated at any time.
Fortunately AWS has an option to mount an EFS filesystem as a volume attached to an ECS task and anything saved to this filesystem will survive a task or EC2 termination. I'm using the burstable version of EFS as my use case does not need specific performance, but you can also pay to reserve bandwidth if you need.
EFS also has the useful option to enable policies to push files you have not recently accessed into lower cost storage which helps manage the overall cost of the service.
Putting it all together
As I said at the start I use ECS to run most of my workloads. This is combined with an ALB (to offload SSL) and haproxy to manage automatic service discovery and routing of requests to backend targets.
I have a terraform module which deploys this on ECS and sets up the Cloud Map entries needed for my haproxy component to discover it. The module is here https://github.com/richardjkendall/tf-modules/tree/master/modules/webdav-server (release tag: v31).
Example config with this module
module "test_haproxy" {
source = "../../tf-modules/modules/webdav-server"
aws_region = "***"
cluster_name = "***"
service_registry_id = "ns-***"
service_registry_service_name = "_files._tcp"
efs_filesystem_id = "fs-***"
dav_root_directory = "/files/root"
dav_lockdb_directory = "/files/db"
users_table = "basicAuthUsers"
auth_realm = "dav"
}
Problems with HTTPS
Running a WebDAV server behind a reverse proxy which is offloading HTTPS can cause issues because the WebDAV protocol uses absolute URLs in the Destination HTTP header so when it sees a https scheme and expects to see http it causes errors to be returned.
This can be fixed by using mod_headers to update any https URLs in the Destination header to http.
RequestHeader edit Destination ^https http early
-- Richard, Mar 2020