26 March 2020/aws

Building serverless SSO

Recently I was building a static site hosted on AWS CloudFront which need to sit behind some form of single-sign-on.

I decided it was a great time to try out Lambda@Edge combined with JSON Web Tokens (JWT) stored in cookies, so I set out to build a solution which combined Lambda@Edge, JWT and OpenID Connect. I decided to use JBoss Keycloak as my identity provider (IdP) because it has the capability to support multiple identity providers, so it is a great flexibility point.

Components

Keycloak

Keycloak is an open source IAM platform https://www.keycloak.org/ backed by the RedHat JBoss project. It is really simple to set-up and run - in fact it can run with a single docker container and a database (which can also be provided by a docker container). In my non-prod setup I'm using AWS ECS to run a task which consists of 2 containers: keycloak and postgres. If you want to get started quickly with this setup you can find a Terraform module here.

CloudFront

This is the AWS CDN solution which creates a globally available cache of an 'origin'. It is common to use S3 as the origin and there is tight integration between CloudFront and S3 to allow only the CloudFront distribution to access the S3 bucket. I also have a Terraform module here for setting up a static site backed by an S3 bucket.

Lambda@Edge

Lambda is AWS' well-known serverless function platform. It allows you to write code in one of several languages and have that code triggered by different events including API calls through the AWS API G/W. Lambda also has a feature where functions can be published to edge locations and used in CloudFront distributions, however there are some significant restrictions placed on those functions under certain scenarios.

Lambda@Edge only supports certain runtimes e.g. python3.7 and does not allow you to use Environment Variables. Further if you are attaching your lambda function to any of the 'viewer' events then the max execution time is 5 seconds, the max memory usage is 128MB and the max (compressed) function size is 1MB. This is because viewer events are executed for every request to CloudFront whereas the origin events are only executed on requests to the origin and these requests can be cached.

JSON Web Tokens

Or JWT. These are a compact, URL safe data-structure to represent claims between two parties. The consist of a header, a payload and a signature. They are very good because they enable stateless session management - all the details you need to verify the user/token are in the token. There's loads of information out there on JWT and lots of good libraries so I won't cover all those details here. Just Google JWT :) Or read this.

The Solution

Overall this hangs together as shown in the UML sequence diagram below. This depicts a basic, sunny day path where an unauthenticated user requests some content and then successfully authenticates before being shown the content.

serverless sso blog diagram

The problems... there are always problems

Audience validation

When validating a JWT it is import you validate the signature against the public key, that the token has not expired and that you are the intended audience. I wanted to use the client ID as the audience ID, but Keycloak by default does not do this, so I had to create an 'Audience mapper' which added the client ID to the audience claim in the JWT access token.

Function size

I decided to use Python (as I know it well) for my lambda function. I started off with a prototype as a local Flask application. Flask is great because:

it's really simple to get up and running
it has a built in web server for debugging which supports hot reloading
there are plenty of libraries which allow you to easily package a Flask app for use on AWS lambda

I got it working and then converted it to support CloudFront viewer-request events. Great I thought! Simply a case of packaging it as a zip and uploading it to lambda. Fail! Why - because my packaged function was 5MB :(

I tried a lot of ways to decrease the size, but could only get it to 2.5MB. In the end I found this is because the JWT signature verification needs the Python cryptography library, which includes a number of hefty binaries.

So I decided to create a dedicated API for JWT validation which I would deploy on lambda behind API G/W, that way my Lambda@Edge function could call that and not need all the heavy crypto libraries. Winner.

Native binaries on Lambda

So... I built my JWT validation API again as a local Flask app, got it working, packaged it as a lambda and deployed it along with the relevant API G/W config (I have lots of nice Terraform modules for this here.

The validation API code is here: https://github.com/richardjkendall/validate-jwt-api

What happened? It did not work :(

Deeper checks told me that the code was failing looking for various dynamic modules (.so files) for the crypto bindings and the cffi library. Lots of research (read... Stack Overflow) later and I find this is because I'm building my function on macOS and getting Darwin compatible libraries included in my package, not those which work on AWS Lambda (which is based on amazonlinux).

So I created a docker image based on amazon linux here and updated my lambda-function terraform module to use this. I still had a problem where Lambda was saying it could not find the _cffi_backend module, and I found I had to rename this file: _cffi_backend.cpython-37m-x8664-linux-gnu.so to \cffi_backend.so. It is not a solution I'm totally happy with, but it works for now.

Putting it together

Adding the validate API to the flow it looks like this:

serverless sso with validate api blog diagram

There's a Terraform module that can deploy a static site, connected to OIDC SSO along with CI/CD to deploy the site content. You can find it here

-- Richard, Mar 2020