15 August 2022/cicd

CI CD with Github and Docker hub

Github and docker cicd

Some time ago I wrote a PAM (pluggable authentication module) to allow user details to be stored in a DynamoDB table. I did this because I was looking for a simple way to integrate a central repository of user credentials with HTTP basic authentication. The Apache httpd server has a good plugin which allows use of PAM for authentication, and so my module is used by this plugin.

I wrote about the creation of this module here: https://rjk.codes/post/building-a-serverless-webdav-server/

I now use it quite differently. My original reason for building this was as part of a web-based shared drive solution I was building. I don’t use this any longer, but I still use the basic authentication as part of some of my authenticating reverse proxies. I’ve written before about this, but I typically separate concerns of authentication from my application code and let a component which sits in front of the application do this. An example is my basic authenticating reverse proxy: https://github.com/richardjkendall/basicauth-rproxy.

If you look at the Dockerfile for this proxy, you’ll see from the first line that it is based on a base image which contains my DynamoDB enabled PAM:

FROM richardjkendall/ubuntu-pam-dynamo:630e9ca9

This presents some complexity when code changes are made in my upstream component, and this article is about how I built the automation which keeps the downstream image up to date when upstream changes are made. This was not that simple because the build happens on two different platforms, Github actions and Docker Hub.

Dynamo DB PAM build

This is the part that happens using Github actions. This is because the build is not trivial, with two dependencies that need to be obtained and built before the module itself can be built. The dependencies are the AWS SDK for C++ and sqlite3. The build takes some time (between 10 and 15 mins) because of these dependencies. I want to explore caching the outcome of these in a later piece of work.

Once the build is done the image is pushed to Docker Hub. Prior to doing this work the build and push was triggered for any branch with the master branch also tagging the built image with ‘latest’.

I needed to modify the build and push process to also trigger an update to the basicauth-rproxy code to reference a specific newly built image. I also wanted some tests to be run on the new image to ensure it still works. Let’s look at how this testing works before we come back to this topic.

Tests on Dockerhub

The build of basicauth-rproxy happens on Docker Hub using their automated build service. I made this choice because it is very simple to set-up and the build of the basicauth-rproxy image is not complex (just installing Apache httpd and configuring it).

When I started this exercise I did not know how to run tests on the Docker Hub platform, and I thought I’d have to move this to Github actions. However I read the Docker Hub documentation and found there is a feature you can use to do this: https://docs.docker.com/docker-hub/builds/automated-testing/. There is an immediate downside that this only works with paid accounts, however I have a paid account so I decided to use this as a learning opportunity.

Autotests are enabled in the automated build settings page, and they run on pull requests as well as being run after any build. There is an option to enable them for internal pull requests or internal and external pull requests. I chose only internal as I don’t accept external pull requests to this repository.

screen shot from docker hiub

The tests are run from a docker-compose file called docker-compose.test.yml, and this needs to have a special service called sut defined. This is the service that runs the tests. The tests are considered to have passed if sut exits with code 0 and to have failed if it exits with a non-zero code.

In my example the file builds and runs the basicauth service and also builds and runs the sut service. This second service is based on a small container which contains curl and bash and which runs a set of tests defined in a shell script. My tests are all curl commands which make calls to the basicauth service with different usernames and passwords defined. The response codes are then checked against what is expected.

The docker-compose file also specifies a set of environment variables expected by the basicauth service, which includes some secrets which are needed to allow access to the DynamoDB table on AWS. The secrets are defined as environment variables in Docker Hub (note: Docker Hub does not seem to properly support secrets).

version: '3'
services:
  echo:
    image: richardjkendall/echo-headers
  basicauth:
    build: .
    environment: 
      - removed…
    links:
      - echo
  sut:
    build: test
    links:
      - basicauth

The echo service is just a very simple application which returns a JSON file containing all the request headers.

The test Dockerfile:

FROM richardjkendall/curl-bash
COPY . .
CMD [ "./test.sh" ]

And an example of one of the tests:

echo "TEST 1: user without salt, valid password"
STATUSCODE=$(curl --silent --output /dev/stderr --write-out "%{http_code}" -u cinosalt:$CINOSALT_PW http://basicauth/)
if test $STATUSCODE -ne 200; then
    echo "TEST 1: failed with status code $STATUSCODE, running again with verbose output"
    curl -v -u cinosalt:$CINOSALT_PW http://basicauth/
    exit 1
else
    echo "TEST 1: passed"
fi

With these tests defined, any PR which is raised against the ‘develop’ branch will trigger them to run. I’ve also set up branch protection rules in Github to require these tests to run and pass before any merge to develop can be allowed.

screen shot from github

So this gives me a solution which can run tests when changes are made to the source code of the basicauth-rproxy solution.

Triggering updates to basicauth from the PAM code

Next I needed a way of triggering updates to basicauth-rproxy when changes are made to the PAM code. As I covered in an earlier section, when changes are made to the PAM code, this triggers a rebuild of the pam-dynamo container image.

To trigger the test and rebuild of the upstream component, I need to make a code change in its repository and raise a PR, so I decided to extend the pam-dynamo workflow to make this change to the basicauth-rproxy Dockerfile in a new branch and then raise a PR from that new branch to the develop branch. This then triggers the Docker hub autotests, and once these pass, the PR can be merged into develop.

I added this as a new job in the workflow, but made it depend on the build job so that it will not run until this job successfully completes. I also made this job conditional on the branch being develop, that way we are not making a change to the upstream repository except when we need to.

You can see the whole workflow here: https://github.com/richardjkendall/pam-dynamo/blob/master/.github/workflows/build_debs.yml but I’ve also included some snippets here.

This one pulls down the basicauth-rproxy code to a new folder and switches to the develop branch. You’ll notice I’m providing a token parameter which is supplied from a secret which is configured for the repository. This token is scoped to allow access to this repository.

- name: get basic auth rproxy source code
      uses: actions/checkout@v3
      with:
        path: 'basicauth'
        repository: 'richardjkendall/basicauth-rproxy'
        ref: develop
        token: ${{secrets.GH_TOKEN}}

The next block finds and replaces the existing image tag with the new image which has been built and pushed up to Docker Hub. This could be a bit more elegant, but it works - it uses sed (the stream editor) to do an inline replacement of any 8 character letter and number string which starts with a colon with the latest image tag. The only thing which matches this regular expression in the file is the existing image tag. This could be changed to only look for letters a to f as it is a hexadecimal reference, but that’s a future enhancement.

- name: update docker tag in Dockerfile
      run: |
        cd $GITHUB_WORKSPACE/basicauth
        sed -E -i 's/:[a-z0-9]{8}/:${{steps.vars.outputs.tag}}/g' Dockerfile
        cat Dockerfile

This final block adds the changes to the repository and pushes it up as a new branch to the origin. It then uses the hub cli tool to open a pull-request to develop. This tool is included as a standard feature on Github actions runners. You can read more about this tool here https://hub.github.com/hub.1.html.

- name: commit, push, and raise PR
      run: |
        cd $GITHUB_WORKSPACE/basicauth
        git add *
        git commit -m "Updating Dockerfile base image to ${{steps.vars.outputs.tag}}"
        git push -u origin update-base-${{steps.vars.outputs.tag}}
        hub pull-request -b develop -m "Update base image to ${{steps.vars.outputs.tag}}"
      env:
        GITHUB_TOKEN: ${{secrets.GH_TOKEN}}

Putting it all together

So now I have the following:

Changes in PAM source code triggers build of an image Success build of the PAM image triggers and update to the basicauth-rproxy component Dockerfile to reference the new source image and raises a PR Dockhub autotests verify that the new image works

The final steps are then done manually. These are to merge the PR on basicauth-rproxy and raise a PR from develop to master on the same repository. This completes the process and does a final build of the image which is tagged as latest.

The same is then done on the PAM source code, with a PR being raised from develop to master which triggers a build which is tagged as latest but which does not trigger the update process again as it does not need to be done again.

Finally any running containers I have need to be updated. I run most of my AWS containers on ECS, and I have a script which triggers redeployments of those services forcing new image pulls. These mostly reference latest so will only pull a new image after the master updates have been done.

What next

There are some inefficiencies in this process, for example:

Merging code to master does not really need to run the build again any longer The builds can be very slow because of the expensive CPP SDKs which have to be built, caching would be beneficial here Updates to running containers could be triggered automatically as part of the github actions workflow

-- Richard, Aug 2022