/aws

Bringing it back... the Webalizer

Recently I needed to produce some simple stats for a CloudFront distribution. After doing some research I found that CloudFront access logs use the W3C extended format and that many tools can read and summarise them.

Among those tools is the Webalizer which was very common back in the day when I first started building websites. This tool is great, free and fast - it also supports the W3C extended format.

Here's the solution I built...

I also decided to give the new AWS Public Container Registry a try as well.

Overview

Before we start - you can see the code for the solution here: https://github.com/richardjkendall/cf-stats.

The solution is quite simple:

  • A container image containing the Apache httpd server along with the Webalizer software and other dependencies
  • A script which downloads the logs from S3 based on a set of filter criteria (date fragments)
  • A Lambda function + CloudWatch trigger which reloads the container every 6 hours

When the container starts it runs the script which downloads the logs from S3 and runs the Webalizer to create the stats. The stats HTML files are made available at the root of the webserver which is installed on the container as well.

Protecting it

There is no configuration on the webserver to authenticate/authorise any users.

I typically like to separate security from application concerns, so I have several other components which can sit in front of the stats container and act as user-aware reverse proxies:

There is a pre-built Terraform module here https://github.com/richardjkendall/tf-modules/tree/master/modules/simple-cf-stats which deploys the stats solution behind a basic auth reverse proxy.

Using the AWS Public Container Registry

With the new limits that Docker has announced on pull rates I decided to give the new AWS ECR Public Container Registry a try.

I use GitHub actions to build most of my container images. Pushing to the new ECR Public repositories is quite easy, although there are a few tricks to the login. Here's an excerpt from my workflow yml file:

      - name: Get ECR password
        id: ecr_login
        run: |
          ECR_PASSWORD="$(docker run -i --rm -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -e AWS_REGION amazon/aws-cli ecr-public get-login-password)"
          echo "::add-mask::$ECR_PASSWORD"
          echo "::set-output name=ecr-login-password::$ECR_PASSWORD"

      - name: Login to ECR public
        run: |
          echo ${{ steps.ecr_login.outputs.ecr-login-password }} | docker login public.ecr.aws -u AWS --password-stdin

I tried using docker/login-action https://github.com/docker/login-action but it did not work with the public ECR offering for some reason.

Per the AWS documentation you need to make sure that the IAM user you use for pushes has permissions on the ecr-public:GetAuthorizationToken and sts:GetServiceBearerToken API calls.

-- Richard, Dec 2020