A couple of years ago my wife and I set out to try and build an online business to help people create good looking CVs.
Sadly it did not take off, and we shut it down last year, but we learned a lot both about technology, design and what a business needs to be successful (the opposite of what we did). Recently we re-launched it in 'free play' mode so that people affected by COVID-19 related job losses can take advantage of it.
My wife did all the design work and my job was to build the tech to support it. This article is about the intial design and how we evolved that to make it as cheap as possible to run.
Our motivation
We both have worked in professional services our whole careers and a big part of that involves talking to clients and creating proposals which include profiles of the people who would work on an engagement. We also did a lot of recuritment work within our companies and therefore saw a lot of CVs from candidates.
On thing was common across all the CVs we saw. They were often hard to read, poorly formatted and did not draw out the differentiating characteristics of the candidates. We decided to create the Rock Your CV (RYC) platform to help people solve these problems.
Conceptual design
The technology design was very simple. A web-based platform to edit and view CVs with a data storage backend and an API layer to connect the web application to the data storage. It would all be hosted on AWS, using tools like CloudFormation so that creating environments was repeatable and easy and CodePipeline to orchestrate build/deploy.
Iteration 1
I often like to use projects as a chance to learn new skills, and so I decided to use the following technologies as part of the platform:
- Front-end: React
- API: Python (I'd used it a lot, but this was the first large project)
- Database: MySQL
- Messaging: RabbitMQ
I decided to use containers for the front-end and API components (running on AWS ECS), AWS RDS for the database and RabbitMQ running on an EC2 instance for the messaging layer. I also had some other background components to run (e.g. a PDF generator) and decided to run those on ECS as well.
This all worked well, but it became clear it would be too costly to have EC2 instances running all the time to support the ECS cluster as well as the costs of the RDS instance running 24x7 as well. All told our bill was going to come to $210USD per month, which for a small business was too high.
Iteration 2
To reduce costs, I decided to explore what serverless components could be used to host parts of the solution. And so I landed on:
- Front-end: React hosted on S3/Cloudfront
- API: Python lambda functions exposed by API Gateway
- Database: MySQL (RDS)
- Messaging: RabbitMQ on EC2
- Background tasks: ECS on small EC2 instances
This significantly reduced our costs, and it was very quick to do, because the only things that needed to change were the build pipeline for the front-end and API. The code stayed substantially the same. But we were still left with the high costs of the database and EC2 instances.
Iteration 3
With more time on my hands I decided to re-write the data layer to use DynamoDB rather than a relational databse like MySQL. This did take time, but because I'd written the code in layers, it did not require the core logic to change, only the way data was accessed and written. I was able to leverage a number of features of DynamoDB to provide relational 'like' access patterns.
At the same time I moved from RabbitMQ to SQS. So with these changes we eliminated most of the 'fixed' costs from the RDS and EC2 instances, but still had a small amount for the background tasks ECS cluster.
So at this point we had:
- Front-end: React hosted on S3/Cloudfront
- API: Python lambda functions exposed by API Gateway
- Database: DynamoDB
- Messaging: SQS
- Background tasks: ECS on small EC2 instances
Iteration 4
The final change I made was to link the running background tasks to the usage of the platform, so that we only spin up the EC2 instances that the tasks run when somebody is logged in and using the platform.
I did this using CloudWatch metrics, alarms and auto-scale groups to detect usage and absence of usage of the platform. When usage is detected (through the login API being successfully used) it triggers an auto-scale event which sets the number of EC2 instances supporting the ECS cluster to a fixed number. If no more activity is detected after 2 hours then this auto-scale group is set back to 0, each time activity is detected (successful API calls) this timer extends by 2 hours.
This has left us with a platform that costs ~$25USD per month to run! Down from ~$210 originally, an 88% reduction!
What's next
There is still more we can do to get our running costs down. The next set of ideas include:
- Moving our Lambda functions out of a VPC - they are in a VPC historically due to the need to access an RDS instance. If we can move them out of the VPC then we no longer need a NAT instance to allow outbound access to the internet for API calls
- Switching background tasks to spot fargate tasks - these are cheaper than on-demand EC2 instances and can start up and shutdown faster, so we don't need to keep them running for as long
Take aways
If you take anything away from this article it is that the more modular you make any application you build, the easier it is to swap out parts of it without requiring lots of code re-writing. Look at swapping out MySQL for DyanmoDB - that was easy to do because of the layered approach I took to writing the code.
-- Richard, Jul 2020