/cloud

Seeing through the fog...

In the past I've written about the Rock Your CV application which my wife and I built a while ago, and which we made free when COVID hit as part of our contribution to trying to help people get back to work.

It is a simple app sitting on AWS which makes use of a lot of native AWS services in order to minimise costs.

Recently I've been thinking about moving it from AWS to another hosting service/model, really to give me some experience with a different provider - but that got me thinking, what are the available options, and what are the pros and cons of each.

This article serves as a 'literature survey' of those options along with my view of the pros and cons of each.

Refresher: the application

As I mentioned in my previous article, the app is really simple, consisting of a UI tier, an application logic tier and a data storage/persistence tier - a classic n-tier design. There is also a backend processing layer which performs some tasks like PDF rendering.

Basic view of RYC application

At a high level the following services are used:

  • UI: AWS CloudFront managed hosting/CDN, built from source using CodePipeline/CodeBuild
  • Application: Python3 code running in AWS Lambda Functions exposed via API Gateway
  • Persistence: AWS DynamoDB and S3 buckets for blob storage
  • Backend: Containerised services running on AWS Elastic Container Service

The application also makes use of some other AWS services like messaging (SQS) and image processing (Rekognition).

I will refer to each of these layers as we discuss the different approaches, so it is important to understand them.

Summary of the options

There are of course, many possible combinations of sevices and deployment approaches which could be used for the application, so to keep it simple I'm going to constrain this analysis to 4.

  1. Single Box: all the parts running on a single server (could be multiple servers if needed)
  2. Containerised (private): all the parts running in containers on a private cluster
  3. Containerised (public): as above, but utilising a public cloud provider's container management offering
  4. Cloud Native: utilising public cloud provider native services (the current deployment approach)

The first 3 options have a lot in common, as in all of them we need to find non-cloud native ways of performing each application task (e.g. web-hosting, running the API etc.).

Here's a quick summary of the design choices this would imply.

Tier/function Non-native Native: AWS Native: Azure
UI Web-server e.g. Apache httpd or Nginx CloudFront Azure CDN
Application Python wrapped as a WSGI application and exposed using a web-server like Apache http with mod_wsgi Lambda + API Gateway Cloud Functions + Application Gateway
Data Relational e.g MySQL/postgres or Document e.g. MongoDB DynamoDB + S3 CosmosDB + Azure Storage
Messaging RabbitMQ or similar SQS Queue Storage
Backend services Headless chrome triggered by a script which is listening for MQ messages Headless chrome with a sidecar orchestrator script running on ECS/EKS Headless chrome with a sidecar orchestrator script running on AKS
Tools Self-hosted Jenkins/Gocd or similar CodePipeline & CodeBuild Azure Devops

Note: I know there are hybrids that are possible, for example there's nothing which stops you from using cloud tooling in on-premise deployments. I've made this black-and-white for simplicity.

Single Box

Circa 1990-2000: this is the 'tin-foil hat' option - pulling my old PowerEdge T110-II server from under my desk and deploying all the various components on that single box. For a small site like Rock Your CV this would likely deliver adequate performance, however there are a series of issues with such an approach.

Pros (good points)

  • Cost is limited, because you buy a server and then sweat that asset, with the only ongoing cost being power and network connectivity
  • Controlled environment, you know everything running on the server and what it is doing
  • No sharing with others introducing unpredictability in performance

Cons (not so good points)

  • There's a lot of work involved in getting all the components running well on the single server, including all the automation and associated scripting
  • Increased development time/cost as you must build some services you might have used from a cloud provider e.g., SQS
  • You may have security risks if you have all the different tiers of the application running in a single environment (no defence in depth)
  • Capacity can become constrained and there's no easy way to respond to this without buying a bigger and bigger server (scale-up), which will be wasted capacity for much of the time
  • You can have strange interactions between the different components running on the same machine as they contend for the same resources (CPU, memory, I/O)
  • You have single points of failure for power and network connectivity as well as the machine itself (one config error could take it out)

Private Containers

A step-up from the 'tin-foil hat' option above, this is still self-hosted, but using a container approach and an orchestrator like Kubernetes. I could (for example) use my 5 node Raspberry Pi cluster to do this, with the minor annoyance of needing to make sure that all my images are built for arm64 (which should not be a problem given I'd be using well-known software).

Such an approach is a first step toward public cloud deployment, as typically containerised workloads can be deployed easily on all three major public cloud providers (AWS, Azure and GCP).

Pros

  • Cost is limited as the hardware is bought once and you only incur ongoing energy and network connectivity costs
  • A controlled environment where you know everything that is running and what it is doing
  • Scaling out is supported easily on non-stateful components e.g., UI web-servers

Cons

  • Setting up the cluster itself can take a lot of time and some of the parts are complex to master and configure correctly
  • There's a work involved in getting all the components running well on the cluster, including all the automation and associated scripting
  • Increased development time/cost as you have to build some services you might have used from a cloud provider e.g., SQS
  • Capacity can only scale as far as the hardware available to the cluster will allow, beyond that, cluster upgrades are needed (which could be expensive)
  • You have single points of failure for power and network connectivity
  • State management (e.g., running a database) in containers is tricky and typically requires application level support

Public Containers

This is very similar to containers deployed on a private cluster, in fact the exact same container images can be used, but they would be deployed on a cluster managed by a public cloud provider for example GKE (Google), ECS/EKS (AWS) or AKS (Azure).

As such it has some similar pros and cons, but some are changed because the public cloud handles a lot of the cluster complexity for you as well as not having power and network single points of failure. The downside is increased ongoing costs, as the cloud providers charge a premium for the value-added services they are providing.

Pros

  • No overheads from setting-up and managing the cluster, this is done for you by the cloud provider
  • Scaling out is supported easily on non-stateful components e.g., UI webservers
  • Assuming no cloud specific services are used, the workloads are truly portable across different clouds
  • Capacity is only constrained by what you can pay for
  • The cloud provider will not have power/network single points of failure (although there's typically work involved in making sure your application runs across the 'availability zones' which is left up to you as the user)

Cons

  • There's a work involved in getting all the components running well on the cluster, including all the automation and associated scripting
  • Increased development time/cost as you have to build some services you might have used from a cloud provider e.g., SQS
  • True portability can be hard to achieve especially when you rely on a cloud provider's managed cluster offering as the tooling which supports builds and deployments is typically specific to the provider's implementation
  • You are usually paying for the infrastructure based on the compute resources you are consuming, which you can never truly reduce to zero - therefore there is always some background run cost, even at very low usage levels for your application

Cloud Native

In this approach we marry native (usually serverless) components provided by a cloud provider with other application components to build the application, so it runs in a specific cloud provider's environment. For Rock Your CV we currently use AWS and that means using a lot of AWS specific services like CloudFront for hosting, API G/W and Lambda for application logic and DynamoDB for data persistence.

The huge pro of such an approach is that it typically reduces your application development time because your app becomes 'glue' between a set of pre-defined cloud services. This is also typically the only way to make applications that can scale their cost down to $0 when not being used. The big con of all this is being 'beholden' to a single cloud provider who might go bust or who (more realistically) know the effort you'd need to move from them so are able to dictate terms and prices.

Pros

  • Can build truly scalable apps which can cost $0 (or very little) to run when they have no usage
  • You can get to market quickly by leveraging the pre-made building blocks the cloud provider has
  • Capacity is constrained only by what you can pay for
  • Many times, the cloud provider takes care of multi-zonal high availability inside the cloud services you are consuming, taking this away as a primary concern for the application developers
  • The cloud providers spend billions of $ on research and building new services, and the benefit of all of this comes to you with

Cons

  • It is unlikely the application is portable to other providers if a lot of cloud provider specific services have been used. E.g., if you use DynamoDB as your data layer, there is a lot of work involved to re-write to use another document database
  • Costs can grow linearly with usage and you may reach a point where it makes more sense to consider different funding approaches (e.g., capital expenditure)
  • If you are trying to make an experience that is differentiating for your users, there may be downsides to using the same building blocks as 'everybody else'
  • Cloud provider outages could take out the whole application - for example AWS's well published S3 issues.

So, what does it all mean?

Looking at the options and the pros and cons, I believe the decision about how to build and deploy your application comes down to 4 fundamental questions:

  • Is time to market important?
  • Is independence from a cloud provider important?
  • What is our funding/cost model?
  • What existing expertise do we have?

Time to market

If you need to get to market quickly then you may be better suited to cloud native approaches.

Independence

If you need true portability between clouds or between cloud and on-prem then this would favour container-based approaches with little/no use of cloud native services. Sometimes this independence requirement can be required by a regulator or your procurement department. That being said, I’m not persuaded by cost related arguments for multi-cloud approaches.

Funding/cost

This is a tricky one to have a simple heuristic for - on-prem/private approaches typically have high build costs and lower run costs (which don't scale with usage), whereas cloud approaches (especially cloud-native) have lower build costs and higher run costs (which do scale with usage). There are of course exceptions to both views.

Existing expertise

This is a significant one, especially if you choose an implementation option which is heavier on initial configuration like hosting your own container cluster. If you already have teams who are proficient in certain application technologies and deployment approaches, then it can make sense to continue along that path.

-- Richard, Jan 2021