Incidents are everybody’s responsibility

Bad things happen: it is how you respond that matters. True for anything that provides a service to customers: restaurants, ticketing, services.

I recently took a course on incident response and wanted to summarise before I forget. The text below is largely from the courses, from Pager Duty. You can find them on YouTube. Key takeaways:

  • Define you response strategy in advance
  • Make sure everyone is agreed and aware of the response strategy
  • The response commander is in full control during an incident
  • Train your people
  • Stay calm
  • Make a decision, even if wrong can get data

Having worked at…


My favourite tech and non-tech podcast episodes of 2020 in no particular order.

13 minutes to the moon, episode 2, “Kids in control” — the remarkable story of the moon landings, brought to life. The average age of the team was 27, months spent on failure scenarios, runbooks, calm voices with no fear, put the decisions in the hands of the doers, complete trust in your team. Sound familiar?

David Tennant Does a Podcast with Olivia Coleman — recorded in 2019. Olivia Coleman comes across as a lovely person, and so do all the other guests in this series.

Training…


Manage JFrog through terraform; secure your deployments with promotion pipelines.

Repositories are your source of truth for images and binaries. This article covers how artefact promotion pipelines are a means to confidently deploy images to production. It covers why promotion pipelines are good practice, gives an example approach for setting up, and links to a complete terraform code GitHub project that implements the pattern. Although I refer to images, the pattern is relevant for anything deployed to production.

Basic Ground Rules for Images

These basic rules will help make your images more stable and deployments repeatable, and are good practices in general:

  • Build the image…


This page covers GitHub template repositories: what, why and how to set up and makes some suggestions for what a template could include. As with everything, get the basics right early on, to avoid technical debt in the future.

What are template repos?

Template repos in GitHub allow users to generate a new repo with the same structure including files, folders and branches as another repo. This means that when repo A is marked as a template repo, it can be used to generate files and folders when creating repo B . It’s the cookie cutter concept and so is a once only operation…


Terraform Cloud is a hosted service that helps manages the coordination of terraform builds when working with remote teams. It has the option to run locally, or within the service. Runs are queued in order, thereby reducing the chance of conflict or overwrites across a team. Other useful features are the ability to role back, team based permissions, and change audit trail.

What this Page Covers

This page covers setting up a Terraform Cloud workspace to manage your GitHub repositories.

Prerequisites

  • a GitHub organisation
  • GitHub organisation admin rights

Set Up Terraform Cloud

The setup involves some flipping between GitHub and Terraform Cloud. Head over to TC and signup for…


I recently spent a fun Sunday configuring Vault using the Terraform Vault Provider, with custom mount paths. In the process I used only Docker images and so decided to share, as I struggled to find similar tutorials. I’m assuming knowledge of the HashiCorp Vault and Terraform products, and that Docker and Docker Compose are installed and working. You can download the associated project, or just read on.

What does this Article Cover?

I am a strong believer in segregation of responsibilities, data and configuration for services/microservices. By using custom paths in Vault, the pattern in this project could easily be extended to multiple services, each…


Recently, I went to the Hong Kong edition of the Microsoft Ignite tour and went to a great talk by David Blank-Edelman’s Monitoring Infrastructure and Apps in Production, and Diagnosing Failure in the Cloud. Thanks to him for a very entertaining talk. The talk was conceptual and covered things relevant to any system. Below are my takeaways, supplemented with my own experience. Note: this is only a very small area of SRE, and I recommend reading around the subject.

What is Site Reliability Engineering?

Ben Sloss wrote:

My explanation is simple: SRE is what happens when you ask a software engineer to design an operations…


What did I want to do?

I wanted to set up ManageIQ.

The instructions from the Get Started ManageIQ page covers Google Cloud, docker and vagrant. The docker and Google Instructions worked well. However, when running docker, configuring the database for persistence between shutdowns was very difficult due to initialisation issues (see appliance-initialize.sh). I wanted to run on AWS. There are a number of appliance downloads, but I could not find install instructions for the AWS vhd. This page covers what I did to get it working, for the hammer 3 release.

How to install?

After some searching around in the forums, I came across this article by Laurent…


“Spinnaker is an open source, multi-cloud continuous delivery platform for releasing software changes with high velocity and confidence.”

I guess if you are reading this, you know what spinnaker is and where it comes from. The instructions below will give you:

  • VM deployment
  • Persistent storage
  • a docker registry provider
  • kubernetes v1 provider for deploying containers in to

I provide links to the official documentation, but the commands alone should work. I’m assuming you have a jump box/bastion to access internal management components. The instructions cover a mostly vanilla install, without RBAC or proper security. …


I was recently looking for a diagram that covered all the elements in a CI/CD pipeline. Most of the ones I came across only covered sections of the deployment journey, and so I decided to create my own. There are three main overlapping cycles to the journey and at the bottom of the article is a gist that summarises the purpose of each white box.

CI/CD Flow

The underlying assumption is that automation drives the flow of code to an integration server environment for testing.

Code Change Cycle

The code change cycle is the review process before merging to master. It involves self-review and at…

Fergus MacDermot

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store