Srijan Choudhary, all posts tagged: devops

2024-04-29-001

Srijan Choudhary — Mon, 29 Apr 2024 03:10:00 +0000

Using sysrq on my laptop - documenting mostly for myself.

My laptop has started freezing sometimes, not sure why. Usually, I can just force power off using the power button and start it again, but it has happened twice that I had to recover the system by booting via a USB drive, chrooting, and recovering the damaged files using fsck or pacman magic.

The linux kernel has:

a ‘magical’ key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up.

(More details on archwiki and kernel doc)

To enable, I did:

echo "kernel.sysrq = 244" | sudo tee /etc/sysctl.d/sysreq.conf
sudo sysctl --system

However, to trigger this on my laptop, I was not able to find the right key combination for SysRq. I was able to make it work using an external keyboard that has a PrintScreen binding on a layer, by using the following:

Press Alt and keep it pressed for the whole sequence: PrintScreen - R - E - I - S - U - B

Currently, PrintScreen on my external keyboard is bound to Caps lock long press + Up arrow.

Testing ansible playbooks against multiple targets using vagrant

Srijan Choudhary — Tue, 21 Nov 2023 06:55:00 +0000

I recently updated my Install docker and docker-compose using ansible post and wanted to test it against multiple target OSes and OS versions. Here's a way I found to do it easily using Vagrant.

Here's the Vagrantfile:

# -*- mode: ruby -*-
# vi: set ft=ruby :

targets = [
  "debian/bookworm64",
  "debian/bullseye64",
  "debian/buster64",
  "ubuntu/jammy64",
  "ubuntu/bionic64",
  "ubuntu/focal64"
]

Vagrant.configure("2") do |config|
  targets.each_with_index do |target, index|
    config.vm.define "machine#{index}" do |machine|
      machine.vm.hostname = "machine#{index}"
      machine.vm.box = target
      machine.vm.synced_folder ".", "/vagrant", disabled: true

      if index == targets.count - 1
        machine.vm.provision "ansible" do |ansible|
          ansible.playbook = "playbook.yml"
          ansible.limit = "all"
          ansible.compatibility_mode = "2.0"
          # ansible.verbose = "v"
        end
      end
    end
  end
end

The targets variable defines what Vagrant boxes to target. The possible list of boxes can be found here: https://app.vagrantup.com/boxes/search

In the Vagrant.configure section, I've defined a machine with an auto-generated machine ID for each target.

The machine.vm.synced_folder line disables the default vagrant share to keep things fast.

Then, I've run the ansible provisioning once at the end instead of for each box separately (from: https://developer.hashicorp.com/vagrant/docs/provisioning/ansible#tips-and-tricks).

The test can be run using:

$ vagrant up

If the boxes are already up, to re-run provisioning, run:

$ vagrant provision

This code can also be found on GitHub: https://github.com/srijan/ansible-install-docker

Exploring conflicting oneshot services in systemd

Srijan Choudhary — Thu, 08 Jun 2023 19:20:00 +0000

Midjourney: two systemd services fighting over who will start first

Background

I use mbsync to sync my mailbox from my online provider (FastMail - referer link) to my local system to eventually use with mu4e (on Emacs).

For periodic sync, I have a systemd service file called mbsync.service defining a oneshot service and a timer file called mbsync.timer that runs this service periodically. I can also activate the same service using a keybinding from inside mu4e.

[Unit]
Description=Mailbox synchronization service
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync fastmail-all
ExecStartPost=bash -c "emacsclient -s srijan -n -e '(mu4e-update-index)' || mu index"

[Install]
WantedBy=default.target

mbsync.service

[Unit]
Description=Mailbox synchronization timer
BindsTo=graphical-session.target
After=graphical-session.target

[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service

[Install]
WantedBy=graphical-session.target

mbsync.timer

Also, for instant download of new mail, I have another service called goimapnotify configured that listens for new/updated/deleted messages on the remote mailbox using IMAP IDLE, and calls the above mbsync.service when there are changes.

This has worked well for me for several years.

The Problem

I recently split my (huge) archive folder into yearly archives so that I can keep/sync only the recent years on my phone. [ Aside: yearly refile in mu4e snippet ]. This lead to an increase in the number of folders that mbsync has to sync, and this increased the time taken to sync because it syncs the folders one by one.

It does have the feature to sync a subset of folders, so I created a second systemd service called mbsync-quick.service and only synced my Inbox from this service. Then I updated the goimapnotify config to trigger this quick service instead of the full service when it detects changes.

But, this caused a problem: these two services can run at the same time, and hence can cause corruption or sync conflicts in the mail files. So, I wanted a way to make sure that these two services don't run at the same time.

Ideally, whenever any of these services are triggered and the other service is already running, then it should wait for the other service to stop before starting, essentially forming a queue.

Solution 1: Using systemd features

Systemd has a way to specify conflicts in the unit section. From the docs:

If a unit has aConflicts=setting on another unit, starting the former will stop the latter and vice versa.
[...] to ensure that the conflicting unit is stopped before the other unit is started, anAfter=orBefore=dependency must be declared.

This is different from our requirement that the conflicting service should be allowed to finish before the triggered service starts, but maybe a good enough way to at least prevent both running at the same time.

To test this, I added Conflicts= in both the services with the other service as the conflicting service, and it works. The only problem is that when a service is triggered, the other service is SIGTERMed. This itself might not cause a corruption issue, but if this happens with the mbsync-quick service, then there might be a delay getting the mail.

This is the best way I found that uses built-in systemd features without any workarounds or hacks. Other solutions below involve some workarounds.

Solution 2: Conflict + stop after sync complete

This is a variation on solution 1 - add a wrapper script to trap the SIGTERM and only exit when the sync is complete. This also worked.

But, the drawback with this method is that anyone calling stop on these services (like the system shutting down) will have to wait for this to finish (or till timeout of 90s). This can cause slowdowns in system shutdown that are hard to debug. So, I don't prefer this solution.

Solution 3: Delay start until the other service is finished

This is also a hacky solution - use ExecStartPre to check if the other service is running, and busywait for it to stop before starting ourselves.

[Unit]
Description=Mailbox synchronization service (quick)
After=network-online.target

[Service]
Type=oneshot
ExecStartPre=/bin/sh -c 'while systemctl --user is-active mbsync.service | grep -q activating; do sleep 0.5; done'
ExecStart=/usr/bin/mbsync fastmail-inbox
ExecStartPost=bash -c "emacsclient -s srijan -n -e '(mu4e-update-index)' || mu index"

mbsync-quick.service

Here, we use systemctl is-active to query the status of the other service, and wait until the other service is not in activating state anymore. The state is called activating instead of active because these are oneshot services that go from inactive to activating to inactive without ever reaching active.

To not make this an actual busywait on the CPU, I added a sleep of 0.5s.

This worked the best for my use case. When one of the services is triggered, it checks if the other service is running and waits for it to stop before running itself. It also does not have the drawback of solution 2 of trapping exits and delaying a stop command.

But, after using it for a day, I found there is a race condition (!) that can cause a deadlock between these two services and none of them are able to start.

The reason for the race condition was:

A service is marked as activating when it's ExecStartPre command starts
I added a sleep of 0.5 seconds

So, if the other service is triggered again in between those 0.5 seconds, both services will be marked as activating and they will indefinitely wait for each other to complete. This is what I get for using workarounds.

Solution 4: One-way conflict, other way delay

So, the final good-enough solution I came up with was to break this cyclic dependency by doing a hybrid of Solution 1 and Solution 3. I was okay with the mbsync.service being stopped for the (higher priority) mbsync-quick.service.

So, I added mbsync.service in Conflicts section of mbsync-quick.service, and used the ExecStartPre method in mbsync.service.

💡Let me know if you know a better way to achieve this.

References

Download a file securely from GCS on an untrusted system

Srijan Choudhary — Sun, 27 Nov 2022 19:30:00 +0000

The Problem

We publish some of our build artifacts to Google Cloud Storage, and users need to download these to the target installation system. But, this target system is not always trusted and can have shared local users, so we don't want to store long-lived credentials.

As a user, I can download the artifact on my (secure) laptop and transfer it to the target system. But, the artifact can be large (several GBs). So, downloading and uploading again makes it cumbersome and slow.

Option 1: use gcloud CLI on the target system

$ gcloud storage cp gs://$BUCKET/$FILE ./

This has two problems:

The user must install (and maybe update) gcloud CLI on the target system.
The user needs to store their credentials on the target system. These credentials have full access to whatever resources the user has. So, it's a huge security risk, especially if we don't trust the target system.

To mitigate (2), the user can log out of gcloud CLI after downloading. But, this is a manual step they might miss.

Option 2: use gcloud CLI with a service account

This is a variation of the above solution - we log in using a service account instead of the user account. This service account can have restricted access to only the resources needed.

$ gcloud iam service-accounts create $SA_NAME \
    --description="Service Account for downloading artifacts"
$ gsutil iam ch \
    serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com:roles/storage.objectViewer \
    gs://$BUCKET

This partially mitigates problem (2) above. If the user forgets to log out of gcloud CLI, the damage will be restricted to the resources accessible by the service account.

Option 3: Short-lived access token

Gcloud CLI supports creating short-lived credentials for the end-user account or any service account.

This credential can be used to download the artifact using wget with an authorization header - no need to install gcloud CLI.

Here's a small script that asks for the auth token as input, parses various GCS bucket URL formats, and downloads the requested artifact directly using wget:

#!/bin/bash
# Download artifact from GCS bucket

set -e

echo -e "====> Run \`gcloud auth print-access-token\` on a system where you've setup gcloud to get access token\n"
read -r -p "Enter access token: " StorageAccessToken
read -r -p "Enter GCS artifact URL: " ArtifactURL

if [[ "${ArtifactURL:0:33}" == "https://console.cloud.google.com/" ]]; then
    BucketAndFile="${ArtifactURL#*https://console.cloud.google.com/storage/browser/_details/}"
elif [[ "${ArtifactURL:0:33}" == "https://storage.cloud.google.com/" ]]; then
    BucketAndFile="${ArtifactURL#*https://storage.cloud.google.com/}"
elif [[ "${ArtifactURL:0:5}" == "gs://" ]]; then
    BucketAndFile="${ArtifactURL#*gs://}"
else
    echo "Invalid GCS artifact URL"
    exit 1
fi

StorageBucket="${BucketAndFile%%/*}"
StorageFile="${BucketAndFile#*/}"
StorageFileEscaped=$(echo "${StorageFile}" | sed 's/\//%2F/g')
OutputFileName="${StorageFile##*/}"

echo -e "\n====> Downloading gs://${StorageBucket}/${StorageFile} to ${OutputFileName}\n"

wget -O "${OutputFileName}" --header="Authorization: Bearer ${StorageAccessToken}" \
    "https://storage.googleapis.com/storage/v1/b/${StorageBucket}/o/${StorageFileEscaped}?alt=media"

Option 4: Signed URLs

Google Cloud Storage also supports signed URLs - which give time-limited access to a specific Cloud Storage resource. Anyone possessing the signed URL can use it while it's active without any further credentials. This fits our use case brilliantly.

To do this, first we need to give ourselves the iam.serviceAccountTokenCreator role so that we can impersonate a service account.

$ gcloud iam service-accounts add-iam-policy-binding \
	$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \
    --member=$MY_EMAIL \
    --role=roles/iam.serviceAccountTokenCreator

Then, we can generate a signed URL:

$ gcloud config set auth/impersonate_service_account \
    $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com

$ gsutil signurl -u -r $REGION -d 10m gs://$BUCKET/$FILE

$ gcloud config unset auth/impersonate_service_account

And we can use wget to download the artifact from this URL without any further authentication.

Advanced PostgreSQL monitoring using Telegraf, InfluxDB, Grafana

Srijan Choudhary — Thu, 11 Mar 2021 15:30:00 +0000

Introduction

This post will go through my experience with setting up some advanced monitoring for PostgreSQL database using Telegraf, InfluxDB, and Grafana (also known as the TIG stack), the problems I faced, and what I ended up doing at the end.

What do I mean by advanced? I liked this Datadog article about some key metrics for PostgreSQL monitoring. Also, this PostgreSQL monitoring template for Zabbix has some good pointers. I didn’t need everything mentioned in these links, but they acted as a good reference. I also prioritized monitoring for issues which I’ve myself faced in the past.

Some key things that I planned to monitor:

Active (and idle) connections vs. max connections configured
Size of databases and tables
Read query throughput and performance (sequential vs. index scans, rows fetched vs. returned, temporary data written to disk)
Write query throughput and performance (rows inserted/updated/deleted, locks, deadlocks, dead rows)

There are a lot of resources online about setting up the data collection pipeline from Telegraf to InfluxDB, and creating dashboards on Grafana. So, I’m not going into too much detail on this part. This is what the pipeline looks like:

PostgreSQL to Telegraf to InfluxDB to Grafana. View Source

And here’s what my final Grafana dashboard looks like

Grafana dashboard sample for PostgreSQL monitoring

Research on existing solutions

I found several solutions and articles online about monitoring PostgreSQL using Telegraf:

1. Telegraf PostgreSQL input plugin

Telegraf has a PostgreSQL input plugin which provides some built-in metrics from the pg_stat_database and pg_stat_bgwriter views. But this plugin cannot be configured to run any custom SQL script to gather the data that we want. And the built-in metrics are a good starting point, but not enough. So, I rejected it.

2. Telegraf postgresql_extensible input plugin

Telegraf has another PostgreSQL input plugin called postgresql_extensible. At first glance, this looks promising: it can run any custom query, and multiple queries can be defined in its configuration file.

However, there is an open issue due to which this plugin does not run the specified query against all databases, but only against the database name specified in the connection string.

One way this can still work is to specify multiple input blocks in the Telegraf config file, one for each database.

[[inputs.postgresql_extensible]]
  address = "host=localhost user=postgres dbname=database1"
  [[inputs.postgresql_extensible.query]]
    script="db_stats.sql"

[[inputs.postgresql_extensible]]
  address = "host=localhost user=postgres dbname=database2"
  [[inputs.postgresql_extensible.query]]
    script="db_stats.sql"

But, configuring this does not scale, especially if the database names are dynamic or we don’t want to hardcode them in the config.

But I really liked the configuration method of this plugin, and I think this will work very well for my use case once the associated Telegraf issue gets resolved.

3. Using a monitoring package like pgwatch2

Another method I found was to use a package like pgwatch2. This is a self-contained solution for PostgreSQL monitoring and includes dashboards as well.

Its main components are

A metrics collector service. This can either be run centrally and “pull” metrics from one or more PostgreSQL instances, or alongside each PostgreSQL instance (like a sidecar) and “push” metrics to a metrics storage backend.
Metrics storage backend. pgwatch2 supports multiple metrics storage backends like bare PostgreSQL, TimescaleDB, InfluxDB, Prometheus, and Graphite.
Grafana dashboards
A configuration layer and associated UI to configure all of the above.

I really liked this tool as well, but felt like this might be too complex for my needs. For example, it monitors a lot more than what I want to monitor, and it has some complexity to handle multiple PostgreSQL versions and multiple deployment configurations.

But I will definitely keep this in mind for a more “batteries included” approach to PostgreSQL monitoring for future projects.

My solution: custom Telegraf plugin

Telegraf supports writing an external custom plugin, and running it via the execd plugin. The execd plugin runs an external program as a long-running daemon.

This approach enabled me to build the exact features I wanted, while also keeping things simple enough to someday revert to using the Telegraf built-in plugin for PostgreSQL.

The custom plugin code can be found at this Github repo. Note that I’ve also included the line_protocol.py file from influx python sdk so that I would not have to install the whole sdk just for line protocol encoding.

What this plugin (and included configuration) does:

Runs as a daemon using Telegraf execd plugin.
When Telegraf asks for data (by sending a newline on STDIN), it runs the queries defined in the plugin’s config file (against the configured databases), converts the results into Influx line format, and sends it to Telegraf.
Queries can be defined to run either on a single database, or on all databases that the configured pg user has access to.

This plugin solves the issue with Telegraf’s postgresql_extensible plugin for me—I don’t need to manually define the list of databases to be able to run queries against all of them.

This is what the custom plugin configuration looks like

[postgresql_custom]
address=""

[[postgresql_custom.query]]
sqlquery="select pg_database_size(current_database()) as size_b;"
per_db=true
measurement="pg_db_size"

[[postgresql_custom.query]]
script="queries/backends.sql"
per_db=true
measurement="pg_backends"

[[postgresql_custom.query]]
script="queries/db_stats.sql"
per_db=true
measurement="pg_db_stats"

[[postgresql_custom.query]]
script="queries/table_stats.sql"
per_db=true
tagvalue="table_name,schema"
measurement="pg_table_stats"

Any queries defined with per_db=true will be run against all databases. Queries can be specified either inline, or using a separate file.

The repository for this plugin has the exact queries configured above. It also has the Grafana dashboard JSON which can be imported to get the same dashboard as above.

Future optimizations

Monitoring related to replication is not added yet, but can be added easily
No need to use superuser account in PostgreSQL 10+
This does not support running different queries depending on version of the target PostgreSQL system.

Let me know in the comments below if you have any doubts or suggestions to make this better.

Running docker jobs inside Jenkins running on docker

Srijan Choudhary — Wed, 24 Feb 2021 10:30:00 +0000

Jenkins is a free and open source automation server, which is used to automate software building, testing, deployment, etc.

I wanted to have a quick and easy way to run Jenkins inside docker, but also use docker containers to run jobs on the dockerized Jenkins. Using docker for jobs makes it easy to encode job runtime dependencies in the source code repo itself.

The official document on running Jenkins in docker is pretty comprehensive. But, I wanted a version using docker-compose (on Linux).

So, I started with a basic compose file:

version: '3.7'
services:
  jenkins:
  	image: jenkins/jenkins:alpine
    ports:
      - 8081:8080
    container_name: jenkins
    volumes:
      - ./home:/var/jenkins_home

docker-compose.yml

When using this ( docker-compose up -d ), things came up properly, but Jenkins did not have access to the docker daemon running on the host. Also, the docker cli binary is not present inside the container.

The way to achieve this was to mount the docker socket and cli binary to inside the container so that it can be accessed. So, we come to the following compose file:

version: '3.7'
services:
  jenkins:
    image: jenkins/jenkins:alpine
    ports:
      - 8081:8080
    container_name: jenkins
    volumes:
      - ./home:/var/jenkins_home
      - /var/run/docker.sock:/var/run/docker.sock
      - /usr/bin/docker:/usr/local/bin/docker

docker-compose.yml

But, when trying to run docker ps inside the container with the above compose file, I was still getting the error: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock. This is because the Jenkins container is running with the jenkins user, which does not have access to use that socket.

From my research, the commonly recommended ways to solve this problem were:

Run the container as root user
chmod the socket file to 777
Install sudo inside the container and give the jenkins user access to sudo without needing to enter password.

A more secure way is to create the docker group inside the container, and add the jenkins user to that group. But, this requires us to build a custom image.

Also, the group id of the docker group inside and outside the container have to be the same, so I had to add an extra check which deletes any existing group inside the container which uses the same group id, then creates the new docker group with the passed group id, and then adds the jenkins user to the docker group.

So, the final Dockerfile is:

FROM jenkins/jenkins:alpine
ARG docker_group_id=999

USER root
RUN old_group=$(getent group $docker_group_id | cut -d: -f1) && \
    ([ -z "$old_group" ] || delgroup "$old_group") && \
    addgroup -g $docker_group_id docker && \
    addgroup jenkins docker

USER jenkins

Dockerfile

And the final docker-compose.yml file is:

version: '3.7'
services:
  jenkins:
    build:
      context: .
      args:
        docker_group_id: 999
    ports:
      - 8081:8080
    container_name: jenkins
    volumes:
      - ./home:/var/jenkins_home
      - /var/run/docker.sock:/var/run/docker.sock
      - /usr/bin/docker:/usr/local/bin/docker

docker-compose.yml

The docker_group_id argument can be edited in the compose file. Command to get the group id of docker:

$ getent group docker | cut -d: -f3

With the above, everything works:

$ docker-compose up -d
Creating network "jenkins_test_default" with the default driver
Building jenkins
Step 1/6 : FROM jenkins/jenkins:alpine
alpine: Pulling from jenkins/jenkins
801bfaa63ef2: Pull complete
2b72e22c6786: Pull complete
8d16efe80b55: Pull complete
682cd8857a9a: Pull complete
29c6010e8988: Pull complete
fa466f5d199d: Pull complete
e047245de0ff: Pull complete
0cfb53380af7: Pull complete
c29612b1a095: Pull complete
cd7d4bd47719: Pull complete
21cd3d960a1f: Pull complete
f3962370d584: Pull complete
bd6f35a1ea17: Pull complete
bd0c271b250f: Pull complete
Digest: sha256:1c3d9a1ed55911f9b165dd122118bff5da57520effb180d36b5c19d2a0cfe645
Status: Downloaded newer image for jenkins/jenkins:alpine
 ---> e14be04b79e8
Step 2/6 : ARG docker_group_id=999
 ---> Running in f1922fa97177
Removing intermediate container f1922fa97177
 ---> 79460069fb98
Step 3/6 : RUN echo "Assuming docker group id: $docker_group_id"
 ---> Running in 11809f4ae767
Assuming docker group id: 999
Removing intermediate container 11809f4ae767
 ---> e89b345f6c74
Step 4/6 : USER root
 ---> Running in b2e311372bc9
Removing intermediate container b2e311372bc9
 ---> 9d4d8c3ad5b2
Step 5/6 : RUN old_group=$(getent group $docker_group_id | cut -d: -f1) &&     ([ -z "$old_group" ] || delgroup "$old_group") &&     addgroup -g $docker_group_id docker &&     addgroup jenkins docker
 ---> Running in 357046a8ac49
Removing intermediate container 357046a8ac49
 ---> 865b942324eb
Step 6/6 : USER jenkins
 ---> Running in dbc2976f62c0
Removing intermediate container dbc2976f62c0
 ---> c7e6fac0187c

Successfully built c7e6fac0187c
Successfully tagged jenkins_test_jenkins:latest
WARNING: Image for service jenkins was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating jenkins ... done

$ docker-compose exec jenkins docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS          PORTS                               NAMES
6c05ee1315e4   jenkins_test_jenkins   "/sbin/tini -- /usr/…"   47 seconds ago   Up 47 seconds   50000/tcp, 0.0.0.0:8081->8080/tcp   jenkins

Next Steps

Here is an excellent guide on how to setup Jenkins configuration as code. This will make this setup even better because nothing will need to be configured inside Jenkins manually - it will all be driven by code / files.

Telegraf: dynamically adding custom tags

Srijan Choudhary — Wed, 14 Oct 2020 00:00:00 +0000

Background

For a recent project, I wanted to add a custom tag to data coming in from a built-in input plugin for telegraf.

The input plugin was the procstat plugin, and the custom data was information from pacemaker (a clustering solution for linux). I wanted to add a tag indicating if the current host was the "active" host in my active/passive setup.

For this, the best solution I came up with was to use a recently released execd processor plugin for telegraf.

How it works

The execd processor plugin runs an external program as a separate process and pipes metrics in to the process's STDIN and reads processed metrics from its STDOUT.

Telegraf plugins interaction. View Source

Telegraf's filtering parameters can be used to select or limit data from which input plugins will go to this processor.

The external program

The external program I wrote does the following:

Get pacemaker status and cache it for 10 seconds
Read a line from stdin
Append the required information as a tag in the data
Write it to stdout

The caching is just an optimization - it was more to do with decreasing polluting the logs than actual speed improvements.

Also, I've done the Influxdb lineprotocol parsing in my code directly (because my usecase is simple), but this can be substituted by an actual library meant for handling lineprotocol.

#!/usr/bin/python

from __future__ import print_function
from sys import stderr
import fileinput
import subprocess
import time

cache_value = None
cache_time = 0
resource_name = "VIP"

def get_crm_status():
    global cache_value, cache_time, resource_name
    ctime = time.time()
    if ctime - cache_time > 10:
        # print("Cache busted", file=stderr)
        try:
            crm_node = subprocess.check_output(["sudo", "/usr/sbin/crm_node", "-n"]).rstrip()
            crm_resource = subprocess.check_output(["sudo", "/usr/sbin/crm_resource", "-r", resource_name, "-W"]).rstrip()
            active_node = crm_resource.split(" ")[-1]
            if active_node == crm_node:
                cache_value = "active"
            else:
                cache_value = "inactive"
        except (OSError, IOError) as e:
            print("Exception: %s" % e, file=stderr)
            # Don't report active/inactive if crm commands are not found
            cache_value = None
        except Exception as e:
            print("Exception: %s" % e, file=stderr)
            # Report as inactive in other cases by default
            cache_value = "inactive"
        cache_time = ctime
    return cache_value

def lineprotocol_add_tag(line, key, value):
    first_comma = line.find(",")
    first_space = line.find(" ")
    if first_comma >= 0 and first_comma <= first_space:
        split_str = ","
    else:
        split_str = " "
    parts = line.split(split_str)
    first, rest = parts[0], parts[1:]
    first_new = first + "," + key + "=" + value
    return split_str.join([first_new] + rest)

for line in fileinput.input():
    line = line.rstrip()
    crm_status = get_crm_status()
    if crm_status:
        try:
            new_line = lineprotocol_add_tag(line, "crm_status", crm_status)
        except Exception as e:
            print("Exception: %s, Input: %s" % (e, line), file=stderr)
            new_line = line
    else:
        new_line = line

    print(new_line)

pacemaker_status.py

Telegraf configuration

Here's a sample telegraf configuration that routes data from "system" plugin to execd processor plugin, and finally outputs to influxdb.

[agent]
  interval = "30s"

[[inputs.cpu]]

[[inputs.system]]

[[processors.execd]]
  command = ["/usr/bin/python", "/etc/telegraf/scripts/pacemaker_status.py"]
  namepass = ["system"]

[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "telegraf"

telegraf.conf

Other types of dynamic tags

In this example, we wanted to get the value of the tag from an external program. If the tag can be calculated from the incoming data itself, then things are much simpler. There are a lot of processor plugins, and many things can be achieved using just those.

Install docker and docker-compose using Ansible

Srijan Choudhary — Thu, 11 Jun 2020 14:30:00 +0000

Updated for 2023: I've updated this post with the following changes:

1. Added a top-level sample playbook
2. Used ansible apt_module's cache_time parameter to prevent repeated apt-get updates
3. Install docker-compose-plugin using apt (provides docker compose v2)
4. Make installing docker compose v1 optional
5. Various fixes as suggested in comments
6. Tested against Debian 10,11,12 and Ubuntu 18.04 (bionic), 20.04 (focal), 22.04 (jammy) using Vagrant.

I've published a new post on how I've done this testing.

I wanted a simple, but optimal (and fast) way to install docker and docker-compose using Ansible. I found a few ways online, but I was not satisfied.

My requirements were:

Support Debian and Ubuntu
Install docker and docker compose v2 using apt repositories
Prevent unnecessary apt-get update if it has been run recently (to make it fast)
Optionally install docker compose v1 by downloading from github releases
- But, don’t download if current version >= the minimum version required

I feel trying to achieve these requirements gave me a very good idea of how powerful ansible can be.

The final role and vars files can be seen in this gist. But, I’ll go through each section below to explain what makes this better / faster.

File structure

playbook.yml
roles/
├── docker/
│    ├── defaults/
│    │   ├── main.yml
│    ├── tasks/
│    │   ├── main.yml
│    │   ├── docker_setup.yml

File structure

Playbook

This is the top-level playbook. Any default vars mentioned below can be overridden here.

---
- hosts: all
  vars:
    - docker_compose_install_v1: true
    - docker_compose_version_v1: "1.29.2"
  tasks:
    - name: Docker setup
      block:
        - import_role: name=docker

playbook.yml

Variables

First, we’ve defined some variables in defaults/main.yml. These will control which release channel of docker will be used and whether to install docker compose v1.

---
docker_apt_release_channel: stable
docker_apt_arch: amd64
docker_apt_repository: "deb [arch={{ docker_apt_arch }}] https://download.docker.com/linux/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} {{ docker_apt_release_channel }}"
docker_apt_gpg_key: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg
docker_compose_install_v1: false
docker_compose_version_v1: "1.29.2"

roles/docker/defaults/main.yml

Role main.yml

The tasks/main.yml file imports tasks from tasks/docker_setup.yml and turns on become for the whole task.

---
- import_tasks: docker_setup.yml
  become: true

roles/docker/tasks/main.yml

Docker Setup

This task is divided into the following sections:

Install dependencies

- name: Install packages using apt
  apt:
    name: 
        - apt-transport-https
        - ca-certificates
        - curl
        - gnupg2
        - software-properties-common
    state: present
    cache_valid_time: 86400

Here the state: present makes sure that these packages are only installed if not already installed. I've set cache_valid_time to 1 day so that apt-get update is not run if it has already run recently.

Add docker repository

- name: Add Docker GPG apt Key
  apt_key:
    url: "{{ docker_apt_gpg_key }}"
    state: present

- name: Add Docker Repository
  apt_repository:
    repo: "{{ docker_apt_repository }}"
    state: present
    update_cache: true

Here, the state: present and update_cache: true make sure that the cache is only updated if this state was changed. So, apt-get update is not run if the docker repo is already present.

Install and enable docker and docker compose v2

- name: Install docker-ce
  apt:
    name: docker-ce
    state: present
    cache_valid_time: 86400

- name: Run and enable docker
  service:
    name: docker
    state: started
    enabled: true

- name: Install docker compose
  apt:
    name: docker-compose-plugin
    state: present
    cache_valid_time: 86400

Again, due to state: present and cache_valid_time: 86400, there are no extra cache fetches if docker and docker-compose-plugin are already installed.

Docker Compose V1 Setup

WARNING: docker-compose v1 is end-of-life, please keep that in mind and only install/use it if absolutely required.

This task is wrapped in an ansible block that checks if docker_compose_install_v1 is true.

- name: Install docker-compose v1
  when:
    - docker_compose_install_v1 is defined
    - docker_compose_install_v1
  block:

Inside the block, there are two sections:

Check if docker-compose is installed and it’s version

- name: Check current docker-compose version
  command: docker-compose --version
  register: docker_compose_vsn
  changed_when: false
  failed_when: false
  check_mode: no

- set_fact:
    docker_compose_current_version: "{{ docker_compose_vsn.stdout | regex_search('(\\d+(\\.\\d+)+)') }}"
  when:
    - docker_compose_vsn.stdout is defined

The first block saves the output of docker-compose --version into a variable docker_compose_vsn. The failed_when: false ensures that this does not call a failure even if the command fails to execute. (See error handling in ansible).

Sample output when docker-compose is installed: docker-compose version 1.26.0, build d4451659

The second block parses this output and extracts the version number using a regex (see ansible filters). There is a when condition which causes the second block to skip execution if the first block failed (See playbook conditionals).

Install or upgrade docker-compose if required

- name: Install or upgrade docker-compose
  get_url: 
    url : "https://github.com/docker/compose/releases/download/{{ docker_compose_version }}/docker-compose-Linux-x86_64"
    dest: /usr/local/bin/docker-compose
    mode: 'a+x'
    force: yes
  when: >
    docker_compose_current_version == ""
    or docker_compose_current_version is version(docker_compose_version, '<')

This just downloads the required docker-compose binary and saves it to /usr/local/bin/docker-compose, but it has a conditional that this will only be done if either docker-compose is not already installed, or if the installed version is less than the required version. To do version comparison, it uses ansible’s built-in version comparison function.

So, we used a few ansible features to achieve what we wanted. I’m sure there are a lot of other things we can do to make this even better and more fool-proof. Maybe a post for another day.

Riemann and Zabbix: Sending data from riemann to zabbix

Srijan Choudhary — Fri, 08 Jun 2018 18:55:00 +0000

Background

At my work, we use Riemann and Zabbix as part of our monitoring stack.

Riemann is a stream processing engine (written in Clojure) which can be used to monitor distributed systems. Although it can be used for defining alerts and sending notifications for those alerts, we currently use it like this:

As a receiving point for metrics / data from a group of systems in an installation
Applying some filtering and aggregation at the installation level.
Sending the filtered / aggregated data to a central Zabbix system.

The actual alerting mechanism is handled by Zabbix. Things like trigger definitions, sending notifications, handling acks and escalations, etc.

This might seem like Riemann is redundant (and there is definitely some overlap in functionality), but keeping Riemann in the data pipeline allows us to be more flexible operationally. This is specially in cases when the metrics data we need is coming from application code, and we need to apply some transformations to the data but cannot update the code.

The Problem

The first problem we faced when trying to do this is: sending data from Riemann to Zabbix is not that straightforward.

Surprisingly, the Zabbix API is not actually meant for sending data points to Zabbix - only for managing it's configuration and accessing historical data.

Solutions

The recommended way to send data to Zabbix is to use a command line application called zabbix_sender.

Another way would be to write a custom zabbix client in Clojure which follows the Zabbix Agent protocol, which uses JSON over TCP sockets.

The current solution we have taken for this is using zabbix_sender itself.

For this, we write filtered values to a predefined text file from Riemann in a format that zabbix_sender can understand.

(def zabbix-logger
  (io (zabbix-logger-init
       "zabbix" "/var/log/riemann/to_zabbix.txt")))
       
(streams
  (where (tagged "zabbix")
    (smap
     (fn [event]
       {:zhost  (:host event)
        :zkey   (:service event)
        :zvalue (:value event)})
     zabbix-sender)))

(defn zabbix-sender
  "Sends events to zabbix via log file.
  Assumes that three keys are present in the incoming data:
    :zhost   -> hostname for sending to zabbix
    :zkey    -> item key for zabbix
    :zvalue  -> value to send for the item key
  Requires zabbix_sender service running and tailing the log file"
  [data]
  (io (zabbix-log-to-file
       zabbix-logger (str (:zhost data) " " (:zkey data) " " (:zvalue data)))))

;; Modified version of:
;; https://github.com/riemann/riemann/blob/68f126ff39819afc3296bb645243f888dab0943e/src/riemann/logging.clj
(defn zabbix-logger-init
  [log_key log_file]
  (let [logger (org.slf4j.LoggerFactory/getLogger log_key)]
    (.detachAndStopAllAppenders logger)
    (riemann.logging/configure-from-opts
     logger
     (org.slf4j.LoggerFactory/getILoggerFactory)
     {:file log_file})
    logger))

(defn zabbix-log-to-file
  [logger string]
  "Log to file using `logger`"
  (.info logger string))

The above code writes data into the file /var/log/riemann/to_zabbix.txt in the following format:

INFO [2018-06-09 05:02:03,600] defaultEventExecutorGroup-2-7 - zabbix - host123 api.req-rate 200

Then, the following script can be run to sending data from this file to Zabbix via zabbix_sender:

$ tail -F /var/log/riemann/to_zabbix.txt | grep --line-buffered -oP "(?<=zabbix - ).*" | zabbix_sender -z $ZABBIX_IP --real-time -i - -vv

Further Thoughts

There should probably be a check on Riemann whether data is correctly being delivered to Zabbix or not. If not, Riemann can send out alerts as well.
The current solution is a little fragile because it's first writing the data to a file and is dependent on an external service running to ship the data to Zabbix. A better solution would be to integrate directly as a Zabbix agent.

My backup strategy to USB disk using duply

Srijan Choudhary — Thu, 04 Aug 2016 17:55:00 +0000

I don't have a lot of data to backup - just my home folder (on my Archlinux laptop) which just has configuration for all the tools I'm using and my programming work.

For photos or videos taken from my phone, I use google photos for backup - which works pretty well. Even if I delete the original files from my phone, the photos app still keeps them online.

Coming back to my laptop, I'm currently using duplicity (with the duply wrapper) to backup to multiple destinations. Why multiple locations? I wanted one local copy so that I can restore fast, and at least one at a remote location so that I can still restore if the local disk fails.

For off-site, I'm using the fantastic rsync.net service. For local backups, I'm using two destinations: a USB HDD at my home, and a NFS server at my work. Depending on where I am, the backup will be done to the correct destination.

This post will deal with the backups to my local USB disk.

Here's what I've been able to achieve: the backups will run every hour as long as the USB disk is connected. If it is not connected, the backup script will not even be triggered. I did not want to see backup failures in my logs if the HDD is not connected.

I've done this using a systemd timer and service. I've defined these units in the user-level part for systemd so that root privileges are not required.

Mounting the USB Disk

To automatically mount the USB disk, I've added the following line to my /etc/fstab:

UUID=27DFA4B43C8C0635 /mnt/Ext01 ntfs-3g nosuid,nodev,nofail,auto,x-gvfs-show,permissions 0 0

Duply config for running the backup

Here's my duply config file (kept at ~/.duply/ext01/conf) (mostly self-explanatory):

TARGET='file:///mnt/Ext01/Backups/'
SOURCE='/home/srijan'
MAX_AGE=1Y
MAX_FULL_BACKUPS=15
MAX_FULLS_WITH_INCRS=2
MAX_FULLBKP_AGE=1M
DUPL_PARAMS="$DUPL_PARAMS --full-if-older-than $MAX_FULLBKP_AGE "
VOLSIZE=4
DUPL_PARAMS="$DUPL_PARAMS --volsize $VOLSIZE "
DUPL_PARAMS="$DUPL_PARAMS --exclude-other-filesystems "

This can be run manually using:

$ duply ext01 backup

Exclusions can be specified in the file ~/.config/ext01/exclude in a glob-like format.

Systemd Service for running the backup

Next, here's the service file (kept at ~/.config/systemd/user/duply_ext01.service):

[Unit]
Description=Run backup using duply: ext01 profile
Requires=mnt-Ext01.mount
After=mnt-Ext01.mount

[Service]
Type=oneshot
ExecStart=/usr/bin/duply ext01 backup

The Requires option says that this unit has a dependency on the mounting of Ext01. The After option specifies the order in which these two should be started (run this service after mounting).

After this step, the service can be run manually (via systemd) using:

$ systemctl --user start duply_ext01.service

Systemd timer for triggering the backup service

Next step is triggering it automatically every hour. Here's the timer file (kept at ~/.config/systemd/user/duply_ext01.timer):

[Unit]
Description=Run backup using duply ext01 profile every hour
BindsTo=mnt-Ext01.mount
After=mnt-Ext01.mount

[Timer]
OnCalendar=hourly
AccuracySec=10m
Persistent=true

[Install]
WantedBy=mnt-Ext01.mount

Here, the BindsTo option defines a dependency similar to the Requires option above, but also declares that this unit is stopped when the mount point goes away due to any reason. This is because I don't want the trigger to fire if the HDD is not connected.

The Persistent=true option ensures that when the timer is activated, the service unit is triggered immediately if it would have been triggered at least once during the time when the timer was inactive. This is because I want to catch up on missed runs of the service when the disk was disconnected.

After creating this file, I ran the following to actually link this timer to mount / unmount events for the Ext01 disk:

$ systemctl --user enable duply_ext01.timer

That's it. Now, whenever I connect the USB disk to my laptop, the timer is started. This timer triggers the backup service to run every hour. Also, it takes care that if some run was missed when the disk was disconnected, then it would be triggered as soon as the disk is connected without waiting for the next hour mark. Pretty cool!

NOTES:

Changing any systemd unit file requires a systemd --user daemon-reload before systemd can recognize the changes.
The systemd documentation was very helpful.

Coming Soon

Although it would be similar, but I'll also document how to do the above with NFS or SSHFS filesystems (instead of local disks). The major difference would be handling loss of internet connectivity, timeouts, etc.

PostgreSQL replication using Bucardo

Srijan Choudhary — Tue, 15 Sep 2015 18:05:00 +0000

There are many different ways to use replication in PostgreSQL, whether for high
availability (using a failover), or load balancing (for scaling), or just for
keeping a backup. Among the various tools I found online, I though bucardo is
the best for my use case - keeping a live backup of a few important tables.

I've assumed the following databases:

Primary: Hostname = host_a, Database = btest
Backup: Hostname = host_b, Database = btest

We will install bucardo in the primary database (it required it's own database
to keep track of things).

Install postgresql
```
 sudo apt-get install postgresql-9.4
```

Install dependencies on host_a

 sudo apt-get install libdbix-safe-perl libdbd-pg-perl libboolean-perl build-essential postgresql-plperl-9.4

On host_a, Download and extract bucardo source

 wget https://github.com/bucardo/bucardo/archive/5.4.0.tar.gz
 tar xvfz 5.4.0.tar.gz

On host_a, Build and Install

 perl Makefile.PL
 make
 sudo make install
 sudo mkdir /var/run/bucardo
 sudo mkdir /var/log/bucardo

Create bucardo user on all hosts

 CREATE USER bucardo SUPERUSER PASSWORD 'random_password';
 CREATE DATABASE bucardo;
 GRANT ALL ON DATABASE bucardo TO bucardo;

Note: All commands from now on are to be run on host_a only.

On host_a, set a password for the postgres user:

 ALTER USER postgres PASSWORD 'random_password';

On host_a, add this to the installation user's ~/.pgpass file:
```
 host_a:5432:*:postgres:random_password
 host_a:5432:*:bucardo:random_password
```
Also add entries for the other hosts for which users were created in step 5.

Note: It is also a good idea to chmod the ~/.pgpass file to 0600.
Run the bucardo install command:
```
 bucardo -h host_a install
```

Copy schema from A to B:

 psql -h host_b -U bucardo template1 -c "drop database if exists btest;"
 psql -h host_b -U bucardo template1 -c "create database btest;"
 pg_dump -U bucardo --schema-only -h host_a btest | psql -U bucardo -h host_b btest

Add databases to bucardo config

 bucardo -h host_a -U bucardo add db main db=btest user=bucardo pass=host_a_pass host=host_a
 bucardo -h host_a -U bucardo add db bak1 db=btest user=bucardo pass=host_b_pass host=host_b

This will save database details (host, port, user, password) to bucardo
database.

Add tables to be synced

To add all tables:

 bucardo -h host_a -U bucardo add all tables db=main relgroup=btest_relgroup

To add one table:

 bucardo -h host_a -U bucardo add table table_name db=main relgroup=btest_relgroup

Note: Only table which have a primary key can be added here. This is a
limitation of bucardo.

Add db group

 bucardo -h host_a -U bucardo add dbgroup btest_dbgroup main:source bak1:target

Create sync

 bucardo -h host_a -U bucardo add sync btest_sync dbgroup=btest_dbgroup relgroup=btest_relgroup conflict_strategy=bucardo_source onetimecopy=2 autokick=0

Start the bucardo service
```
 sudo bucardo -h host_a -U bucardo -P random_password start
```
Note that this command requires passing the password because it uses sudo,
and root user's .pgpass file does not have the credentials saved for bucardo
user.

Run sync once

 bucardo -h host_a -U bucardo kick btest_sync 0

Set auto-kick on any changes

 bucardo -h host_a -U bucardo update sync btest_sync autokick=1
 bucardo -h host_a -U bucardo reload config

That's it. Now, the tables specified in step 11 will be replicated from host_a
to host_b.

I also plan to write about other alternatives I've tried soon.

Django, uWSGI, Nginx on Freebsd

Srijan Choudhary — Thu, 05 Mar 2015 00:00:00 +0000

Here are the steps I took for configuring Django on Freebsd using uWSGI and Nginx.

The data flow is like this:

Web Request ---> Nginx ---> uWSGI ---> Django

I was undecided for a while on whether to choose uWSGI or gunicorn. There are some blog posts discussing the pros and cons of each. I chose uWSGI in the end.

Also, to start uWSGI in freebsd, I found two methods: using supervisord, or using a custom freebsd init script which could use uWSGI ini files. Currently using supervisord.

Install Packages Required

$ sudo pkg install python py27-virtualenv nginx uwsgi py27-supervisor

Also install any database package(s) required.

Setup your Django project

Choose a folder for setting up your Django project sources. /usr/local/www/myapp is suggested. Clone the sources to this folder, then setup the python virtual environment.

$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

If required, also setup the database and run the migrations.

Setup uWSGI using supervisord

Setup the supervisord file at /usr/local/etc/supervisord.conf.

Sample supervisord.conf:

[unix_http_server]
file=/var/run/supervisor/supervisor.sock   

[supervisord]
logfile=/var/log/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB       ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10          ; (num of main logfile rotation backups;default 10)
loglevel=info               ; (log level;default info; others: debug,warn,trace)
pidfile=/var/run/supervisor/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false              ; (start in foreground if true;default false)
minfds=1024                 ; (min. avail startup file descriptors;default 1024)
minprocs=200                ; (min. avail process descriptors;default 200)

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///var/run/supervisor/supervisor.sock
history_file=~/.sc_history  ; use readline history if available

[program:uwsgi_myapp]
directory=/usr/local/www/myapp/
command=/usr/local/bin/uwsgi -s /var/run/%(program_name)s%(process_num)d.sock
        --chmod-socket=666 --need-app --disable-logging --home=venv
        --wsgi-file wsgi.py --processes 1 --threads 10
stdout_logfile="syslog"
stderr_logfile="syslog"
startsecs=10
stopsignal=QUIT
stopasgroup=true
killasgroup=true
process_name=%(program_name)s%(process_num)d
numprocs=5

supervisord.conf

And start it:

$ echo supervisord_enable="YES" >> /etc/rc.conf
$ sudo service supervisord start
$ sudo supervisorctl tail -f uwsgi_myapp:uwsgi_myapp0

Setup Nginx

Use the following line in nginx.conf's http section to include all config files from conf.d folder.

include /usr/local/etc/nginx/conf.d/*.conf;

Create a myapp.conf in conf.d.

Sample myapp.conf:

upstream myapp {
    least_conn;
    server unix:///var/run/uwsgi_myapp0.sock;
    server unix:///var/run/uwsgi_myapp1.sock;
    server unix:///var/run/uwsgi_myapp2.sock;
    server unix:///var/run/uwsgi_myapp3.sock;
    server unix:///var/run/uwsgi_myapp4.sock;
}

server {
    listen       80;
    server_name  myapp.example.com;
 
    location /static {
        alias /usr/local/www/myapp/static;
    }

    location / {
        uwsgi_pass  myapp;
        include uwsgi_params;
    }
}

myapp.conf

And start Nginx:

$ echo nginx_enable="YES" >> /etc/rc.conf
$ sudo service nginx start
$ sudo tail -f /var/log/nginx-error.log

Accessing http://myapp.example.com/ should work correctly after this. If not, see the supervisord and Nginx logs opened and correct the errors.

Read only root on Linux

Srijan Choudhary — Sat, 28 Feb 2015 00:00:00 +0000

In many cases, it is required to run a system in such a way that it is tolerant of uncontrolled power losses, resets, etc. After such an event occurs, it should atleast be able to boot up and connect to the network so that some action can be taken remotely.

There are a few different ways in which this could be accomplished.

Mounting the root filesystem with read-only flags

Most parts of the linux root filesystem can be mounted read-only without much problems, but there are some parts which don't play well. This debian wiki page has some information about this approach. I thought this approach would not be very stable, so did not try it out completely.

Using aufs/overlayfs

aufs is a union file system for linux systems, which enables us to mount separate filesystems as layers to form a single merged filesystem. Using aufs, we can mount the root file system as read-only, create a writable tmpfs ramdisk, and combine these so that the system thinks that the root filesystem is writable, but changes are not actually saved, and don't survive a reboot.

I found this method to be most suitable and stable for my task, and have been using this for the last 6 months. This system mounts the real filesytem at mountpoint /ro with read-only flag, creates a writable ramdisk at mountpoint /rw, and makes a union filesystem using these two at mountpoint /.

The steps I followed for my implementation are detailed below. These are just a modified version of the steps in this ubuntu wiki page. I am using Debian in my implementation.

Install debian using live cd or your preferred method.
After first boot, upgrade and configure the system as needed.
Install aufs-tools.
Add aufs to initramfs and setup this script to start at init.

# echo aufs >> /etc/initramfs-tools/modules
# wget https://cdn.rawgit.com/srijan/383a8d7af6860de6f9de/raw/ -O /etc/initramfs-tools/scripts/init-bottom/__rootaufs
# chmod 0755 /etc/initramfs-tools/scripts/init-bottom/__rootaufs

Remake the initramfs.

# update-initramfs -u

Edit grub settings in /etc/default/grub and add aufs=tmpfs to GRUB_CMDLINE_LINUX_DEFAULT, and regenerate grub.

# update-grub

Reboot

Making changes

To change something trivial (like a file edit), just remount the /ro mountpoint as read-write, edit the file, and reboot.

# mount -o remount,rw /ro

To do something more complicated (like install os packages), press e in grub menu during bootup, remove aufs=tmpfs from the kernel line, and boot using F10. The system will boot up normally once.

Another method could be to use a configuration management tool (puppet, chef, ansible, etc.) to make the required changes whenever the system comes online. The changes would be lost on reboot, but it would become much easier to manage multiple such systems.

Also, if some part of the system is required to be writable (like /var/log), that directory could be mounted separately as a read-write mountpoint.