<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:media="http://search.yahoo.com/mrss/"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<channel>
  <title>Srijan Choudhary, all posts tagged: devops</title>
  <link>https://srijan.ch/feed/all/tag:devops</link>
  <lastBuildDate>Mon, 29 Apr 2024 03:10:00 +0000</lastBuildDate>
  <image>
    <url>https://srijan.ch/assets/favicon/favicon-32x32.png</url>
    <title>Srijan Choudhary, all posts tagged: devops</title>
    <link>https://srijan.ch/feed/all/tag:devops</link>
  </image>
  <sy:updatePeriod>daily</sy:updatePeriod>
  <sy:updateFrequency>1</sy:updateFrequency>
  <generator>Kirby</generator>
  <atom:link href="https://srijan.ch/feed/all.xml/tag:devops" rel="self" type="application/rss+xml" />
  <description>Srijan Choudhary&#039;s Articles and Notes Feed for tag: devops</description>
  <item>
    <title>2024-04-29-001</title>
    <description><![CDATA[Using sysrq on my laptop - documenting mostly for myself. My laptop has started freezing sometimes, not sure why. Usually, I can just force power off using the power button and start it again, but it has happened twice that I had to recover the system by booting via a USB drive, chrooting, and recovering the damaged files using fsck or pacman magic. The linux kernel has: a ‘magical’ key combo you …]]></description>
    <link>https://srijan.ch/notes/2024-04-29-001</link>
    <guid isPermaLink="false">tag:srijan.ch:/notes/2024-04-29-001</guid>
    <category><![CDATA[linux]]></category>
    <category><![CDATA[devops]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Mon, 29 Apr 2024 03:10:00 +0000</pubDate>
    <content:encoded><![CDATA[<p>Using sysrq on my laptop - documenting mostly for myself.</p>
<p>My laptop has started freezing sometimes, not sure why. Usually, I can just force power off using the power button and start it again, but it has happened twice that I had to recover the system by booting via a USB drive, chrooting, and recovering the damaged files using fsck or pacman magic.</p>
<p>The linux kernel has:</p>
<blockquote>
<p>a ‘magical’ key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up.</p>
</blockquote>
<p>(More details on <a href="https://wiki.archlinux.org/title/keyboard_shortcuts#Kernel_(SysRq)">archwiki</a> and <a href="https://docs.kernel.org/admin-guide/sysrq.html">kernel doc</a>)</p>
<p>To enable, I did:</p>
<pre><code>echo "kernel.sysrq = 244" | sudo tee /etc/sysctl.d/sysreq.conf
sudo sysctl --system</code></pre>
<p>However, to trigger this on my laptop, I was not able to find the right key combination for SysRq. I was able to make it work using an external keyboard that has a PrintScreen binding on a layer, by using the following:</p>
<p>Press Alt and keep it pressed for the whole sequence: PrintScreen - R - E - I - S - U - B</p>
<p>Currently, PrintScreen on my external keyboard is bound to Caps lock long press + Up arrow.</p>]]></content:encoded>
    <comments>https://srijan.ch/notes/2024-04-29-001#comments</comments>
    <slash:comments>3</slash:comments>
  </item><item>
    <title>Testing ansible playbooks against multiple targets using vagrant</title>
    <description><![CDATA[How to test your ansible playbooks against multiple target OSes and versions using Vagrant]]></description>
    <link>https://srijan.ch/testing-ansible-playbooks-using-vagrant</link>
    <guid isPermaLink="false">tag:srijan.ch:/testing-ansible-playbooks-using-vagrant</guid>
    <category><![CDATA[ansible]]></category>
    <category><![CDATA[vagrant]]></category>
    <category><![CDATA[devops]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Tue, 21 Nov 2023 06:55:00 +0000</pubDate>
    <media:content url="https://srijan.ch/media/pages/blog/testing-ansible-playbooks-using-vagrant/9f989c7a78-1700550017/kvistholt-photography-ozpwn40zck4-unsplash.jpg" medium="image" />
    <content:encoded><![CDATA[<figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/testing-ansible-playbooks-using-vagrant/9f989c7a78-1700550017/kvistholt-photography-ozpwn40zck4-unsplash.jpg" alt="">
  
  </figure>
<p>I recently updated my <a href="https://srijan.ch/install-docker-and-docker-compose-using-ansible">Install docker and docker-compose using ansible</a> post and wanted to test it against multiple target OSes and OS versions. Here's a way I found to do it easily using Vagrant.</p>
<p>Here's the Vagrantfile:</p>
<pre><code class="language-ruby"># -*- mode: ruby -*-
# vi: set ft=ruby :

targets = [
  "debian/bookworm64",
  "debian/bullseye64",
  "debian/buster64",
  "ubuntu/jammy64",
  "ubuntu/bionic64",
  "ubuntu/focal64"
]

Vagrant.configure("2") do |config|
  targets.each_with_index do |target, index|
    config.vm.define "machine#{index}" do |machine|
      machine.vm.hostname = "machine#{index}"
      machine.vm.box = target
      machine.vm.synced_folder ".", "/vagrant", disabled: true

      if index == targets.count - 1
        machine.vm.provision "ansible" do |ansible|
          ansible.playbook = "playbook.yml"
          ansible.limit = "all"
          ansible.compatibility_mode = "2.0"
          # ansible.verbose = "v"
        end
      end
    end
  end
end</code></pre>
<p>The <code>targets</code> variable defines what Vagrant boxes to target. The possible list of boxes can be found here: <a href="https://app.vagrantup.com/boxes/search">https://app.vagrantup.com/boxes/search</a></p>
<p>In the <code>Vagrant.configure</code> section, I've defined a machine with an auto-generated machine ID for each target.</p>
<p>The <code>machine.vm.synced_folder</code> line disables the default vagrant share to keep things fast.</p>
<p>Then, I've run the ansible provisioning once at the end instead of for each box separately (from: <a href="https://developer.hashicorp.com/vagrant/docs/provisioning/ansible#tips-and-tricks">https://developer.hashicorp.com/vagrant/docs/provisioning/ansible#tips-and-tricks</a>).</p>
<p>The test can be run using:</p>
<pre><code class="language-shell-session">$ vagrant up</code></pre>
<p>If the boxes are already up, to re-run provisioning, run:</p>
<pre><code class="language-shell-session">$ vagrant provision</code></pre>
<p>This code can also be found on GitHub: <a href="https://github.com/srijan/ansible-install-docker">https://github.com/srijan/ansible-install-docker</a></p>]]></content:encoded>
    <comments>https://srijan.ch/testing-ansible-playbooks-using-vagrant#comments</comments>
    <slash:comments>2</slash:comments>
  </item><item>
    <title>Exploring conflicting oneshot services in systemd</title>
    <description><![CDATA[Exploring ways to make two systemd services using a shared resource work with each other]]></description>
    <link>https://srijan.ch/exploring-conflicting-oneshot-services-in-systemd</link>
    <guid isPermaLink="false">64807b30f6b0810001fa0d01</guid>
    <category><![CDATA[linux]]></category>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[emacs]]></category>
    <category><![CDATA[systemd]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Thu, 08 Jun 2023 19:20:00 +0000</pubDate>
    <media:content url="https://srijan.ch/media/pages/blog/exploring-conflicting-oneshot-services-in-systemd/0c15993753-1699621096/systemd-conflicts-01.png" medium="image" />
    <content:encoded><![CDATA[<figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/exploring-conflicting-oneshot-services-in-systemd/0c15993753-1699621096/systemd-conflicts-01.png" alt="Exploring conflicting oneshot services in systemd">
  
    <figcaption class="text-center">
    Midjourney: two systemd services fighting over who will start first  </figcaption>
  </figure>
<h2>Background</h2>
<p>I use <a href="https://isync.sourceforge.io/mbsync.html" rel="noreferrer">mbsync</a> to sync my mailbox from my online provider (<a href="https://ref.fm/u12054901" rel="noreferrer">FastMail</a> - referer link) to my local system to eventually use with <a href="https://djcbsoftware.nl/code/mu/mu4e.html" rel="noreferrer">mu4e</a> (on Emacs).</p> <p>For periodic sync, I have a systemd service file called <code>mbsync.service</code> defining a oneshot service and a timer file called <code>mbsync.timer</code> that runs this service periodically. I can also activate the same service using a keybinding from inside mu4e.</p><figure>
  <pre><code class="language-ini">[Unit]
Description=Mailbox synchronization service
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync fastmail-all
ExecStartPost=bash -c &quot;emacsclient -s srijan -n -e &#039;(mu4e-update-index)&#039; || mu index&quot;

[Install]
WantedBy=default.target</code></pre>
    <figcaption class="text-center">mbsync.service</figcaption>
  </figure>
<figure>
  <pre><code class="language-ini">[Unit]
Description=Mailbox synchronization timer
BindsTo=graphical-session.target
After=graphical-session.target

[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service

[Install]
WantedBy=graphical-session.target</code></pre>
    <figcaption class="text-center">mbsync.timer</figcaption>
  </figure>
<p>Also, for instant download of new mail, I have another service called <a href="https://gitlab.com/shackra/goimapnotify" rel="noreferrer">goimapnotify</a> configured that listens for new/updated/deleted messages on the remote mailbox using IMAP IDLE, and calls the above <code>mbsync.service</code> when there are changes.</p><p>This has worked well for me for several years.</p><h2>The Problem</h2>
<p>I
 recently split my (huge) archive folder into yearly archives so that I 
can keep/sync only the recent years on my phone. [ Aside: <a href="https://fedi.srijan.dev/notice/AVGV5TuD1cOEWQ8iQa" rel="noreferrer">yearly refile in mu4e snippet</a>
 ]. This lead to an increase in the number of folders that mbsync has to
 sync, and this increased the time taken to sync because it syncs the 
folders one by one.</p> <p>It does have the feature to sync a subset of folders, so I created a second systemd service called <code>mbsync-quick.service</code>
 and only synced my Inbox from this service. Then I updated the 
goimapnotify config to trigger this quick service instead of the full 
service when it detects changes.</p> <p>But, this caused a problem: these
 two services can run at the same time, and hence can cause corruption 
or sync conflicts in the mail files. So, I wanted a way to make sure 
that these two services don't run at the same time.</p> <p>Ideally,
 whenever any of these services are triggered and the other service is 
already running, then it should wait for the other service to stop 
before starting, essentially forming a queue.</p><h2>Solution 1: Using systemd features</h2>
<p>Systemd has a <a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Conflicts=" rel="noreferrer">way to specify conflicts</a> in the unit section. From the docs:</p><blockquote>
  If a unit has a<code>Conflicts=</code>setting on another unit, starting the former will stop the latter and vice versa.<br>[...] to ensure that the conflicting unit is stopped before the other unit is started, an<code>After=</code>or<code>Before=</code>dependency must be declared.  </blockquote>
<p>This
 is different from our requirement that the conflicting service should 
be allowed to finish before the triggered service starts, but maybe a 
good enough way to at least prevent both running at the same time.</p> <p>To test this, I added <code>Conflicts=</code>
 in both the services with the other service as the conflicting service,
 and it works. The only problem is that when a service is triggered, the
 other service is <code>SIGTERM</code>ed. This itself might not cause a 
corruption issue, but if this happens with the mbsync-quick service, 
then there might be a delay getting the mail.</p> <p>This is the best way
 I found that uses built-in systemd features without any workarounds or 
hacks. Other solutions below involve some workarounds.</p><h2>Solution 2: Conflict + stop after sync complete</h2>
<p>This
 is a variation on solution 1 - add a wrapper script to trap the SIGTERM
 and only exit when the sync is complete. This also worked.</p> <p>But, 
the drawback with this method is that anyone calling stop on these 
services (like the system shutting down) will have to wait for this to 
finish (or till timeout of 90s). This can cause slowdowns in system 
shutdown that are hard to debug. So, I don't prefer this solution.</p><h2>Solution 3: Delay start until the other service is finished</h2>
<p>This is also a hacky solution - use <code>ExecStartPre</code> to check if the other service is running, and busywait for it to stop before starting ourselves.</p><figure>
  <pre><code class="language-ini">[Unit]
Description=Mailbox synchronization service (quick)
After=network-online.target

[Service]
Type=oneshot
ExecStartPre=/bin/sh -c &#039;while systemctl --user is-active mbsync.service | grep -q activating; do sleep 0.5; done&#039;
ExecStart=/usr/bin/mbsync fastmail-inbox
ExecStartPost=bash -c &quot;emacsclient -s srijan -n -e &#039;(mu4e-update-index)&#039; || mu index&quot;</code></pre>
    <figcaption class="text-center">mbsync-quick.service</figcaption>
  </figure>
<p>Here, we use <code>systemctl is-active</code> to query the status of the other service, and wait until the other service is not in <code>activating</code> state anymore. The state is called <code>activating</code> instead of <code>active</code> because these are oneshot services that go from <code>inactive</code> to <code>activating</code> to <code>inactive</code> without ever reaching <code>active</code>.</p><p>To not make this an actual busywait on the CPU, I added a sleep of 0.5s.</p><p>This worked the best for my use case. When one of the services is triggered, it checks if the other service is running and waits for it to stop before running itself. It also does not have the drawback of solution 2 of trapping exits and delaying a stop command.</p><p>But, after using it for a day, I found there is a race condition (!) that can cause a deadlock between these two services and none of them are able to start.</p><p>The reason for the race condition was:</p><ul><li>A service is marked as <code>activating</code> when it's <code>ExecStartPre</code> command starts</li><li>I added a sleep of 0.5 seconds</li></ul><p>So, if the other service is triggered again in between those 0.5 seconds, both services will be marked as <code>activating</code> and they will indefinitely wait for each other to complete. This is what I get for using workarounds.</p><h2>Solution 4: One-way conflict, other way delay</h2>
<p>So,
 the final good-enough solution I came up with was to break this cyclic 
dependency by doing a hybrid of Solution 1 and Solution 3. I was okay 
with the <code>mbsync.service</code> being stopped for the (higher priority) <code>mbsync-quick.service</code>.</p> <p>So, I added <code>mbsync.service</code> in Conflicts section of <code>mbsync-quick.service</code>, and used the <code>ExecStartPre</code> method in <code>mbsync.service</code>.</p> <p>💡Let me know if you know a better way to achieve this.</p><h2>References</h2>
<ul><li><a href="https://unix.stackexchange.com/questions/503719/how-to-set-a-conflict-in-systemd-in-one-direction-only" rel="noreferrer">https://unix.stackexchange.com/questions/503719/how-to-set-a-conflict-in-systemd-in-one-direction-only</a></li><li><a href="https://unix.stackexchange.com/questions/465794/is-it-possible-to-make-a-systemd-unit-wait-until-all-its-conflicts-are-stopped/562959" rel="noreferrer">https://unix.stackexchange.com/questions/465794/is-it-possible-to-make-a-systemd-unit-wait-until-all-its-conflicts-are-stopped/562959</a></li></ul>]]></content:encoded>
    <comments>https://srijan.ch/exploring-conflicting-oneshot-services-in-systemd#comments</comments>
    <slash:comments>0</slash:comments>
  </item><item>
    <title>Download a file securely from GCS on an untrusted system</title>
    <description><![CDATA[Download files from google cloud storage using temporary credentials or time-limited access URLs]]></description>
    <link>https://srijan.ch/secure-gcs-download</link>
    <guid isPermaLink="false">632920ea8948d20001269e4e</guid>
    <category><![CDATA[cloud]]></category>
    <category><![CDATA[security]]></category>
    <category><![CDATA[devops]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Sun, 27 Nov 2022 19:30:00 +0000</pubDate>
    <content:encoded><![CDATA[<h2>The Problem</h2>
<p>We publish some of our build artifacts to <a href="https://cloud.google.com/storage" rel="noreferrer">Google Cloud Storage</a>,
 and users need to download these to the target installation system. 
But, this target system is not always trusted and can have shared local 
users, so we don't want to store long-lived credentials.</p> <p>As a 
user, I can download the artifact on my (secure) laptop and transfer it 
to the target system. But, the artifact can be large (several GBs). So, 
downloading and uploading again makes it cumbersome and slow.</p><h2>Option 1: use <a href="https://cloud.google.com/sdk/docs/install" rel="noreferrer">gcloud CLI</a> on the target system</h2>
<p>Log in to the target system, install gcloud CLI, authenticate, and then download the file:</p><figure>
  <pre><code class="language-shellsession">$ gcloud storage cp gs://$BUCKET/$FILE ./</code></pre>
  </figure>
<p>This has two problems:</p><ol><li>The user must install (and maybe update) gcloud CLI on the target system.</li><li>The
 user needs to store their credentials on the target system. These 
credentials have full access to whatever resources the user has. So, 
it's a huge security risk, especially if we don't trust the target 
system.</li></ol><p>To mitigate (2), the user can log out of gcloud CLI after downloading. But, this is a manual step they might miss.</p><h2>Option 2: use gcloud CLI with a service account</h2>
<p>This
 is a variation of the above solution - we log in using a service 
account instead of the user account. This service account can have 
restricted access to only the resources needed.</p><figure>
  <pre><code class="language-shellsession">$ gcloud iam service-accounts create $SA_NAME \
    --description=&quot;Service Account for downloading artifacts&quot;
$ gsutil iam ch \
    serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com:roles/storage.objectViewer \
    gs://$BUCKET</code></pre>
  </figure>
<p>This partially mitigates problem (2) above. If 
the user forgets to log out of gcloud CLI, the damage will be restricted
 to the resources accessible by the service account.</p><h2>Option 3: Short-lived access token</h2>
<p>Gcloud CLI supports creating short-lived credentials for the end-user account or <a href="https://cloud.google.com/iam/docs/create-short-lived-credentials-direct" rel="noreferrer">any service account</a>.</p> <p>This credential can be used to download the artifact using wget with an authorization header - no need to install gcloud CLI.</p> <p>Here's
 a small script that asks for the auth token as input, parses various 
GCS bucket URL formats, and downloads the requested artifact directly 
using wget:</p><figure>
  <pre><code class="language-bash">#!/bin/bash
# Download artifact from GCS bucket

set -e

echo -e &quot;====&gt; Run \`gcloud auth print-access-token\` on a system where you&#039;ve setup gcloud to get access token\n&quot;
read -r -p &quot;Enter access token: &quot; StorageAccessToken
read -r -p &quot;Enter GCS artifact URL: &quot; ArtifactURL

if [[ &quot;${ArtifactURL:0:33}&quot; == &quot;https://console.cloud.google.com/&quot; ]]; then
    BucketAndFile=&quot;${ArtifactURL#*https://console.cloud.google.com/storage/browser/_details/}&quot;
elif [[ &quot;${ArtifactURL:0:33}&quot; == &quot;https://storage.cloud.google.com/&quot; ]]; then
    BucketAndFile=&quot;${ArtifactURL#*https://storage.cloud.google.com/}&quot;
elif [[ &quot;${ArtifactURL:0:5}&quot; == &quot;gs://&quot; ]]; then
    BucketAndFile=&quot;${ArtifactURL#*gs://}&quot;
else
    echo &quot;Invalid GCS artifact URL&quot;
    exit 1
fi

StorageBucket=&quot;${BucketAndFile%%/*}&quot;
StorageFile=&quot;${BucketAndFile#*/}&quot;
StorageFileEscaped=$(echo &quot;${StorageFile}&quot; | sed &#039;s/\//%2F/g&#039;)
OutputFileName=&quot;${StorageFile##*/}&quot;

echo -e &quot;\n====&gt; Downloading gs://${StorageBucket}/${StorageFile} to ${OutputFileName}\n&quot;

wget -O &quot;${OutputFileName}&quot; --header=&quot;Authorization: Bearer ${StorageAccessToken}&quot; \
    &quot;https://storage.googleapis.com/storage/v1/b/${StorageBucket}/o/${StorageFileEscaped}?alt=media&quot;</code></pre>
  </figure>
<h2>Option 4: Signed URLs</h2>
<p>Google Cloud Storage also supports <a href="https://cloud.google.com/storage/docs/access-control/signed-urls" rel="noreferrer">signed URLs</a>
 - which give time-limited access to a specific Cloud Storage resource. 
Anyone possessing the signed URL can use it while it's active without 
any further credentials. This fits our use case brilliantly.</p> <p>To do this, first we need to give ourselves the <code>iam.serviceAccountTokenCreator</code> role so that we can impersonate a service account.</p><figure>
  <pre><code class="language-shellsession">$ gcloud iam service-accounts add-iam-policy-binding \
	$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \
    --member=$MY_EMAIL \
    --role=roles/iam.serviceAccountTokenCreator</code></pre>
  </figure>
<p>Then, we can generate a signed URL:</p><figure>
  <pre><code class="language-shellsession">$ gcloud config set auth/impersonate_service_account \
    $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com

$ gsutil signurl -u -r $REGION -d 10m gs://$BUCKET/$FILE

$ gcloud config unset auth/impersonate_service_account</code></pre>
  </figure>
<p>And we can use wget to download the artifact from this URL without any further authentication.</p>]]></content:encoded>
    <comments>https://srijan.ch/secure-gcs-download#comments</comments>
    <slash:comments>0</slash:comments>
  </item><item>
    <title>Advanced PostgreSQL monitoring using Telegraf, InfluxDB, Grafana</title>
    <description><![CDATA[My experience with advanced monitoring for PostgreSQL database using Telegraf, InfluxDB, and Grafana, using a custom postgresql plugin for Telegraf.]]></description>
    <link>https://srijan.ch/advanced-postgresql-monitoring-using-telegraf</link>
    <guid isPermaLink="false">603cefe38527ef00014f776d</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[postgresql]]></category>
    <category><![CDATA[monitoring]]></category>
    <category><![CDATA[telegraf]]></category>
    <category><![CDATA[influxdb]]></category>
    <category><![CDATA[ansible]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Thu, 11 Mar 2021 15:30:00 +0000</pubDate>
    <media:content url="https://srijan.ch/media/pages/blog/advanced-postgresql-monitoring-using-telegraf/d28e269c6f-1699621096/grafana-postgresql-monitoring.png" medium="image" />
    <content:encoded><![CDATA[<figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/advanced-postgresql-monitoring-using-telegraf/54e29f97da-1699621096/photo-1564760055775-d63b17a55c44.jpeg" alt="Advanced PostgreSQL monitoring using Telegraf, InfluxDB, Grafana">
  
  </figure>
<h2>Introduction</h2>
<p>This post will go 
through my experience with setting up some advanced monitoring for 
PostgreSQL database using Telegraf, InfluxDB, and Grafana (also known as
 the TIG stack), the problems I faced, and what I ended up doing at the 
end.</p> <p>What do I mean by advanced? I liked <a href="https://www.datadoghq.com/blog/postgresql-monitoring/#key-metrics-for-postgresql-monitoring" rel="noreferrer">this Datadog article</a> about some key metrics for PostgreSQL monitoring. Also, this <a href="https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/postgresql" rel="noreferrer">PostgreSQL monitoring template for Zabbix</a>
 has some good pointers. I didn’t need everything mentioned in these 
links, but they acted as a good reference. I also prioritized monitoring
 for issues which I’ve myself faced in the past.</p> <p>Some key things that I planned to monitor:</p><ul><li>Active (and idle) connections vs. max connections configured</li><li>Size of databases and tables</li><li><a href="https://www.datadoghq.com/blog/postgresql-monitoring/#read-query-throughput-and-performance" rel="noreferrer">Read query throughput and performance</a> (sequential vs. index scans, rows fetched vs. returned, temporary data written to disk)</li><li><a href="https://www.datadoghq.com/blog/postgresql-monitoring/#write-query-throughput-and-performance" rel="noreferrer">Write query throughput and performance</a> (rows inserted/updated/deleted, locks, deadlocks, dead rows)</li></ul><p>There
 are a lot of resources online about setting up the data collection 
pipeline from Telegraf to InfluxDB, and creating dashboards on Grafana. 
So, I’m not going into too much detail on this part. This is what the 
pipeline looks like:</p><figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/advanced-postgresql-monitoring-using-telegraf/7c74d1bdcd-1699621096/pg_telegraf_influx_grafana.png" alt="PostgreSQL to Telegraf to InfluxDB to Grafana">
  
    <figcaption class="text-center">
    PostgreSQL to Telegraf to InfluxDB to Grafana. <a href="https://www.planttext.com/?text=TP9RRu8m5CVV-oawdp2PCfCzBTkY8d4cA0OmcqzD1nqsmPRqacc6ttr5A7Etyz2UzlpE_vnUnb9XeVI-05UKfONEY1O5t2bLoZlN5VXzc5ErqwzQ4f5ofWXJmvJltOYcM6HyHKb92jUx7QmBpDHc6RY250HBueu6DsOVUIO9KqR4iAoh19Djk4dGyo9vGe4_zrSpfm_0b6kMON5qkBo6lJ3kzU47WCRYerHaZ_o3SfJHpGL-Cq3IkXtsXJgKbLePPb7FS5tedB9U_oT53YJD3ENNCrmBdX8fkVYNvrerik7P-SrrJaGADBDTs3BmWco0DjBfMk84EhMBiwVbo32UbehlRRTjGYqNMRc6go2KAgCCmke22XeLsr9b45FT4k04WBbKmZ8eQBvJe7g0tyoiasD9O0Mg-tWR9_uIJUV82uCmUgp3q3vAUpTdq7z9_6Wr2T0V6UUaCBR7CRmfthG0ncOml-KJ" target="_blank" rel="noreferrer">View Source</a>  </figcaption>
  </figure>
<p>And here’s what my final Grafana dashboard looks like</p><figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/advanced-postgresql-monitoring-using-telegraf/d28e269c6f-1699621096/grafana-postgresql-monitoring.png" alt="Grafana dashboard sample for postgresql monitoring">
  
    <figcaption class="text-center">
    Grafana dashboard sample for PostgreSQL monitoring  </figcaption>
  </figure>
<h2>Research on existing solutions</h2>
<p>I found several solutions and articles online about monitoring PostgreSQL using Telegraf:</p><h3>1. Telegraf PostgreSQL input plugin</h3>
<p>Telegraf has a <a href="https://github.com/influxdata/telegraf/tree/master/plugins/inputs/postgresql" rel="noreferrer">PostgreSQL input plugin</a> which provides some built-in metrics from the <code>pg_stat_database</code> and <code>pg_stat_bgwriter</code>
 views. But this plugin cannot be configured to run any custom SQL 
script to gather the data that we want. And the built-in metrics are a 
good starting point, but not enough. So, I rejected it.</p><h3>2. Telegraf postgresql_extensible input plugin</h3>
<p>Telegraf has another PostgreSQL input plugin called <a href="https://github.com/influxdata/telegraf/tree/master/plugins/inputs/postgresql_extensible" rel="noreferrer">postgresql_extensible</a>.
 At first glance, this looks promising: it can run any custom query, and
 multiple queries can be defined in its configuration file.</p> <p>However, there is an <a href="https://github.com/influxdata/telegraf/issues/5009" rel="noreferrer">open issue</a>
 due to which this plugin does not run the specified query against all 
databases, but only against the database name specified in the 
connection string.</p> <p>One way this can still work is to specify multiple input blocks in the Telegraf config file, one for each database.</p><figure>
  <pre><code class="language-toml">[[inputs.postgresql_extensible]]
  address = &quot;host=localhost user=postgres dbname=database1&quot;
  [[inputs.postgresql_extensible.query]]
    script=&quot;db_stats.sql&quot;

[[inputs.postgresql_extensible]]
  address = &quot;host=localhost user=postgres dbname=database2&quot;
  [[inputs.postgresql_extensible.query]]
    script=&quot;db_stats.sql&quot;</code></pre>
  </figure>
<p>But, <strong>configuring this does not scale</strong>, especially if the database names are dynamic or we don’t want to hardcode them in the config.</p> <p>But I really liked the configuration method of this plugin, and I think this will work very well for my use case once the <a href="https://github.com/influxdata/telegraf/issues/5009" rel="noreferrer">associated Telegraf issue</a> gets resolved.</p><h3>3. Using a monitoring package like pgwatch2</h3>
<p>Another method I found was to use a package like <a href="https://github.com/cybertec-postgresql/pgwatch2" rel="noreferrer">pgwatch2</a>. This is a self-contained solution for PostgreSQL monitoring and includes dashboards as well.</p> <p>Its main components are</p><ol><li><u>A metrics collector service</u>.
 This can either be run centrally and “pull” metrics from one or more 
PostgreSQL instances, or alongside each PostgreSQL instance (like a 
sidecar) and “push” metrics to a metrics storage backend.</li><li><u>Metrics storage backend</u>. pgwatch2 supports multiple metrics storage backends like bare PostgreSQL, TimescaleDB, InfluxDB, Prometheus, and Graphite.</li><li><u>Grafana dashboards</u></li><li><u>A configuration layer</u> and associated UI to configure all of the above.</li></ol><p>I
 really liked this tool as well, but felt like this might be too complex
 for my needs. For example, it monitors a lot more than what I want to 
monitor, and it has some complexity to handle multiple PostgreSQL 
versions and multiple deployment configurations.</p> <p>But I will definitely keep this in mind for a more “batteries included” approach to PostgreSQL monitoring for future projects.</p><h2>My solution: custom Telegraf plugin</h2>
<p>Telegraf supports writing an external custom plugin, and running it via the <a href="https://github.com/influxdata/telegraf/tree/master/plugins/inputs/execd" rel="noreferrer">execd plugin</a>. The <code>execd</code> plugin runs an external program as a long-running daemon.</p> <p>This
 approach enabled me to build the exact features I wanted, while also 
keeping things simple enough to someday revert to using the Telegraf 
built-in plugin for PostgreSQL.</p> <p>The custom plugin code can be found at <a href="https://github.com/srijan/telegraf-execd-pg-custom" rel="noreferrer">this Github repo</a>. Note that I’ve also included the <code>line_protocol.py</code> file from influx python sdk so that I would not have to install the whole sdk just for line protocol encoding.</p> <p>What this plugin (and included configuration) does:</p><ol><li>Runs as a daemon using Telegraf execd plugin.</li><li>When
 Telegraf asks for data (by sending a newline on STDIN), it runs the 
queries defined in the plugin’s config file (against the configured 
databases), converts the results into Influx line format, and sends it 
to Telegraf.</li><li>Queries can be defined to run either on a single database, or on all databases that the configured pg user has access to.</li></ol><p>This
 plugin solves the issue with Telegraf’s postgresql_extensible plugin 
for me—I don’t need to manually define the list of databases to be able 
to run queries against all of them.</p> <p>This is what the custom plugin configuration looks like</p><figure>
  <pre><code class="language-toml">[postgresql_custom]
address=&quot;&quot;

[[postgresql_custom.query]]
sqlquery=&quot;select pg_database_size(current_database()) as size_b;&quot;
per_db=true
measurement=&quot;pg_db_size&quot;

[[postgresql_custom.query]]
script=&quot;queries/backends.sql&quot;
per_db=true
measurement=&quot;pg_backends&quot;

[[postgresql_custom.query]]
script=&quot;queries/db_stats.sql&quot;
per_db=true
measurement=&quot;pg_db_stats&quot;

[[postgresql_custom.query]]
script=&quot;queries/table_stats.sql&quot;
per_db=true
tagvalue=&quot;table_name,schema&quot;
measurement=&quot;pg_table_stats&quot;</code></pre>
  </figure>
<p>Any queries defined with <code>per_db=true</code> will be run against all databases. Queries can be specified either inline, or using a separate file.</p> <p>The <a href="https://github.com/srijan/telegraf-execd-pg-custom" rel="noreferrer">repository for this plugin</a>
 has the exact queries configured above. It also has the Grafana 
dashboard JSON which can be imported to get the same dashboard as above.</p><h2>Future optimizations</h2>
<ul><li>Monitoring related to replication is not added yet, but can be added easily</li><li>No need to use superuser account in PostgreSQL 10+</li><li>This does not support running different queries depending on version of the target PostgreSQL system.</li></ul><hr />
<p>Let me know in the comments below if you have any doubts or suggestions to make this better.</p>]]></content:encoded>
    <comments>https://srijan.ch/advanced-postgresql-monitoring-using-telegraf#comments</comments>
    <slash:comments>3</slash:comments>
  </item><item>
    <title>Running docker jobs inside Jenkins running on docker</title>
    <description><![CDATA[Run Jenkins inside docker, but also use docker containers to run jobs on that Jenkins]]></description>
    <link>https://srijan.ch/docker-jobs-inside-jenkins-on-docker</link>
    <guid isPermaLink="false">60362aece749840001df438e</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[jenkins]]></category>
    <category><![CDATA[docker]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Wed, 24 Feb 2021 10:30:00 +0000</pubDate>
    <media:content url="https://srijan.ch/media/pages/blog/docker-jobs-inside-jenkins-on-docker/ebd7e48a64-1699621096/photo-1595546440771-84f0b521a533.jpeg" medium="image" />
    <content:encoded><![CDATA[<figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/docker-jobs-inside-jenkins-on-docker/ebd7e48a64-1699621096/photo-1595546440771-84f0b521a533.jpeg" alt="Running docker jobs inside Jenkins running on docker">
  
  </figure>
<p><a href="https://www.jenkins.io/" rel="noreferrer">Jenkins</a> is a free and open source automation server, which is used to automate software building, testing, deployment, etc.</p> <p>I
 wanted to have a quick and easy way to run Jenkins inside docker, but 
also use docker containers to run jobs on the dockerized Jenkins. Using 
docker for jobs makes it easy to encode job runtime dependencies in the 
source code repo itself.</p> <p>The official document on <a href="https://www.jenkins.io/doc/book/installing/docker/" rel="noreferrer">running Jenkins in docker</a> is pretty comprehensive. But, I wanted a version using docker-compose (on Linux).</p> <p>So, I started with a basic compose file:</p><figure>
  <pre><code class="language-yaml">version: &#039;3.7&#039;
services:
  jenkins:
  	image: jenkins/jenkins:alpine
    ports:
      - 8081:8080
    container_name: jenkins
    volumes:
      - ./home:/var/jenkins_home</code></pre>
    <figcaption class="text-center">docker-compose.yml</figcaption>
  </figure>
<p>When using this ( <code>docker-compose up -d</code> ), things came up properly, but Jenkins did not have access to the docker daemon running on the host. Also, the docker cli binary is not present inside the container.</p><p>The way to achieve this was to mount the docker socket and cli binary to inside the container so that it can be accessed. So, we come to the following compose file:</p><figure>
  <pre><code class="language-yaml">version: &#039;3.7&#039;
services:
  jenkins:
    image: jenkins/jenkins:alpine
    ports:
      - 8081:8080
    container_name: jenkins
    volumes:
      - ./home:/var/jenkins_home
      - /var/run/docker.sock:/var/run/docker.sock
      - /usr/bin/docker:/usr/local/bin/docker</code></pre>
    <figcaption class="text-center">docker-compose.yml</figcaption>
  </figure>
<p>But, when trying to run <code>docker ps</code> inside the container with the above compose file, I was still getting the error: <code>Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock</code>. This is because the Jenkins container is running with the <code>jenkins</code> user, which does not have access to use that socket.</p><p>From my research, the commonly recommended ways to solve this problem were:</p><ul><li>Run the container as root user</li><li><code>chmod</code> the socket file to <code>777</code></li><li>Install <code>sudo</code> inside the container and give the <code>jenkins</code> user access to sudo without needing to enter password.</li></ul><p>A more secure way is to create the <code>docker</code> group inside the container, and add the <code>jenkins</code> user to that group. But, this requires us to build a custom image.</p> <p>Also, the group id of the <code>docker</code>
 group inside and outside the container have to be the same, so I had to
 add an extra check which deletes any existing group inside the 
container which uses the same group id, then creates the new <code>docker</code> group with the passed group id, and then adds the <code>jenkins</code> user to the <code>docker</code> group.</p> <p>So, the final <code>Dockerfile</code> is:</p><figure>
  <pre><code class="language-yaml">FROM jenkins/jenkins:alpine
ARG docker_group_id=999

USER root
RUN old_group=$(getent group $docker_group_id | cut -d: -f1) &amp;&amp; \
    ([ -z &quot;$old_group&quot; ] || delgroup &quot;$old_group&quot;) &amp;&amp; \
    addgroup -g $docker_group_id docker &amp;&amp; \
    addgroup jenkins docker

USER jenkins</code></pre>
    <figcaption class="text-center">Dockerfile</figcaption>
  </figure>
<p>And the final <code>docker-compose.yml</code> file is:</p><figure>
  <pre><code class="language-yaml">version: &#039;3.7&#039;
services:
  jenkins:
    build:
      context: .
      args:
        docker_group_id: 999
    ports:
      - 8081:8080
    container_name: jenkins
    volumes:
      - ./home:/var/jenkins_home
      - /var/run/docker.sock:/var/run/docker.sock
      - /usr/bin/docker:/usr/local/bin/docker</code></pre>
    <figcaption class="text-center">docker-compose.yml</figcaption>
  </figure>
<p>The <code>docker_group_id</code> argument can be edited in the compose file. Command to get the group id of docker:</p><figure>
  <pre><code class="language-shellsession">$ getent group docker | cut -d: -f3</code></pre>
  </figure>
<p>With the above, everything works:</p><figure>
  <pre><code class="language-shellsession">$ docker-compose up -d
Creating network &quot;jenkins_test_default&quot; with the default driver
Building jenkins
Step 1/6 : FROM jenkins/jenkins:alpine
alpine: Pulling from jenkins/jenkins
801bfaa63ef2: Pull complete
2b72e22c6786: Pull complete
8d16efe80b55: Pull complete
682cd8857a9a: Pull complete
29c6010e8988: Pull complete
fa466f5d199d: Pull complete
e047245de0ff: Pull complete
0cfb53380af7: Pull complete
c29612b1a095: Pull complete
cd7d4bd47719: Pull complete
21cd3d960a1f: Pull complete
f3962370d584: Pull complete
bd6f35a1ea17: Pull complete
bd0c271b250f: Pull complete
Digest: sha256:1c3d9a1ed55911f9b165dd122118bff5da57520effb180d36b5c19d2a0cfe645
Status: Downloaded newer image for jenkins/jenkins:alpine
 ---&gt; e14be04b79e8
Step 2/6 : ARG docker_group_id=999
 ---&gt; Running in f1922fa97177
Removing intermediate container f1922fa97177
 ---&gt; 79460069fb98
Step 3/6 : RUN echo &quot;Assuming docker group id: $docker_group_id&quot;
 ---&gt; Running in 11809f4ae767
Assuming docker group id: 999
Removing intermediate container 11809f4ae767
 ---&gt; e89b345f6c74
Step 4/6 : USER root
 ---&gt; Running in b2e311372bc9
Removing intermediate container b2e311372bc9
 ---&gt; 9d4d8c3ad5b2
Step 5/6 : RUN old_group=$(getent group $docker_group_id | cut -d: -f1) &amp;&amp;     ([ -z &quot;$old_group&quot; ] || delgroup &quot;$old_group&quot;) &amp;&amp;     addgroup -g $docker_group_id docker &amp;&amp;     addgroup jenkins docker
 ---&gt; Running in 357046a8ac49
Removing intermediate container 357046a8ac49
 ---&gt; 865b942324eb
Step 6/6 : USER jenkins
 ---&gt; Running in dbc2976f62c0
Removing intermediate container dbc2976f62c0
 ---&gt; c7e6fac0187c

Successfully built c7e6fac0187c
Successfully tagged jenkins_test_jenkins:latest
WARNING: Image for service jenkins was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating jenkins ... done

$ docker-compose exec jenkins docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS          PORTS                               NAMES
6c05ee1315e4   jenkins_test_jenkins   &quot;/sbin/tini -- /usr/&hellip;&quot;   47 seconds ago   Up 47 seconds   50000/tcp, 0.0.0.0:8081-&gt;8080/tcp   jenkins</code></pre>
  </figure>
<h2>Next Steps</h2>
<p><a href="https://www.digitalocean.com/community/tutorials/how-to-automate-jenkins-setup-with-docker-and-jenkins-configuration-as-code" rel="noreferrer">Here is an excellent guide</a>
 on how to setup Jenkins configuration as code. This will make this 
setup even better because nothing will need to be configured inside 
Jenkins manually - it will all be driven by code / files.</p>]]></content:encoded>
    <comments>https://srijan.ch/docker-jobs-inside-jenkins-on-docker#comments</comments>
    <slash:comments>0</slash:comments>
  </item><item>
    <title>Telegraf: dynamically adding custom tags</title>
    <description><![CDATA[Adding a custom tag to data coming in from an input plugin for telegraf]]></description>
    <link>https://srijan.ch/telegraf-dynamic-tags</link>
    <guid isPermaLink="false">6030d3dab5e0920001f557d7</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[telegraf]]></category>
    <category><![CDATA[influxdb]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Wed, 14 Oct 2020 00:00:00 +0000</pubDate>
    <media:content url="https://srijan.ch/media/pages/blog/telegraf-dynamic-tags/4aa8784b8f-1699621096/telegraf-plugin-interactions.png" medium="image" />
    <content:encoded><![CDATA[<h3>Background</h3>
<p>For a recent project, I wanted to add a custom tag to data coming in from a built-in input plugin for <a href="https://www.influxdata.com/time-series-platform/telegraf/" rel="noreferrer">telegraf</a>.</p> <p>The input plugin was the <a href="https://github.com/influxdata/telegraf/tree/master/plugins/inputs/procstat" rel="noreferrer">procstat plugin</a>, and the custom data was information from <a href="https://clusterlabs.org/pacemaker/doc/" rel="noreferrer">pacemaker</a>
 (a clustering solution for linux). I wanted to add a tag indicating if 
the current host was the "active" host in my active/passive setup.</p> <p>For this, the best solution I came up with was to use a <a href="https://www.influxdata.com/blog/telegraf-1-15-starlark-nginx-go-redfish-new-relic-mongodb/" rel="noreferrer">recently released</a> <a href="https://github.com/influxdata/telegraf/tree/master/plugins/processors/execd" rel="noreferrer">execd processor</a> plugin for telegraf.</p><h3>How it works</h3>
<p>The execd processor plugin runs an external program as a separate 
process and pipes metrics in to the process's STDIN and reads processed 
metrics from its STDOUT.</p><figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/telegraf-dynamic-tags/4aa8784b8f-1699621096/telegraf-plugin-interactions.png" alt="Telegraf plugins interaction diagram">
  
    <figcaption class="text-center">
    Telegraf plugins interaction. <a href="https://www.planttext.com/?text=TP9RRu8m5CVV-oawdp2PCfCzBTkY8d4cA0OmcqzD1nqsmPRqacc6ttr5A7Etyz2UzlpE_vnUnb9XeVI-05UKfONEY1O5t2bLoZlN5VXzc5ErqwzQ4f5ofWXJmvJltOYcM6HyHKb92jUx7QmBpDHc6RY250HBueu6DsOVUIO9KqR4iAoh19Djk4dGyo9vGe4_zrSpfm_0b6kMON5qkBo6lJ3kzU47WCRYerHaZ_o3SfJHpGL-Cq3IkXtsXJgKbLePPb7FS5tedB9U_oT53YJD3ENNCrmBdX8fkVYNvrerik7P-SrrJaGADBDTs3BmWco0DjBfMk84EhMBiwVbo32UbehlRRTjGYqNMRc6go2KAgCCmke22XeLsr9b45FT4k04WBbKmZ8eQBvJe7g0tyoiasD9O0Mg-tWR9_uIJUV82uCmUgp3q3vAUpTdq7z9_6Wr2T0V6UUaCBR7CRmfthG0ncOml-KJ" target="_blank" rel="noreferrer">View Source</a>  </figcaption>
  </figure>
<p>Telegraf's <a href="https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md#metric-filtering" rel="noreferrer">filtering parameters</a> can be used to select or limit data from which input plugins will go to this processor.</p><h3>The external program</h3>
<p>The external program I wrote does the following:</p><ol><li>Get pacemaker status and cache it for 10 seconds</li><li>Read a line from stdin</li><li>Append the required information as a tag in the data</li><li>Write it to stdout</li></ol><p>The caching is just an optimization - it was more to do with decreasing polluting the logs than actual speed improvements.</p> <p>Also, I've done the Influxdb lineprotocol parsing in my code directly
 (because my usecase is simple), but this can be substituted by an 
actual library meant for handling lineprotocol.</p><figure>
  <pre><code class="language-python">#!/usr/bin/python

from __future__ import print_function
from sys import stderr
import fileinput
import subprocess
import time

cache_value = None
cache_time = 0
resource_name = &quot;VIP&quot;

def get_crm_status():
    global cache_value, cache_time, resource_name
    ctime = time.time()
    if ctime - cache_time &gt; 10:
        # print(&quot;Cache busted&quot;, file=stderr)
        try:
            crm_node = subprocess.check_output([&quot;sudo&quot;, &quot;/usr/sbin/crm_node&quot;, &quot;-n&quot;]).rstrip()
            crm_resource = subprocess.check_output([&quot;sudo&quot;, &quot;/usr/sbin/crm_resource&quot;, &quot;-r&quot;, resource_name, &quot;-W&quot;]).rstrip()
            active_node = crm_resource.split(&quot; &quot;)[-1]
            if active_node == crm_node:
                cache_value = &quot;active&quot;
            else:
                cache_value = &quot;inactive&quot;
        except (OSError, IOError) as e:
            print(&quot;Exception: %s&quot; % e, file=stderr)
            # Don&#039;t report active/inactive if crm commands are not found
            cache_value = None
        except Exception as e:
            print(&quot;Exception: %s&quot; % e, file=stderr)
            # Report as inactive in other cases by default
            cache_value = &quot;inactive&quot;
        cache_time = ctime
    return cache_value

def lineprotocol_add_tag(line, key, value):
    first_comma = line.find(&quot;,&quot;)
    first_space = line.find(&quot; &quot;)
    if first_comma &gt;= 0 and first_comma &lt;= first_space:
        split_str = &quot;,&quot;
    else:
        split_str = &quot; &quot;
    parts = line.split(split_str)
    first, rest = parts[0], parts[1:]
    first_new = first + &quot;,&quot; + key + &quot;=&quot; + value
    return split_str.join([first_new] + rest)

for line in fileinput.input():
    line = line.rstrip()
    crm_status = get_crm_status()
    if crm_status:
        try:
            new_line = lineprotocol_add_tag(line, &quot;crm_status&quot;, crm_status)
        except Exception as e:
            print(&quot;Exception: %s, Input: %s&quot; % (e, line), file=stderr)
            new_line = line
    else:
        new_line = line

    print(new_line)</code></pre>
    <figcaption class="text-center">pacemaker_status.py</figcaption>
  </figure>
<h3>Telegraf configuration</h3>
<p>Here's
 a sample telegraf configuration that routes data from "system" plugin 
to execd processor plugin, and finally outputs to influxdb.</p><figure>
  <pre><code class="language-toml">[agent]
  interval = &quot;30s&quot;

[[inputs.cpu]]

[[inputs.system]]

[[processors.execd]]
  command = [&quot;/usr/bin/python&quot;, &quot;/etc/telegraf/scripts/pacemaker_status.py&quot;]
  namepass = [&quot;system&quot;]

[[outputs.influxdb]]
  urls = [&quot;http://127.0.0.1:8086&quot;]
  database = &quot;telegraf&quot;</code></pre>
    <figcaption class="text-center">telegraf.conf</figcaption>
  </figure>
<h3>Other types of dynamic tags</h3>
<p>In this example, we wanted to get the value of the tag from an 
external program. If the tag can be calculated from the incoming data 
itself, then things are much simpler. There are <a href="https://github.com/influxdata/telegraf/tree/release-1.15/plugins/processors" rel="noreferrer">a lot of processor plugins</a>, and many things can be achieved using just those.</p>]]></content:encoded>
    <comments>https://srijan.ch/telegraf-dynamic-tags#comments</comments>
    <slash:comments>0</slash:comments>
  </item><item>
    <title>Install docker and docker-compose using Ansible</title>
    <description><![CDATA[Optimized way to install docker and docker-compose using Ansible]]></description>
    <link>https://srijan.ch/install-docker-and-docker-compose-using-ansible</link>
    <guid isPermaLink="false">6030d3dab5e0920001f557cd</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[docker]]></category>
    <category><![CDATA[ansible]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Thu, 11 Jun 2020 14:30:00 +0000</pubDate>
    <media:content url="https://srijan.ch/media/pages/blog/install-docker-and-docker-compose-using-ansible/b62b609bf9-1699621096/photo-1584444707186-b7831c11014f.jpg" medium="image" />
    <content:encoded><![CDATA[<figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/install-docker-and-docker-compose-using-ansible/b62b609bf9-1699621096/photo-1584444707186-b7831c11014f.jpg" alt="">
  
  </figure>
<p>Updated for 2023: I've updated this post with the following changes:</p><p>1. Added a top-level sample playbook<br>2. Used ansible apt_module's <a href="https://docs.ansible.com/ansible/latest/collections/ansible/builtin/apt_module.html#parameter-cache_valid_time" title="cache_time" rel="noreferrer">cache_time</a> parameter to prevent repeated apt-get updates<br>3. Install <code>docker-compose-plugin</code> using apt (provides docker compose v2)<br>4. Make installing docker compose v1 optional<br>5. Various fixes as suggested in comments<br>6. Tested against Debian 10,11,12 and Ubuntu 18.04 (bionic), 20.04 (focal), 22.04 (jammy) using Vagrant.</p><p>I've published a <a href="https://srijan.ch/testing-ansible-playbooks-using-vagrant" rel="noreferrer">new post on how I've done this testing</a>.</p><hr />
<p>I wanted a simple, but optimal (and fast) way to install 
docker and docker-compose using Ansible. I found a few ways online, but I
 was not satisfied.</p> <p>My requirements were:</p><ul><li>Support Debian and Ubuntu</li><li>Install docker and docker compose v2 using apt repositories</li><li>Prevent unnecessary <code>apt-get update</code> if it has been run recently (to make it fast)</li><li>Optionally install docker compose v1 by downloading from github releases<ul><li>But, don’t download if current version &gt;= the minimum version required</li></ul></li></ul><p>I feel trying to achieve these requirements gave me a very good idea of how powerful ansible can be.</p> <p>The final role and vars files can be seen in <a href="https://gist.github.com/srijan/2028af568459195cb9a3dae8d111e754">this gist</a>. But, I’ll go through each section below to explain what makes this better / faster.</p><h2>File structure</h2>
<figure>
  <pre><code class="language-treeview">playbook.yml
roles/
├── docker/
│    ├── defaults/
│    │   ├── main.yml
│    ├── tasks/
│    │   ├── main.yml
│    │   ├── docker_setup.yml</code></pre>
    <figcaption class="text-center">File structure</figcaption>
  </figure>
<h2>Playbook</h2>
<p>This is the top-level playbook. Any default vars mentioned below can be overridden here.</p><figure>
  <pre><code class="language-yaml">---
- hosts: all
  vars:
    - docker_compose_install_v1: true
    - docker_compose_version_v1: &quot;1.29.2&quot;
  tasks:
    - name: Docker setup
      block:
        - import_role: name=docker</code></pre>
    <figcaption class="text-center">playbook.yml</figcaption>
  </figure>
<h2>Variables</h2>
<p>First, we’ve defined some variables in <code>defaults/main.yml</code>. These will control which release channel of docker will be used and whether to install docker compose v1.</p><figure>
  <pre><code class="language-yaml">---
docker_apt_release_channel: stable
docker_apt_arch: amd64
docker_apt_repository: &quot;deb [arch={{ docker_apt_arch }}] https://download.docker.com/linux/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} {{ docker_apt_release_channel }}&quot;
docker_apt_gpg_key: https://download.docker.com/linux/{{ ansible_distribution | lower }}/gpg
docker_compose_install_v1: false
docker_compose_version_v1: &quot;1.29.2&quot;</code></pre>
    <figcaption class="text-center">roles/docker/defaults/main.yml</figcaption>
  </figure>
<h2>Role main.yml</h2>
<p>The <code>tasks/main.yml</code> file imports tasks from <code>tasks/docker_setup.yml</code> and turns on <a href="https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_privilege_escalation.html#using-become" rel="noreferrer">become</a> for the whole task.</p><figure>
  <pre><code class="language-yaml">---
- import_tasks: docker_setup.yml
  become: true</code></pre>
    <figcaption class="text-center">roles/docker/tasks/main.yml</figcaption>
  </figure>
<h2>Docker Setup</h2>
<p>This task is divided into the following sections:</p><h3>Install dependencies</h3>
<figure>
  <pre><code class="language-yaml">- name: Install packages using apt
  apt:
    name: 
        - apt-transport-https
        - ca-certificates
        - curl
        - gnupg2
        - software-properties-common
    state: present
    cache_valid_time: 86400</code></pre>
  </figure>
<p>Here the <code>state: present</code> makes sure that these packages are only installed if not already installed. I've set <code>cache_valid_time</code> to 1 day so that <code>apt-get update</code> is not run if it has already run recently.</p><h3>Add docker repository</h3>
<figure>
  <pre><code class="language-yaml">- name: Add Docker GPG apt Key
  apt_key:
    url: &quot;{{ docker_apt_gpg_key }}&quot;
    state: present

- name: Add Docker Repository
  apt_repository:
    repo: &quot;{{ docker_apt_repository }}&quot;
    state: present
    update_cache: true</code></pre>
  </figure>
<p>Here, the <code>state: present</code> and <code>update_cache: true</code> make sure that the cache is only updated if this state was changed. So, <code>apt-get update</code> is not run if the docker repo is already present.</p><h3>Install and enable docker and docker compose v2</h3>
<figure>
  <pre><code class="language-yaml">- name: Install docker-ce
  apt:
    name: docker-ce
    state: present
    cache_valid_time: 86400

- name: Run and enable docker
  service:
    name: docker
    state: started
    enabled: true

- name: Install docker compose
  apt:
    name: docker-compose-plugin
    state: present
    cache_valid_time: 86400</code></pre>
  </figure>
<p>Again, due to <code>state: present</code> and <code>cache_valid_time: 86400</code>, there are no extra cache fetches if docker and docker-compose-plugin are already installed.</p><h2>Docker Compose V1 Setup</h2>
<p>WARNING: docker-compose v1 is end-of-life, please keep that in mind and only install/use it if absolutely required.</p><p>This task is wrapped in an <a href="https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_blocks.html" rel="noreferrer">ansible block</a> that checks if <code>docker_compose_install_v1</code> is true.</p><figure>
  <pre><code class="language-text">- name: Install docker-compose v1
  when:
    - docker_compose_install_v1 is defined
    - docker_compose_install_v1
  block:</code></pre>
  </figure>
<p>Inside the block, there are two sections:</p><h3>Check if docker-compose is installed and it’s version</h3>
<figure>
  <pre><code class="language-yaml">- name: Check current docker-compose version
  command: docker-compose --version
  register: docker_compose_vsn
  changed_when: false
  failed_when: false
  check_mode: no

- set_fact:
    docker_compose_current_version: &quot;{{ docker_compose_vsn.stdout | regex_search(&#039;(\\d+(\\.\\d+)+)&#039;) }}&quot;
  when:
    - docker_compose_vsn.stdout is defined</code></pre>
  </figure>
<p>The first block saves the output of <code>docker-compose --version</code> into a variable <code>docker_compose_vsn</code>. The <code>failed_when: false</code> ensures that this does not call a failure even if the command fails to execute. (See <a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html">error handling in ansible</a>).</p> <p>Sample output when docker-compose is installed: <code>docker-compose version 1.26.0, build d4451659</code></p> <p>The second block parses this output and extracts the version number using a regex (see <a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_filters.html">ansible filters</a>). There is a <code>when</code> condition which causes the second block to skip execution if the first block failed (See <a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_conditionals.html">playbook conditionals</a>).</p><h3>Install or upgrade docker-compose if required</h3>
<figure>
  <pre><code class="language-yaml">- name: Install or upgrade docker-compose
  get_url: 
    url : &quot;https://github.com/docker/compose/releases/download/{{ docker_compose_version }}/docker-compose-Linux-x86_64&quot;
    dest: /usr/local/bin/docker-compose
    mode: &#039;a+x&#039;
    force: yes
  when: &gt;
    docker_compose_current_version == &quot;&quot;
    or docker_compose_current_version is version(docker_compose_version, &#039;&lt;&#039;)</code></pre>
  </figure>
<p>This just downloads the required docker-compose binary and saves it to <code>/usr/local/bin/docker-compose</code>,
 but it has a conditional that this will only be done if either 
docker-compose is not already installed, or if the installed version is 
less than the required version. To do version comparison, it uses 
ansible’s built-in <a href="https://docs.ansible.com/ansible/latest/user_guide/playbooks_tests.html#version-comparison">version comparison function</a>.</p> <p>So,
 we used a few ansible features to achieve what we wanted. I’m sure 
there are a lot of other things we can do to make this even better and 
more fool-proof. Maybe a post for another day.</p>]]></content:encoded>
    <comments>https://srijan.ch/install-docker-and-docker-compose-using-ansible#comments</comments>
    <slash:comments>10</slash:comments>
  </item><item>
    <title>Riemann and Zabbix: Sending data from riemann to zabbix</title>
    <description><![CDATA[Tutorial for sending data from riemann to zabbix]]></description>
    <link>https://srijan.ch/sending-data-from-riemann-to-zabbix</link>
    <guid isPermaLink="false">6030d3dab5e0920001f557d3</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[monitoring]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Fri, 08 Jun 2018 18:55:00 +0000</pubDate>
    <content:encoded><![CDATA[<h3>Background</h3>
<p>At <a href="https://www.greyorange.com/" rel="noreferrer">my work</a>, we use <a href="http://riemann.io/" rel="noreferrer">Riemann</a> and <a href="https://www.zabbix.com/" rel="noreferrer">Zabbix</a> as part of our monitoring stack.</p><p>Riemann is a stream processing engine (written in Clojure) which can be used to monitor distributed systems. Although it can be used for defining alerts and sending notifications for those alerts, we currently use it like this:</p><ol><li>As a receiving point for metrics / data from a group of systems in an installation</li><li>Applying some filtering and aggregation at the installation level.</li><li>Sending the filtered / aggregated data to a central Zabbix system.</li></ol><p>The actual alerting mechanism is handled by Zabbix. Things like trigger definitions, sending notifications, handling acks and escalations, etc.</p><p>This might seem like Riemann is redundant (and there is definitely some overlap in functionality), but keeping Riemann in the data pipeline allows us to be more flexible operationally. This is specially in cases when the metrics data we need is coming from application code, and we need to apply some transformations to the data but cannot update the code.</p><h3>The Problem</h3>
<p>The first problem we faced when trying to do this is: sending data from Riemann to Zabbix is not that straightforward.</p><p>Surprisingly, the <a href="https://www.zabbix.com/documentation/3.4/manual/api" rel="noreferrer">Zabbix API</a> is not actually meant for sending data points to Zabbix - only for managing it's configuration and accessing historical data.</p><h3>Solutions</h3>
<p>The recommended way to send data to Zabbix is to use a command line application called <a href="https://www.zabbix.com/documentation/3.4/manpages/zabbix_sender" rel="noreferrer">zabbix_sender</a>.</p><p>Another way would be to write a custom zabbix client in Clojure which follows the <a href="https://www.zabbix.com/documentation/3.4/manual/appendix/items/activepassive" rel="noreferrer">Zabbix Agent protocol</a>, which uses JSON over TCP sockets.</p><p>The current solution we have taken for this is using <code>zabbix_sender</code> itself.</p><p>For this, we write filtered values to a predefined text file from Riemann in a format that <code>zabbix_sender</code> can understand.</p><figure>
  <pre><code class="language-clojure">(def zabbix-logger
  (io (zabbix-logger-init
       &quot;zabbix&quot; &quot;/var/log/riemann/to_zabbix.txt&quot;)))
       
(streams
  (where (tagged &quot;zabbix&quot;)
    (smap
     (fn [event]
       {:zhost  (:host event)
        :zkey   (:service event)
        :zvalue (:value event)})
     zabbix-sender)))

(defn zabbix-sender
  &quot;Sends events to zabbix via log file.
  Assumes that three keys are present in the incoming data:
    :zhost   -&gt; hostname for sending to zabbix
    :zkey    -&gt; item key for zabbix
    :zvalue  -&gt; value to send for the item key
  Requires zabbix_sender service running and tailing the log file&quot;
  [data]
  (io (zabbix-log-to-file
       zabbix-logger (str (:zhost data) &quot; &quot; (:zkey data) &quot; &quot; (:zvalue data)))))

;; Modified version of:
;; https://github.com/riemann/riemann/blob/68f126ff39819afc3296bb645243f888dab0943e/src/riemann/logging.clj
(defn zabbix-logger-init
  [log_key log_file]
  (let [logger (org.slf4j.LoggerFactory/getLogger log_key)]
    (.detachAndStopAllAppenders logger)
    (riemann.logging/configure-from-opts
     logger
     (org.slf4j.LoggerFactory/getILoggerFactory)
     {:file log_file})
    logger))

(defn zabbix-log-to-file
  [logger string]
  &quot;Log to file using `logger`&quot;
  (.info logger string))</code></pre>
  </figure>
<p>The above code writes data into the file <code>/var/log/riemann/to_zabbix.txt</code> in the following format:</p><figure>
  <pre><code class="language-log">INFO [2018-06-09 05:02:03,600] defaultEventExecutorGroup-2-7 - zabbix - host123 api.req-rate 200</code></pre>
  </figure>
<p>Then, the following script can be run to sending data from this file to Zabbix via <code>zabbix_sender</code>:</p><figure>
  <pre><code class="language-shellsession">$ tail -F /var/log/riemann/to_zabbix.txt | grep --line-buffered -oP &quot;(?&lt;=zabbix - ).*&quot; | zabbix_sender -z $ZABBIX_IP --real-time -i - -vv</code></pre>
  </figure>
<h3>Further Thoughts</h3>
<ul><li>There should probably be a check on Riemann whether data is correctly being delivered to Zabbix or not. If not, Riemann can send out alerts as well.</li><li>The current solution is a little fragile because it's first writing the data to a file and is dependent on an external service running to ship the data to Zabbix. A better solution would be to integrate directly as a Zabbix agent.</li></ul>]]></content:encoded>
    <comments>https://srijan.ch/sending-data-from-riemann-to-zabbix#comments</comments>
    <slash:comments>0</slash:comments>
  </item><item>
    <title>My backup strategy to USB disk using duply</title>
    <description><![CDATA[Local system backup using duply]]></description>
    <link>https://srijan.ch/my-backup-strategy-part-1</link>
    <guid isPermaLink="false">6030d3dab5e0920001f557ce</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[linux]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Thu, 04 Aug 2016 17:55:00 +0000</pubDate>
    <content:encoded><![CDATA[<p>I don't have a lot of data to backup - just my home folder (on my 
Archlinux laptop) which just has configuration for all the tools I'm 
using and my programming work.</p> <p>For photos or videos taken from my phone, I use google photos for 
backup - which works pretty well. Even if I delete the original files 
from my phone, the photos app still keeps them online.</p> <p>Coming back to my laptop, I'm currently using <a href="http://duplicity.nongnu.org/">duplicity</a> (with the <a href="http://duply.net/">duply</a>
 wrapper) to backup to multiple destinations. Why multiple locations? I 
wanted one local copy so that I can restore fast, and at least one at a 
remote location so that I can still restore if the local disk fails.</p> <p>For off-site, I'm using the fantastic <a href="http://www.rsync.net/">rsync.net</a> service. For local backups, I'm using two destinations: a USB HDD at my home, and a NFS server at my work. <strong>Depending on where I am, the backup will be done to the correct destination</strong>.</p> <p>This post will deal with the backups to my local USB disk.</p> <p>Here's what I've been able to achieve: the backups will run every 
hour as long as the USB disk is connected. If it is not connected, the 
backup script will not even be triggered. I did not want to see backup 
failures in my logs if the HDD is not connected.</p> <p>I've done this using a systemd timer and service. I've defined these units in <a href="https://wiki.archlinux.org/index.php/Systemd/User">the user-level part for systemd</a> so that root privileges are not required.</p><h3>Mounting the USB Disk</h3>
<p>To automatically mount the USB disk, I've added the following line to my <code>/etc/fstab</code>:</p><figure>
  <pre><code class="language-ini">UUID=27DFA4B43C8C0635 /mnt/Ext01 ntfs-3g nosuid,nodev,nofail,auto,x-gvfs-show,permissions 0 0</code></pre>
  </figure>
<h3>Duply config for running the backup</h3>
<p>Here's my <strong>duply</strong> config file (kept at <code>~/.duply/ext01/conf</code>) (mostly self-explanatory):</p><figure>
  <pre><code class="language-ini">TARGET=&#039;file:///mnt/Ext01/Backups/&#039;
SOURCE=&#039;/home/srijan&#039;
MAX_AGE=1Y
MAX_FULL_BACKUPS=15
MAX_FULLS_WITH_INCRS=2
MAX_FULLBKP_AGE=1M
DUPL_PARAMS=&quot;$DUPL_PARAMS --full-if-older-than $MAX_FULLBKP_AGE &quot;
VOLSIZE=4
DUPL_PARAMS=&quot;$DUPL_PARAMS --volsize $VOLSIZE &quot;
DUPL_PARAMS=&quot;$DUPL_PARAMS --exclude-other-filesystems &quot;</code></pre>
  </figure>
<p>This can be run manually using:</p><figure>
  <pre><code class="language-shellsession">$ duply ext01 backup</code></pre>
  </figure>
<p>Exclusions can be specified in the file <code>~/.config/ext01/exclude</code> in a glob-like format.</p><h3>Systemd Service for running the backup</h3>
<p>Next, here's the <strong>service file</strong> (kept at <code>~/.config/systemd/user/duply_ext01.service</code>):</p><figure>
  <pre><code class="language-ini">[Unit]
Description=Run backup using duply: ext01 profile
Requires=mnt-Ext01.mount
After=mnt-Ext01.mount

[Service]
Type=oneshot
ExecStart=/usr/bin/duply ext01 backup</code></pre>
  </figure>
<p>The <code>Requires</code> option says that this unit has a dependency on the mounting of Ext01. The <code>After</code> option specifies the order in which these two should be started (run this service <em>after</em> mounting).</p> <p>After this step, the service can be run manually (via systemd) using:</p><figure>
  <pre><code class="language-shellsession">$ systemctl --user start duply_ext01.service</code></pre>
  </figure>
<h3>Systemd timer for triggering the backup service</h3>
<p>Next step is triggering it automatically every hour. Here's the <strong>timer file</strong> (kept at <code>~/.config/systemd/user/duply_ext01.timer</code>):</p><figure>
  <pre><code class="language-ini">[Unit]
Description=Run backup using duply ext01 profile every hour
BindsTo=mnt-Ext01.mount
After=mnt-Ext01.mount

[Timer]
OnCalendar=hourly
AccuracySec=10m
Persistent=true

[Install]
WantedBy=mnt-Ext01.mount</code></pre>
  </figure>
<p>Here, the <code>BindsTo</code> option defines a dependency similar to the <code>Requires</code>
 option above, but also declares that this unit is stopped when the 
mount point goes away due to any reason. This is because I don't want 
the trigger to fire if the HDD is not connected.</p> <p>The <code>Persistent=true</code> option ensures that when the timer 
is activated, the service unit is triggered immediately if it would have
 been triggered at least once during the time when the timer was 
inactive. This is because I want to catch up on missed runs of the 
service when the disk was disconnected.</p> <p>After creating this file, I ran the following to actually link this timer to mount / unmount events for the Ext01 disk:</p><figure>
  <pre><code class="language-shellsession">$ systemctl --user enable duply_ext01.timer</code></pre>
  </figure>
<p>That's it. Now, whenever I connect the USB disk to my laptop, the 
timer is started. This timer triggers the backup service to run every 
hour. Also, it takes care that if some run was missed when the disk was 
disconnected, then it would be triggered as soon as the disk is 
connected without waiting for the next hour mark. Pretty cool!</p><h4>NOTES:</h4>
<ul><li>Changing any systemd unit file requires a <code>systemd --user daemon-reload</code> before systemd can recognize the changes.</li><li>The <a href="https://www.freedesktop.org/software/systemd/man/index.html">systemd documentation</a> was very helpful.</li></ul><h3>Coming Soon</h3>
<p>Although it would be similar, but I'll also document how to do the 
above with NFS or SSHFS filesystems (instead of local disks). The major 
difference would be handling loss of internet connectivity, timeouts, 
etc.</p>]]></content:encoded>
    <comments>https://srijan.ch/my-backup-strategy-part-1#comments</comments>
    <slash:comments>0</slash:comments>
  </item><item>
    <title>PostgreSQL replication using Bucardo</title>
    <description><![CDATA[Keeping a live replica of selected PostgreSQL tables using Bucardo]]></description>
    <link>https://srijan.ch/postgresql-replication-using-bucardo</link>
    <guid isPermaLink="false">6030d3dab5e0920001f557cf</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[postgresql]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Tue, 15 Sep 2015 18:05:00 +0000</pubDate>
    <media:content url="https://srijan.ch/media/pages/blog/postgresql-replication-using-bucardo/71791f08a7-1699621096/photo-1551356277-dbb545a2d493.jpg" medium="image" />
    <content:encoded><![CDATA[<figure data-ratio="auto">
    <img src="https://srijan.ch/media/pages/blog/postgresql-replication-using-bucardo/71791f08a7-1699621096/photo-1551356277-dbb545a2d493.jpg" alt="PostgreSQL Replication using Bucardo">
  
  </figure>
<p>There are many different ways to use replication in PostgreSQL, whether for high<br />
availability (using a failover), or load balancing (for scaling), or just for<br />
keeping a backup. Among the various tools I found online, I though bucardo is<br />
the best for my use case - keeping a live backup of a few important tables.</p>
<p>I've assumed the following databases:</p>
<ul>
<li>Primary: Hostname = <code>host_a</code>, Database = <code>btest</code></li>
<li>Backup: Hostname = <code>host_b</code>, Database = <code>btest</code></li>
</ul>
<p>We will install bucardo in the primary database (it required it's own database<br />
to keep track of things).</p>
<ol>
<li>
<p>Install postgresql</p>
<pre><code class="language-shell-session"> sudo apt-get install postgresql-9.4</code></pre>
</li>
<li>
<p>Install dependencies on <code>host_a</code></p>
<pre><code class="language-shell-session"> sudo apt-get install libdbix-safe-perl libdbd-pg-perl libboolean-perl build-essential postgresql-plperl-9.4</code></pre>
</li>
<li>
<p>On <code>host_a</code>, Download and extract bucardo source</p>
<pre><code class="language-shell-session"> wget https://github.com/bucardo/bucardo/archive/5.4.0.tar.gz
 tar xvfz 5.4.0.tar.gz</code></pre>
</li>
<li>
<p>On <code>host_a</code>, Build and Install</p>
<pre><code class="language-shell-session"> perl Makefile.PL
 make
 sudo make install
 sudo mkdir /var/run/bucardo
 sudo mkdir /var/log/bucardo</code></pre>
</li>
<li>
<p>Create bucardo user on all hosts</p>
<pre><code class="language-sql"> CREATE USER bucardo SUPERUSER PASSWORD 'random_password';
 CREATE DATABASE bucardo;
 GRANT ALL ON DATABASE bucardo TO bucardo;</code></pre>
<p>Note: All commands from now on are to be run on <code>host_a</code> only.</p>
</li>
<li>
<p>On <code>host_a</code>, set a password for the <code>postgres</code> user:</p>
<pre><code class="language-sql"> ALTER USER postgres PASSWORD 'random_password';</code></pre>
</li>
<li>
<p>On <code>host_a</code>, add this to the installation user's <code>~/.pgpass</code> file:</p>
<pre><code class="language-ini"> host_a:5432:*:postgres:random_password
 host_a:5432:*:bucardo:random_password</code></pre>
<p>Also add entries for the other hosts for which users were created in step 5.</p>
<p>Note: It is also a good idea to chmod the <code>~/.pgpass</code> file to <code>0600</code>.</p>
</li>
<li>
<p>Run the bucardo install command:</p>
<pre><code class="language-shell-session"> bucardo -h host_a install</code></pre>
</li>
<li>
<p>Copy schema from A to B:</p>
<pre><code class="language-shell-session"> psql -h host_b -U bucardo template1 -c "drop database if exists btest;"
 psql -h host_b -U bucardo template1 -c "create database btest;"
 pg_dump -U bucardo --schema-only -h host_a btest | psql -U bucardo -h host_b btest</code></pre>
</li>
<li>
<p>Add databases to bucardo config</p>
<pre><code class="language-shell-session"> bucardo -h host_a -U bucardo add db main db=btest user=bucardo pass=host_a_pass host=host_a
 bucardo -h host_a -U bucardo add db bak1 db=btest user=bucardo pass=host_b_pass host=host_b</code></pre>
<p>This will save database details (host, port, user, password) to bucardo<br />
database.</p>
</li>
<li>
<p>Add tables to be synced</p>
<p>To add all tables:</p>
<pre><code class="language-shell-session"> bucardo -h host_a -U bucardo add all tables db=main relgroup=btest_relgroup</code></pre>
<p>To add one table:</p>
<pre><code class="language-shell-session"> bucardo -h host_a -U bucardo add table table_name db=main relgroup=btest_relgroup</code></pre>
<p>Note: Only table which have a primary key can be added here. This is a<br />
limitation of bucardo.</p>
</li>
<li>
<p>Add db group</p>
<pre><code class="language-shell-session"> bucardo -h host_a -U bucardo add dbgroup btest_dbgroup main:source bak1:target</code></pre>
</li>
<li>
<p>Create sync</p>
<pre><code class="language-shell-session"> bucardo -h host_a -U bucardo add sync btest_sync dbgroup=btest_dbgroup relgroup=btest_relgroup conflict_strategy=bucardo_source onetimecopy=2 autokick=0</code></pre>
</li>
<li>
<p>Start the bucardo service</p>
<pre><code class="language-shell-session"> sudo bucardo -h host_a -U bucardo -P random_password start</code></pre>
<p>Note that this command requires passing the password because it uses sudo,<br />
and root user's <code>.pgpass</code> file does not have the credentials saved for bucardo<br />
user.</p>
</li>
<li>
<p>Run sync once</p>
<pre><code class="language-shell-session"> bucardo -h host_a -U bucardo kick btest_sync 0</code></pre>
</li>
<li>
<p>Set auto-kick on any changes</p>
<pre><code class="language-shell-session"> bucardo -h host_a -U bucardo update sync btest_sync autokick=1
 bucardo -h host_a -U bucardo reload config</code></pre>
</li>
</ol>
<p>That's it. Now, the tables specified in step 11 will be replicated from <code>host_a</code><br />
to <code>host_b</code>.</p>
<p>I also plan to write about other alternatives I've tried soon.</p>]]></content:encoded>
    <comments>https://srijan.ch/postgresql-replication-using-bucardo#comments</comments>
    <slash:comments>6</slash:comments>
  </item><item>
    <title>Django, uWSGI, Nginx on Freebsd</title>
    <description><![CDATA[Setting up Django on Freebsd using uWSGI and Nginx]]></description>
    <link>https://srijan.ch/django-uwsgi-nginx-on-freebsd</link>
    <guid isPermaLink="false">6030d3dab5e0920001f557cb</guid>
    <category><![CDATA[devops]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Thu, 05 Mar 2015 00:00:00 +0000</pubDate>
    <content:encoded><![CDATA[<p>Here are the steps I took for configuring Django on Freebsd using uWSGI and Nginx.</p> <p>The data flow is like this:</p> <p>Web Request ---&gt; Nginx ---&gt; uWSGI ---&gt; Django</p> <p>I was undecided for a while on whether to choose uWSGI or gunicorn. There are <a href="http://cramer.io/2013/06/27/serving-python-web-applications/">some</a> <a href="http://mattseymour.net/blog/2014/07/uwsgi-or-gunicorn/">blog</a> <a href="http://blog.kgriffs.com/2012/12/18/uwsgi-vs-gunicorn-vs-node-benchmarks.html">posts</a> discussing the pros and cons of each. I chose uWSGI in the end.</p> <p>Also, to start uWSGI in freebsd, I found two methods: using <a href="http://amix.dk/blog/post/19689">supervisord</a>, or using a <a href="http://lists.freebsd.org/pipermail/freebsd-questions/2014-February/256073.html">custom freebsd init script</a> which could use uWSGI ini files. Currently using supervisord.</p><h2>Install Packages Required</h2>
<figure>
  <pre><code class="language-shellsession">$ sudo pkg install python py27-virtualenv nginx uwsgi py27-supervisor</code></pre>
  </figure>
<p>Also install any database package(s) required.</p><h2>Setup your Django project</h2>
<p>Choose a folder for setting up your Django project sources. <code>/usr/local/www/myapp</code> is suggested. Clone the sources to this folder, then setup the python virtual environment.</p><figure>
  <pre><code class="language-shellsession">$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt</code></pre>
  </figure>
<p>If required, also setup the database and run the migrations.</p><h2>Setup uWSGI using supervisord</h2>
<p>Setup the supervisord file at <code>/usr/local/etc/supervisord.conf</code>.</p> <p>Sample supervisord.conf:</p><figure>
  <pre><code class="language-ini">[unix_http_server]
file=/var/run/supervisor/supervisor.sock   

[supervisord]
logfile=/var/log/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB       ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10          ; (num of main logfile rotation backups;default 10)
loglevel=info               ; (log level;default info; others: debug,warn,trace)
pidfile=/var/run/supervisor/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false              ; (start in foreground if true;default false)
minfds=1024                 ; (min. avail startup file descriptors;default 1024)
minprocs=200                ; (min. avail process descriptors;default 200)

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///var/run/supervisor/supervisor.sock
history_file=~/.sc_history  ; use readline history if available

[program:uwsgi_myapp]
directory=/usr/local/www/myapp/
command=/usr/local/bin/uwsgi -s /var/run/%(program_name)s%(process_num)d.sock
        --chmod-socket=666 --need-app --disable-logging --home=venv
        --wsgi-file wsgi.py --processes 1 --threads 10
stdout_logfile=&quot;syslog&quot;
stderr_logfile=&quot;syslog&quot;
startsecs=10
stopsignal=QUIT
stopasgroup=true
killasgroup=true
process_name=%(program_name)s%(process_num)d
numprocs=5</code></pre>
  </figure>
<p>supervisord.conf</p> <p>And start it:</p><figure>
  <pre><code class="language-shellsession">$ echo supervisord_enable=&quot;YES&quot; &gt;&gt; /etc/rc.conf
$ sudo service supervisord start
$ sudo supervisorctl tail -f uwsgi_myapp:uwsgi_myapp0</code></pre>
  </figure>
<h2>Setup Nginx</h2>
<p>Use the following line in <code>nginx.conf</code>'s http section to include all config files from <code>conf.d</code> folder.</p><figure>
  <pre><code class="language-nginx">include /usr/local/etc/nginx/conf.d/*.conf;</code></pre>
  </figure>
<p>Create a <code>myapp.conf</code> in <code>conf.d</code>.</p> <p>Sample myapp.conf:</p><figure>
  <pre><code class="language-nginx">upstream myapp {
    least_conn;
    server unix:///var/run/uwsgi_myapp0.sock;
    server unix:///var/run/uwsgi_myapp1.sock;
    server unix:///var/run/uwsgi_myapp2.sock;
    server unix:///var/run/uwsgi_myapp3.sock;
    server unix:///var/run/uwsgi_myapp4.sock;
}

server {
    listen       80;
    server_name  myapp.example.com;
 
    location /static {
        alias /usr/local/www/myapp/static;
    }

    location / {
        uwsgi_pass  myapp;
        include uwsgi_params;
    }
}</code></pre>
  </figure>
<p>myapp.conf</p> <p>And start Nginx:</p><figure>
  <pre><code class="language-shellsession">$ echo nginx_enable=&quot;YES&quot; &gt;&gt; /etc/rc.conf
$ sudo service nginx start
$ sudo tail -f /var/log/nginx-error.log</code></pre>
  </figure>
<p>Accessing <a href="http://myapp.example.com/">http://myapp.example.com/</a> should work correctly after this. If not, see the supervisord and Nginx logs opened and correct the errors.</p>]]></content:encoded>
    <comments>https://srijan.ch/django-uwsgi-nginx-on-freebsd#comments</comments>
    <slash:comments>5</slash:comments>
  </item><item>
    <title>Read only root on Linux</title>
    <description><![CDATA[Setting up a read-only root filesystem on Linux]]></description>
    <link>https://srijan.ch/read-only-root-on-linux</link>
    <guid isPermaLink="false">6030d3dab5e0920001f557d2</guid>
    <category><![CDATA[devops]]></category>
    <category><![CDATA[linux]]></category>
    <dc:creator>Srijan Choudhary</dc:creator>
    <pubDate>Sat, 28 Feb 2015 00:00:00 +0000</pubDate>
    <content:encoded><![CDATA[<p>In many cases, it is required to run a system in such a way that it 
is tolerant of uncontrolled power losses, resets, etc. After such an 
event occurs, it should atleast be able to boot up and connect to the 
network so that some action can be taken remotely.</p> <p>There are a few different ways in which this could be accomplished.</p><h3>Mounting the root filesystem with read-only flags</h3>
<p>Most parts of the linux root filesystem can be mounted read-only 
without much problems, but there are some parts which don't play well. <a href="https://wiki.debian.org/ReadonlyRoot">This debian wiki page</a> has some information about this approach. I thought this approach would not be very stable, so did not try it out completely.</p><h3>Using aufs/overlayfs</h3>
<p>aufs is a union file system for linux systems, which enables us to 
mount separate filesystems as layers to form a single merged filesystem.
 Using aufs, we can mount the root file system as read-only, create a 
writable tmpfs ramdisk, and combine these so that the system thinks that
 the root filesystem is writable, but changes are not actually saved, 
and don't survive a reboot.</p> <p>I found this method to be most suitable and stable for my task, and 
have been using this for the last 6 months. This system mounts the real 
filesytem at mountpoint <code>/ro</code> with read-only flag, creates a writable ramdisk at mountpoint <code>/rw</code>, and makes a union filesystem using these two at mountpoint <code>/</code>.</p> <p>The steps I followed for my implementation are detailed below. These are just a modified version of the steps in <a href="https://help.ubuntu.com/community/aufsRootFileSystemOnUsbFlash">this ubuntu wiki page</a>. I am using Debian in my implementation.</p><ol><li><p>Install debian using live cd or your preferred method.</p></li><li><p>After first boot, upgrade and configure the system as needed.</p></li><li><p>Install <code>aufs-tools</code>.</p></li><li><p>Add aufs to initramfs and setup <a href="https://gist.github.com/srijan/383a8d7af6860de6f9de">this script</a> to start at init.</p></li></ol><figure>
  <pre><code class="language-shellsession"># echo aufs &gt;&gt; /etc/initramfs-tools/modules
# wget https://cdn.rawgit.com/srijan/383a8d7af6860de6f9de/raw/ -O /etc/initramfs-tools/scripts/init-bottom/__rootaufs
# chmod 0755 /etc/initramfs-tools/scripts/init-bottom/__rootaufs</code></pre>
  </figure>
<ol><li>Remake the initramfs.</li></ol><figure>
  <pre><code class="language-shellsession"># update-initramfs -u</code></pre>
  </figure>
<ol><li>Edit grub settings in <code>/etc/default/grub</code> and add <code>aufs=tmpfs</code> to <code>GRUB_CMDLINE_LINUX_DEFAULT</code>, and regenerate grub.</li></ol><figure>
  <pre><code class="language-shellsession"># update-grub</code></pre>
  </figure>
<ol><li>Reboot</li></ol><h4>Making changes</h4>
<p>To change something trivial (like a file edit), just remount the <code>/ro</code> mountpoint as read-write, edit the file, and reboot.</p><figure>
  <pre><code class="language-shellsession"># mount -o remount,rw /ro</code></pre>
  </figure>
<p>To do something more complicated (like install os packages), press <code>e</code> in grub menu during bootup, remove <code>aufs=tmpfs</code> from the kernel line, and boot using <code>F10</code>. The system will boot up normally once.</p> <p>Another method could be to use a configuration management tool 
(puppet, chef, ansible, etc.) to make the required changes whenever the 
system comes online. The changes would be lost on reboot, but it would 
become much easier to manage multiple such systems.</p> <p>Also, if some part of the system is required to be writable (like <code>/var/log</code>), that directory could be mounted separately as a read-write mountpoint.</p>]]></content:encoded>
    <comments>https://srijan.ch/read-only-root-on-linux#comments</comments>
    <slash:comments>1</slash:comments>
  </item></channel>
</rss>
