Exploring conflicting oneshot services in systemd

Srijan Choudhary Srijan Choudhary
- 4 min read
Exploring conflicting oneshot services in systemd
Midjourney: two systemd services fighting over who will start first

Background

I use mbsync to sync my mailbox from my online provider (FastMail - referer link) to my local system to eventually use with mu4e (on Emacs).

For periodic sync, I have a systemd service file called mbsync.service defining a oneshot service and a timer file called mbsync.timer that runs this service periodically. I can also activate the same service using a keybinding from inside mu4e.

[Unit]
Description=Mailbox synchronization service
After=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync fastmail-all
ExecStartPost=bash -c "emacsclient -s srijan -n -e '(mu4e-update-index)' || mu index"

[Install]
WantedBy=default.target
mbsync.service
[Unit]
Description=Mailbox synchronization timer
BindsTo=graphical-session.target
After=graphical-session.target

[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service

[Install]
WantedBy=graphical-session.target
mbsync.timer

Also, for instant download of new mail, I have another service called goimapnotify configured that listens for new/updated/deleted messages on the remote mailbox using IMAP IDLE, and calls the above mbsync.service when there are changes.

This has worked well for me for several years.

The Problem

I recently split my (huge) archive folder into yearly archives so that I can keep/sync only the recent years on my phone. [ Aside: yearly refile in mu4e snippet ]. This lead to an increase in the number of folders that mbsync has to sync, and this increased the time taken to sync because it syncs the folders one by one.

It does have the feature to sync a subset of folders, so I created a second systemd service called mbsync-quick.service and only synced my Inbox from this service. Then I updated the goimapnotify config to trigger this quick service instead of the full service when it detects changes.

But, this caused a problem: these two services can run at the same time, and hence can cause corruption or sync conflicts in the mail files. So, I wanted a way to make sure that these two services don't run at the same time.

Ideally, whenever any of these services are triggered and the other service is already running, then it should wait for the other service to stop before starting, essentially forming a queue.

Solution 1: Using systemd features

Systemd has a way to specify conflicts in the unit section. From the docs:

If a unit has aConflicts=setting on another unit, starting the former will stop the latter and vice versa.
[...] to ensure that the conflicting unit is stopped before the other unit is started, anAfter=orBefore=dependency must be declared.

This is different from our requirement that the conflicting service should be allowed to finish before the triggered service starts, but maybe a good enough way to at least prevent both running at the same time.

To test this, I added Conflicts= in both the services with the other service as the conflicting service, and it works. The only problem is that when a service is triggered, the other service is SIGTERMed. This itself might not cause a corruption issue, but if this happens with the mbsync-quick service, then there might be a delay getting the mail.

This is the best way I found that uses built-in systemd features without any workarounds or hacks. Other solutions below involve some workarounds.

Solution 2: Conflict + stop after sync complete

This is a variation on solution 1 - add a wrapper script to trap the SIGTERM and only exit when the sync is complete. This also worked.

But, the drawback with this method is that anyone calling stop on these services (like the system shutting down) will have to wait for this to finish (or till timeout of 90s). This can cause slowdowns in system shutdown that are hard to debug. So, I don't prefer this solution.

Solution 3: Delay start until the other service is finished

This is also a hacky solution - use ExecStartPre to check if the other service is running, and busywait for it to stop before starting ourselves.

[Unit]
Description=Mailbox synchronization service (quick)
After=network-online.target

[Service]
Type=oneshot
ExecStartPre=/bin/sh -c 'while systemctl --user is-active mbsync.service | grep -q activating; do sleep 0.5; done'
ExecStart=/usr/bin/mbsync fastmail-inbox
ExecStartPost=bash -c "emacsclient -s srijan -n -e '(mu4e-update-index)' || mu index"
mbsync-quick.service

Here, we use systemctl is-active to query the status of the other service, and wait until the other service is not in activating state anymore. The state is called activating instead of active because these are oneshot services that go from inactive to activating to inactive without ever reaching active.

To not make this an actual busywait on the CPU, I added a sleep of 0.5s.

This worked the best for my use case. When one of the services is triggered, it checks if the other service is running and waits for it to stop before running itself. It also does not have the drawback of solution 2 of trapping exits and delaying a stop command.

But, after using it for a day, I found there is a race condition (!) that can cause a deadlock between these two services and none of them are able to start.

The reason for the race condition was:

  • A service is marked as activating when it's ExecStartPre command starts
  • I added a sleep of 0.5 seconds

So, if the other service is triggered again in between those 0.5 seconds, both services will be marked as activating and they will indefinitely wait for each other to complete. This is what I get for using workarounds.

Solution 4: One-way conflict, other way delay

So, the final good-enough solution I came up with was to break this cyclic dependency by doing a hybrid of Solution 1 and Solution 3. I was okay with the mbsync.service being stopped for the (higher priority) mbsync-quick.service.

So, I added mbsync.service in Conflicts section of mbsync-quick.service, and used the ExecStartPre method in mbsync.service.

πŸ’‘Let me know if you know a better way to achieve this.

References

Interactions