» TechSparx » Software Development » Using Docker to deploy applications, encapsulate software tools, and otherwise simplify software development » Cloning the MAMP concept with Docker -- DAMP stack: Docker, Apache, MySQL, PHP, etc » Scheduling background tasks using cron in a Docker container

David Herron

; Date: Thu Mar 25 2021

Tags: Docker »»»» Docker MAMP »»»»

The cron service has, for time immemorial in Unix-like systems, long helped us schedule the occasional background process that keeps our systems ticking. Typically it is used to gather up or summarize log files, or collect and process data from external systems. In the old days when e-mail exchange, and the all-too-important Usenet news exchange, happened using UUCP over modem lines, a cron job scheduled regular UUCP calls to neighboring servers. Having cron running in the background is part of normal Unix/Linux/etc system admin practices. Even though the crontab format is kind of hokey, we all learn it and set up automated background tasks to keep the world functioning. Let's see how to set this up in a Docker container.

In a normal Unix/Linux/etc machine the cron service is automatically present. It's there next to the couple dozen other background services like talkd (a primitive chatting system), lpd (printer support), or dbus-daemon (DBUS). But, the Linux image running in a Docker container is purposely minimal. Instead of running the full gamut of background tasks, a Docker container runs only the processes required for the application, in order to minimize overhead and container size. That means even though Docker containers run a Linux flavor, cron is not there.

Sometimes your application requires occasional running of background tasks. If nothing else, log files need to be summarized and collected.

The first time I thought about running cron in Docker came while writing the 4th Edition of Node.js Web Development. The book covers the full gamut of Node.js application development from soup to nuts, that is from initial concept to delivery on real cloud hosting servers, and even consideration of security setup. The 4th Edition added a section on setting up HTTPS for Node.js applications, using the Lets Encrypt service to procure SSL certificates. Since Lets Encrypt requires running command-line tools every 30-60 days to renew the certificates, it became necessary to work out how to install cron in a Docker container to manage SSL certificate renewal.

How do we set up cron in a Docker container?

The simplest model for running cron in a Docker container is to create a container purposed solely for cron. One simply creates a Dockerfile deriving from your favorite Linux distro, and install the cron package for that distro. You then configure cron jobs in the manner required by the specific cron daemon being used.

But there is a slippery slope on which you want to run cron next to another service like NGINX. In the case I mentioned, using NGINX as a reverse proxy in front of an application, and Lets Encrypt command-line tools to manage SSL certificates, led me to create a container running both NGINX and cron.

The simplest model to run cron alongside another service in Docker is to start with a Dockerfile that installs and configures the packages for all required services. To run both cron and another service is easily accomplished with a specific CMD instruction in the Dockerfile. But you might find it more productive to use a background process manager, like /etc/init, especially if your container requires more than two background processes.

The code shown in this tutorial is available in a Github repo: https://github.com/robogeek/docker-cron

There is a Docker container derived from the discussion in this tutorial: https://hub.docker.com/r/robogeek/crond

Creating a Docker container to solely run Cron

Let's start with the simplest Dockerfile's that can run cron. When creating Docker images we have a choice between using Alpine Linux, because it produces ultra small images, or using a normal Linux distro, like Debian, with the familiarity that comes from having used Debian and Ubuntu for over 15 years. In other words, let's create two Docker images to compare the differences.

In the Github repo associated with this tutorial you'll find a directory, alpine, containing this Dockerfile:

FROM alpine

CMD [ "/usr/sbin/crond", "-f", "-d8" ]

Notice that it's just Alpine Linux with no additional packages installed. That's because Alpine Linux is based on Busy Box, and the Busy Box binary directly supports crond. To explain what that means, Busy Box .. well, they explain it better than I can:

BusyBox combines tiny versions of many common UNIX utilities into a single small executable. It provides replacements for most of the utilities you usually find in GNU fileutils, shellutils, etc. The utilities in BusyBox generally have fewer options than their full-featured GNU cousins; however, the options that are included provide the expected functionality and behave very much like their GNU counterparts. BusyBox provides a fairly complete environment for any small or embedded system. -- https://www.busybox.net/about.html

In Busy Box, there is a single executable file that contains several applications all compiled together. Through the magic of symlinks, that executable has several names.

The arguments we can use with the Busy Box crond are:

/etc/periodic # crond --help
BusyBox v1.32.1 () multi-call binary.

Usage: crond -fbS -l N -d N -L LOGFILE -c DIR

        -f      Foreground
        -b      Background (default)
        -S      Log to syslog (default)
        -l N    Set log level. Most verbose 0, default 8
        -d N    Set log level, log to stderr
        -L FILE Log to FILE
        -c DIR  Cron dir. Default:/var/spool/cron/crontabs

Hence, this Dockerfile says to run crond in the foreground, and to set the debugging level to 8.

The result, as we'll see, is an ultra-small Docker image.

In the Github repo you'll find another directory, debian, containing this Dockerfile:

FROM debian:jessie

RUN apt-get update && apt-get install -y cron bash wget
CMD [ "cron", "-f" ]

With the Debian Docker image we are required to install additional tools, so we've done so.

In each directory you'll find a package.json in which are recorded several command scripts. This will require having Node.js (and npm) installed. If you do have that installed, simply run npm run build in each directory. If you do not, then instead execute this:

$ (cd alpine; docker build -t crond-alpine .)
$ (cd debian; docker build -t crond-debian .)

This will build each container. Notice that building crond-debian takes more time and more effort to assemble everything, due to it running apt-get update and apt-get install.

After running both builds, run this command:

$ docker images
REPOSITORY          TAG          IMAGE ID       CREATED          SIZE
crond-debian        latest       9d4c08ada890   19 minutes ago   222MB
...
crond-alpine        latest       1110a78d59ff   4 weeks ago      5.61MB
...

Oh, hey, 5.61 megabytes versus 222 megabytes is a huge difference. Maybe we should stick with Alpine?

Next, let's launch both containers:

$ docker run --name alpine -d crond-alpine
$ docker run --name debian -d crond-debian

Neither will do anything because there's no cron jobs configured. But we can access the container innards to see the required configuration.

$ docker exec -it alpine bash
OCI runtime exec failed: exec failed: container_linux.go:370: starting container process caused: exec: "bash": executable file not found in $PATH: unknown

The normal way to get into a Docker container is with this command, but that presumes bash is installed. Because this is Alpine Linux, which is purposely kept as trimmed as possible, it's not surprising to get this error message, because bash is probably not installed. Indeed:

$ docker exec -it alpine sh
/ #

We can get in by executing sh instead of bash, learn something new every day.

Next, run this command:

/etc # cat crontabs/root 
# do daily/weekly/monthly maintenance
# min   hour    day     month   weekday command
*/15    *       *       *       *       run-parts /etc/periodic/15min
0       *       *       *       *       run-parts /etc/periodic/hourly
0       2       *       *       *       run-parts /etc/periodic/daily
0       3       *       *       6       run-parts /etc/periodic/weekly
0       5       1       *       *       run-parts /etc/periodic/monthly

This is the only preconfigured cron job on the system. This configuration scans these subdirectories under /etc/periodic looking for any scripts to execute.

Let's do the same investigation for Debian:

$ docker exec -it debian bash
root@7bb32b120e74:/# cd /etc
root@7bb32b120e74:/etc# cat crontab 
# /etc/crontab: system-wide crontab
#...

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user  command
17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly
25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

This configuration is roughly similar, scanning /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly for scripts to execute. The latter three will be skipped if anacron is installed.

That gives us all we need to know to configure and launch cron on either, which we'll do in the next section.

Using a Cron container to run regular backups of Wordpress site

Let's do something using the Alpine based Cron container. To start let's inject a simple shell script:

#!/bin/sh
touch /tmp/hello.txt
echo Hello, World! >>/tmp/hello.txt

Specifically, create a directory named alpine/15min, and save that script into that directory giving it the name hello-world.sh. For good measure make that script executable by running this: chmod +x 15min/hello-world.sh

Then in the alpine directory create a file named docker-compose.yml containing:

version: '3.8'

services:

    crond:
        image: crond-alpine
        volumes:
            - ./15min:/etc/periodic/15min

This defines a container, injecting the local 15min directory into the correct place inside the container. Run docker-compose up -d to launch this container, and after a few minutes you can do this:

$ docker exec -it alpine_crond_1 sh
/ # 
/ # cd /etc/periodic/15min/
/etc/periodic/15min # ls
hello-world.sh
/etc/periodic/15min # ls -l
total 4
-rwxr-xr-x    1 root     root            28 Mar 23 02:10 hello-world.sh
/etc/periodic/15min #

The script was successfully injected into the container, as expected.

To observe behavior of the crond process:

$ docker-compose logs -f
Attaching to alpine_crond_1
crond_1  | crond: crond (busybox 1.32.1) started, log level 8
crond_1  | crond: USER root pid   7 cmd run-parts /etc/periodic/15min
crond_1  | crond: USER root pid  17 cmd run-parts /etc/periodic/15min
^CERROR: Aborting.

But, there will be a problem due to naming the script hello-world.sh. The script is supposed to add a message text to /tmp/hello.txt, but no such text is added. After several hours of head-scratching, I learned a silly thing. Files with the extension .sh are not supported by the run-parts tool. Specifically, your script cannot be suffixed with any extension name, because run-parts is being used.

Notice that the crond configuration relies on using run-parts to run the scripts in each directory. It's not hard to replace the default cron configuration with your own, that doesn't use run-parts. But it is a useful tool, and the default cron configuration is flexible enough for most purposes.

As soon as you change the script file name to hello-world rather than hello-world.sh, then wait until the 15 minute mark, messages show up in /tmp/hello.txt:

$ docker exec -it alpine_crond_1 sh
/ # cat /tmp/hello.txt 
Hello, World!
Hello, World!
/ #

This demonstrates the basic need, we can easily define a container that supports cron jobs.

Now that we've shown an ability to run cron jobs in a Docker container, let's try to do something practical. Given a generic Wordpress site, let's create a container to handle backups. Namely, we'll have a directory containing the Wordpress files, and we'll have access particulars to the MySQL database. We don't need to consider the details of Wordpress deployment in Docker, the container simply needs access to those things. A companion article goes over Wordpress deployment: Wordpress local development environment with Docker and Docker Compose on your laptop

The setup described in that article uses the official MySQL and Wordpress containers. The important characteristics to perform backups is:

Both are on the Docker network wpnet, which is named wordpress-local_wpnet once it is launched using Docker Compose.
The Wordpress files are stored in a directory, ../../wordpress-local/roots/html, relative to the current directory.
The database is on a hostname db, and is accessed by the user ID dbuser, password dbpassw0rd, and database name wpdb.

In the Github repository you'll find a new directory, wordpress-backup. In there you'll find a new Dockerfile:

FROM alpine

RUN apk add --no-cache mysql-client
CMD [ "/usr/sbin/crond", "-f", "-d8" ]

This adds to the alpine image the mysql-client package. That package contains both the mysql and mysqladmin tools. We'll use the latter in a shell script to run a backup process.

In the package.json you'll see the command to build this is:

docker build -t crond-alpine-mysql .

Hence, the container name is crond-alpine-mysql.

Next you'll find a directory, hourly, containing a shell script named backup:

#!/bin/sh

SRC=/docroot
DEST=/backups

DB_USER_NAME=dbuser
DB_PASSWORD=dbpassw0rd
DB_NAME=wpdb
DB_SERVER=db

mkdir -p ${DEST}

TIMESTAMP=`date '+%Y-%m-%d-%H:%M'`

mysqldump \
        -u ${DB_USER_NAME} \
        --password=${DB_PASSWORD} \
        --host=${DB_SERVER} \
        --databases ${DB_NAME} \
        >${DEST}/${TIMESTAMP}-${DB_NAME}.sql
gzip ${DEST}/${TIMESTAMP}-${DB_NAME}.sql

cd $SRC
tar cfz ${DEST}/${TIMESTAMP}-${DB_NAME}-filez.tar.gz .

This runs mysqldump to make an SQL dump of the database, which is then compressed using gzip. Next it uses tar to make a backup of the Wordpress directory, that is also gzip compressed. At the top are environment variables to make it easier to configure this script. We could add some shell wizardry to test if these variables are already set, to avoid overwriting an externally supplied value. That way a Compose file could supply configuration via environment variables.

The last thing you'll find is docker-compose.yml:

version: '3.8'

services:

    crond:
        image: crond-alpine-mysql
        networks:
            - wordpress-local_wpnet
        volumes:
            - ./hourly:/etc/periodic/hourly
            # This must be the path to where the Wordpress files are located
            - ../../wordpress-local/roots/html:/docroot
            - ./backups:/backups

networks:
    wordpress-local_wpnet:
        external: true

This uses the Docker container with the Dockerfile shown earlier. It attaches to the virtual network associated with the Wordpress and MySQL containers. It attaches to the necessary directories.

To launch: docker-compose up -d

After it launches you can login and inspect that everything was setup correctly:

$ docker exec -it wordpress-backup_crond_1 sh
/ # ls /docroot/
foo.txt               wp-comments-post.php  wp-login.php
index.php             wp-config-sample.php  wp-mail.php
license.txt           wp-config.php         wp-settings.php
readme.html           wp-content            wp-signup.php
test.php              wp-cron.php           wp-trackback.php
wp-activate.php       wp-includes           xmlrpc.php
wp-admin              wp-links-opml.php
wp-blog-header.php    wp-load.php
/ # ls /backups
/ # mysql -u dbuser -h db -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 55
...
MySQL [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| wpdb               |
+--------------------+
2 rows in set (0.025 sec)

MySQL [(none)]>

This container therefore is setup for running Cron plus it can run mysqldump. We've verified access to the database.

After launching the Compose file, you can view the logs:

$ docker-compose logs -f
Attaching to wordpress-backup_crond_1
crond_1  | crond: crond (busybox 1.32.1) started, log level 8
crond_1  | crond: USER root pid   7 cmd run-parts /etc/periodic/15min
crond_1  | crond: USER root pid  18 cmd run-parts /etc/periodic/15min
crond_1  | crond: USER root pid  23 cmd run-parts /etc/periodic/15min
crond_1  | crond: USER root pid  24 cmd run-parts /etc/periodic/hourly
crond_1  | run-parts: can't execute '/etc/periodic/hourly/backup': Permission denied

Whoops, we need to make the backup script executable:

$ chmod +x hourly/backup

Eventually the script will run, and the backups directory will get these files:

/etc/periodic # ls -l /backups/
total 13508
-rw-r--r--    1 root     root      12839372 Mar 23 20:00 2021-03-23-20:00-wpdb-filez.tar.gz
-rw-r--r--    1 root     root        161065 Mar 23 20:00 2021-03-23-20:00-wpdb.sql.gz

This demonstrates we can use a simple Docker container to manage one or several tasks such as backing up files related to a Wordpress website.

Using the productized version of Wordpress-backup-image just discussed

In the previous section we discussed a container that can be used for backing up a Wordpress container. The usage and contents are almost identical to what was just discussed.

The image, robogeek/crond, is on Docker Hub, and there is further documentation there. This image not only has the mysql-client installed, but also less, gzip and rsync because each might be useful in a backup script.

The docker-compose.yml I have on my server is:

version: '3.8'

services:
    crond:
        image: robogeek/crond:latest
        container_name: crond
        networks:
            - servernet
            - dbnet
        volumes:
            - /opt/SITE1/roots:/var/SITE1:rw
            - /opt/SITE1/scripts/SITE1.backup.sh:/etc/periodic/daily/SITE1:ro
            - /opt/SITE2/roots:/var/SITE2:rw
            - /opt/SITE2/scripts/SITE2.backup.sh:/etc/periodic/daily/SITE2:ro
            - /opt/SITE3/roots:/var/SITE3:rw
            - /opt/SITE3/scripts/SITE3.backup.sh:/etc/periodic/daily/SITE3:ro
            - /home/ubuntu/backups:/backups:rw

networks:
    servernet:
        external:
            name: servernet
    dbnet:
        external:
            name: dbnet

This is a little different from what was shown in the previous section. First, for each site the /opt/SITE-NAME/roots directory contains a directory, /opt/SITE-NAME/root/SITE-DOMAIN, which is the root of the files for the site. Each of the backup scripts knows this.

Second, the SITE-NAME.backup.sh script will not execute under the run-parts program, as noted earlier. Therefore it is mounted into the container as /etc/periodic/daily/SITE-NAME.

Creating a Docker container to run both Cron and NGINX

Now that we have demonstrated what we can do with a simple container solely running Cron, let's try something more complex. Namely a container with NGINX, where we've installed Lets Encrypt tools, and have a Cron job to manage SSL certificate renewal.

Since that's a large task, we have a companion article describing that setup: Manage Letsenrypt HTTPS/SSL certificates with a Docker container using Cron, Nginx, and Certbot

The Dockerfile used there is:

FROM nginx:stable

# Inspiration:
# https://hub.docker.com/r/gaafar/cron/

# Install cron, certbot, bash, plus any other dependencies

RUN apt-get update \
    && apt-get install -y cron bash wget certbot \
    && apt-get update -y \
    && mkdir -p /webroots /scripts \
    && rm -f /etc/nginx/conf.d/default.conf \
    && rm -f /etc/cron.d/certbot

COPY *.sh /scripts/
RUN chmod +x /scripts/*.sh

# /webroots/DOMAIN.TLD/.well-known/... files go here
VOLUME /webroots
# This handles book-keeping files for Letsencrypt
VOLUME /etc/letsencrypt
# This lets folks inject Nginx config files
VOLUME /etc/nginx/conf.d

WORKDIR /scripts

# This installs a Crontab entry which 
# runs "certbot renew" on several days a week at 03:22 AM

RUN echo "22 03 * * 2,7 root /scripts/renew.sh" >/etc/cron.d/certbot-renew

# Run both nginx and cron together
CMD [ "sh", "-c", "cron && nginx -g 'daemon off;'" ]

This Dockerfile does a bunch of setup which should probably be done a little differently. For example, rather than have a Cron job hard coded into the Dockerfile, the /etc/cron.daily directory should be exported to the host machine. Why that directory and not /etc/periodic/daily? The image shown here is a Debian image. It's worth checking whether the certbot package is available for Alpine, and using that instead, in which case we'd use the /etc/periodic/daily directory.

The important part for our discussion is the last line:

# Run both nginx and cron together
CMD [ "sh", "-c", "cron && nginx -g 'daemon off;'" ]

Run this way the container ends up with two background processes. The first is cron which runs the Cron background service on Debian. Using the && construct, we then run nginx. By using the daemon off argument, NGINX is run in the foreground which is required with Docker so that Docker can manage processes in the container.

The pattern is:

command1 && command2 && command3 && nginx -g 'daemon off;'

This is a Bash technique in which multiple commands are executed one after the other. So long as all commands execute with zero exit status the whole command will execute. But, Docker wants to treat the command in the CMD instruction as the primary service process for the container. The health of that process is one sign Docker reads to determine if the container is healthy.

For this to work, every command but the last command must automatically spin themselves into the background. The last command (in this case nginx) must stay in the foreground so that Docker will manage the container correctly. That way you'll end up with one or more background processes, plus a foreground process that Docker watches carefully.

Why did we want Cron and NGINX and Certbot in the same container? When Certbot downloads a new SSL certificate, because it renewed an old SSL certificate, NGINX must be restarted in order to recognize and use the new certificate. That means Certbot must be running inside the NGINX container so that it can send a Unix signal to the NGINX process. Since it is not possible for one Docker container to execute a process inside another Docker container, it is necessary to have a task scheduling process running inside the NGINX container. Hence, it was felt necessary to integrate Cron with NGINX to have a process scheduler in the NGINX container.

Running this container and demonstrating its use is beyond the scope of this article. There are NGINX configuration files, and the provisioning of SSL certificates from Lets Encrypt, which must be discussed. For all that see: Manage Letsenrypt HTTPS/SSL certificates with a Docker container using Cron, Nginx, and Certbot

Docker container with multiple services, including cron

A little further down the slippery slope is those Docker containers consisting of several service processes in one container. We've gotten ourselves into territory where the default behavior of Docker doesn't work so well. In the previous section we discussed one method for starting multiple processes in a Docker container. And we discussed how Docker watches the process in the CMD instruction as part of determining container health.

Docker's goal is service reliability by detecting unhealthy containers, then killing and restarting any such containers. But consider, what if the cron process discussed in the previous section were to crash? Docker would not see that the cron process had crashed because it is only watching the nginx process.

Generally speaking what's needed is a process manager. The process manager would be launched in the CMD instruction, and it would have the configuration required to launch NGINX, Cron, or any other process desired. For example the Gitlab project distributes a Docker container which launches the whole Gitlab infrastructure inside one container.

In traditional Unix-like systems the process manager was /etc/init. It relied on running shell scripts in /etc to set up the system and run background processes. More recently other tools have been developed for Linux or other systems.

One such tool is: s6: http://www.skarnet.org/software/s6/index.html -- It is described thusly:

s6 is a small suite of programs for UNIX, designed to allow process supervision (a.k.a service supervision), in the line of daemontools and runit, as well as various operations on processes and daemons. It is meant to be a toolbox for low-level process and service administration, providing different sets of independent tools that can be used within or without the framework, and that can be assembled together to achieve powerful functionality with a very small amount of code.

Several large scale Docker containers use S6. For example we discussed an NGINX+Cron container earlier. There is an open source project, NGINX Proxy Manager, that is a mature implementation of the same idea, which uses S6 to manage several services inside a container. But use of S6 is beyond the scope of what we want to cover in this tutorial.

Further, we have strayed into an area where Docker is not a good fit, so let's head to the Summary where we can discuss that problem.

Summary

In this tutorial we've explored several ways to use Cron in a Docker container to manage background tasks that run occasionally. It's potentially very powerful, especially when used well.

Old school system administrators like myself built finely crafted crontab files describing a carefully orchestrated set of background tasks. For example every 15 minutes we'd fire up UUCP to exchange files with any neighboring systems, or every day several scripts would run to trim old log files. And then there's the backup job which runs a system backup in the middle of the night. Using Docker to host a Cron service can provide the same result.

For example web servers like Apache or NGINX produce log files, and it's common practice to rotate the log files using a cron job. The Wordpress example shown earlier could be easily refactored a bit to handle log file rotation.

An alternative, Jobber, is a Docker-centric tool that serves a similar purpose to Cron. It's worth exploring.

But there's an issue to discuss about the number of service processes for each container. Many claim it is a best practice that Docker containers serve a single purpose. Therefore, any complex service deployment should use multiple containers.

In other words, in the NGINX+Cron example that should have been two container. The task scheduler service (cron) should have been its own container.

The rationale behind this single-purpose-for-each-container is sound. It's about reducing the complexity of each part of a system, making the system more reliable and maintainable. But, as with all theoretical stances, there are times where the theory doesn't work right.

We discussed this earlier in the case of the NGINX+Cron container. When the we provision new SSL certificates using Certbot, we must cause the NGINX process to restart. That's done by sending a Unix Signal to the NGINX process. That can only be done from inside the container where NGINX lives, and if Certbot is run in a different container then we cannot send the Unix signal. Therefore it was necessary to build task scheduling into the NGINX container, leading us to using Cron.

About the Author(s)

David Herron : David Herron is a writer and software engineer focusing on the wise use of technology. He is especially interested in clean energy technologies like solar power, wind power, and electric cars. David worked for nearly 30 years in Silicon Valley on software ranging from electronic mail systems, to video streaming, to the Java programming language, and has published several books on Node.js programming and electric vehicles.

Self-hosted Docker infrastructure in home or office using low-cost computers like Intel NUC Using multiple databases and PHPMyAdmin in a single MySQL Docker instance