ns8-backup-monitor
NethServer 8 backup failure notification service.
Receives Alertmanager webhook alerts, correlates per-module backup status from the cluster Redis, optionally probes restic repositories, and sends a detailed HTML/text email through the NS8 mail relay.
Table of contents
- Architecture
- File layout
- Runtime paths
- Requirements
- Installation
- Configuration
- Alertmanager integration
- Outcome classification
- Redis key structure
- Service management
- Troubleshooting
- Uninstallation
- License
Architecture
Alertmanager ──POST /alert──► receiver.py
│
(wait N seconds for all modules
to finish writing their status)
│
▼
correlator.py
(reads Redis KEYS/HGETALL,
classifies outcome:
SUCCESS / PARTIAL / REPO_FAILURE)
│
▼
repo_check.py ← optional
(runagent → restic snapshots
on each module's repository)
│
▼
notifier.py
(builds HTML + plain-text email,
dispatches via ns8-sendmail)
Key design decision: the service is a long-running HTTP server managed by systemd, not a one-shot script. This means it is always ready to receive an alert regardless of whether the backup was triggered manually or by a scheduled timer.
File layout
ns8-backup-monitor/
│
├── README.md ← this file
│
├── config/
│ └── config.yml.example ← annotated configuration template
│ (copy to /etc/ns8-backup-monitor/config.yml)
│
├── deploy/
│ ├── install.sh ← interactive installer / uninstaller
│ └── ns8-backup-monitor.service ← systemd unit file
│
└── ns8_backup_monitor/ ← Python package
├── __init__.py ← package metadata, version string
├── __main__.py ← entry point: arg parsing, logging init,
│ hands off to receiver.run_server()
├── receiver.py ← HTTP webhook server (POST /alert)
├── correlator.py ← reads Redis, classifies backup outcome
├── repo_check.py ← probes restic repositories via runagent
├── notifier.py ← builds and sends email notifications
└── utils.py ← load_config(), setup_logging()
Runtime paths
The following paths are created by deploy/install.sh and assumed by the
default configuration.
| Purpose | Path |
|---|---|
| Python package | /opt/ns8-backup-monitor/ns8_backup_monitor/ |
| Deploy scripts | /opt/ns8-backup-monitor/deploy/ |
| Configuration | /etc/ns8-backup-monitor/config.yml |
| systemd unit | /etc/systemd/system/ns8-backup-monitor.service |
| Log file | /var/log/ns8-backup-monitor.log |
| NS8 Redis socket | /var/lib/nethserver/cluster/state/redis.sock |
Requirements
| Dependency | Provided by | Notes |
|---|---|---|
python3 ≥ 3.8 |
OS | Standard on AlmaLinux / Rocky 8+ |
pyyaml |
pip3 install pyyaml |
Only non-stdlib dependency |
redis-cli |
NethServer 8 | Used via subprocess, no Python Redis client needed |
runagent |
NethServer 8 | Required for repo_check only |
ns8-sendmail |
NethServer 8 | Required for email delivery |
systemd |
OS | Service management |
This service must run on an NS8 leader node (or any node that has read access to the cluster Redis socket and
runagentinPATH).
Installation
One-liner (recommended)
bash <(curl -fsSL https://repo.lelekaos.com/admin/ns8-backup-monitor/raw/branch/main/deploy/install.sh)
The installer will:
- Check prerequisites (
python3,curl,tar,ns8-sendmail). - Download and extract the latest source archive from the Gitea repository.
- Prompt interactively for sender address, recipient list, and subject prefix.
- Write
/etc/ns8-backup-monitor/config.ymlwith the supplied values. - Install and start the systemd service.
Manual installation
git clone https://repo.lelekaos.com/admin/ns8-backup-monitor.git
cd ns8-backup-monitor
# Install Python dependency
pip3 install pyyaml
# Create directories
mkdir -p /opt/ns8-backup-monitor /etc/ns8-backup-monitor
# Copy source and config template
cp -r . /opt/ns8-backup-monitor/
cp config/config.yml.example /etc/ns8-backup-monitor/config.yml
# Edit the config before starting
nano /etc/ns8-backup-monitor/config.yml
# Install systemd unit
cp deploy/ns8-backup-monitor.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now ns8-backup-monitor
Configuration
The configuration file is a YAML document. The installer writes it to
/etc/ns8-backup-monitor/config.yml; a fully annotated template is available
at config/config.yml.example.
# ---------------------------------------------------------------------------
# Email notification settings
# ---------------------------------------------------------------------------
# Delivery is handled by ns8-sendmail, which uses the SMTP relay already
# configured in NethServer 8. No SMTP credentials are needed here.
mail:
# Envelope / header sender address.
from: "ns8-backup-monitor@yourdomain.com"
# One or more recipient addresses. At least one is required.
to:
- "admin@yourdomain.com"
# String prepended to every email subject line.
subject_prefix: "[NS8 Backup]"
# ---------------------------------------------------------------------------
# Webhook receiver (HTTP server)
# ---------------------------------------------------------------------------
receiver:
# Interface to listen on. 127.0.0.1 is recommended when Alertmanager
# runs on the same host; use 0.0.0.0 only if it runs on a different node.
host: "127.0.0.1"
# TCP port. Must match the webhook URL configured in Alertmanager.
port: 9099
# ---------------------------------------------------------------------------
# Timing
# ---------------------------------------------------------------------------
correlator:
# Seconds to wait after receiving the alert before reading Redis.
# This grace period allows all module agents to finish writing their
# per-module status hashes. 30 s is sufficient for most deployments.
wait_seconds: 30
# Look-back window in seconds used when the alert does not include a
# backup_id label. Any plan whose Redis status was updated within this
# window is considered "recent" and included in the report.
recent_window: 3600
# ---------------------------------------------------------------------------
# Redis connection
# ---------------------------------------------------------------------------
redis:
# Path to the NS8 cluster Redis Unix socket.
# On a standard NS8 installation this path never changes.
socket: "/var/lib/nethserver/cluster/state/redis.sock"
# ---------------------------------------------------------------------------
# Repository check (optional, uses runagent + restic)
# ---------------------------------------------------------------------------
repo_check:
# Maximum seconds to wait for each repository check before giving up.
timeout: 60
# Extra flags passed verbatim to every restic invocation.
# Example: "--cacert /etc/pki/tls/certs/ca-bundle.crt"
restic_flags: ""
# ---------------------------------------------------------------------------
# Logging
# ---------------------------------------------------------------------------
logging:
# Python log level: DEBUG, INFO, WARNING, ERROR.
level: INFO
# Absolute path for the rotating log file (5 MB × 3 backups).
# Leave empty to log to stdout / journald only.
file: "/var/log/ns8-backup-monitor.log"
Alertmanager integration
Add a receiver pointing to the service in your Alertmanager configuration:
# alertmanager.yml (relevant excerpt)
route:
receiver: ns8-backup-monitor
# Only route backup-related alerts to this receiver.
routes:
- match:
alertname: NethServerBackupFailed
receiver: ns8-backup-monitor
receivers:
- name: ns8-backup-monitor
webhook_configs:
- url: "http://127.0.0.1:9099/alert"
# Send resolved alerts too so the service can log them.
send_resolved: true
Reload Alertmanager after editing:
systemctl reload alertmanager
# or, for the NS8 metrics module:
runagent -m metrics1 systemctl reload alertmanager
Outcome classification
For each backup plan the correlator reads all per-module status hashes and produces one of three outcomes:
| Outcome | Condition | Email subject |
|---|---|---|
SUCCESS |
All modules finished with result=success |
✅ Backup completed |
PARTIAL |
At least one module succeeded, at least one failed | ⚠️ Backup partially failed |
REPO_FAILURE |
All modules failed or no status found in Redis | ❌ Backup failed |
Redis key structure
The correlator reads two families of keys from the NS8 cluster Redis:
| Key pattern | Description |
|---|---|
cluster/backup/<backup_id>/status |
Plan-level status hash. Fields: result, timestamp, errors (integer count). |
module/<module_id>/backup/<backup_id>/status |
Per-module status hash. Fields: result, timestamp, error (message string). |
result is either "success" or "error". timestamp is an ISO 8601
string in UTC (e.g. 2024-01-15T03:00:05Z).
Service management
# Check service status
systemctl status ns8-backup-monitor
# Follow live logs via journald
journalctl -u ns8-backup-monitor -f
# Follow the rotating log file directly
tail -f /var/log/ns8-backup-monitor.log
# Restart after a config change
systemctl restart ns8-backup-monitor
# Test the webhook endpoint manually
curl -s -X POST http://127.0.0.1:9099/alert \
-H 'Content-Type: application/json' \
-d '{"alerts":[{"status":"firing","labels":{"alertname":"NethServerBackupFailed"}}]}'
Troubleshooting
Service starts but no email is received
- Verify
ns8-sendmailworks independently:echo 'Test' | ns8-sendmail -s 'Test' admin@yourdomain.com - Check
mail.toin/etc/ns8-backup-monitor/config.yml. - Increase log level to
DEBUGand restart the service.
REPO_FAILURE on every alert even though backups succeed
- The correlator may be reading Redis before all modules have finished.
Increasecorrelator.wait_seconds(e.g. to60). - Check that the Redis socket path is correct:
redis-cli -s /var/lib/nethserver/cluster/state/redis.sock PING
Alertmanager does not reach the webhook
- Confirm the service is listening:
ss -tlnp | grep 9099 - If Alertmanager runs on a different host, change
receiver.hostto0.0.0.0and open the port in the firewall.
Uninstallation
bash /opt/ns8-backup-monitor/deploy/install.sh --uninstall
The script will stop and disable the service, remove the install directory, and optionally remove the configuration directory.
License
MIT — see LICENSE if present, otherwise contact the repository owner.