ns8-backup-monitor
A lightweight webhook receiver for NethServer 8 that intercepts Alertmanager backup failure alerts, enriches them with per-module status data from the cluster Redis, optionally checks repository health via restic, and delivers a detailed email notification through the NS8 configured mail relay.
Unlike solutions that hook into run-backup (which only fires on manual UI launches), this service listens to the Alertmanager webhook channel — the same source used by the NS8 monitoring stack — and therefore captures both manual and scheduled automatic backups.
Architecture overview
Alertmanager
│ POST /alert (NsBackupFailed | NsBackupMissing)
▼
[receiver.py] HTTP webhook listener (localhost:9099)
│ waits N seconds for modules to settle
▼
[correlator.py] Reads Redis cluster state, classifies outcome
│ SUCCESS | PARTIAL | REPO_FAILURE
▼
[repo_check.py] (only on non-SUCCESS) Probes restic repos via runagent
▼
[notifier.py] Builds HTML/text email, sends via ns8-sendmail
Requirements
| Dependency | Notes |
|---|---|
| NS8 leader or worker node | Must have access to the cluster Redis socket |
redis-cli |
Included in standard NS8 installations |
runagent |
NS8 binary used to invoke restic inside module containers |
ns8-sendmail |
NS8 mail relay script (invoked via runagent) |
| Python 3.8+ | Standard library only — no pip dependencies |
| Alertmanager | Must be configured to send webhooks to this service |
File layout
ns8-backup-monitor/
│
├── README.md ← This file
│
├── config/
│ └── config.yml.example ← Annotated configuration template
│
├── deploy/
│ ├── install.sh ← Interactive installer / uninstaller
│ └── ns8-backup-monitor.service ← systemd unit file
│
└── ns8_backup_monitor/ ← Python package (main application)
├── __init__.py ← Package marker, exposes version
├── __main__.py ← CLI entry point (`python3 -m ns8_backup_monitor`)
├── receiver.py ← HTTP webhook server (Alertmanager → pipeline)
├── correlator.py ← Redis reader and outcome classifier
├── repo_check.py ← restic repository health prober
├── notifier.py ← Email builder and sender
└── utils.py ← Config loader and logging setup
Runtime paths (after installation)
| Path | Purpose |
|---|---|
/opt/ns8-backup-monitor/ |
Application root (Python package) |
/etc/ns8-backup-monitor/config.yml |
Active configuration file |
/etc/systemd/system/ns8-backup-monitor.service |
systemd unit |
/var/log/ns8-backup-monitor/ |
Log directory (if file logging is enabled) |
/var/lib/nethserver/cluster/state/redis.sock |
NS8 cluster Redis socket (default) |
Installation
Quick install (interactive)
bash <(curl -fsSL https://repo.lelekaos.com/admin/ns8-backup-monitor/raw/branch/main/deploy/install.sh)
Note: Use
bash <(curl ...)rather thancurl ... | bash. The interactive installer reads answers from your terminal viaread; piping stdin from curl breaks that interaction.
Non-interactive install (CI / automation)
curl -fsSL https://repo.lelekaos.com/admin/ns8-backup-monitor/raw/branch/main/deploy/install.sh \
| bash -s -- \
--from "backup@example.com" \
--to "admin@example.com"
What the installer does
- Copies the Python package to
/opt/ns8-backup-monitor/ - Writes
/etc/ns8-backup-monitor/config.ymlfrom the template - Installs and enables the systemd unit
- Prints the Alertmanager webhook receiver URL
Uninstallation
bash /opt/ns8-backup-monitor/deploy/install.sh --uninstall
The uninstaller stops and removes the systemd unit, then optionally removes the configuration directory.
Configuration
The active configuration file is /etc/ns8-backup-monitor/config.yml.
Edit it directly and restart the service to apply changes.
nano /etc/ns8-backup-monitor/config.yml
systemctl restart ns8-backup-monitor
See config/config.yml.example for a fully annotated reference with all available options.
Key sections
# ── Mail settings ─────────────────────────────────────────────
mail:
from: "backup@ns02.example.com" # Envelope From address
to:
- "admin@example.com" # One or more recipient addresses
subject_prefix: "[NS8 Backup]" # Prepended to every subject line
# ── Webhook receiver ──────────────────────────────────────────
receiver:
host: "127.0.0.1" # Bind address (keep localhost unless Alertmanager is remote)
port: 9099 # Must match the Alertmanager webhook URL
# ── Correlator behaviour ─────────────────────────────────────
correlator:
wait_seconds: 30 # Seconds to wait after alert before reading Redis
# (allows slow modules to write their final status)
recent_window: 3600 # When no backup_id label is present, scan Redis for
# plan status keys updated within this many seconds
# ── Redis connection ─────────────────────────────────────────
redis:
socket: "/var/lib/nethserver/cluster/state/redis.sock"
# ── Repository health check ──────────────────────────────────
repo_check:
enabled: true
timeout: 60 # Seconds per restic check call
# ── Logging ──────────────────────────────────────────────────
logging:
level: "INFO" # DEBUG | INFO | WARNING | ERROR
file: "" # Leave empty to log to stdout (journald captures it)
Alertmanager integration
Add a receiver to your Alertmanager configuration on the NS8 leader node
(/etc/alertmanager/alertmanager.yml or via the NS8 metrics1 module):
receivers:
- name: ns8-backup-monitor
webhook_configs:
- url: "http://127.0.0.1:9099/alert"
send_resolved: false
route:
receiver: ns8-backup-monitor
group_by: [alertname]
group_wait: 10s
group_interval: 5m
repeat_interval: 12h
routes:
- match_re:
alertname: "NsBackupFailed|NsBackupMissing"
receiver: ns8-backup-monitor
The service handles two alert names:
| Alert name | Meaning |
|---|---|
NsBackupFailed |
One or more backup modules reported an error |
NsBackupMissing |
Expected backup did not run within the time window |
Outcome classification
After reading per-module Redis keys, the correlator assigns one of three outcomes:
| Outcome | Condition | Email subject |
|---|---|---|
SUCCESS |
All modules succeeded | ✅ Backup completed successfully |
PARTIAL |
Some modules failed, some succeeded | ⚠️ Backup partially failed |
REPO_FAILURE |
All modules failed, or no status found in Redis | ❌ Backup failed – possible repository error |
On PARTIAL or REPO_FAILURE, the repo health check runs automatically and appends
diagnostic information (restic error output) to the email.
Redis key structure
The correlator reads the following NS8 Redis key patterns:
cluster/backup/<backup_id>/status → overall plan status (hash)
module/<module_id>/backup/<backup_id>/status → per-module status (hash)
Hash fields:
| Field | Values | Description |
|---|---|---|
result |
success / error |
Outcome of the backup operation |
timestamp |
ISO 8601 | When the status was last written |
error |
string | Error message, if any |
errors |
integer | Number of module errors (plan-level hash only) |
Service management
# Check service status
systemctl status ns8-backup-monitor
# View live logs
journalctl -u ns8-backup-monitor -f
# Restart after config change
systemctl restart ns8-backup-monitor
# Disable on boot
systemctl disable ns8-backup-monitor
Troubleshooting
Service fails to start
journalctl -u ns8-backup-monitor --no-pager -n 50
Common causes:
config.ymlnot found at the expected path → check/etc/ns8-backup-monitor/config.yml- Port 9099 already in use → change
receiver.portin config
No email received after a backup failure
-
Verify Alertmanager is firing the webhook:
journalctl -u ns8-backup-monitor -fYou should see
Received N relevant alert(s)within a minute of the backup failure. -
Check that
wait_secondshas elapsed (default 30 s) and look forSending notification...in the log. -
Verify the mail relay works independently:
echo "Test" | runagent ns8-sendmail -s "test" admin@example.com
Correlator finds no modules
If the log shows No recent backup status keys found in Redis, possible causes:
recent_windowis too short — the backup ran more than 1 hour ago- Redis socket path is wrong for your installation
- The backup plan wrote status to a non-standard key pattern
Development
The application is pure Python 3 with no third-party dependencies.
# Run locally (requires NS8 Redis socket access)
python3 -m ns8_backup_monitor --config ./config/config.yml.example
# Send a test webhook payload
curl -s -X POST http://127.0.0.1:9099/alert \
-H "Content-Type: application/json" \
-d '{"alerts":[{"status":"firing","labels":{"alertname":"NsBackupFailed","backup_id":"1"}}]}'
License
MIT License — contributions welcome via pull request.