# ns8-backup-monitor > **NethServer 8 backup failure notification service.** > > Receives Alertmanager webhook alerts, correlates per-module backup status > from the cluster Redis, optionally probes restic repositories, and sends a > detailed HTML/text email through the NS8 mail relay. --- ## Table of contents 1. [Architecture](#architecture) 2. [File layout](#file-layout) 3. [Runtime paths](#runtime-paths) 4. [Requirements](#requirements) 5. [Installation](#installation) 6. [Configuration](#configuration) 7. [Alertmanager integration](#alertmanager-integration) 8. [Outcome classification](#outcome-classification) 9. [Redis key structure](#redis-key-structure) 10. [Service management](#service-management) 11. [Troubleshooting](#troubleshooting) 12. [Uninstallation](#uninstallation) 13. [License](#license) --- ## Architecture ``` Alertmanager ──POST /alert──► receiver.py │ (wait N seconds for all modules to finish writing their status) │ ▼ correlator.py (reads Redis KEYS/HGETALL, classifies outcome: SUCCESS / PARTIAL / REPO_FAILURE) │ ▼ repo_check.py ← optional (runagent → restic snapshots on each module's repository) │ ▼ notifier.py (builds HTML + plain-text email, dispatches via ns8-sendmail) ``` **Key design decision:** the service is a long-running HTTP server managed by systemd, not a one-shot script. This means it is always ready to receive an alert regardless of whether the backup was triggered manually or by a scheduled timer. --- ## File layout ``` ns8-backup-monitor/ │ ├── README.md ← this file │ ├── config/ │ └── config.yml.example ← annotated configuration template │ (copy to /etc/ns8-backup-monitor/config.yml) │ ├── deploy/ │ ├── install.sh ← interactive installer / uninstaller │ └── ns8-backup-monitor.service ← systemd unit file │ └── ns8_backup_monitor/ ← Python package ├── __init__.py ← package metadata, version string ├── __main__.py ← entry point: arg parsing, logging init, │ hands off to receiver.run_server() ├── receiver.py ← HTTP webhook server (POST /alert) ├── correlator.py ← reads Redis, classifies backup outcome ├── repo_check.py ← probes restic repositories via runagent ├── notifier.py ← builds and sends email notifications └── utils.py ← load_config(), setup_logging() ``` --- ## Runtime paths The following paths are created by `deploy/install.sh` and assumed by the default configuration. | Purpose | Path | |---------|------| | Python package | `/opt/ns8-backup-monitor/ns8_backup_monitor/` | | Deploy scripts | `/opt/ns8-backup-monitor/deploy/` | | Configuration | `/etc/ns8-backup-monitor/config.yml` | | systemd unit | `/etc/systemd/system/ns8-backup-monitor.service` | | Log file | `/var/log/ns8-backup-monitor.log` | | NS8 Redis socket | `/var/lib/nethserver/cluster/state/redis.sock` | --- ## Requirements | Dependency | Provided by | Notes | |------------|------------|-------| | `python3` ≥ 3.8 | OS | Standard on AlmaLinux / Rocky 8+ | | `pyyaml` | `pip3 install pyyaml` | Only non-stdlib dependency | | `redis-cli` | NethServer 8 | Used via subprocess, no Python Redis client needed | | `runagent` | NethServer 8 | Required for `repo_check` only | | `ns8-sendmail` | NethServer 8 | Required for email delivery | | `systemd` | OS | Service management | > **This service must run on an NS8 leader node** (or any node that has > read access to the cluster Redis socket and `runagent` in `PATH`). --- ## Installation ### One-liner (recommended) ```bash bash <(curl -fsSL https://repo.lelekaos.com/admin/ns8-backup-monitor/raw/branch/main/deploy/install.sh) ``` The installer will: 1. Check prerequisites (`python3`, `curl`, `tar`, `ns8-sendmail`). 2. Download and extract the latest source archive from the Gitea repository. 3. Prompt interactively for sender address, recipient list, and subject prefix. 4. Write `/etc/ns8-backup-monitor/config.yml` with the supplied values. 5. Install and start the systemd service. ### Manual installation ```bash git clone https://repo.lelekaos.com/admin/ns8-backup-monitor.git cd ns8-backup-monitor # Install Python dependency pip3 install pyyaml # Create directories mkdir -p /opt/ns8-backup-monitor /etc/ns8-backup-monitor # Copy source and config template cp -r . /opt/ns8-backup-monitor/ cp config/config.yml.example /etc/ns8-backup-monitor/config.yml # Edit the config before starting nano /etc/ns8-backup-monitor/config.yml # Install systemd unit cp deploy/ns8-backup-monitor.service /etc/systemd/system/ systemctl daemon-reload systemctl enable --now ns8-backup-monitor ``` --- ## Configuration The configuration file is a YAML document. The installer writes it to `/etc/ns8-backup-monitor/config.yml`; a fully annotated template is available at `config/config.yml.example`. ```yaml # --------------------------------------------------------------------------- # Email notification settings # --------------------------------------------------------------------------- # Delivery is handled by ns8-sendmail, which uses the SMTP relay already # configured in NethServer 8. No SMTP credentials are needed here. mail: # Envelope / header sender address. from: "ns8-backup-monitor@yourdomain.com" # One or more recipient addresses. At least one is required. to: - "admin@yourdomain.com" # String prepended to every email subject line. subject_prefix: "[NS8 Backup]" # --------------------------------------------------------------------------- # Webhook receiver (HTTP server) # --------------------------------------------------------------------------- receiver: # Interface to listen on. 127.0.0.1 is recommended when Alertmanager # runs on the same host; use 0.0.0.0 only if it runs on a different node. host: "127.0.0.1" # TCP port. Must match the webhook URL configured in Alertmanager. port: 9099 # --------------------------------------------------------------------------- # Timing # --------------------------------------------------------------------------- correlator: # Seconds to wait after receiving the alert before reading Redis. # This grace period allows all module agents to finish writing their # per-module status hashes. 30 s is sufficient for most deployments. wait_seconds: 30 # Look-back window in seconds used when the alert does not include a # backup_id label. Any plan whose Redis status was updated within this # window is considered "recent" and included in the report. recent_window: 3600 # --------------------------------------------------------------------------- # Redis connection # --------------------------------------------------------------------------- redis: # Path to the NS8 cluster Redis Unix socket. # On a standard NS8 installation this path never changes. socket: "/var/lib/nethserver/cluster/state/redis.sock" # --------------------------------------------------------------------------- # Repository check (optional, uses runagent + restic) # --------------------------------------------------------------------------- repo_check: # Maximum seconds to wait for each repository check before giving up. timeout: 60 # Extra flags passed verbatim to every restic invocation. # Example: "--cacert /etc/pki/tls/certs/ca-bundle.crt" restic_flags: "" # --------------------------------------------------------------------------- # Logging # --------------------------------------------------------------------------- logging: # Python log level: DEBUG, INFO, WARNING, ERROR. level: INFO # Absolute path for the rotating log file (5 MB × 3 backups). # Leave empty to log to stdout / journald only. file: "/var/log/ns8-backup-monitor.log" ``` --- ## Alertmanager integration Add a receiver pointing to the service in your Alertmanager configuration: ```yaml # alertmanager.yml (relevant excerpt) route: receiver: ns8-backup-monitor # Only route backup-related alerts to this receiver. routes: - match: alertname: NethServerBackupFailed receiver: ns8-backup-monitor receivers: - name: ns8-backup-monitor webhook_configs: - url: "http://127.0.0.1:9099/alert" # Send resolved alerts too so the service can log them. send_resolved: true ``` Reload Alertmanager after editing: ```bash systemctl reload alertmanager # or, for the NS8 metrics module: runagent -m metrics1 systemctl reload alertmanager ``` --- ## Outcome classification For each backup plan the correlator reads all per-module status hashes and produces one of three outcomes: | Outcome | Condition | Email subject | |---------|-----------|---------------| | `SUCCESS` | All modules finished with `result=success` | `✅ Backup completed` | | `PARTIAL` | At least one module succeeded, at least one failed | `⚠️ Backup partially failed` | | `REPO_FAILURE` | All modules failed **or** no status found in Redis | `❌ Backup failed` | --- ## Redis key structure The correlator reads two families of keys from the NS8 cluster Redis: | Key pattern | Description | |-------------|-------------| | `cluster/backup//status` | Plan-level status hash. Fields: `result`, `timestamp`, `errors` (integer count). | | `module//backup//status` | Per-module status hash. Fields: `result`, `timestamp`, `error` (message string). | `result` is either `"success"` or `"error"`. `timestamp` is an ISO 8601 string in UTC (e.g. `2024-01-15T03:00:05Z`). --- ## Service management ```bash # Check service status systemctl status ns8-backup-monitor # Follow live logs via journald journalctl -u ns8-backup-monitor -f # Follow the rotating log file directly tail -f /var/log/ns8-backup-monitor.log # Restart after a config change systemctl restart ns8-backup-monitor # Test the webhook endpoint manually curl -s -X POST http://127.0.0.1:9099/alert \ -H 'Content-Type: application/json' \ -d '{"alerts":[{"status":"firing","labels":{"alertname":"NethServerBackupFailed"}}]}' ``` --- ## Troubleshooting ### Service starts but no email is received 1. Verify `ns8-sendmail` works independently: ```bash echo 'Test' | ns8-sendmail -s 'Test' admin@yourdomain.com ``` 2. Check `mail.to` in `/etc/ns8-backup-monitor/config.yml`. 3. Increase log level to `DEBUG` and restart the service. ### `REPO_FAILURE` on every alert even though backups succeed - The correlator may be reading Redis before all modules have finished. Increase `correlator.wait_seconds` (e.g. to `60`). - Check that the Redis socket path is correct: `redis-cli -s /var/lib/nethserver/cluster/state/redis.sock PING` ### Alertmanager does not reach the webhook - Confirm the service is listening: `ss -tlnp | grep 9099` - If Alertmanager runs on a different host, change `receiver.host` to `0.0.0.0` and open the port in the firewall. --- ## Uninstallation ```bash bash /opt/ns8-backup-monitor/deploy/install.sh --uninstall ``` The script will stop and disable the service, remove the install directory, and optionally remove the configuration directory. --- ## License MIT — see [LICENSE](LICENSE) if present, otherwise contact the repository owner.