docs: rewrite README with full file map, architecture, configuration and public-ready comments
This commit is contained in:
@@ -1,40 +1,60 @@
|
|||||||
# ns8-backup-monitor
|
# ns8-backup-monitor
|
||||||
|
|
||||||
A lightweight webhook receiver for **NethServer 8** that intercepts Alertmanager backup failure alerts, enriches them with per-module status data from the cluster Redis, optionally checks repository health via `restic`, and delivers a detailed email notification through the NS8 configured mail relay.
|
> **NethServer 8 backup failure notification service.**
|
||||||
|
>
|
||||||
Unlike solutions that hook into `run-backup` (which only fires on manual UI launches), this service listens to the Alertmanager webhook channel — the same source used by the NS8 monitoring stack — and therefore captures **both manual and scheduled automatic backups**.
|
> Receives Alertmanager webhook alerts, correlates per-module backup status
|
||||||
|
> from the cluster Redis, optionally probes restic repositories, and sends a
|
||||||
|
> detailed HTML/text email through the NS8 mail relay.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Architecture overview
|
## Table of contents
|
||||||
|
|
||||||
```
|
1. [Architecture](#architecture)
|
||||||
Alertmanager
|
2. [File layout](#file-layout)
|
||||||
│ POST /alert (NsBackupFailed | NsBackupMissing)
|
3. [Runtime paths](#runtime-paths)
|
||||||
▼
|
4. [Requirements](#requirements)
|
||||||
[receiver.py] HTTP webhook listener (localhost:9099)
|
5. [Installation](#installation)
|
||||||
│ waits N seconds for modules to settle
|
6. [Configuration](#configuration)
|
||||||
▼
|
7. [Alertmanager integration](#alertmanager-integration)
|
||||||
[correlator.py] Reads Redis cluster state, classifies outcome
|
8. [Outcome classification](#outcome-classification)
|
||||||
│ SUCCESS | PARTIAL | REPO_FAILURE
|
9. [Redis key structure](#redis-key-structure)
|
||||||
▼
|
10. [Service management](#service-management)
|
||||||
[repo_check.py] (only on non-SUCCESS) Probes restic repos via runagent
|
11. [Troubleshooting](#troubleshooting)
|
||||||
▼
|
12. [Uninstallation](#uninstallation)
|
||||||
[notifier.py] Builds HTML/text email, sends via ns8-sendmail
|
13. [License](#license)
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Requirements
|
## Architecture
|
||||||
|
|
||||||
| Dependency | Notes |
|
```
|
||||||
|---|---|
|
Alertmanager ──POST /alert──► receiver.py
|
||||||
| NS8 leader or worker node | Must have access to the cluster Redis socket |
|
│
|
||||||
| `redis-cli` | Included in standard NS8 installations |
|
(wait N seconds for all modules
|
||||||
| `runagent` | NS8 binary used to invoke `restic` inside module containers |
|
to finish writing their status)
|
||||||
| `ns8-sendmail` | NS8 mail relay script (invoked via `runagent`) |
|
│
|
||||||
| Python 3.8+ | Standard library only — no pip dependencies |
|
▼
|
||||||
| Alertmanager | Must be configured to send webhooks to this service |
|
correlator.py
|
||||||
|
(reads Redis KEYS/HGETALL,
|
||||||
|
classifies outcome:
|
||||||
|
SUCCESS / PARTIAL / REPO_FAILURE)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
repo_check.py ← optional
|
||||||
|
(runagent → restic snapshots
|
||||||
|
on each module's repository)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
notifier.py
|
||||||
|
(builds HTML + plain-text email,
|
||||||
|
dispatches via ns8-sendmail)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key design decision:** the service is a long-running HTTP server managed by
|
||||||
|
systemd, not a one-shot script. This means it is always ready to receive an
|
||||||
|
alert regardless of whether the backup was triggered manually or by a scheduled
|
||||||
|
timer.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -43,193 +63,236 @@ Alertmanager
|
|||||||
```
|
```
|
||||||
ns8-backup-monitor/
|
ns8-backup-monitor/
|
||||||
│
|
│
|
||||||
├── README.md ← This file
|
├── README.md ← this file
|
||||||
│
|
│
|
||||||
├── config/
|
├── config/
|
||||||
│ └── config.yml.example ← Annotated configuration template
|
│ └── config.yml.example ← annotated configuration template
|
||||||
|
│ (copy to /etc/ns8-backup-monitor/config.yml)
|
||||||
│
|
│
|
||||||
├── deploy/
|
├── deploy/
|
||||||
│ ├── install.sh ← Interactive installer / uninstaller
|
│ ├── install.sh ← interactive installer / uninstaller
|
||||||
│ └── ns8-backup-monitor.service ← systemd unit file
|
│ └── ns8-backup-monitor.service ← systemd unit file
|
||||||
│
|
│
|
||||||
└── ns8_backup_monitor/ ← Python package (main application)
|
└── ns8_backup_monitor/ ← Python package
|
||||||
├── __init__.py ← Package marker, exposes version
|
├── __init__.py ← package metadata, version string
|
||||||
├── __main__.py ← CLI entry point (`python3 -m ns8_backup_monitor`)
|
├── __main__.py ← entry point: arg parsing, logging init,
|
||||||
├── receiver.py ← HTTP webhook server (Alertmanager → pipeline)
|
│ hands off to receiver.run_server()
|
||||||
├── correlator.py ← Redis reader and outcome classifier
|
├── receiver.py ← HTTP webhook server (POST /alert)
|
||||||
├── repo_check.py ← restic repository health prober
|
├── correlator.py ← reads Redis, classifies backup outcome
|
||||||
├── notifier.py ← Email builder and sender
|
├── repo_check.py ← probes restic repositories via runagent
|
||||||
└── utils.py ← Config loader and logging setup
|
├── notifier.py ← builds and sends email notifications
|
||||||
|
└── utils.py ← load_config(), setup_logging()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Runtime paths (after installation)
|
---
|
||||||
|
|
||||||
| Path | Purpose |
|
## Runtime paths
|
||||||
|---|---|
|
|
||||||
| `/opt/ns8-backup-monitor/` | Application root (Python package) |
|
The following paths are created by `deploy/install.sh` and assumed by the
|
||||||
| `/etc/ns8-backup-monitor/config.yml` | Active configuration file |
|
default configuration.
|
||||||
| `/etc/systemd/system/ns8-backup-monitor.service` | systemd unit |
|
|
||||||
| `/var/log/ns8-backup-monitor/` | Log directory (if file logging is enabled) |
|
| Purpose | Path |
|
||||||
| `/var/lib/nethserver/cluster/state/redis.sock` | NS8 cluster Redis socket (default) |
|
|---------|------|
|
||||||
|
| Python package | `/opt/ns8-backup-monitor/ns8_backup_monitor/` |
|
||||||
|
| Deploy scripts | `/opt/ns8-backup-monitor/deploy/` |
|
||||||
|
| Configuration | `/etc/ns8-backup-monitor/config.yml` |
|
||||||
|
| systemd unit | `/etc/systemd/system/ns8-backup-monitor.service` |
|
||||||
|
| Log file | `/var/log/ns8-backup-monitor.log` |
|
||||||
|
| NS8 Redis socket | `/var/lib/nethserver/cluster/state/redis.sock` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
| Dependency | Provided by | Notes |
|
||||||
|
|------------|------------|-------|
|
||||||
|
| `python3` ≥ 3.8 | OS | Standard on AlmaLinux / Rocky 8+ |
|
||||||
|
| `pyyaml` | `pip3 install pyyaml` | Only non-stdlib dependency |
|
||||||
|
| `redis-cli` | NethServer 8 | Used via subprocess, no Python Redis client needed |
|
||||||
|
| `runagent` | NethServer 8 | Required for `repo_check` only |
|
||||||
|
| `ns8-sendmail` | NethServer 8 | Required for email delivery |
|
||||||
|
| `systemd` | OS | Service management |
|
||||||
|
|
||||||
|
> **This service must run on an NS8 leader node** (or any node that has
|
||||||
|
> read access to the cluster Redis socket and `runagent` in `PATH`).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
### Quick install (interactive)
|
### One-liner (recommended)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash <(curl -fsSL https://repo.lelekaos.com/admin/ns8-backup-monitor/raw/branch/main/deploy/install.sh)
|
bash <(curl -fsSL https://repo.lelekaos.com/admin/ns8-backup-monitor/raw/branch/main/deploy/install.sh)
|
||||||
```
|
```
|
||||||
|
|
||||||
> **Note:** Use `bash <(curl ...)` rather than `curl ... | bash`.
|
The installer will:
|
||||||
> The interactive installer reads answers from your terminal via `read`; piping stdin
|
1. Check prerequisites (`python3`, `curl`, `tar`, `ns8-sendmail`).
|
||||||
> from curl breaks that interaction.
|
2. Download and extract the latest source archive from the Gitea repository.
|
||||||
|
3. Prompt interactively for sender address, recipient list, and subject prefix.
|
||||||
|
4. Write `/etc/ns8-backup-monitor/config.yml` with the supplied values.
|
||||||
|
5. Install and start the systemd service.
|
||||||
|
|
||||||
### Non-interactive install (CI / automation)
|
### Manual installation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl -fsSL https://repo.lelekaos.com/admin/ns8-backup-monitor/raw/branch/main/deploy/install.sh \
|
git clone https://repo.lelekaos.com/admin/ns8-backup-monitor.git
|
||||||
| bash -s -- \
|
cd ns8-backup-monitor
|
||||||
--from "backup@example.com" \
|
|
||||||
--to "admin@example.com"
|
# Install Python dependency
|
||||||
|
pip3 install pyyaml
|
||||||
|
|
||||||
|
# Create directories
|
||||||
|
mkdir -p /opt/ns8-backup-monitor /etc/ns8-backup-monitor
|
||||||
|
|
||||||
|
# Copy source and config template
|
||||||
|
cp -r . /opt/ns8-backup-monitor/
|
||||||
|
cp config/config.yml.example /etc/ns8-backup-monitor/config.yml
|
||||||
|
# Edit the config before starting
|
||||||
|
nano /etc/ns8-backup-monitor/config.yml
|
||||||
|
|
||||||
|
# Install systemd unit
|
||||||
|
cp deploy/ns8-backup-monitor.service /etc/systemd/system/
|
||||||
|
systemctl daemon-reload
|
||||||
|
systemctl enable --now ns8-backup-monitor
|
||||||
```
|
```
|
||||||
|
|
||||||
### What the installer does
|
|
||||||
|
|
||||||
1. Copies the Python package to `/opt/ns8-backup-monitor/`
|
|
||||||
2. Writes `/etc/ns8-backup-monitor/config.yml` from the template
|
|
||||||
3. Installs and enables the systemd unit
|
|
||||||
4. Prints the Alertmanager webhook receiver URL
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Uninstallation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash /opt/ns8-backup-monitor/deploy/install.sh --uninstall
|
|
||||||
```
|
|
||||||
|
|
||||||
The uninstaller stops and removes the systemd unit, then optionally removes the configuration directory.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
The active configuration file is `/etc/ns8-backup-monitor/config.yml`.
|
The configuration file is a YAML document. The installer writes it to
|
||||||
Edit it directly and restart the service to apply changes.
|
`/etc/ns8-backup-monitor/config.yml`; a fully annotated template is available
|
||||||
|
at `config/config.yml.example`.
|
||||||
```bash
|
|
||||||
nano /etc/ns8-backup-monitor/config.yml
|
|
||||||
systemctl restart ns8-backup-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
See `config/config.yml.example` for a fully annotated reference with all available options.
|
|
||||||
|
|
||||||
### Key sections
|
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# ── Mail settings ─────────────────────────────────────────────
|
# ---------------------------------------------------------------------------
|
||||||
|
# Email notification settings
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Delivery is handled by ns8-sendmail, which uses the SMTP relay already
|
||||||
|
# configured in NethServer 8. No SMTP credentials are needed here.
|
||||||
mail:
|
mail:
|
||||||
from: "backup@ns02.example.com" # Envelope From address
|
# Envelope / header sender address.
|
||||||
|
from: "ns8-backup-monitor@yourdomain.com"
|
||||||
|
|
||||||
|
# One or more recipient addresses. At least one is required.
|
||||||
to:
|
to:
|
||||||
- "admin@example.com" # One or more recipient addresses
|
- "admin@yourdomain.com"
|
||||||
subject_prefix: "[NS8 Backup]" # Prepended to every subject line
|
|
||||||
|
|
||||||
# ── Webhook receiver ──────────────────────────────────────────
|
# String prepended to every email subject line.
|
||||||
|
subject_prefix: "[NS8 Backup]"
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Webhook receiver (HTTP server)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
receiver:
|
receiver:
|
||||||
host: "127.0.0.1" # Bind address (keep localhost unless Alertmanager is remote)
|
# Interface to listen on. 127.0.0.1 is recommended when Alertmanager
|
||||||
port: 9099 # Must match the Alertmanager webhook URL
|
# runs on the same host; use 0.0.0.0 only if it runs on a different node.
|
||||||
|
host: "127.0.0.1"
|
||||||
|
# TCP port. Must match the webhook URL configured in Alertmanager.
|
||||||
|
port: 9099
|
||||||
|
|
||||||
# ── Correlator behaviour ─────────────────────────────────────
|
# ---------------------------------------------------------------------------
|
||||||
|
# Timing
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
correlator:
|
correlator:
|
||||||
wait_seconds: 30 # Seconds to wait after alert before reading Redis
|
# Seconds to wait after receiving the alert before reading Redis.
|
||||||
# (allows slow modules to write their final status)
|
# This grace period allows all module agents to finish writing their
|
||||||
recent_window: 3600 # When no backup_id label is present, scan Redis for
|
# per-module status hashes. 30 s is sufficient for most deployments.
|
||||||
# plan status keys updated within this many seconds
|
wait_seconds: 30
|
||||||
|
|
||||||
# ── Redis connection ─────────────────────────────────────────
|
# Look-back window in seconds used when the alert does not include a
|
||||||
|
# backup_id label. Any plan whose Redis status was updated within this
|
||||||
|
# window is considered "recent" and included in the report.
|
||||||
|
recent_window: 3600
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Redis connection
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
redis:
|
redis:
|
||||||
|
# Path to the NS8 cluster Redis Unix socket.
|
||||||
|
# On a standard NS8 installation this path never changes.
|
||||||
socket: "/var/lib/nethserver/cluster/state/redis.sock"
|
socket: "/var/lib/nethserver/cluster/state/redis.sock"
|
||||||
|
|
||||||
# ── Repository health check ──────────────────────────────────
|
# ---------------------------------------------------------------------------
|
||||||
|
# Repository check (optional, uses runagent + restic)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
repo_check:
|
repo_check:
|
||||||
enabled: true
|
# Maximum seconds to wait for each repository check before giving up.
|
||||||
timeout: 60 # Seconds per restic check call
|
timeout: 60
|
||||||
|
# Extra flags passed verbatim to every restic invocation.
|
||||||
|
# Example: "--cacert /etc/pki/tls/certs/ca-bundle.crt"
|
||||||
|
restic_flags: ""
|
||||||
|
|
||||||
# ── Logging ──────────────────────────────────────────────────
|
# ---------------------------------------------------------------------------
|
||||||
|
# Logging
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
logging:
|
logging:
|
||||||
level: "INFO" # DEBUG | INFO | WARNING | ERROR
|
# Python log level: DEBUG, INFO, WARNING, ERROR.
|
||||||
file: "" # Leave empty to log to stdout (journald captures it)
|
level: INFO
|
||||||
|
# Absolute path for the rotating log file (5 MB × 3 backups).
|
||||||
|
# Leave empty to log to stdout / journald only.
|
||||||
|
file: "/var/log/ns8-backup-monitor.log"
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Alertmanager integration
|
## Alertmanager integration
|
||||||
|
|
||||||
Add a receiver to your Alertmanager configuration on the NS8 leader node
|
Add a receiver pointing to the service in your Alertmanager configuration:
|
||||||
(`/etc/alertmanager/alertmanager.yml` or via the NS8 `metrics1` module):
|
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
# alertmanager.yml (relevant excerpt)
|
||||||
|
route:
|
||||||
|
receiver: ns8-backup-monitor
|
||||||
|
# Only route backup-related alerts to this receiver.
|
||||||
|
routes:
|
||||||
|
- match:
|
||||||
|
alertname: NethServerBackupFailed
|
||||||
|
receiver: ns8-backup-monitor
|
||||||
|
|
||||||
receivers:
|
receivers:
|
||||||
- name: ns8-backup-monitor
|
- name: ns8-backup-monitor
|
||||||
webhook_configs:
|
webhook_configs:
|
||||||
- url: "http://127.0.0.1:9099/alert"
|
- url: "http://127.0.0.1:9099/alert"
|
||||||
send_resolved: false
|
# Send resolved alerts too so the service can log them.
|
||||||
|
send_resolved: true
|
||||||
route:
|
|
||||||
receiver: ns8-backup-monitor
|
|
||||||
group_by: [alertname]
|
|
||||||
group_wait: 10s
|
|
||||||
group_interval: 5m
|
|
||||||
repeat_interval: 12h
|
|
||||||
routes:
|
|
||||||
- match_re:
|
|
||||||
alertname: "NsBackupFailed|NsBackupMissing"
|
|
||||||
receiver: ns8-backup-monitor
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The service handles two alert names:
|
Reload Alertmanager after editing:
|
||||||
|
|
||||||
| Alert name | Meaning |
|
```bash
|
||||||
|---|---|
|
systemctl reload alertmanager
|
||||||
| `NsBackupFailed` | One or more backup modules reported an error |
|
# or, for the NS8 metrics module:
|
||||||
| `NsBackupMissing` | Expected backup did not run within the time window |
|
runagent -m metrics1 systemctl reload alertmanager
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Outcome classification
|
## Outcome classification
|
||||||
|
|
||||||
After reading per-module Redis keys, the correlator assigns one of three outcomes:
|
For each backup plan the correlator reads all per-module status hashes and
|
||||||
|
produces one of three outcomes:
|
||||||
|
|
||||||
| Outcome | Condition | Email subject |
|
| Outcome | Condition | Email subject |
|
||||||
|---|---|---|
|
|---------|-----------|---------------|
|
||||||
| `SUCCESS` | All modules succeeded | ✅ Backup completed successfully |
|
| `SUCCESS` | All modules finished with `result=success` | `✅ Backup completed` |
|
||||||
| `PARTIAL` | Some modules failed, some succeeded | ⚠️ Backup partially failed |
|
| `PARTIAL` | At least one module succeeded, at least one failed | `⚠️ Backup partially failed` |
|
||||||
| `REPO_FAILURE` | All modules failed, or no status found in Redis | ❌ Backup failed – possible repository error |
|
| `REPO_FAILURE` | All modules failed **or** no status found in Redis | `❌ Backup failed` |
|
||||||
|
|
||||||
On `PARTIAL` or `REPO_FAILURE`, the repo health check runs automatically and appends
|
|
||||||
diagnostic information (restic error output) to the email.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Redis key structure
|
## Redis key structure
|
||||||
|
|
||||||
The correlator reads the following NS8 Redis key patterns:
|
The correlator reads two families of keys from the NS8 cluster Redis:
|
||||||
|
|
||||||
```
|
| Key pattern | Description |
|
||||||
cluster/backup/<backup_id>/status → overall plan status (hash)
|
|-------------|-------------|
|
||||||
module/<module_id>/backup/<backup_id>/status → per-module status (hash)
|
| `cluster/backup/<backup_id>/status` | Plan-level status hash. Fields: `result`, `timestamp`, `errors` (integer count). |
|
||||||
```
|
| `module/<module_id>/backup/<backup_id>/status` | Per-module status hash. Fields: `result`, `timestamp`, `error` (message string). |
|
||||||
|
|
||||||
Hash fields:
|
`result` is either `"success"` or `"error"`. `timestamp` is an ISO 8601
|
||||||
|
string in UTC (e.g. `2024-01-15T03:00:05Z`).
|
||||||
| Field | Values | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| `result` | `success` / `error` | Outcome of the backup operation |
|
|
||||||
| `timestamp` | ISO 8601 | When the status was last written |
|
|
||||||
| `error` | string | Error message, if any |
|
|
||||||
| `errors` | integer | Number of module errors (plan-level hash only) |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -239,70 +302,61 @@ Hash fields:
|
|||||||
# Check service status
|
# Check service status
|
||||||
systemctl status ns8-backup-monitor
|
systemctl status ns8-backup-monitor
|
||||||
|
|
||||||
# View live logs
|
# Follow live logs via journald
|
||||||
journalctl -u ns8-backup-monitor -f
|
journalctl -u ns8-backup-monitor -f
|
||||||
|
|
||||||
# Restart after config change
|
# Follow the rotating log file directly
|
||||||
|
tail -f /var/log/ns8-backup-monitor.log
|
||||||
|
|
||||||
|
# Restart after a config change
|
||||||
systemctl restart ns8-backup-monitor
|
systemctl restart ns8-backup-monitor
|
||||||
|
|
||||||
# Disable on boot
|
# Test the webhook endpoint manually
|
||||||
systemctl disable ns8-backup-monitor
|
curl -s -X POST http://127.0.0.1:9099/alert \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{"alerts":[{"status":"firing","labels":{"alertname":"NethServerBackupFailed"}}]}'
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Service fails to start
|
### Service starts but no email is received
|
||||||
|
|
||||||
|
1. Verify `ns8-sendmail` works independently:
|
||||||
```bash
|
```bash
|
||||||
journalctl -u ns8-backup-monitor --no-pager -n 50
|
echo 'Test' | ns8-sendmail -s 'Test' admin@yourdomain.com
|
||||||
```
|
```
|
||||||
|
2. Check `mail.to` in `/etc/ns8-backup-monitor/config.yml`.
|
||||||
|
3. Increase log level to `DEBUG` and restart the service.
|
||||||
|
|
||||||
Common causes:
|
### `REPO_FAILURE` on every alert even though backups succeed
|
||||||
- `config.yml` not found at the expected path → check `/etc/ns8-backup-monitor/config.yml`
|
|
||||||
- Port 9099 already in use → change `receiver.port` in config
|
|
||||||
|
|
||||||
### No email received after a backup failure
|
- The correlator may be reading Redis before all modules have finished.
|
||||||
|
Increase `correlator.wait_seconds` (e.g. to `60`).
|
||||||
|
- Check that the Redis socket path is correct:
|
||||||
|
`redis-cli -s /var/lib/nethserver/cluster/state/redis.sock PING`
|
||||||
|
|
||||||
1. Verify Alertmanager is firing the webhook:
|
### Alertmanager does not reach the webhook
|
||||||
```bash
|
|
||||||
journalctl -u ns8-backup-monitor -f
|
|
||||||
```
|
|
||||||
You should see `Received N relevant alert(s)` within a minute of the backup failure.
|
|
||||||
|
|
||||||
2. Check that `wait_seconds` has elapsed (default 30 s) and look for `Sending notification...` in the log.
|
- Confirm the service is listening:
|
||||||
|
`ss -tlnp | grep 9099`
|
||||||
3. Verify the mail relay works independently:
|
- If Alertmanager runs on a different host, change `receiver.host` to
|
||||||
```bash
|
`0.0.0.0` and open the port in the firewall.
|
||||||
echo "Test" | runagent ns8-sendmail -s "test" admin@example.com
|
|
||||||
```
|
|
||||||
|
|
||||||
### Correlator finds no modules
|
|
||||||
|
|
||||||
If the log shows `No recent backup status keys found in Redis`, possible causes:
|
|
||||||
- `recent_window` is too short — the backup ran more than 1 hour ago
|
|
||||||
- Redis socket path is wrong for your installation
|
|
||||||
- The backup plan wrote status to a non-standard key pattern
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Development
|
## Uninstallation
|
||||||
|
|
||||||
The application is pure Python 3 with no third-party dependencies.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Run locally (requires NS8 Redis socket access)
|
bash /opt/ns8-backup-monitor/deploy/install.sh --uninstall
|
||||||
python3 -m ns8_backup_monitor --config ./config/config.yml.example
|
|
||||||
|
|
||||||
# Send a test webhook payload
|
|
||||||
curl -s -X POST http://127.0.0.1:9099/alert \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"alerts":[{"status":"firing","labels":{"alertname":"NsBackupFailed","backup_id":"1"}}]}'
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The script will stop and disable the service, remove the install directory,
|
||||||
|
and optionally remove the configuration directory.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT License — contributions welcome via pull request.
|
MIT — see [LICENSE](LICENSE) if present, otherwise contact the repository owner.
|
||||||
|
|||||||
Reference in New Issue
Block a user