Monitoring & Alerts
Metrics
Section titled “Metrics”Tundra scrapes metrics from each enrolled server every 30 seconds:
| Metric | Granularity |
|---|---|
| CPU usage (%) | Per-core + aggregate |
| RAM usage (MB / %) | Used, cached, free |
| Disk usage (GB / %) | Per-mount |
| Network I/O (bytes/s) | Per-interface |
| Site request rate | Per-site, from Caddy logs |
| Site error rate | 4xx and 5xx counts |
Metrics are stored in metrics_samples, partitioned by week.
Alert rules
Section titled “Alert rules”Go to Alerts → Rules → Add to create threshold-based alerts:
| Field | Example |
|---|---|
| Metric | server.cpu_usage_pct |
| Condition | > 90 |
| Duration | 5 minutes (alert only if sustained) |
| Cooldown | 30 minutes (suppress repeated firings) |
Delivery channels
Section titled “Delivery channels”| Channel | Configuration |
|---|---|
| SMTP settings in Settings → SMTP | |
| Slack | Webhook URL |
| Discord | Webhook URL |
| PagerDuty | Integration key |
Configure channels in Settings → Notifications.
Active alerts
Section titled “Active alerts”Alerts → Active shows currently firing alerts with time, severity, and affected resource.
Click an alert to see the metric chart that triggered it.
Dashboard
Section titled “Dashboard”The main dashboard shows a fleet health summary:
- Server count with active/degraded breakdown
- Site count with active/provisioning breakdown
- Domain count
- Alert rule count and current firing count
Click any card to drill into the resource list.