Skip to content

Monitoring & Alerts

Tundra scrapes metrics from each enrolled server every 30 seconds:

MetricGranularity
CPU usage (%)Per-core + aggregate
RAM usage (MB / %)Used, cached, free
Disk usage (GB / %)Per-mount
Network I/O (bytes/s)Per-interface
Site request ratePer-site, from Caddy logs
Site error rate4xx and 5xx counts

Metrics are stored in metrics_samples, partitioned by week.

Go to Alerts → Rules → Add to create threshold-based alerts:

FieldExample
Metricserver.cpu_usage_pct
Condition> 90
Duration5 minutes (alert only if sustained)
Cooldown30 minutes (suppress repeated firings)
ChannelConfiguration
EmailSMTP settings in Settings → SMTP
SlackWebhook URL
DiscordWebhook URL
PagerDutyIntegration key

Configure channels in Settings → Notifications.

Alerts → Active shows currently firing alerts with time, severity, and affected resource.
Click an alert to see the metric chart that triggered it.

The main dashboard shows a fleet health summary:

  • Server count with active/degraded breakdown
  • Site count with active/provisioning breakdown
  • Domain count
  • Alert rule count and current firing count

Click any card to drill into the resource list.