Valerter exposes a /metrics endpoint for Prometheus monitoring.
metrics:
enabled: true # Default: true
port: 9090 # Default: 9090
| Metric | Labels | Description |
|---|---|---|
valerter_alerts_sent_total |
rule_name, notifier_name, notifier_type |
Alerts sent successfully |
valerter_alerts_throttled_total |
rule_name |
Alerts blocked by throttling |
valerter_alerts_passed_total |
rule_name |
Alerts that passed throttling |
valerter_alerts_dropped_total |
- | Alerts dropped (queue full, global counter) |
valerter_alerts_failed_total |
rule_name, notifier_name, notifier_type |
Alerts that permanently failed |
valerter_email_recipient_errors_total |
rule_name, notifier_name |
Email delivery failures per recipient |
valerter_lines_discarded_total |
rule_name, reason |
Log lines discarded (e.g., reason=oversized for lines > 1MB) |
valerter_logs_matched_total |
rule_name |
Logs matched by rule (before throttling) |
valerter_notifier_config_errors_total |
notifier, error_type |
Notifier configuration errors (e.g., env var resolution) |
valerter_notify_errors_total |
rule_name, notifier_name, notifier_type |
Notification send errors |
valerter_parse_errors_total |
rule_name, error_type |
Parsing errors |
valerter_reconnections_total |
rule_name |
VictoriaLogs reconnections |
valerter_rule_panics_total |
rule_name |
Rule task panics (auto-restarted) |
valerter_rule_errors_total |
rule_name |
Fatal rule errors |
| Metric | Labels | Description |
|---|---|---|
valerter_queue_size |
- | Current notification queue size |
valerter_last_query_timestamp |
rule_name |
Unix timestamp of last successful query |
valerter_victorialogs_up |
rule_name |
VictoriaLogs connection status (1=connected, 0=disconnected or error) |
valerter_uptime_seconds |
- | Time since valerter started |
valerter_build_info |
version |
Build information (always 1) |
| Metric | Labels | Description |
|---|---|---|
valerter_query_duration_seconds |
rule_name |
VictoriaLogs query latency (time to first chunk) |
scrape_configs:
- job_name: 'valerter'
static_configs:
- targets: ['localhost:9090']
Monitor Valerter itself with these Prometheus alerting rules:
groups:
- name: valerter
rules:
# Valerter not querying VictoriaLogs for 5 minutes
- alert: ValerterNotQuerying
expr: time() - valerter_last_query_timestamp > 300
for: 1m
labels:
severity: warning
annotations:
summary: "Valerter rule not querying"
description: "No queries received from rule for over 5 minutes"
# VictoriaLogs connection lost
- alert: ValerterVictoriaLogsDown
expr: valerter_victorialogs_up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Valerter disconnected from VictoriaLogs"
description: "Rule lost connection to VictoriaLogs. Check network and VictoriaLogs health."
# Alerts failing to send
- alert: ValerterAlertsFailing
expr: rate(valerter_alerts_failed_total[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Valerter alerts failing for "
description: "Alerts are failing to send via notifier"
# Too many alerts throttled (potential tuning needed)
- alert: ValerterHighThrottleRate
expr: rate(valerter_alerts_throttled_total[1h]) > 100
for: 10m
labels:
severity: info
annotations:
summary: "High throttle rate on rule "
description: "Consider adjusting throttle settings if this is unexpected"
# Queue filling up
- alert: ValerterQueueBacklog
expr: valerter_queue_size > 50
for: 5m
labels:
severity: warning
annotations:
summary: "Valerter notification queue backlog"
description: "Queue size is , notifications may be delayed"
# Rule panics (indicates bugs)
- alert: ValerterRulePanic
expr: increase(valerter_rule_panics_total[1h]) > 0
labels:
severity: warning
annotations:
summary: "Valerter rule panicked"
description: "Rule panicked and was auto-restarted. Check logs for details."
valerter_victorialogs_up - VictoriaLogs connection status (1=connected, 0=error)valerter_uptime_seconds - Process uptime (detect restarts)valerter_queue_size - Notification backlogvalerter_query_duration_seconds - Query latencyvalerter_alerts_sent_total - Successful alertsvalerter_alerts_throttled_total - Throttled alerts (tuning indicator)valerter_alerts_failed_total - Failed alerts (notifier issues)valerter_parse_errors_total - Log parsing issuesvalerter_notify_errors_total - Transient notification errorsvalerter_rule_panics_total - Critical: indicates bugs