Valerter

Prometheus Metrics

Valerter exposes a /metrics endpoint for Prometheus monitoring.

Configuration

metrics:
  enabled: true    # Default: true
  port: 9090       # Default: 9090

Exposed Metrics

Counters

Metric Labels Description
valerter_alerts_sent_total rule_name, notifier_name, notifier_type Alerts sent successfully
valerter_alerts_throttled_total rule_name Alerts blocked by throttling
valerter_alerts_passed_total rule_name Alerts that passed throttling
valerter_alerts_dropped_total - Alerts dropped (queue full, global counter)
valerter_alerts_failed_total rule_name, notifier_name, notifier_type Alerts that permanently failed
valerter_email_recipient_errors_total rule_name, notifier_name Email delivery failures per recipient
valerter_lines_discarded_total rule_name, reason Log lines discarded (e.g., reason=oversized for lines > 1MB)
valerter_logs_matched_total rule_name Logs matched by rule (before throttling)
valerter_notifier_config_errors_total notifier, error_type Notifier configuration errors (e.g., env var resolution)
valerter_notify_errors_total rule_name, notifier_name, notifier_type Notification send errors
valerter_parse_errors_total rule_name, error_type Parsing errors
valerter_reconnections_total rule_name VictoriaLogs reconnections
valerter_rule_panics_total rule_name Rule task panics (auto-restarted)
valerter_rule_errors_total rule_name Fatal rule errors

Gauges

Metric Labels Description
valerter_queue_size - Current notification queue size
valerter_last_query_timestamp rule_name Unix timestamp of last successful query
valerter_victorialogs_up rule_name VictoriaLogs connection status (1=connected, 0=disconnected or error)
valerter_uptime_seconds - Time since valerter started
valerter_build_info version Build information (always 1)

Histograms

Metric Labels Description
valerter_query_duration_seconds rule_name VictoriaLogs query latency (time to first chunk)

Prometheus Scrape Configuration

scrape_configs:
  - job_name: 'valerter'
    static_configs:
      - targets: ['localhost:9090']

Example Alerting Rules

Monitor Valerter itself with these Prometheus alerting rules:

groups:
  - name: valerter
    rules:
      # Valerter not querying VictoriaLogs for 5 minutes
      - alert: ValerterNotQuerying
        expr: time() - valerter_last_query_timestamp > 300
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Valerter rule  not querying"
          description: "No queries received from rule  for over 5 minutes"

      # VictoriaLogs connection lost
      - alert: ValerterVictoriaLogsDown
        expr: valerter_victorialogs_up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Valerter disconnected from VictoriaLogs"
          description: "Rule  lost connection to VictoriaLogs. Check network and VictoriaLogs health."

      # Alerts failing to send
      - alert: ValerterAlertsFailing
        expr: rate(valerter_alerts_failed_total[5m]) > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Valerter alerts failing for "
          description: "Alerts are failing to send via  notifier"

      # Too many alerts throttled (potential tuning needed)
      - alert: ValerterHighThrottleRate
        expr: rate(valerter_alerts_throttled_total[1h]) > 100
        for: 10m
        labels:
          severity: info
        annotations:
          summary: "High throttle rate on rule "
          description: "Consider adjusting throttle settings if this is unexpected"

      # Queue filling up
      - alert: ValerterQueueBacklog
        expr: valerter_queue_size > 50
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Valerter notification queue backlog"
          description: "Queue size is , notifications may be delayed"

      # Rule panics (indicates bugs)
      - alert: ValerterRulePanic
        expr: increase(valerter_rule_panics_total[1h]) > 0
        labels:
          severity: warning
        annotations:
          summary: "Valerter rule  panicked"
          description: "Rule panicked and was auto-restarted. Check logs for details."

Key Metrics to Monitor

Health

Performance

Alerting Effectiveness

Errors

See Also