Monitoring¶

runqy provides multiple monitoring options, from simple Redis queries to full Prometheus/Grafana integration.

Web Dashboard¶

The built-in web dashboard at /monitoring provides real-time visibility into your queues and workers.

Dashboard Features¶

Task History: View processed/failed task counts over time (Today, 7D, 30D)
Queue Sizes: Live view of pending, active, retry, and archived tasks per queue
Worker Status: Monitor worker health, active tasks, and heartbeats
Task Management: Inspect, retry, or delete tasks directly from the UI

The dashboard works out of the box using Redis data. No additional setup required.

Headless Mode

For API-only deployments (e.g., headless servers, CI/CD pipelines), disable the dashboard with --no-ui:

runqy serve --no-ui

The REST API, Prometheus metrics (/metrics), and Swagger docs remain available.

Authentication¶

The dashboard requires authentication to protect sensitive queue and task data.

First-Time Setup¶

On first access, you'll be prompted to create an admin account:

Navigate to /monitoring
You'll be redirected to /monitoring/setup
Enter your email and password (minimum 8 characters)
Click "Create Admin Account"

After setup, the dashboard is protected and requires login.

Navigate to /monitoring
Enter your email and password
You'll be logged in for 7 days (JWT cookie)

Session Management¶

Sessions expire after 7 days
Click "Logout" in the sidebar to end your session
If your session expires, you'll be redirected to the login page

Single Admin

The dashboard supports a single admin account. There is no password reset feature—if you forget your password, you'll need to delete the admin_user row from the database and re-run setup.

Worker Health¶

Workers report their health status via Redis heartbeat.

Check Worker Status¶

# List all workers
redis-cli KEYS "asynq:workers:*"

# Get worker details
redis-cli HGETALL asynq:workers:worker-abc123

The heartbeat hash includes:

Field	Description
`started`	Worker start timestamp
`healthy`	`true` if Python process is running
`queue`	Queue being processed
`active_task`	Currently processing task ID (if any)

Degraded State¶

When healthy: false:

The supervised Python process has crashed
Worker won't process new tasks
Manual restart is required

Queue Metrics¶

Pending Tasks¶

# Count pending tasks
redis-cli LLEN asynq:inference.default:pending

# List pending task IDs
redis-cli LRANGE asynq:inference.default:pending 0 -1

Active Tasks¶

# Count active tasks
redis-cli LLEN asynq:inference.default:active

# List active task IDs
redis-cli LRANGE asynq:inference.default:active 0 -1

Task Inspection¶

View Task Data¶

redis-cli HGETALL asynq:t:task-id-here

View Task Result¶

redis-cli GET asynq:result:task-id-here

Prometheus Integration (Optional)¶

For advanced monitoring with sub-second time-series data, runqy integrates with Prometheus. This is optional—the dashboard works without it.

What Prometheus Adds¶

Feature	Without Prometheus	With Prometheus
Task history	Daily aggregates (90 days)	Sub-second time-series
Real-time throughput	Snapshot totals	Rate per second
Custom dashboards	Basic dashboard	Full Grafana support
Alerting	Manual monitoring	AlertManager integration
Retention	90 days in Redis	Configurable (years)

Architecture¶

┌─────────────────┐     scrape      ┌─────────────────┐
│  runqy server   │ ───────────────>│   Prometheus    │
│  :3000/metrics  │     /metrics    │   :9090         │
└─────────────────┘                 └─────────────────┘
                                           │
                                           │ query
                                           ▼
                                    ┌─────────────────┐
                                    │    Grafana      │
                                    │    :3001        │
                                    └─────────────────┘

Setup¶

1. Configure Prometheus to Scrape runqy¶

Add to your prometheus.yml:

scrape_configs:
  - job_name: 'runqy'
    static_configs:
      - targets: ['localhost:3000']  # runqy server address
    scrape_interval: 15s
    metrics_path: /metrics

2. Set the Prometheus Address (Optional)¶

To enable Prometheus-powered charts in the dashboard:

export PROMETHEUS_ADDRESS=http://localhost:9090

Note

This environment variable is optional. Without it, the dashboard uses Redis data which works perfectly for most use cases. Set this only if you want sub-second time-series data in the dashboard.

Available Metrics¶

runqy exposes the following Prometheus metrics at /metrics:

Queue Metrics¶

Metric	Type	Description
`asynq_queue_size`	Gauge	Number of tasks in each state (pending, active, retry, archived)
`asynq_queue_latency_seconds`	Gauge	Time since oldest pending task was enqueued
`asynq_queue_memory_usage_approx_bytes`	Gauge	Approximate memory usage per queue

Task Metrics¶

Metric	Type	Description
`asynq_tasks_processed_total`	Counter	Total tasks processed (labeled by queue)
`asynq_tasks_failed_total`	Counter	Total tasks failed (labeled by queue)

Example Queries¶

Tasks processed per second:

rate(asynq_tasks_processed_total[5m])

Tasks failed per second:

rate(asynq_tasks_failed_total[5m])

Error rate percentage:

rate(asynq_tasks_failed_total[5m]) / rate(asynq_tasks_processed_total[5m]) * 100

Queue depth (pending tasks):

asynq_queue_size{state="pending"}

Queue latency:

asynq_queue_latency_seconds

Grafana Dashboard¶

Import the asynq Grafana dashboard for a pre-built visualization:

In Grafana, go to Dashboards > Import
Enter dashboard ID: 18863 (asynq dashboard)
Select your Prometheus data source
Click Import

Or create custom panels using the queries above.

Docker Compose Example¶

Here's a complete setup with Prometheus and Grafana:

version: '3.8'

services:
  runqy:
    image: ghcr.io/publikey/runqy:latest
    ports:
      - "3000:3000"
    environment:
      - REDIS_HOST=redis
      - REDIS_PASSWORD=
      - RUNQY_API_KEY=your-api-key
      - PROMETHEUS_ADDRESS=http://prometheus:9090  # Optional
    depends_on:
      - redis

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  grafana-data:

Create prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'runqy'
    static_configs:
      - targets: ['runqy:3000']

Alerting¶

Prometheus AlertManager¶

Example alert rules (alerts.yml):

groups:
  - name: runqy
    rules:
      # High queue depth
      - alert: HighQueueDepth
        expr: asynq_queue_size{state="pending"} > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High queue depth on {{ $labels.queue }}"
          description: "Queue {{ $labels.queue }} has {{ $value }} pending tasks"

      # High error rate
      - alert: HighErrorRate
        expr: >
          rate(asynq_tasks_failed_total[5m]) / rate(asynq_tasks_processed_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.queue }}"
          description: "Queue {{ $labels.queue }} has {{ $value | humanizePercentage }} error rate"

      # Queue latency
      - alert: HighQueueLatency
        expr: asynq_queue_latency_seconds > 300
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High latency on {{ $labels.queue }}"
          description: "Queue {{ $labels.queue }} has {{ $value | humanizeDuration }} latency"

      # No tasks processed (potential worker issue)
      - alert: NoTasksProcessed
        expr: >
          increase(asynq_tasks_processed_total[10m]) == 0
          and asynq_queue_size{state="pending"} > 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "No tasks processed on {{ $labels.queue }}"
          description: "Queue {{ $labels.queue }} has pending tasks but none processed in 10 minutes"

Key Metrics to Monitor¶

Metric	Alert Threshold	Description
Queue depth	> 1000 pending	Tasks accumulating faster than processing
Error rate	> 10%	High failure rate indicates issues
Queue latency	> 5 minutes	Tasks waiting too long
Worker health	`healthy: false`	Worker process crashed

Best Practices¶

Start with the dashboard - The built-in dashboard covers most monitoring needs
Add Prometheus for scale - When you need historical data beyond 90 days or sub-second metrics
Set up alerts early - Don't wait for production issues to configure alerting
Monitor worker health - A crashed worker won't process tasks and won't recover automatically
Track error rates - Sudden spikes often indicate code bugs or upstream service issues

Monitoring¶

Web Dashboard¶

Dashboard Features¶

Authentication¶

First-Time Setup¶

Login Flow¶

Session Management¶

Worker Health¶

Check Worker Status¶

Degraded State¶

Queue Metrics¶

Pending Tasks¶

Active Tasks¶

Task Inspection¶

View Task Data¶

View Task Result¶

Prometheus Integration (Optional)¶

What Prometheus Adds¶

Architecture¶

Setup¶

1. Configure Prometheus to Scrape runqy¶

2. Set the Prometheus Address (Optional)¶

Available Metrics¶

Queue Metrics¶

Task Metrics¶

Example Queries¶

Grafana Dashboard¶

Docker Compose Example¶

Alerting¶

Prometheus AlertManager¶

Key Metrics to Monitor¶

Best Practices¶