No description
  • Python 93%
  • HTML 5.9%
  • Shell 0.6%
  • Dockerfile 0.4%
Find a file
Martin Hradil ccf9494030
Tasks - restore 4 workers, atomically update execution state, fail & retry on parallel (#167)
* dispatcherd: increase max_workers to 4

* remove redundant decorators from task functions

execute_db_task already handles django setup, logging, and error handling,
making @task_execution_wrapper and @task(queue=...) redundant on all 9
task functions.

* remove unused utilities from utils.py

task_execution_wrapper, get_task_and_execution, and the task decorator
fallback are all superseded by execute_db_task handling lifecycle directly.

* remove execute_db_task from public task registry

It's the dispatcherd entry point, not a user-callable task function.

* unify scheduling into periodic DB sync

Remove the dual APScheduler registration path (_load_task_registry,
_add_registry_tasks, _add_scheduled_task, _execute_scheduled_task, and
task_registry). All tasks now go through _periodic_database_sync →
_execute_database_task → submit_task_to_dispatcher.

Feature flag and cancelled/completed status checks move into
_execute_database_task. Non-recurring tasks with disabled flags are
removed from tracking so they can be retried after re-enabling.

* safeguard duplicate task execution with atomic claim

Use atomic UPDATE ... WHERE status='pending' so only one worker can
claim a task. Create TaskExecution inside _claim_task (not
submit_task_to_dispatcher) to prevent orphaned execution records.
Guard submit_task_to_dispatcher against duplicate submissions when a
pending/running execution exists. Restore per-task start/complete/error
logging lost with task_execution_wrapper removal.

Issue: AAP-69212

* add retry with delay support to Task.retry()

Allow failed tasks to wait before retry (e.g. 120s for lock contention).
Task.retry(delay_seconds=N) sets scheduled_time into the future so the
periodic sync won't pick it up until the delay has elapsed. execute_db_task
reads retry_delay_seconds from task_data (default 600s).

* add advisory locks to prevent parallel collector execution

Each collector task acquires a PostgreSQL advisory lock via run_with_lock().
Locking is applied in execute_db_task for tasks listed in TASK_LOCKS, so
direct invocations (e.g. run_task.py) run without contention. If the lock
cannot be acquired, the task fails and is retried automatically.

Issue: AAP-69213

* add upstream dependency checks to daily pipeline tasks

daily_metrics_rollup now checks that hourly collections exist before
proceeding; returns error if upstream dependency not met.
send_anonymized_to_segment returns early when no payloads are pending.
Track missing daily snapshots in missing_hours for rollup diagnostics.

* clean up comments and docstrings

Update docstrings to reflect advisory lock behavior and task ordering
dependencies. Fix stale references to task_execution_wrapper. Add
comment explaining hour_timestamp fallback. Update max_attempts in
test to match current default (5).

fix stale docstring reference to _execute_scheduled_task (now _execute_database_task)

* simplify scheduler tests after task registry removal

UnifiedTaskScheduler no longer loads task registry in __init__, so
the init/stop tests don't need to mock the Task model.

* fix TaskExecution stuck running on exception in execute_db_task

The outer except block passed None for both task and execution to
handle_task_error, so TaskExecution records were never marked failed.
Initialize both to None before the try block and pass the actual
instances. Also remove the stale FIXME in submit_task_to_dispatcher
that tried to update an execution that doesn't exist there.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>

* enable auto-retry for tasks that raise exceptions

Wrap the task function call in an inner try/except so exceptions get
converted to error dicts and flow through the same failed/retry path.
Also fix advisory lock test patch targets to use tasks_system.run_with_lock
instead of utils.run_with_lock.

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>

* use create_task_result consistently for all task error returns

Replace raise ValueError with create_task_result("error", ...) in
collectors (hourly, snapshot, daily) and generic_collect_metrics so
task functions always return a status dict rather than raising. Also
convert the raw {"status": "error"} dict in handle_task_error to use
the helper. Remove stale Raises: sections from docstrings and remove
the @task/@task_execution_wrapper decorators from collect_daily_metrics
(added in the daily collectors PR, already removed from the others).

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>

* add scheduled_time and transaction guards to _claim_task

- Add Task.ready_to_run() classmethod (queryset equivalent of
  is_ready_to_run) to filter pending non-recurring tasks whose
  scheduled_time has passed or is null
- Use ready_to_run() in _claim_task so retry delays are respected
- Wrap claim + execution creation in transaction.atomic()

Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>

* extract execute_claimed and execute_function from execute_db_task

in the name of cyclomatic complexity :)

* daily metrics rollup comment update
2026-04-02 13:39:39 +00:00
.github Bump SonarSource/sonarqube-scan-action from 6.0.0 to 7.0.0 (#170) 2026-04-01 13:24:10 +01:00
.vscode AAP-52509 - Changing to Default Postgres Over SQL (#11) 2025-09-16 16:14:50 +01:00
apps Tasks - restore 4 workers, atomically update execution state, fail & retry on parallel (#167) 2026-04-02 13:39:39 +00:00
licenses Enhance Dockerfile and settings for Red Hat certification compliance (#130) 2026-03-06 11:03:00 +00:00
metrics_service [AAP-58749] Perf test api benchmark (#150) 2026-03-19 18:58:42 +00:00
scripts AAP-69211 Refactor Docker and supervisord configurations for improved service m… (#161) 2026-03-26 13:32:31 +00:00
tests Tasks - restore 4 workers, atomically update execution state, fail & retry on parallel (#167) 2026-04-02 13:39:39 +00:00
tools Anonymization fixups, debug scripts (#113) 2026-03-18 14:19:34 +00:00
.coderabbit.yaml Update .coderabbit.yaml for automation dashboard (#125) 2026-03-03 14:47:06 +00:00
.copier-answers.yml [platform-service-framework] complete framework retrofit (#84) 2025-12-19 10:06:49 -05:00
.dockerignore Remove legacy requirements files and add production Docker configuration (#129) 2026-03-05 20:01:26 +00:00
.gitignore Anonymization fixups, debug scripts (#113) 2026-03-18 14:19:34 +00:00
.hermeto-pip-input.json Konflux build fix (#104) 2026-02-16 11:20:16 +00:00
.pre-commit-config.yaml Update pre-commit configuration and enhance development documentation (#151) 2026-03-20 19:26:59 +00:00
.protected_files.yaml [platform-service-framework] complete framework retrofit (#84) 2025-12-19 10:06:49 -05:00
CLAUDE.md [AAP-69219] ANONYMIZED_DATA_COLLECTION flag incorrectly stops all collections, not just anonymization (#168) 2026-03-31 09:27:43 +01:00
CONTRIBUTING.md Update pre-commit configuration and enhance development documentation (#151) 2026-03-20 19:26:59 +00:00
DCO Repo initialization 2025-07-28 11:58:20 -04:00
docker-compose.prod.single.yml Update dispatcher configuration to reduce max workers from 4 to 1 and… (#148) 2026-03-16 17:33:54 +00:00
docker-compose.production.yml AAP-69211 Refactor Docker and supervisord configurations for improved service m… (#161) 2026-03-26 13:32:31 +00:00
docker-compose.yml AAP-69211 Refactor Docker and supervisord configurations for improved service m… (#161) 2026-03-26 13:32:31 +00:00
Dockerfile AAP-69211 Refactor Docker and supervisord configurations for improved service m… (#161) 2026-03-26 13:32:31 +00:00
Dockerfile.dev AAP-69211 Refactor Docker and supervisord configurations for improved service m… (#161) 2026-03-26 13:32:31 +00:00
LICENSE [platform-service-framework] complete framework retrofit (#84) 2025-12-19 10:06:49 -05:00
Makefile AAP-62673: Tasks code cleanup (#92) 2026-02-18 20:04:34 +00:00
manage.py [platform-service-framework] complete framework retrofit (#84) 2025-12-19 10:06:49 -05:00
pyproject.toml Milan new collectors (#165) 2026-04-01 14:27:58 +02:00
README.md [AAP-69219] ANONYMIZED_DATA_COLLECTION flag incorrectly stops all collections, not just anonymization (#168) 2026-03-31 09:27:43 +01:00
SECURITY.md Repo initialization 2025-07-28 11:58:20 -04:00
settings.local.py.example Env var cleanup - use METRICS_SERVICE_MODE (#87) 2026-01-16 14:57:18 +00:00
sonar-project.properties [AAP-58749] Perf test api benchmark (#150) 2026-03-19 18:58:42 +00:00
ubi.repo build(ci): [AAP-52401] Metrics-Service Onboarding to Konflux (#21) 2025-10-03 10:50:24 +01:00
uv.lock Milan new collectors (#165) 2026-04-01 14:27:58 +02:00

Metrics service

A modern Django-based service built for the Ansible Automation Platform (AAP) ecosystem, featuring comprehensive task management, REST APIs, and automated background job processing.

Features

  • 🚀 Modern Django Architecture - Django 5.2+ with clean app-based structure
  • 📊 Automated Task Management - Feature-enable controlled task groups with automatic routing
  • Smart Task Routing - Automatic submission to dispatcherd with no manual intervention
  • 🔌 REST API - Versioned RESTful APIs with OpenAPI documentation
  • 🔐 Authentication & Authorization - Django-Ansible-Base integration with RBAC
  • 📈 Real-time Dashboard - Web-based task monitoring and management interface
  • 🐳 Docker Ready - Multi-container deployment with PostgreSQL
  • 🧪 Comprehensive Testing - Unit and integration tests with coverage reporting
  • 📝 API Documentation - Interactive Swagger/OpenAPI documentation
  • 🔧 Metrics Collection - Integrated metrics-utility for data collection

Quick Start

# Clone the repository
git clone <repository-url>
cd metrics-service

# Start all services
docker-compose up -d

# Create a superuser (optional)
docker-compose exec metrics-service python manage.py createsuperuser

Your service will be available at:

Option 2: Local Development

# Prerequisites: Python 3.12, PostgreSQL 13+

# Install dependencies (project uses uv)
uv sync --dev

# Configure (optional — for local overrides)
cp settings.local.py.example settings.local.py
# Edit settings.local.py to configure your local development environment.

# Set up database (configure via environment variables if needed)
# See Configuration section below for environment variable options
python manage.py migrate
python manage.py metrics_service init-default-settings
python manage.py metrics_service init-service-id
python manage.py metrics_service init-system-tasks
python manage.py createsuperuser

# Start complete service (Django + dispatcher + scheduler)
python manage.py metrics_service run

Option 3: Local development, with uv and metrics-utility from sources

Edit pyproject.toml such that:

 [tool.uv.sources]
 django-ansible-base = { git = "https://github.com/ansible/django-ansible-base", rev = "devel" }
+metrics-utility = { path = "../metrics-utility", editable = true }
uv sync
uv run ./manage.py migrate
uv run ./manage.py createsuperuser
uv run ./manage.py metrics_service run
uv run ./scripts/run_task.py hello_world # debugging individual tasks

Endpoints

# List all tasks
GET /api/v1/tasks/

# Create a new task
POST /api/v1/tasks/
{
  "name": "Hello World Task",
  "function_name": "hello_world",
  "task_data": {}
}

# Get running tasks
GET /api/v1/tasks/running/

# Retry a failed task
POST /api/v1/tasks/{id}/retry/

# Available task functions
GET /api/v1/tasks/available_functions/

Built-in Task Functions

System Tasks (always enabled):

  • cleanup_old_tasks - Clean up completed/failed tasks
  • hello_world - Simple test task for dispatcherd integration
  • execute_db_task - Execute database-defined tasks with lifecycle management

Metrics Collection Tasks (always enabled - run regardless of opt-out flag):

  • collect_hourly_metrics - Collect time-series metrics every hour (collector type via collector_type parameter)
  • collect_snapshot_metrics - Collect daily snapshot metrics (collector type via collector_type parameter)
  • daily_metrics_rollup - Merge hourly collections and create daily rollup summary
  • cleanup_metrics_data - Clean up old metrics data based on retention policies

Anonymization and Transmission Tasks (controlled by ANONYMIZED_DATA_COLLECTION, default: enabled, customer opt-out):

  • daily_anonymize_and_prepare - Anonymize daily rollup and prepare for transmission
  • send_anonymized_to_segment - Send anonymized metrics to Segment.com

Background Tasks

The service includes an automated background task system with intelligent routing:

Unified Service Management

# Start complete service (init*, then Django + dispatcher + scheduler)
python manage.py metrics_service run

# Start with custom configuration
python manage.py metrics_service run --workers 4

# Individual components
python manage.py runserver 0.0.0.0:8000  # web
python manage.py run_dispatcherd --workers 2  # worker
python manage.py run_task_scheduler  # scheduler

Automatic Task Routing

Tasks are automatically routed based on their properties:

  • Immediate tasks → Direct to dispatcherd
  • Scheduled tasks → APScheduler with DateTrigger
  • Recurring tasks → APScheduler with CronTrigger

No manual intervention required - create a task and it's automatically processed!

Task Groups & Feature Flags

We have these feature flags:

flag default
ANONYMIZED_DATA_COLLECTION true

You can change the default value using the METRICS_SERVICE_FEATURE_ENABLED__ prefixed-environment variables.

# Enable/disable anonymized data collection (default: true)
METRICS_SERVICE_FEATURE_ENABLED__ANONYMIZED_DATA_COLLECTION=false

These environment variables (or their default values) are used to populate the feature flags database tables during mangage.py metrics_service init-default-settings. You can also use python manage.py metrics_service remove-default-settings to remove these settings from the database.

The feature flag value in the database determines whether the anonymization and transmission tasks run. Collection, rollup, and cleanup tasks always run regardless of this flag. If the value is missing from the database, the environment variable is used as the fallback.

Development

Code Quality Tools

# Format + lint + test in one step (via poe task runner)
uv run poe check

# Or individually
uv run poe format     # ruff format (includes import sorting)
uv run poe lint       # ruff check
uv run poe unit-test  # pytest

# Direct ruff commands
ruff format .
ruff check . --fix

# Type checking (optional, gradual adoption)
mypy .

Pre-commit Hooks

This project uses pre-commit hooks to ensure code quality and automatically sync requirements files:

# Install pre-commit hooks
pre-commit install

# Run hooks on all files
pre-commit run --all-files

# Run hooks manually
pre-commit run

The pre-commit configuration automatically runs:

  • ruff check --fix — lint and auto-fix
  • ruff-format — code formatting
  • Platform Service Framework validation

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=apps --cov=metrics_service --cov-report=html

# Run specific test categories
pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only

Database Operations

# Create migrations
python manage.py makemigrations

# Apply migrations
python manage.py migrate

# Initialize settings table with feature flag defaults
python manage.py metrics_service init-default-settings

# Remove feature flags from settings
python manage.py metrics_service remove-default-settings

# Initialize DAB ServiceID (required after first migration)
python manage.py metrics_service init-service-id

# Initialize system tasks
python manage.py metrics_service init-system-tasks

Configuration

Metrics Service uses Dynaconf for settings management, following the Platform Service Framework.

Quick Start

Development Mode (default):

Important

The following example assumes those values exported as environment variables, to set on the settings.local.py file remove the METRICS_SERVICE_ prefix.

# Project
DJANGO_SETTINGS_MODULE=metrics_service.settings
METRICS_SERVICE_MODE=development
METRICS_SERVICE_SECRET_KEY=dev-secret-key-change-in-production
METRICS_SERVICE_DEBUG="true"
METRICS_SERVICE_ALLOWED_HOSTS='["localhost","127.0.0.1","metrics-service","0.0.0.0"]'

# Database
METRICS_SERVICE_DATABASES__default__ENGINE=django.db.backends.postgresql
METRICS_SERVICE_DATABASES__default__HOST=postgres
METRICS_SERVICE_DATABASES__default__PORT=5432
METRICS_SERVICE_DATABASES__default__USER=metrics_service
METRICS_SERVICE_DATABASES__default__PASSWORD=metrics_service
METRICS_SERVICE_DATABASES__default__NAME=metrics_service
METRICS_SERVICE_DATABASES__default__OPTIONS__sslmode=prefer

# Task App
METRICS_SERVICE_FEATURE_ENABLED__ANONYMIZED_DATA_COLLECTION="true"
DISPATCHERD_CONFIG_FILE=/app/apps/settings/dispatcherd.yaml
DISPATCHERD_ENABLED="true"
python manage.py runserver

Production Mode:

# Set environment mode and required secrets
export METRICS_SERVICE_MODE=production
export METRICS_SERVICE_SECRET_KEY="your-secure-random-key"
export METRICS_SERVICE_ALLOWED_HOSTS="yourdomain.com,api.yourdomain.com"

# Override defaults as needed
export METRICS_SERVICE_DATABASES__default__HOST=prod-db.example.com
export METRICS_SERVICE_DATABASES__default__PASSWORD=secure-password

python manage.py runserver

Configuration Methods

Settings are loaded in order of precedence (lowest to highest):

Read Only (overridable)

  • metrics_service/settings.py - Framework defaults

Editable:

  • apps/settings/defaults.py - Defaults for the whole project
  • apps/core/settings.py - Core settings, DAB related settings
  • apps/*/settings.py - Each app settings in the loading order
  • apps/settings/{mode}.py - Settings specific to the current METRICS_SERVICE_MODE
  • settings.local.py - For local settings (git ignored)
  • /etc/ansible-automation-platform/metrics_service/ - for prod environment overrides
  • METRICS_SERVICE_ prefixed environment variables

Common Environment Variables

Variable Description Required in Production
METRICS_SERVICE_MODE Environment mode (development/production) No (defaults to development)
METRICS_SERVICE_SECRET_KEY Django secret key Yes
METRICS_SERVICE_DEBUG Enable debug mode No
METRICS_SERVICE_LOG_LEVEL Logging level (DEBUG/INFO/WARNING/ERROR) No (defaults to INFO)
METRICS_SERVICE_DATABASES__default__HOST Database host No (has default)
METRICS_SERVICE_DATABASES__default__PASSWORD Database password No (has default)
METRICS_SERVICE_ALLOWED_HOSTS Allowed hosts (comma-separated) Yes (production)

Note: Use double underscores (__) for nested settings:

# Nested database configuration
export METRICS_SERVICE_DATABASES__default__HOST=localhost
export METRICS_SERVICE_DATABASES__default__PORT=5432

Logging Configuration

Metrics Service uses a centralized logging system that integrates with Django's logging framework. All log levels are controlled by a single environment variable.

Setting Log Level:

# For development - see all debug messages
export METRICS_SERVICE_LOG_LEVEL=DEBUG

# For production - informational messages only
export METRICS_SERVICE_LOG_LEVEL=INFO

# For troubleshooting - warnings and errors
export METRICS_SERVICE_LOG_LEVEL=WARNING

# For critical issues only
export METRICS_SERVICE_LOG_LEVEL=ERROR

Quick Debug Mode:

# Run with debug logging temporarily
METRICS_SERVICE_LOG_LEVEL=DEBUG python manage.py runserver

# Or for the complete service
METRICS_SERVICE_LOG_LEVEL=DEBUG python manage.py metrics_service run

Log Output Format:

All logs use Django's configured format with timestamps, log levels, request IDs (when applicable), module names, and messages:

2025-01-18 10:15:23,456 INFO     [abc123] apps.tasks.signals New task created: Cleanup (ID: 42)
2025-01-18 10:15:24,789 WARNING  [] apps.core.utils Database connection slow: 2.3s

To inspect the full settings loading history or debug a specific variable:

export DJANGO_SETTINGS_MODULE=metrics_service.settings
uv run dynaconf inspect -m debug -f yaml   # full loading history
uv run dynaconf inspect -k VARIABLE_NAME   # single variable

Deployment

Docker Production

# Build production image
docker build -t metrics-service .

# Run with production settings
docker run -p 8000:8000 \
  -e METRICS_SERVICE_MODE=production \
  -e METRICS_SERVICE_SECRET_KEY=your-secret-key \
  -e METRICS_SERVICE_DATABASES__default__HOST=your-db-host \
  -e METRICS_SERVICE_DATABASES__default__PASSWORD=your-db-password \
  metrics-service

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Make your changes with tests
  4. Run the test suite: uv run pytest
  5. Run code quality checks: uv run poe check
  6. Submit a pull request

Development Standards

  • Code Style: Ruff formatting, 120 character line length
  • Type Hints: Required for all new code
  • Documentation: Docstrings for public APIs
  • Testing: Test coverage for new features
  • Commits: Conventional commit messages

License

This project is licensed under the Apache License - see the LICENSE file for details.

Support

  • Documentation: Check the CLAUDE.md file for detailed development guidance
  • Issues: Report bugs and feature requests via GitHub issues
  • API Documentation: Interactive docs available at /api/docs/ when running