Skip to content

Worker Management

Purpose

Event-driven worker management system enabling deployment and execution of workers written in multiple languages (Python, C#, .NET DLL) that subscribe to Dapr pub/sub topics, process CloudEvents, and coordinate via distributed locks.

Requirements

Requirement: Worker Lifecycle Management

The WorkManager SHALL support creating, starting, stopping, and deleting workers. Workers SHALL persist their configuration and restore automatically on service restart.

Scenario: Create and start a worker

  • WHEN a client creates a worker with a topic subscription and code payload
  • THEN the worker SHALL be persisted to the state store, subscribed to the topic, and set to Running status

Scenario: Hot-reload worker code

  • WHEN a client updates a running worker's code
  • THEN the worker SHALL switch to the new code without restarting the service

Scenario: Auto-recovery on restart

  • WHEN the WorkManager service restarts
  • THEN all previously persisted workers SHALL be restored to their saved state before the health check passes

Scenario: Worker version history

  • WHEN a worker's code is updated
  • THEN the previous code version SHALL be preserved in a versioned history

Requirement: Polyglot Worker Engines

The WorkManager SHALL support executing worker code in multiple languages via a pluggable engine architecture. Built-in engines SHALL include Python (subprocess), C# (Roslyn runtime compilation), and pre-compiled .NET DLL (isolated AssemblyLoadContext).

Scenario: Python worker execution

  • WHEN a worker with a Python engine processes a message
  • THEN the worker code SHALL execute in a dedicated Python subprocess with a configurable timeout

Scenario: C# source worker execution

  • WHEN a worker with a C# source engine processes a message
  • THEN the code SHALL be compiled at runtime via Roslyn and executed in-process

Scenario: DLL worker execution

  • WHEN a worker with a DotnetDll engine processes a message
  • THEN the assembly SHALL be loaded in an isolated AssemblyLoadContext and executed in-process

Scenario: Engine extension

  • WHEN a new engine implementation conforms to the IEngine interface
  • THEN it SHALL be registrable in the engine registry without modifying core WorkManager code

Requirement: Event-Driven Message Processing

Workers SHALL subscribe to Dapr pub/sub topics and process incoming CloudEvents. The output CloudEvent type field SHALL determine the publish topic.

Scenario: Message processing

  • WHEN a CloudEvent is published to a topic a worker subscribes to
  • THEN the worker's code SHALL be invoked with the event data and any output events SHALL be published to the topic matching their type field

Scenario: Concurrent message handling

  • WHEN multiple messages arrive for the same worker
  • THEN they SHALL be processed concurrently, subject to group locking constraints

Requirement: Distributed Coordination

Workers assigned to the same group SHALL process messages with mutually exclusive execution using distributed locks with fencing tokens. Messages that exhaust retries SHALL go to a dead-letter topic.

Scenario: Lock acquisition

  • WHEN a worker in a group receives a message
  • THEN it SHALL acquire a distributed lock with a fencing token before processing

Scenario: Lock contention

  • WHEN a lock cannot be acquired because another instance holds it
  • THEN the message SHALL be retried with exponential backoff, up to the Dapr max redelivery count

Scenario: Dead letter

  • WHEN a message exhausts all redelivery attempts due to lock contention
  • THEN the message SHALL be routed to the dead-letter topic (<topic>-dead)

Scenario: Stale lock theft

  • WHEN a lock holder's fencing token is stale
  • THEN another instance SHALL be able to steal the lock

Requirement: Worker Code Sources

Workers SHALL accept code from inline Base64-encoded content or from HTTP(S) URLs. URL-based sources SHALL be validated against SSRF and host allowlist constraints as defined in the cross-cutting security requirements.

Scenario: Inline code worker

  • WHEN a worker is created with a Base64-encoded code string
  • THEN the decoded code SHALL be passed to the engine for execution

Scenario: URL code worker

  • WHEN a worker is created with a URL pointing to code
  • THEN the code SHALL be fetched, SHA-256 verified, and passed to the engine

Scenario: Blocked URL

  • WHEN a worker's code URL targets a private IP or a host not in the allowlist
  • THEN the worker creation SHALL fail with a security rejection

Requirement: Python Sandboxing

Python worker subprocesses SHALL run with restricted interpreter flags and a package allowlist. All non-allowed imports SHALL be blocked.

Scenario: Allowed import

  • WHEN a Python worker imports a package listed in PYTHON_ALLOWED_PACKAGES
  • THEN the import SHALL succeed

Scenario: Blocked import

  • WHEN a Python worker imports a package not in the allowlist
  • THEN the import SHALL fail with an error