Worker Management
Purpose¶
Event-driven worker management system enabling deployment and execution of workers written in multiple languages (Python, C#, .NET DLL) that subscribe to Dapr pub/sub topics, process CloudEvents, and coordinate via distributed locks.
Requirements¶
Requirement: Worker Lifecycle Management¶
The WorkManager SHALL support creating, starting, stopping, and deleting workers. Workers SHALL persist their configuration and restore automatically on service restart.
Scenario: Create and start a worker¶
- WHEN a client creates a worker with a topic subscription and code payload
- THEN the worker SHALL be persisted to the state store, subscribed to the topic, and set to Running status
Scenario: Hot-reload worker code¶
- WHEN a client updates a running worker's code
- THEN the worker SHALL switch to the new code without restarting the service
Scenario: Auto-recovery on restart¶
- WHEN the WorkManager service restarts
- THEN all previously persisted workers SHALL be restored to their saved state before the health check passes
Scenario: Worker version history¶
- WHEN a worker's code is updated
- THEN the previous code version SHALL be preserved in a versioned history
Requirement: Polyglot Worker Engines¶
The WorkManager SHALL support executing worker code in multiple languages via a pluggable engine architecture. Built-in engines SHALL include Python (subprocess), C# (Roslyn runtime compilation), and pre-compiled .NET DLL (isolated AssemblyLoadContext).
Scenario: Python worker execution¶
- WHEN a worker with a Python engine processes a message
- THEN the worker code SHALL execute in a dedicated Python subprocess with a configurable timeout
Scenario: C# source worker execution¶
- WHEN a worker with a C# source engine processes a message
- THEN the code SHALL be compiled at runtime via Roslyn and executed in-process
Scenario: DLL worker execution¶
- WHEN a worker with a DotnetDll engine processes a message
- THEN the assembly SHALL be loaded in an isolated
AssemblyLoadContextand executed in-process
Scenario: Engine extension¶
- WHEN a new engine implementation conforms to the
IEngineinterface - THEN it SHALL be registrable in the engine registry without modifying core WorkManager code
Requirement: Event-Driven Message Processing¶
Workers SHALL subscribe to Dapr pub/sub topics and process incoming CloudEvents. The output CloudEvent type field SHALL determine the publish topic.
Scenario: Message processing¶
- WHEN a CloudEvent is published to a topic a worker subscribes to
- THEN the worker's code SHALL be invoked with the event data and any output events SHALL be published to the topic matching their
typefield
Scenario: Concurrent message handling¶
- WHEN multiple messages arrive for the same worker
- THEN they SHALL be processed concurrently, subject to group locking constraints
Requirement: Distributed Coordination¶
Workers assigned to the same group SHALL process messages with mutually exclusive execution using distributed locks with fencing tokens. Messages that exhaust retries SHALL go to a dead-letter topic.
Scenario: Lock acquisition¶
- WHEN a worker in a group receives a message
- THEN it SHALL acquire a distributed lock with a fencing token before processing
Scenario: Lock contention¶
- WHEN a lock cannot be acquired because another instance holds it
- THEN the message SHALL be retried with exponential backoff, up to the Dapr max redelivery count
Scenario: Dead letter¶
- WHEN a message exhausts all redelivery attempts due to lock contention
- THEN the message SHALL be routed to the dead-letter topic (
<topic>-dead)
Scenario: Stale lock theft¶
- WHEN a lock holder's fencing token is stale
- THEN another instance SHALL be able to steal the lock
Requirement: Worker Code Sources¶
Workers SHALL accept code from inline Base64-encoded content or from HTTP(S) URLs. URL-based sources SHALL be validated against SSRF and host allowlist constraints as defined in the cross-cutting security requirements.
Scenario: Inline code worker¶
- WHEN a worker is created with a Base64-encoded code string
- THEN the decoded code SHALL be passed to the engine for execution
Scenario: URL code worker¶
- WHEN a worker is created with a URL pointing to code
- THEN the code SHALL be fetched, SHA-256 verified, and passed to the engine
Scenario: Blocked URL¶
- WHEN a worker's code URL targets a private IP or a host not in the allowlist
- THEN the worker creation SHALL fail with a security rejection
Requirement: Python Sandboxing¶
Python worker subprocesses SHALL run with restricted interpreter flags and a package allowlist. All non-allowed imports SHALL be blocked.
Scenario: Allowed import¶
- WHEN a Python worker imports a package listed in
PYTHON_ALLOWED_PACKAGES - THEN the import SHALL succeed
Scenario: Blocked import¶
- WHEN a Python worker imports a package not in the allowlist
- THEN the import SHALL fail with an error