Background jobs

Budget: 3-5 days.

Why this matters

Solves can take minutes. Today solve_vrp blocks the request. That is fine for a demo with one user, but it holds a worker the whole time, it breaks on server restarts, and it has no retry path when FastVRP transiently fails. A queue fixes all three.

What to learn

TopicTime
Queue concepts: producers, consumers, visibility timeouts, dead letters0.5 day
Pick one of Celery, RQ, or arq and read its quickstart (arq fits this async codebase best)0.5 day
Redis as the broker: install, run, connect0.5 day
Progress reporting pattern: worker writes to Postgres, SSE handler reads1 day
Move solve_vrp from tools.py into an arq job1-2 days
Retry policy with exponential backoff for FastVRP timeouts0.5 day

Resources

  • arq docs: async-native task queue, closest fit to this codebase.
  • Celery docs: heavier, battle-tested alternative.
  • RQ docs: simple sync worker alternative.
  • FastAPI background tasks: read this to understand why they are not a substitute for a real queue.
  • Redis docs: you only need the basics to use it as a broker.

Exercise

Move the body of solve_vrp in tools.py into an arq job. The tool invocation should enqueue the job and return a job ID. Store status and result on the solutions table added in the previous chapter. Update the SSE chat handler so when the agent is waiting on a solve, it streams progress updates as they land in Postgres. Add a retry policy that retries FastVRP timeouts up to three times with exponential backoff.