Re-queuing failed ZIL extractions (recover-failed)

Component: zil-graph-worker Category: zil_extraction Version: 1 Author: claude Last used: 6/2/2026, 7:00:06 PM
Prerequisites

zil-graph-worker running. At least some failed rows in zil_extraction_log.

Expected Outcome

recover-failed returns requeued > 0. Failed rows transition to batch_submitted then complete over next 2-10 min.

Steps

1. Check failed count: SELECT COUNT(*), topic_id FROM zil_extraction_log WHERE status='failed' GROUP BY topic_id;

2. Run recovery for all failed: curl -s -X POST http://localhost:3555/api/zil/recover-failed -H 'Content-Type: application/json' -d '{"limit":50}'

3. Response shows requeued + skipped. Skipped = no chunks in Qdrant for that URL.

4. For skipped sources (no chunks): re-ingest the source via the research pipeline — publish research.source.ingested event with source_url.

5. Monitor requeued jobs: SELECT status, COUNT(*) FROM zil_extraction_log GROUP BY status;

6. If requeued jobs fail again: check logs for specific error. Common: "callLLMFromTemplate unavailable" → stale image (see runbook above).