Re-queuing failed ZIL extractions (recover-failed)
zil-graph-worker running. At least some failed rows in zil_extraction_log.
recover-failed returns requeued > 0. Failed rows transition to batch_submitted then complete over next 2-10 min.
1. Check failed count: SELECT COUNT(*), topic_id FROM zil_extraction_log WHERE status='failed' GROUP BY topic_id;
2. Run recovery for all failed: curl -s -X POST http://localhost:3555/api/zil/recover-failed -H 'Content-Type: application/json' -d '{"limit":50}'
3. Response shows requeued + skipped. Skipped = no chunks in Qdrant for that URL.
4. For skipped sources (no chunks): re-ingest the source via the research pipeline — publish research.source.ingested event with source_url.
5. Monitor requeued jobs: SELECT status, COUNT(*) FROM zil_extraction_log GROUP BY status;
6. If requeued jobs fail again: check logs for specific error. Common: "callLLMFromTemplate unavailable" → stale image (see runbook above).