document-ingester: Known Failure Modes

Component: document-ingester Tags: document-ingester,research,qdrant,embedding,chunking Author: Updated: 5/20/2026, 9:28:34 PM

document-ingester processes web.fetch.completed events: extracts text, chunks, embeds, stores in Qdrant, publishes research.source.ingested.


COMMON FAILURES:

1. Qdrant unavailable: Chunked content cannot be stored. research.source.ingested will not be published. Check Qdrant health at http://192.168.1.151:6333/health.

2. Empty or too-short content from web-fetcher: Extracted text is below minimum length threshold. Event may be silently dropped. Check logs for "content too short" messages.

3. Missing research.source.ingested after web.fetch.completed: Ingestion step failed or was skipped. Check flow tracer ??? if web.fetch.completed is present but research.source.ingested is missing, chain is_stalled. Check document-ingester logs.

4. Duplicate URL ingestion: Same URL ingested multiple times. document-ingester should dedup by source_url + topic_id. Check research_sources for duplicate rows.