web-fetcher: Known Failure Modes

Component: web-fetcher Tags: web-fetcher,research,fetch,pipeline Author: Updated: 5/20/2026, 9:28:34 PM

web-fetcher fetches URLs from web.fetch.requested events and publishes web.fetch.completed.


Flow shape: web.fetch.requested -> web.fetch.completed -> research.source.ingested


COMMON FAILURES:

1. Fetch timeout / connection refused: Site blocked the request or took too long. Check ops-tracker Flow page for missing web.fetch.completed events. Usually transient ??? next poll cycle retries. If systemic, domain may be blocking the server IP.

2. Non-2xx response (403, 429, 5xx): Site rejected the request. system-event-listener publishes web.fetch.requested.failed -> ops-tracker creates an issue with the URL. 429 = rate limited. 403 = blocked.

3. Empty or unparseable body: Page returned 200 but content was empty or JS-rendered. document-ingester will receive empty payload. May need browser-based fetching for JS-heavy pages.

4. Missing web.fetch.completed in event log: web-fetcher consumed the event but crashed before publishing. Check web-fetcher container logs and restart.