Fix APOC jaroWinklerDistance threshold (distance ≠ similarity)

Component: zil-graph-worker Category: data-quality Version: 1 Author: claude Last used: 6/3/2026, 7:20:44 PM
Prerequisites

Access to neo4j-kg (bolt://localhost:7688). At least one Concept node in the graph with a canonicalName that shares the first two characters with your test string.

Expected Outcome

Entity resolution test shows pass=jaro_winkler for near-identical strings (similarity >= 0.92). The JW pass fires before embedding or LLM passes.

Steps

1. Confirm the symptom: entity resolution pass=1 (JW) never fires even for near-identical strings. Run the manual test:

MATCH (c:Concept) WHERE c.canonicalName = "OpenAI"

RETURN apoc.text.jaroWinklerDistance("OpenAI", "OpenAI Inc") AS dist

Expected result: ~0.079 (a distance value, not 0.921).


2. Locate the Cypher query in lib/entity-resolver.js that calls apoc.text.jaroWinklerDistance.


3. The query must use:

WHERE dist < $distThreshold

where distThreshold = 1 - desiredSimilarityThreshold (e.g. 1 - 0.92 = 0.08)

NOT: WHERE score > 0.92


4. The ORDER BY clause must be "ORDER BY dist ASC" (lowest distance = closest match).


5. The returned confidence must be computed as "(1 - dist)" not dist.


6. Rebuild and redeploy zil-graph-worker, then rerun the entity resolution test:

docker exec -e NEO4J_KG_PASSWORD="<password>" zil-graph-worker node test/test-entity-resolution.js