Implement hybrid search pipeline combining vector, fulltext, and graph search across Neo4j, with cross-attention reranking via Synesis (Qwen3-VL-Reranker-2B) `/v1/rerank` endpoint. - Add SearchService with vector, fulltext, and graph search strategies - Add SynesisRerankerClient for multimodal reranking via HTTP API - Add search API endpoint (POST /search/) with filtering by library, collection, and library_type - Add SearchRequest/Response serializers and image search results - Add "nonfiction" to library_type choices - Consolidate reranker stack from two models to single Synesis service - Handle image analysis_status as "skipped" when analysis is unavailable - Add comprehensive tests for search pipeline and reranker client
15 KiB
Phase 3: Search & Re-ranking
Objective
Build the complete hybrid search pipeline: accept a query → embed it → search Neo4j (vector + full-text + graph traversal) → fuse candidates → re-rank via Synesis → return ranked results with content-type context. At the end of this phase, content is discoverable through multiple search modalities, ranked by cross-attention relevance, and ready for Phase 4's RAG generation.
Heritage
The hybrid search architecture adapts patterns from Spelunker's two-stage retrieval pipeline — vector recall + cross-attention re-ranking — enhanced with knowledge graph traversal, multimodal search, and content-type-aware re-ranking instructions.
Architecture Overview
User Query (text, optional image, optional filters)
│
├─→ Vector Search (Neo4j vector index — Chunk.embedding)
│ → Top-K nearest neighbors by cosine similarity
│
├─→ Full-Text Search (Neo4j fulltext index — Chunk.text_preview, Concept.name)
│ → BM25-scored matches
│
├─→ Graph Search (Cypher traversal)
│ → Concept-linked chunks via MENTIONS/REFERENCES/DEPICTS edges
│
└─→ Image Search (Neo4j vector index — ImageEmbedding.embedding)
→ Multimodal similarity (text-to-image in unified vector space)
│
└─→ Candidate Fusion (Reciprocal Rank Fusion)
→ Deduplicated, scored candidate list
│
└─→ Re-ranking (Synesis /v1/rerank)
→ Content-type-aware instruction injection
→ Cross-attention precision scoring
│
└─→ Final ranked results with metadata
Synesis Integration
Synesis is a custom FastAPI service built around Qwen3-VL-2B, providing both embedding and re-ranking over a clean REST API. It runs on pan.helu.ca:8400.
Embedding (Phase 2, already working): Synesis's /v1/embeddings endpoint is OpenAI-compatible — the existing EmbeddingClient handles it with api_type="openai".
Re-ranking (Phase 3, new): Synesis's /v1/rerank endpoint provides:
- Native
instructionparameter — maps directly toreranker_instructionfrom content types top_nfor server-side truncation- Multimodal support — both query and documents can include images
- Relevance scores for each candidate
# Synesis rerank request
POST http://pan.helu.ca:8400/v1/rerank
{
"query": {"text": "How do I configure a 3-phase motor?"},
"documents": [
{"text": "The motor controller requires..."},
{"text": "3-phase power is distributed..."}
],
"instruction": "Re-rank passages from technical documentation based on procedural relevance.",
"top_n": 10
}
Deliverables
1. Search Service (library/services/search.py)
The core search orchestrator. Accepts a SearchRequest, dispatches to individual search backends, fuses results, and optionally re-ranks.
SearchRequest
@dataclass
class SearchRequest:
query: str # Natural language query text
query_image: bytes | None = None # Optional image for multimodal search
library_uid: str | None = None # Scope to specific library
library_type: str | None = None # Scope to library type
collection_uid: str | None = None # Scope to specific collection
search_types: list[str] # ["vector", "fulltext", "graph"]
limit: int = 20 # Max results after fusion
vector_top_k: int = 50 # Candidates from vector search
fulltext_top_k: int = 30 # Candidates from fulltext search
graph_max_depth: int = 2 # Graph traversal depth
rerank: bool = True # Apply re-ranking
include_images: bool = True # Include image results
SearchResponse
@dataclass
class SearchCandidate:
chunk_uid: str
item_uid: str
item_title: str
library_type: str
text_preview: str
chunk_s3_key: str
chunk_index: int
score: float # Final score (post-fusion or post-rerank)
source: str # "vector", "fulltext", "graph"
metadata: dict # Page, section, nearby images, etc.
@dataclass
class ImageSearchResult:
image_uid: str
item_uid: str
item_title: str
image_type: str
description: str
s3_key: str
score: float
source: str # "vector", "graph"
@dataclass
class SearchResponse:
query: str
candidates: list[SearchCandidate] # Ranked text results
images: list[ImageSearchResult] # Ranked image results
total_candidates: int # Pre-fusion candidate count
search_time_ms: float
reranker_used: bool
reranker_model: str | None
search_types_used: list[str]
2. Vector Search
Uses Neo4j's db.index.vector.queryNodes() against chunk_embedding_index.
- Embed query text using system embedding model (via existing
EmbeddingClient) - Prepend library's
embedding_instructionwhen scoped to a specific library - Query Neo4j vector index for top-K Chunk nodes by cosine similarity
- Filter by library/collection via graph pattern matching
CALL db.index.vector.queryNodes('chunk_embedding_index', $top_k, $query_vector)
YIELD node AS chunk, score
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
AND ($library_type IS NULL OR lib.library_type = $library_type)
AND ($collection_uid IS NULL OR col.uid = $collection_uid)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
chunk.chunk_s3_key AS chunk_s3_key, chunk.chunk_index AS chunk_index,
item.uid AS item_uid, item.title AS item_title,
lib.library_type AS library_type, score
ORDER BY score DESC
LIMIT $top_k
3. Full-Text Search
Uses Neo4j fulltext indexes created by setup_neo4j_indexes.
- Query
chunk_text_fulltextfor Chunk matches (BM25) - Query
concept_name_fulltextfor Concept matches → traverse to connected Chunks - Query
item_title_fulltextfor Item title matches → get their Chunks - Normalize BM25 scores to 0-1 range for fusion compatibility
-- Chunk full-text search
CALL db.index.fulltext.queryNodes('chunk_text_fulltext', $query)
YIELD node AS chunk, score
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
item.uid AS item_uid, item.title AS item_title,
lib.library_type AS library_type, score
ORDER BY score DESC
LIMIT $top_k
-- Concept-to-Chunk traversal
CALL db.index.fulltext.queryNodes('concept_name_fulltext', $query)
YIELD node AS concept, score AS concept_score
MATCH (chunk:Chunk)-[:MENTIONS]->(concept)
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
item.uid AS item_uid, item.title AS item_title,
concept_score * 0.8 AS score
4. Graph Search
Knowledge-graph-powered discovery — the differentiator from standard RAG.
- Match query terms against Concept names via fulltext index
- Traverse
Concept ←[MENTIONS]- Chunk ←[HAS_CHUNK]- Item - Expand via
Concept -[RELATED_TO]- Conceptfor secondary connections - Score based on relationship weight and traversal depth
-- Concept graph traversal
CALL db.index.fulltext.queryNodes('concept_name_fulltext', $query)
YIELD node AS concept, score
MATCH path = (concept)<-[:MENTIONS|REFERENCES*1..2]-(connected)
WHERE connected:Chunk OR connected:Item
WITH concept, connected, score, length(path) AS depth
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
WHERE chunk = connected OR item = connected
RETURN DISTINCT chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
item.uid AS item_uid, item.title AS item_title,
score / (depth * 0.5 + 1) AS score
5. Image Search
Multimodal vector search against image_embedding_index.
- Embed query text (or image) using system embedding model
- Search
ImageEmbeddingvectors in unified multimodal space - Return with Image descriptions, OCR text, and Item associations from Phase 2B
- Also include images found via concept graph DEPICTS relationships
6. Candidate Fusion (library/services/fusion.py)
Reciprocal Rank Fusion (RRF) — parameter-light, proven in Spelunker.
def reciprocal_rank_fusion(
result_lists: list[list[SearchCandidate]],
k: int = 60,
) -> list[SearchCandidate]:
"""
RRF score = Σ 1 / (k + rank_i) for each list containing the candidate.
Candidates in multiple lists get boosted.
"""
- Deduplicates candidates by
chunk_uid - Candidates appearing in multiple search types get naturally boosted
- Sort by fused score descending, trim to
limit
7. Re-ranking Client (library/services/reranker.py)
Targets Synesis's POST /v1/rerank endpoint. Wraps the system reranker model's API configuration.
Synesis Backend
class RerankerClient:
def rerank(
self,
query: str,
candidates: list[SearchCandidate],
instruction: str = "",
top_n: int | None = None,
query_image: bytes | None = None,
) -> list[SearchCandidate]:
"""
Re-rank candidates via Synesis /v1/rerank.
Injects content-type reranker_instruction as the instruction parameter.
"""
Features:
- Uses
text_preview(500 chars) for document text — avoids S3 round-trips - Prepends library's
reranker_instructionas theinstructionparameter - Supports multimodal queries (text + image)
- Falls back gracefully when no reranker model configured
- Tracks usage via
LLMUsagewithpurpose="reranking"
8. Search API Endpoints
New endpoints in library/api/:
| Method | Route | Purpose |
|---|---|---|
POST |
/api/v1/library/search/ |
Full hybrid search + re-rank |
POST |
/api/v1/library/search/vector/ |
Vector-only search (debugging) |
POST |
/api/v1/library/search/fulltext/ |
Full-text-only search (debugging) |
GET |
/api/v1/library/concepts/ |
List/search concepts |
GET |
/api/v1/library/concepts/<uid>/graph/ |
Concept neighborhood graph |
9. Search UI Views
| URL | View | Purpose |
|---|---|---|
/library/search/ |
search |
Search page with query input + filters |
/library/concepts/ |
concept_list |
Browse concepts with search |
/library/concepts/<uid>/ |
concept_detail |
Single concept with connections |
10. Prometheus Metrics
| Metric | Type | Labels | Purpose |
|---|---|---|---|
mnemosyne_search_requests_total |
Counter | search_type, library_type | Search throughput |
mnemosyne_search_duration_seconds |
Histogram | search_type | Per-search-type latency |
mnemosyne_search_candidates_total |
Histogram | search_type | Candidates per search type |
mnemosyne_fusion_duration_seconds |
Histogram | — | Fusion latency |
mnemosyne_rerank_requests_total |
Counter | model_name, status | Re-rank throughput |
mnemosyne_rerank_duration_seconds |
Histogram | model_name | Re-rank latency |
mnemosyne_rerank_candidates |
Histogram | — | Candidates sent to reranker |
mnemosyne_search_total_duration_seconds |
Histogram | — | End-to-end search latency |
11. Management Commands
| Command | Purpose |
|---|---|
search <query> [--library-uid] [--limit] [--no-rerank] |
CLI search for testing |
search_stats |
Search index statistics |
12. Settings
# Search configuration
SEARCH_VECTOR_TOP_K = env.int("SEARCH_VECTOR_TOP_K", default=50)
SEARCH_FULLTEXT_TOP_K = env.int("SEARCH_FULLTEXT_TOP_K", default=30)
SEARCH_GRAPH_MAX_DEPTH = env.int("SEARCH_GRAPH_MAX_DEPTH", default=2)
SEARCH_RRF_K = env.int("SEARCH_RRF_K", default=60)
SEARCH_DEFAULT_LIMIT = env.int("SEARCH_DEFAULT_LIMIT", default=20)
RERANKER_MAX_CANDIDATES = env.int("RERANKER_MAX_CANDIDATES", default=32)
RERANKER_TIMEOUT = env.int("RERANKER_TIMEOUT", default=30)
File Structure
mnemosyne/library/
├── services/
│ ├── search.py # NEW — SearchService orchestrator
│ ├── fusion.py # NEW — Reciprocal Rank Fusion
│ ├── reranker.py # NEW — Synesis re-ranking client
│ └── ... # Existing services unchanged
├── metrics.py # Modified — add search/rerank metrics
├── views.py # Modified — add search UI views
├── urls.py # Modified — add search routes
├── api/
│ ├── views.py # Modified — add search API endpoints
│ ├── serializers.py # Modified — add search serializers
│ └── urls.py # Modified — add search API routes
├── management/commands/
│ ├── search.py # NEW — CLI search command
│ └── search_stats.py # NEW — Index statistics
├── templates/library/
│ ├── search.html # NEW — Search page
│ ├── concept_list.html # NEW — Concept browser
│ └── concept_detail.html # NEW — Concept detail
└── tests/
├── test_search.py # NEW — Search service tests
├── test_fusion.py # NEW — RRF fusion tests
├── test_reranker.py # NEW — Re-ranking client tests
└── test_search_api.py # NEW — Search API endpoint tests
Dependencies
No new Python dependencies required. Phase 3 uses:
neomodel+ raw Cypher (Neo4j search)requests(Synesis reranker HTTP)EmbeddingClientfrom Phase 2 (query embedding)prometheus_client(metrics)
Testing Strategy
All tests use Django TestCase. External services mocked.
| Test File | Scope |
|---|---|
test_search.py |
SearchService orchestration, individual search methods, library/collection scoping |
test_fusion.py |
RRF correctness, deduplication, score calculation, edge cases |
test_reranker.py |
Synesis backend (mocked HTTP), instruction injection, graceful fallback |
test_search_api.py |
API endpoints, request validation, response format |
Success Criteria
- Vector search returns Chunk nodes ranked by cosine similarity from Neo4j
- Full-text search returns matches from Neo4j fulltext indexes
- Graph search traverses Concept relationships to discover related content
- Image search returns images via multimodal vector similarity
- Reciprocal Rank Fusion correctly merges and deduplicates across search types
- Re-ranking via Synesis
/v1/rerankre-scores candidates with cross-attention - Content-type
reranker_instructioninjected per library type - Search scoping works (by library, library type, collection)
- Search gracefully degrades: no reranker → skip; no embedding model → clear error
- Search API endpoints return structured results with scores and metadata
- Search UI allows querying with filters and displays ranked results
- Concept explorer allows browsing the knowledge graph
- Prometheus metrics track search throughput, latency, and candidate counts
- CLI search command works for testing
- All tests pass with mocked external services