Files

Robert Helewka 634845fee0 feat: add Phase 3 hybrid search with Synesis reranking

Implement hybrid search pipeline combining vector, fulltext, and graph
search across Neo4j, with cross-attention reranking via Synesis
(Qwen3-VL-Reranker-2B) `/v1/rerank` endpoint.

- Add SearchService with vector, fulltext, and graph search strategies
- Add SynesisRerankerClient for multimodal reranking via HTTP API
- Add search API endpoint (POST /search/) with filtering by library,
  collection, and library_type
- Add SearchRequest/Response serializers and image search results
- Add "nonfiction" to library_type choices
- Consolidate reranker stack from two models to single Synesis service
- Handle image analysis_status as "skipped" when analysis is unavailable
- Add comprehensive tests for search pipeline and reranker client

2026-03-29 18:09:50 +00:00

15 KiB

Raw Blame History

Phase 3: Search & Re-ranking

Objective

Build the complete hybrid search pipeline: accept a query → embed it → search Neo4j (vector + full-text + graph traversal) → fuse candidates → re-rank via Synesis → return ranked results with content-type context. At the end of this phase, content is discoverable through multiple search modalities, ranked by cross-attention relevance, and ready for Phase 4's RAG generation.

Heritage

The hybrid search architecture adapts patterns from Spelunker's two-stage retrieval pipeline — vector recall + cross-attention re-ranking — enhanced with knowledge graph traversal, multimodal search, and content-type-aware re-ranking instructions.

Architecture Overview

User Query (text, optional image, optional filters)
  │
  ├─→ Vector Search (Neo4j vector index — Chunk.embedding)
  │     → Top-K nearest neighbors by cosine similarity
  │
  ├─→ Full-Text Search (Neo4j fulltext index — Chunk.text_preview, Concept.name)
  │     → BM25-scored matches
  │
  ├─→ Graph Search (Cypher traversal)
  │     → Concept-linked chunks via MENTIONS/REFERENCES/DEPICTS edges
  │
  └─→ Image Search (Neo4j vector index — ImageEmbedding.embedding)
        → Multimodal similarity (text-to-image in unified vector space)
          │
          └─→ Candidate Fusion (Reciprocal Rank Fusion)
                → Deduplicated, scored candidate list
                  │
                  └─→ Re-ranking (Synesis /v1/rerank)
                        → Content-type-aware instruction injection
                        → Cross-attention precision scoring
                          │
                          └─→ Final ranked results with metadata

Synesis Integration

Synesis is a custom FastAPI service built around Qwen3-VL-2B, providing both embedding and re-ranking over a clean REST API. It runs on pan.helu.ca:8400.

Embedding (Phase 2, already working): Synesis's /v1/embeddings endpoint is OpenAI-compatible — the existing EmbeddingClient handles it with api_type="openai".

Re-ranking (Phase 3, new): Synesis's /v1/rerank endpoint provides:

Native instruction parameter — maps directly to reranker_instruction from content types
top_n for server-side truncation
Multimodal support — both query and documents can include images
Relevance scores for each candidate

# Synesis rerank request
POST http://pan.helu.ca:8400/v1/rerank
{
    "query": {"text": "How do I configure a 3-phase motor?"},
    "documents": [
        {"text": "The motor controller requires..."},
        {"text": "3-phase power is distributed..."}
    ],
    "instruction": "Re-rank passages from technical documentation based on procedural relevance.",
    "top_n": 10
}

Deliverables

1. Search Service (`library/services/search.py`)

The core search orchestrator. Accepts a SearchRequest, dispatches to individual search backends, fuses results, and optionally re-ranks.

SearchRequest

@dataclass
class SearchRequest:
    query: str                           # Natural language query text
    query_image: bytes | None = None     # Optional image for multimodal search
    library_uid: str | None = None       # Scope to specific library
    library_type: str | None = None      # Scope to library type
    collection_uid: str | None = None    # Scope to specific collection
    search_types: list[str]              # ["vector", "fulltext", "graph"]
    limit: int = 20                      # Max results after fusion
    vector_top_k: int = 50              # Candidates from vector search
    fulltext_top_k: int = 30            # Candidates from fulltext search
    graph_max_depth: int = 2             # Graph traversal depth
    rerank: bool = True                  # Apply re-ranking
    include_images: bool = True          # Include image results

SearchResponse

@dataclass
class SearchCandidate:
    chunk_uid: str
    item_uid: str
    item_title: str
    library_type: str
    text_preview: str
    chunk_s3_key: str
    chunk_index: int
    score: float                         # Final score (post-fusion or post-rerank)
    source: str                          # "vector", "fulltext", "graph"
    metadata: dict                       # Page, section, nearby images, etc.

@dataclass
class ImageSearchResult:
    image_uid: str
    item_uid: str
    item_title: str
    image_type: str
    description: str
    s3_key: str
    score: float
    source: str                          # "vector", "graph"

@dataclass
class SearchResponse:
    query: str
    candidates: list[SearchCandidate]    # Ranked text results
    images: list[ImageSearchResult]      # Ranked image results
    total_candidates: int                # Pre-fusion candidate count
    search_time_ms: float
    reranker_used: bool
    reranker_model: str | None
    search_types_used: list[str]

2. Vector Search

Uses Neo4j's db.index.vector.queryNodes() against chunk_embedding_index.

Embed query text using system embedding model (via existing EmbeddingClient)
Prepend library's embedding_instruction when scoped to a specific library
Query Neo4j vector index for top-K Chunk nodes by cosine similarity
Filter by library/collection via graph pattern matching

CALL db.index.vector.queryNodes('chunk_embedding_index', $top_k, $query_vector)
YIELD node AS chunk, score
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
  AND ($library_type IS NULL OR lib.library_type = $library_type)
  AND ($collection_uid IS NULL OR col.uid = $collection_uid)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       chunk.chunk_s3_key AS chunk_s3_key, chunk.chunk_index AS chunk_index,
       item.uid AS item_uid, item.title AS item_title,
       lib.library_type AS library_type, score
ORDER BY score DESC
LIMIT $top_k

3. Full-Text Search

Uses Neo4j fulltext indexes created by setup_neo4j_indexes.

Query chunk_text_fulltext for Chunk matches (BM25)
Query concept_name_fulltext for Concept matches → traverse to connected Chunks
Query item_title_fulltext for Item title matches → get their Chunks
Normalize BM25 scores to 0-1 range for fusion compatibility

-- Chunk full-text search
CALL db.index.fulltext.queryNodes('chunk_text_fulltext', $query)
YIELD node AS chunk, score
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       item.uid AS item_uid, item.title AS item_title,
       lib.library_type AS library_type, score
ORDER BY score DESC
LIMIT $top_k

-- Concept-to-Chunk traversal
CALL db.index.fulltext.queryNodes('concept_name_fulltext', $query)
YIELD node AS concept, score AS concept_score
MATCH (chunk:Chunk)-[:MENTIONS]->(concept)
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       item.uid AS item_uid, item.title AS item_title,
       concept_score * 0.8 AS score

4. Graph Search

Knowledge-graph-powered discovery — the differentiator from standard RAG.

Match query terms against Concept names via fulltext index
Traverse Concept ←[MENTIONS]- Chunk ←[HAS_CHUNK]- Item
Expand via Concept -[RELATED_TO]- Concept for secondary connections
Score based on relationship weight and traversal depth

-- Concept graph traversal
CALL db.index.fulltext.queryNodes('concept_name_fulltext', $query)
YIELD node AS concept, score
MATCH path = (concept)<-[:MENTIONS|REFERENCES*1..2]-(connected)
WHERE connected:Chunk OR connected:Item
WITH concept, connected, score, length(path) AS depth
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
WHERE chunk = connected OR item = connected
RETURN DISTINCT chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       item.uid AS item_uid, item.title AS item_title,
       score / (depth * 0.5 + 1) AS score

5. Image Search

Multimodal vector search against image_embedding_index.

Embed query text (or image) using system embedding model
Search ImageEmbedding vectors in unified multimodal space
Return with Image descriptions, OCR text, and Item associations from Phase 2B
Also include images found via concept graph DEPICTS relationships

6. Candidate Fusion (`library/services/fusion.py`)

Reciprocal Rank Fusion (RRF) — parameter-light, proven in Spelunker.

def reciprocal_rank_fusion(
    result_lists: list[list[SearchCandidate]],
    k: int = 60,
) -> list[SearchCandidate]:
    """
    RRF score = Σ 1 / (k + rank_i) for each list containing the candidate.
    Candidates in multiple lists get boosted.
    """

Deduplicates candidates by chunk_uid
Candidates appearing in multiple search types get naturally boosted
Sort by fused score descending, trim to limit

7. Re-ranking Client (`library/services/reranker.py`)

Targets Synesis's POST /v1/rerank endpoint. Wraps the system reranker model's API configuration.

Synesis Backend

class RerankerClient:
    def rerank(
        self,
        query: str,
        candidates: list[SearchCandidate],
        instruction: str = "",
        top_n: int | None = None,
        query_image: bytes | None = None,
    ) -> list[SearchCandidate]:
        """
        Re-rank candidates via Synesis /v1/rerank.
        
        Injects content-type reranker_instruction as the instruction parameter.
        """

Features:

Uses text_preview (500 chars) for document text — avoids S3 round-trips
Prepends library's reranker_instruction as the instruction parameter
Supports multimodal queries (text + image)
Falls back gracefully when no reranker model configured
Tracks usage via LLMUsage with purpose="reranking"

8. Search API Endpoints

New endpoints in library/api/:

Method	Route	Purpose
`POST`	`/api/v1/library/search/`	Full hybrid search + re-rank
`POST`	`/api/v1/library/search/vector/`	Vector-only search (debugging)
`POST`	`/api/v1/library/search/fulltext/`	Full-text-only search (debugging)
`GET`	`/api/v1/library/concepts/`	List/search concepts
`GET`	`/api/v1/library/concepts/<uid>/graph/`	Concept neighborhood graph

9. Search UI Views

URL	View	Purpose
`/library/search/`	`search`	Search page with query input + filters
`/library/concepts/`	`concept_list`	Browse concepts with search
`/library/concepts/<uid>/`	`concept_detail`	Single concept with connections

10. Prometheus Metrics

Metric	Type	Labels	Purpose
`mnemosyne_search_requests_total`	Counter	search_type, library_type	Search throughput
`mnemosyne_search_duration_seconds`	Histogram	search_type	Per-search-type latency
`mnemosyne_search_candidates_total`	Histogram	search_type	Candidates per search type
`mnemosyne_fusion_duration_seconds`	Histogram	—	Fusion latency
`mnemosyne_rerank_requests_total`	Counter	model_name, status	Re-rank throughput
`mnemosyne_rerank_duration_seconds`	Histogram	model_name	Re-rank latency
`mnemosyne_rerank_candidates`	Histogram	—	Candidates sent to reranker
`mnemosyne_search_total_duration_seconds`	Histogram	—	End-to-end search latency

11. Management Commands

Command	Purpose
`search <query> [--library-uid] [--limit] [--no-rerank]`	CLI search for testing
`search_stats`	Search index statistics

12. Settings

# Search configuration
SEARCH_VECTOR_TOP_K = env.int("SEARCH_VECTOR_TOP_K", default=50)
SEARCH_FULLTEXT_TOP_K = env.int("SEARCH_FULLTEXT_TOP_K", default=30)
SEARCH_GRAPH_MAX_DEPTH = env.int("SEARCH_GRAPH_MAX_DEPTH", default=2)
SEARCH_RRF_K = env.int("SEARCH_RRF_K", default=60)
SEARCH_DEFAULT_LIMIT = env.int("SEARCH_DEFAULT_LIMIT", default=20)
RERANKER_MAX_CANDIDATES = env.int("RERANKER_MAX_CANDIDATES", default=32)
RERANKER_TIMEOUT = env.int("RERANKER_TIMEOUT", default=30)

File Structure

mnemosyne/library/
├── services/
│   ├── search.py              # NEW — SearchService orchestrator
│   ├── fusion.py              # NEW — Reciprocal Rank Fusion
│   ├── reranker.py            # NEW — Synesis re-ranking client
│   └── ...                    # Existing services unchanged
├── metrics.py                 # Modified — add search/rerank metrics
├── views.py                   # Modified — add search UI views
├── urls.py                    # Modified — add search routes
├── api/
│   ├── views.py               # Modified — add search API endpoints
│   ├── serializers.py         # Modified — add search serializers
│   └── urls.py                # Modified — add search API routes
├── management/commands/
│   ├── search.py              # NEW — CLI search command
│   └── search_stats.py        # NEW — Index statistics
├── templates/library/
│   ├── search.html            # NEW — Search page
│   ├── concept_list.html      # NEW — Concept browser
│   └── concept_detail.html    # NEW — Concept detail
└── tests/
    ├── test_search.py         # NEW — Search service tests
    ├── test_fusion.py         # NEW — RRF fusion tests
    ├── test_reranker.py       # NEW — Re-ranking client tests
    └── test_search_api.py     # NEW — Search API endpoint tests

Dependencies

No new Python dependencies required. Phase 3 uses:

neomodel + raw Cypher (Neo4j search)
requests (Synesis reranker HTTP)
EmbeddingClient from Phase 2 (query embedding)
prometheus_client (metrics)

Testing Strategy

All tests use Django TestCase. External services mocked.

Test File	Scope
`test_search.py`	SearchService orchestration, individual search methods, library/collection scoping
`test_fusion.py`	RRF correctness, deduplication, score calculation, edge cases
`test_reranker.py`	Synesis backend (mocked HTTP), instruction injection, graceful fallback
`test_search_api.py`	API endpoints, request validation, response format

Success Criteria

Vector search returns Chunk nodes ranked by cosine similarity from Neo4j
Full-text search returns matches from Neo4j fulltext indexes
Graph search traverses Concept relationships to discover related content
Image search returns images via multimodal vector similarity
Reciprocal Rank Fusion correctly merges and deduplicates across search types
Re-ranking via Synesis /v1/rerank re-scores candidates with cross-attention
Content-type reranker_instruction injected per library type
Search scoping works (by library, library type, collection)
Search gracefully degrades: no reranker → skip; no embedding model → clear error
Search API endpoints return structured results with scores and metadata
Search UI allows querying with filters and displays ranked results
Concept explorer allows browsing the knowledge graph
Prometheus metrics track search throughput, latency, and candidate counts
CLI search command works for testing
All tests pass with mocked external services

15 KiB Raw Blame History