Files
mnemosyne/docs/PHASE_3_SEARCH_AND_RERANKING.md
Robert Helewka 634845fee0 feat: add Phase 3 hybrid search with Synesis reranking
Implement hybrid search pipeline combining vector, fulltext, and graph
search across Neo4j, with cross-attention reranking via Synesis
(Qwen3-VL-Reranker-2B) `/v1/rerank` endpoint.

- Add SearchService with vector, fulltext, and graph search strategies
- Add SynesisRerankerClient for multimodal reranking via HTTP API
- Add search API endpoint (POST /search/) with filtering by library,
  collection, and library_type
- Add SearchRequest/Response serializers and image search results
- Add "nonfiction" to library_type choices
- Consolidate reranker stack from two models to single Synesis service
- Handle image analysis_status as "skipped" when analysis is unavailable
- Add comprehensive tests for search pipeline and reranker client
2026-03-29 18:09:50 +00:00

15 KiB

Phase 3: Search & Re-ranking

Objective

Build the complete hybrid search pipeline: accept a query → embed it → search Neo4j (vector + full-text + graph traversal) → fuse candidates → re-rank via Synesis → return ranked results with content-type context. At the end of this phase, content is discoverable through multiple search modalities, ranked by cross-attention relevance, and ready for Phase 4's RAG generation.

Heritage

The hybrid search architecture adapts patterns from Spelunker's two-stage retrieval pipeline — vector recall + cross-attention re-ranking — enhanced with knowledge graph traversal, multimodal search, and content-type-aware re-ranking instructions.

Architecture Overview

User Query (text, optional image, optional filters)
  │
  ├─→ Vector Search (Neo4j vector index — Chunk.embedding)
  │     → Top-K nearest neighbors by cosine similarity
  │
  ├─→ Full-Text Search (Neo4j fulltext index — Chunk.text_preview, Concept.name)
  │     → BM25-scored matches
  │
  ├─→ Graph Search (Cypher traversal)
  │     → Concept-linked chunks via MENTIONS/REFERENCES/DEPICTS edges
  │
  └─→ Image Search (Neo4j vector index — ImageEmbedding.embedding)
        → Multimodal similarity (text-to-image in unified vector space)
          │
          └─→ Candidate Fusion (Reciprocal Rank Fusion)
                → Deduplicated, scored candidate list
                  │
                  └─→ Re-ranking (Synesis /v1/rerank)
                        → Content-type-aware instruction injection
                        → Cross-attention precision scoring
                          │
                          └─→ Final ranked results with metadata

Synesis Integration

Synesis is a custom FastAPI service built around Qwen3-VL-2B, providing both embedding and re-ranking over a clean REST API. It runs on pan.helu.ca:8400.

Embedding (Phase 2, already working): Synesis's /v1/embeddings endpoint is OpenAI-compatible — the existing EmbeddingClient handles it with api_type="openai".

Re-ranking (Phase 3, new): Synesis's /v1/rerank endpoint provides:

  • Native instruction parameter — maps directly to reranker_instruction from content types
  • top_n for server-side truncation
  • Multimodal support — both query and documents can include images
  • Relevance scores for each candidate
# Synesis rerank request
POST http://pan.helu.ca:8400/v1/rerank
{
    "query": {"text": "How do I configure a 3-phase motor?"},
    "documents": [
        {"text": "The motor controller requires..."},
        {"text": "3-phase power is distributed..."}
    ],
    "instruction": "Re-rank passages from technical documentation based on procedural relevance.",
    "top_n": 10
}

Deliverables

1. Search Service (library/services/search.py)

The core search orchestrator. Accepts a SearchRequest, dispatches to individual search backends, fuses results, and optionally re-ranks.

SearchRequest

@dataclass
class SearchRequest:
    query: str                           # Natural language query text
    query_image: bytes | None = None     # Optional image for multimodal search
    library_uid: str | None = None       # Scope to specific library
    library_type: str | None = None      # Scope to library type
    collection_uid: str | None = None    # Scope to specific collection
    search_types: list[str]              # ["vector", "fulltext", "graph"]
    limit: int = 20                      # Max results after fusion
    vector_top_k: int = 50              # Candidates from vector search
    fulltext_top_k: int = 30            # Candidates from fulltext search
    graph_max_depth: int = 2             # Graph traversal depth
    rerank: bool = True                  # Apply re-ranking
    include_images: bool = True          # Include image results

SearchResponse

@dataclass
class SearchCandidate:
    chunk_uid: str
    item_uid: str
    item_title: str
    library_type: str
    text_preview: str
    chunk_s3_key: str
    chunk_index: int
    score: float                         # Final score (post-fusion or post-rerank)
    source: str                          # "vector", "fulltext", "graph"
    metadata: dict                       # Page, section, nearby images, etc.

@dataclass
class ImageSearchResult:
    image_uid: str
    item_uid: str
    item_title: str
    image_type: str
    description: str
    s3_key: str
    score: float
    source: str                          # "vector", "graph"

@dataclass
class SearchResponse:
    query: str
    candidates: list[SearchCandidate]    # Ranked text results
    images: list[ImageSearchResult]      # Ranked image results
    total_candidates: int                # Pre-fusion candidate count
    search_time_ms: float
    reranker_used: bool
    reranker_model: str | None
    search_types_used: list[str]

Uses Neo4j's db.index.vector.queryNodes() against chunk_embedding_index.

  • Embed query text using system embedding model (via existing EmbeddingClient)
  • Prepend library's embedding_instruction when scoped to a specific library
  • Query Neo4j vector index for top-K Chunk nodes by cosine similarity
  • Filter by library/collection via graph pattern matching
CALL db.index.vector.queryNodes('chunk_embedding_index', $top_k, $query_vector)
YIELD node AS chunk, score
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
  AND ($library_type IS NULL OR lib.library_type = $library_type)
  AND ($collection_uid IS NULL OR col.uid = $collection_uid)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       chunk.chunk_s3_key AS chunk_s3_key, chunk.chunk_index AS chunk_index,
       item.uid AS item_uid, item.title AS item_title,
       lib.library_type AS library_type, score
ORDER BY score DESC
LIMIT $top_k

Uses Neo4j fulltext indexes created by setup_neo4j_indexes.

  • Query chunk_text_fulltext for Chunk matches (BM25)
  • Query concept_name_fulltext for Concept matches → traverse to connected Chunks
  • Query item_title_fulltext for Item title matches → get their Chunks
  • Normalize BM25 scores to 0-1 range for fusion compatibility
-- Chunk full-text search
CALL db.index.fulltext.queryNodes('chunk_text_fulltext', $query)
YIELD node AS chunk, score
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       item.uid AS item_uid, item.title AS item_title,
       lib.library_type AS library_type, score
ORDER BY score DESC
LIMIT $top_k

-- Concept-to-Chunk traversal
CALL db.index.fulltext.queryNodes('concept_name_fulltext', $query)
YIELD node AS concept, score AS concept_score
MATCH (chunk:Chunk)-[:MENTIONS]->(concept)
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       item.uid AS item_uid, item.title AS item_title,
       concept_score * 0.8 AS score

Knowledge-graph-powered discovery — the differentiator from standard RAG.

  • Match query terms against Concept names via fulltext index
  • Traverse Concept ←[MENTIONS]- Chunk ←[HAS_CHUNK]- Item
  • Expand via Concept -[RELATED_TO]- Concept for secondary connections
  • Score based on relationship weight and traversal depth
-- Concept graph traversal
CALL db.index.fulltext.queryNodes('concept_name_fulltext', $query)
YIELD node AS concept, score
MATCH path = (concept)<-[:MENTIONS|REFERENCES*1..2]-(connected)
WHERE connected:Chunk OR connected:Item
WITH concept, connected, score, length(path) AS depth
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
WHERE chunk = connected OR item = connected
RETURN DISTINCT chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
       item.uid AS item_uid, item.title AS item_title,
       score / (depth * 0.5 + 1) AS score

Multimodal vector search against image_embedding_index.

  • Embed query text (or image) using system embedding model
  • Search ImageEmbedding vectors in unified multimodal space
  • Return with Image descriptions, OCR text, and Item associations from Phase 2B
  • Also include images found via concept graph DEPICTS relationships

6. Candidate Fusion (library/services/fusion.py)

Reciprocal Rank Fusion (RRF) — parameter-light, proven in Spelunker.

def reciprocal_rank_fusion(
    result_lists: list[list[SearchCandidate]],
    k: int = 60,
) -> list[SearchCandidate]:
    """
    RRF score = Σ 1 / (k + rank_i) for each list containing the candidate.
    Candidates in multiple lists get boosted.
    """
  • Deduplicates candidates by chunk_uid
  • Candidates appearing in multiple search types get naturally boosted
  • Sort by fused score descending, trim to limit

7. Re-ranking Client (library/services/reranker.py)

Targets Synesis's POST /v1/rerank endpoint. Wraps the system reranker model's API configuration.

Synesis Backend

class RerankerClient:
    def rerank(
        self,
        query: str,
        candidates: list[SearchCandidate],
        instruction: str = "",
        top_n: int | None = None,
        query_image: bytes | None = None,
    ) -> list[SearchCandidate]:
        """
        Re-rank candidates via Synesis /v1/rerank.
        
        Injects content-type reranker_instruction as the instruction parameter.
        """

Features:

  • Uses text_preview (500 chars) for document text — avoids S3 round-trips
  • Prepends library's reranker_instruction as the instruction parameter
  • Supports multimodal queries (text + image)
  • Falls back gracefully when no reranker model configured
  • Tracks usage via LLMUsage with purpose="reranking"

8. Search API Endpoints

New endpoints in library/api/:

Method Route Purpose
POST /api/v1/library/search/ Full hybrid search + re-rank
POST /api/v1/library/search/vector/ Vector-only search (debugging)
POST /api/v1/library/search/fulltext/ Full-text-only search (debugging)
GET /api/v1/library/concepts/ List/search concepts
GET /api/v1/library/concepts/<uid>/graph/ Concept neighborhood graph

9. Search UI Views

URL View Purpose
/library/search/ search Search page with query input + filters
/library/concepts/ concept_list Browse concepts with search
/library/concepts/<uid>/ concept_detail Single concept with connections

10. Prometheus Metrics

Metric Type Labels Purpose
mnemosyne_search_requests_total Counter search_type, library_type Search throughput
mnemosyne_search_duration_seconds Histogram search_type Per-search-type latency
mnemosyne_search_candidates_total Histogram search_type Candidates per search type
mnemosyne_fusion_duration_seconds Histogram Fusion latency
mnemosyne_rerank_requests_total Counter model_name, status Re-rank throughput
mnemosyne_rerank_duration_seconds Histogram model_name Re-rank latency
mnemosyne_rerank_candidates Histogram Candidates sent to reranker
mnemosyne_search_total_duration_seconds Histogram End-to-end search latency

11. Management Commands

Command Purpose
search <query> [--library-uid] [--limit] [--no-rerank] CLI search for testing
search_stats Search index statistics

12. Settings

# Search configuration
SEARCH_VECTOR_TOP_K = env.int("SEARCH_VECTOR_TOP_K", default=50)
SEARCH_FULLTEXT_TOP_K = env.int("SEARCH_FULLTEXT_TOP_K", default=30)
SEARCH_GRAPH_MAX_DEPTH = env.int("SEARCH_GRAPH_MAX_DEPTH", default=2)
SEARCH_RRF_K = env.int("SEARCH_RRF_K", default=60)
SEARCH_DEFAULT_LIMIT = env.int("SEARCH_DEFAULT_LIMIT", default=20)
RERANKER_MAX_CANDIDATES = env.int("RERANKER_MAX_CANDIDATES", default=32)
RERANKER_TIMEOUT = env.int("RERANKER_TIMEOUT", default=30)

File Structure

mnemosyne/library/
├── services/
│   ├── search.py              # NEW — SearchService orchestrator
│   ├── fusion.py              # NEW — Reciprocal Rank Fusion
│   ├── reranker.py            # NEW — Synesis re-ranking client
│   └── ...                    # Existing services unchanged
├── metrics.py                 # Modified — add search/rerank metrics
├── views.py                   # Modified — add search UI views
├── urls.py                    # Modified — add search routes
├── api/
│   ├── views.py               # Modified — add search API endpoints
│   ├── serializers.py         # Modified — add search serializers
│   └── urls.py                # Modified — add search API routes
├── management/commands/
│   ├── search.py              # NEW — CLI search command
│   └── search_stats.py        # NEW — Index statistics
├── templates/library/
│   ├── search.html            # NEW — Search page
│   ├── concept_list.html      # NEW — Concept browser
│   └── concept_detail.html    # NEW — Concept detail
└── tests/
    ├── test_search.py         # NEW — Search service tests
    ├── test_fusion.py         # NEW — RRF fusion tests
    ├── test_reranker.py       # NEW — Re-ranking client tests
    └── test_search_api.py     # NEW — Search API endpoint tests

Dependencies

No new Python dependencies required. Phase 3 uses:

  • neomodel + raw Cypher (Neo4j search)
  • requests (Synesis reranker HTTP)
  • EmbeddingClient from Phase 2 (query embedding)
  • prometheus_client (metrics)

Testing Strategy

All tests use Django TestCase. External services mocked.

Test File Scope
test_search.py SearchService orchestration, individual search methods, library/collection scoping
test_fusion.py RRF correctness, deduplication, score calculation, edge cases
test_reranker.py Synesis backend (mocked HTTP), instruction injection, graceful fallback
test_search_api.py API endpoints, request validation, response format

Success Criteria

  • Vector search returns Chunk nodes ranked by cosine similarity from Neo4j
  • Full-text search returns matches from Neo4j fulltext indexes
  • Graph search traverses Concept relationships to discover related content
  • Image search returns images via multimodal vector similarity
  • Reciprocal Rank Fusion correctly merges and deduplicates across search types
  • Re-ranking via Synesis /v1/rerank re-scores candidates with cross-attention
  • Content-type reranker_instruction injected per library type
  • Search scoping works (by library, library type, collection)
  • Search gracefully degrades: no reranker → skip; no embedding model → clear error
  • Search API endpoints return structured results with scores and metadata
  • Search UI allows querying with filters and displays ranked results
  • Concept explorer allows browsing the knowledge graph
  • Prometheus metrics track search throughput, latency, and candidate counts
  • CLI search command works for testing
  • All tests pass with mocked external services