fix(search): require library match and preserve raw scores for RRF
Replace OPTIONAL MATCH with MATCH for Library-Collection-Item paths to ensure results are properly scoped to libraries, and remove per-query score normalization since RRF fuses results by rank rather than score magnitude.
This commit is contained in:
@@ -1,306 +0,0 @@
|
||||
## 🐾 Red Panda Approval™
|
||||
|
||||
This project follows Red Panda Approval standards — our gold standard for Django application quality. Code must be elegant, reliable, and maintainable to earn the approval of our adorable red panda judges.
|
||||
|
||||
### The 5 Sacred Django Criteria
|
||||
1. **Fresh Migration Test** — Clean migrations from empty database
|
||||
2. **Elegant Simplicity** — No unnecessary complexity
|
||||
3. **Observable & Debuggable** — Proper logging and error handling
|
||||
4. **Consistent Patterns** — Follow Django conventions
|
||||
5. **Actually Works** — Passes all checks and serves real user needs
|
||||
|
||||
## Environment Standards
|
||||
- Virtual environment: ~/env/PROJECT/bin/activate
|
||||
- Use pyproject.toml for project configuration (no setup.py, no requirements.txt)
|
||||
- Python version: specified in pyproject.toml
|
||||
- Dependencies: floor-pinned with ceiling (e.g. `Django>=5.2,<6.0`)
|
||||
|
||||
### Dependency Pinning
|
||||
|
||||
```toml
|
||||
# Correct — floor pin with ceiling
|
||||
dependencies = [
|
||||
"Django>=5.2,<6.0",
|
||||
"djangorestframework>=3.14,<4.0",
|
||||
"cryptography>=41.0,<45.0",
|
||||
]
|
||||
|
||||
# Wrong — exact pins in library packages
|
||||
dependencies = [
|
||||
"Django==5.2.7", # too strict, breaks downstream
|
||||
]
|
||||
```
|
||||
|
||||
Exact pins (`==`) are only appropriate in application-level lock files, not in reusable library packages.
|
||||
|
||||
## Directory Structure
|
||||
myproject/ # Git repository root
|
||||
├── .gitignore
|
||||
├── README.md
|
||||
├── pyproject.toml # Project configuration (moved to repo root)
|
||||
├── docker-compose.yml
|
||||
├── .env # Docker Compose environment (DATABASE_URL=postgres://...)
|
||||
├── .env.example
|
||||
│
|
||||
├── project/ # Django project root (manage.py lives here)
|
||||
│ ├── manage.py
|
||||
│ ├── Dockerfile
|
||||
│ ├── .env # Local development environment (DATABASE_URL=sqlite:///...)
|
||||
│ ├── .env.example
|
||||
│ │
|
||||
│ ├── config/ # Django configuration module
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── settings.py
|
||||
│ │ ├── urls.py
|
||||
│ │ ├── wsgi.py
|
||||
│ │ └── asgi.py
|
||||
│ │
|
||||
│ ├── accounts/ # Django app
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── models.py
|
||||
│ │ ├── views.py
|
||||
│ │ └── urls.py
|
||||
│ │
|
||||
│ ├── blog/ # Django app
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── models.py
|
||||
│ │ ├── views.py
|
||||
│ │ └── urls.py
|
||||
│ │
|
||||
│ ├── static/
|
||||
│ │ ├── css/
|
||||
│ │ └── js/
|
||||
│ │
|
||||
│ └── templates/
|
||||
│ └── base.html
|
||||
│
|
||||
├── web/ # Nginx configuration
|
||||
│ └── nginx.conf
|
||||
│
|
||||
├── db/ # PostgreSQL configuration
|
||||
│ └── postgresql.conf
|
||||
│
|
||||
└── docs/ # Project documentation
|
||||
└── index.md
|
||||
|
||||
## Settings Structure
|
||||
- Use a single settings.py file
|
||||
- Use django-environ or python-dotenv for environment variables
|
||||
- Never commit .env files to version control
|
||||
- Provide .env.example with all required variables documented
|
||||
- Create .gitignore file
|
||||
- Create a .dockerignore file
|
||||
|
||||
## Code Organization
|
||||
- Imports: PEP 8 ordering (stdlib, third-party, local)
|
||||
- Type hints on function parameters
|
||||
- CSS: External .css files only (no inline styles, no embedded `<style>` tags)
|
||||
- JS: External .js files only (no inline handlers, no embedded `<script>` blocks)
|
||||
- Maximum file length: 1000 lines
|
||||
- If a file exceeds 500 lines, consider splitting by domain concept
|
||||
|
||||
## Database Conventions
|
||||
- Migrations run cleanly from empty database
|
||||
- Never edit deployed migrations
|
||||
- Use meaningful migration names: --name add_email_to_profile
|
||||
- One logical change per migration when possible
|
||||
- Test migrations both forward and backward
|
||||
|
||||
### Development vs Production
|
||||
- Development: SQLite
|
||||
- Production: PostgreSQL
|
||||
|
||||
## Caching
|
||||
- Expensive queries are cached
|
||||
- Cache keys follow naming convention
|
||||
- TTLs are appropriate (not infinite)
|
||||
- Invalidation is documented
|
||||
- Key Naming Pattern: {app}:{model}:{identifier}:{field}
|
||||
|
||||
## Model Naming
|
||||
- Model names: singular PascalCase (User, BlogPost, OrderItem)
|
||||
- Correct English pluralization on related names
|
||||
- All models have created_at and updated_at
|
||||
- All models define __str__ and get_absolute_url
|
||||
- TextChoices used for status fields
|
||||
- related_name defined on ForeignKey fields
|
||||
- Related names: plural snake_case with proper English pluralization
|
||||
|
||||
## Forms
|
||||
- Use ModelForm with explicit fields list (never __all__)
|
||||
|
||||
## Field Naming
|
||||
- Foreign keys: singular without _id suffix (author, category, parent)
|
||||
- Boolean fields: use prefixes (is_active, has_permission, can_edit)
|
||||
- Date fields: use suffixes (created_at, updated_at, published_on)
|
||||
- Avoid abbreviations (use description, not desc)
|
||||
|
||||
## Required Model Fields
|
||||
- All models should include:
|
||||
- created_at = models.DateTimeField(auto_now_add=True)
|
||||
- updated_at = models.DateTimeField(auto_now=True)
|
||||
- Consider adding:
|
||||
- id = models.UUIDField(primary_key=True) for public-facing models
|
||||
- is_active = models.BooleanField(default=True) for soft deletes
|
||||
|
||||
## Indexing
|
||||
- Add db_index=True to frequently queried fields
|
||||
- Use Meta.indexes for composite indexes
|
||||
- Document why each index exists
|
||||
|
||||
## Queries
|
||||
- Use select_related() for foreign keys
|
||||
- Use prefetch_related() for reverse relations and M2M
|
||||
- Avoid queries in loops (N+1 problem)
|
||||
- Use .only() and .defer() for large models
|
||||
- Add comments explaining complex querysets
|
||||
|
||||
## Docstrings
|
||||
- Use Sphinx style docstrings
|
||||
- Document all public functions, classes, and modules
|
||||
- Skip docstrings for obvious one-liners and standard Django overrides
|
||||
|
||||
## Views
|
||||
- Use Function-Based Views (FBVs) exclusively
|
||||
- Explicit logic is preferred over implicit inheritance
|
||||
- Extract shared logic into utility functions
|
||||
|
||||
## URLs & Identifiers
|
||||
|
||||
- Public URLs use short UUIDs (12 characters) via `shortuuid`
|
||||
- Never expose sequential IDs in URLs (security/enumeration risk)
|
||||
- Internal references may use standard UUIDs or PKs
|
||||
|
||||
## URL Patterns
|
||||
- Resource-based URLs (RESTful style)
|
||||
- Namespaced URL names per app
|
||||
- Trailing slashes (Django default)
|
||||
- Flat structure preferred over deep nesting
|
||||
|
||||
## Background Tasks
|
||||
- All tasks are run synchronously unless the design specifies background tasks are needed for long operations
|
||||
- Long operations use Celery tasks
|
||||
- Use Memcached, task progress pattern: {app}:task:{task_id}:progress
|
||||
- Tasks are idempotent
|
||||
- Tasks include retry logic
|
||||
- Tasks live in app/tasks.py
|
||||
- RabbitMQ is the Message Broker
|
||||
- Flower Monitoring: Use for debugging failed tasks
|
||||
|
||||
## Testing
|
||||
- Framework: Django TestCase (not pytest)
|
||||
- Separate test files per module: test_models.py, test_views.py, test_forms.py
|
||||
|
||||
## Frontend Standards
|
||||
|
||||
### New Projects (DaisyUI + Tailwind)
|
||||
- DaisyUI 4 via CDN for component classes
|
||||
- Tailwind CSS via CDN for utility classes
|
||||
- Theme management via Themis (DaisyUI `data-theme` attribute)
|
||||
- All apps extend `themis/base.html` for consistent navigation
|
||||
- No inline styles or scripts
|
||||
|
||||
### Existing Projects (Bootstrap 5)
|
||||
- Bootstrap 5 via CDN
|
||||
- Bootstrap Icons via CDN
|
||||
- Bootswatch for theme variants (if applicable)
|
||||
- django-bootstrap5 and crispy-bootstrap5 for form rendering
|
||||
|
||||
## Preferred Packages
|
||||
|
||||
### Core Django
|
||||
- django>=5.2,<6.0
|
||||
- django-environ — Environment variables
|
||||
|
||||
### Authentication & Security
|
||||
- django-allauth — User management
|
||||
- django-allauth-2fa — Two-factor authentication
|
||||
|
||||
### API Development
|
||||
- djangorestframework>=3.14,<4.0 — REST APIs
|
||||
- drf-spectacular — OpenAPI/Swagger documentation
|
||||
|
||||
### Encryption
|
||||
- cryptography — Fernet encryption for secrets/API keys
|
||||
|
||||
### Background Tasks
|
||||
- celery — Async task queue
|
||||
- django-celery-progress — Progress bars
|
||||
- flower — Celery monitoring
|
||||
|
||||
### Caching
|
||||
- pymemcache — Memcached backend
|
||||
|
||||
### Database
|
||||
- dj-database-url — Database URL configuration
|
||||
- psycopg[binary] — PostgreSQL adapter
|
||||
- shortuuid — Short UUIDs for public URLs
|
||||
|
||||
### Production
|
||||
- gunicorn — WSGI server
|
||||
|
||||
### Shared Apps
|
||||
- django-heluca-themis — User preferences, themes, key management, navigation
|
||||
|
||||
### Deprecated / Removed
|
||||
- ~~pytz~~ — Use stdlib `zoneinfo` (Python 3.9+, Django 4+)
|
||||
- ~~Pillow~~ — Only add if your app needs ImageField
|
||||
- ~~django-heluca-core~~ — Replaced by Themis
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
### Models
|
||||
- Don't use `Model.objects.get()` without handling `DoesNotExist`
|
||||
- Don't use `null=True` on `CharField` or `TextField` (use `blank=True, default=""`)
|
||||
- Don't use `related_name='+'` unless you have a specific reason
|
||||
- Don't override `save()` for business logic (use signals or service functions)
|
||||
- Don't use `auto_now=True` on fields you might need to manually set
|
||||
- Don't use `ForeignKey` without specifying `on_delete` explicitly
|
||||
- Don't use `Meta.ordering` on large tables (specify ordering in queries)
|
||||
|
||||
### Queries
|
||||
- Don't query inside loops (N+1 problem)
|
||||
- Don't use `.all()` when you need a subset
|
||||
- Don't use raw SQL unless absolutely necessary
|
||||
- Don't forget `select_related()` and `prefetch_related()`
|
||||
|
||||
### Views
|
||||
- Don't put business logic in views
|
||||
- Don't use `request.POST.get()` without validation (use forms)
|
||||
- Don't return sensitive data in error messages
|
||||
- Don't forget `login_required` decorator on protected views
|
||||
|
||||
### Forms
|
||||
- Don't use `fields = '__all__'` in ModelForm
|
||||
- Don't trust client-side validation alone
|
||||
- Don't use `exclude` in ModelForm (use explicit `fields`)
|
||||
|
||||
### Templates
|
||||
- Don't use `{{ variable }}` for URLs (use `{% url %}` tag)
|
||||
- Don't put logic in templates
|
||||
- Don't use inline CSS or JavaScript (external files only)
|
||||
- Don't forget `{% csrf_token %}` in forms
|
||||
|
||||
### Security
|
||||
- Don't store secrets in `settings.py` (use environment variables)
|
||||
- Don't commit `.env` files to version control
|
||||
- Don't use `DEBUG=True` in production
|
||||
- Don't expose sequential IDs in public URLs
|
||||
- Don't use `mark_safe()` on user-supplied content
|
||||
- Don't disable CSRF protection
|
||||
|
||||
### Imports & Code Style
|
||||
- Don't use `from module import *`
|
||||
- Don't use mutable default arguments
|
||||
- Don't use bare `except:` clauses
|
||||
- Don't ignore linter warnings without documented reason
|
||||
|
||||
### Migrations
|
||||
- Don't edit migrations that have been deployed
|
||||
- Don't use `RunPython` without a reverse function
|
||||
- Don't add non-nullable fields without a default value
|
||||
|
||||
### Celery Tasks
|
||||
- Don't pass model instances to tasks (pass IDs and re-fetch)
|
||||
- Don't assume tasks run immediately
|
||||
- Don't forget retry logic for external service calls
|
||||
@@ -247,7 +247,7 @@ class SearchService:
|
||||
CALL db.index.vector.queryNodes('chunk_embedding_index', $top_k, $query_vector)
|
||||
YIELD node AS chunk, score
|
||||
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
|
||||
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
|
||||
MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
|
||||
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
|
||||
AND ($library_type IS NULL OR lib.library_type = $library_type)
|
||||
AND ($collection_uid IS NULL OR col.uid = $collection_uid)
|
||||
@@ -352,7 +352,7 @@ class SearchService:
|
||||
CALL db.index.fulltext.queryNodes('chunk_text_fulltext', $query)
|
||||
YIELD node AS chunk, score
|
||||
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
|
||||
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
|
||||
MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
|
||||
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
|
||||
AND ($library_type IS NULL OR lib.library_type = $library_type)
|
||||
AND ($collection_uid IS NULL OR col.uid = $collection_uid)
|
||||
@@ -374,15 +374,13 @@ class SearchService:
|
||||
|
||||
try:
|
||||
results, _ = db.cypher_query(cypher, params)
|
||||
# Normalize BM25 scores to 0-1 range
|
||||
max_score = max((float(r[7]) for r in results if r[7]), default=1.0)
|
||||
# Keep raw BM25 scores — RRF fuses by rank, not by score magnitude.
|
||||
for row in results:
|
||||
uid = row[0]
|
||||
if not uid:
|
||||
continue
|
||||
raw_score = float(row[7]) if row[7] else 0.0
|
||||
normalized = raw_score / max_score if max_score > 0 else 0.0
|
||||
if uid not in candidates or normalized > candidates[uid].score:
|
||||
if uid not in candidates or raw_score > candidates[uid].score:
|
||||
candidates[uid] = SearchCandidate(
|
||||
chunk_uid=uid,
|
||||
text_preview=row[1] or "",
|
||||
@@ -391,7 +389,7 @@ class SearchService:
|
||||
item_uid=row[4] or "",
|
||||
item_title=row[5] or "",
|
||||
library_type=row[6] or "",
|
||||
score=normalized,
|
||||
score=raw_score,
|
||||
source="fulltext",
|
||||
)
|
||||
except Exception as exc:
|
||||
@@ -409,7 +407,7 @@ class SearchService:
|
||||
YIELD node AS concept, score AS concept_score
|
||||
MATCH (chunk:Chunk)-[:MENTIONS]->(concept)
|
||||
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
|
||||
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
|
||||
MATCH (lib:Library)-[:CONTAINS]->(:Collection)-[:CONTAINS]->(item)
|
||||
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
|
||||
AND ($library_type IS NULL OR lib.library_type = $library_type)
|
||||
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
|
||||
@@ -430,14 +428,13 @@ class SearchService:
|
||||
|
||||
try:
|
||||
results, _ = db.cypher_query(cypher, params)
|
||||
max_score = max((float(r[7]) for r in results if r[7]), default=1.0)
|
||||
# Raw scores already include the 0.8 concept downweight from Cypher.
|
||||
for row in results:
|
||||
uid = row[0]
|
||||
if not uid:
|
||||
continue
|
||||
raw_score = float(row[7]) if row[7] else 0.0
|
||||
normalized = raw_score / max_score if max_score > 0 else 0.0
|
||||
if uid not in candidates or normalized > candidates[uid].score:
|
||||
if uid not in candidates or raw_score > candidates[uid].score:
|
||||
candidates[uid] = SearchCandidate(
|
||||
chunk_uid=uid,
|
||||
text_preview=row[1] or "",
|
||||
@@ -446,7 +443,7 @@ class SearchService:
|
||||
item_uid=row[4] or "",
|
||||
item_title=row[5] or "",
|
||||
library_type=row[6] or "",
|
||||
score=normalized,
|
||||
score=raw_score,
|
||||
source="fulltext",
|
||||
)
|
||||
except Exception as exc:
|
||||
@@ -476,17 +473,17 @@ class SearchService:
|
||||
LIMIT 10
|
||||
MATCH (chunk:Chunk)-[:MENTIONS]->(concept)
|
||||
MATCH (item:Item)-[:HAS_CHUNK]->(chunk)
|
||||
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
|
||||
MATCH (lib:Library)-[:CONTAINS]->(:Collection)-[:CONTAINS]->(item)
|
||||
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
|
||||
AND ($library_type IS NULL OR lib.library_type = $library_type)
|
||||
WITH chunk, item, lib, concept, concept_score,
|
||||
count(DISTINCT concept) AS concept_count
|
||||
RETURN DISTINCT chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
|
||||
WITH chunk, item, lib,
|
||||
max(concept_score) AS score,
|
||||
collect(DISTINCT concept.name)[..5] AS concept_names
|
||||
RETURN chunk.uid AS chunk_uid, chunk.text_preview AS text_preview,
|
||||
chunk.chunk_s3_key AS chunk_s3_key, chunk.chunk_index AS chunk_index,
|
||||
item.uid AS item_uid, item.title AS item_title,
|
||||
lib.library_type AS library_type,
|
||||
concept_score AS score,
|
||||
collect(concept.name)[..5] AS concept_names
|
||||
score, concept_names
|
||||
ORDER BY score DESC
|
||||
LIMIT $limit
|
||||
"""
|
||||
@@ -504,16 +501,12 @@ class SearchService:
|
||||
logger.error("Graph search failed: %s", exc)
|
||||
return []
|
||||
|
||||
# Normalize scores
|
||||
max_score = max((float(r[7]) for r in results if r[7]), default=1.0)
|
||||
|
||||
candidates = []
|
||||
for row in results:
|
||||
uid = row[0]
|
||||
if not uid:
|
||||
continue
|
||||
raw_score = float(row[7]) if row[7] else 0.0
|
||||
normalized = raw_score / max_score if max_score > 0 else 0.0
|
||||
concept_names = row[8] if len(row) > 8 else []
|
||||
|
||||
candidates.append(
|
||||
@@ -525,7 +518,7 @@ class SearchService:
|
||||
item_uid=row[4] or "",
|
||||
item_title=row[5] or "",
|
||||
library_type=row[6] or "",
|
||||
score=normalized,
|
||||
score=raw_score,
|
||||
source="graph",
|
||||
metadata={"concepts": concept_names},
|
||||
)
|
||||
@@ -562,7 +555,7 @@ class SearchService:
|
||||
YIELD node AS emb_node, score
|
||||
MATCH (img:Image)-[:HAS_EMBEDDING]->(emb_node)
|
||||
MATCH (item:Item)-[:HAS_IMAGE]->(img)
|
||||
OPTIONAL MATCH (lib:Library)-[:CONTAINS]->(col:Collection)-[:CONTAINS]->(item)
|
||||
MATCH (lib:Library)-[:CONTAINS]->(:Collection)-[:CONTAINS]->(item)
|
||||
WHERE ($library_uid IS NULL OR lib.uid = $library_uid)
|
||||
AND ($library_type IS NULL OR lib.library_type = $library_type)
|
||||
RETURN img.uid AS image_uid, img.image_type AS image_type,
|
||||
@@ -642,11 +635,13 @@ class SearchService:
|
||||
|
||||
try:
|
||||
client = RerankerClient(reranker_model, user=self.user)
|
||||
# Don't pass top_n — let the reranker score every candidate so
|
||||
# cross-attention can promote items the RRF stage ranked low.
|
||||
# Final trimming to request.limit happens in search().
|
||||
reranked = client.rerank(
|
||||
query=request.query,
|
||||
candidates=candidates_to_rerank,
|
||||
instruction=instruction,
|
||||
top_n=request.limit,
|
||||
query_image=request.query_image,
|
||||
)
|
||||
return reranked, reranker_model.name
|
||||
@@ -660,22 +655,27 @@ class SearchService:
|
||||
# Helpers
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
GENERIC_RERANKER_INSTRUCTION = (
|
||||
"Re-rank these passages by relevance to the query."
|
||||
)
|
||||
|
||||
def _get_reranker_instruction(
|
||||
self, request: SearchRequest, candidates: list[SearchCandidate]
|
||||
) -> str:
|
||||
"""
|
||||
Get the content-type-aware reranker instruction.
|
||||
|
||||
If scoped to a library or library type, use that type's instruction.
|
||||
If mixed types, use a generic instruction.
|
||||
Scoped queries (by library or library type) use that type's
|
||||
instruction. Unscoped queries — even when results happen to
|
||||
come mostly from one type — use a generic instruction so the
|
||||
reranker is not biased toward the majority type.
|
||||
|
||||
:param request: SearchRequest.
|
||||
:param candidates: Candidates (used to detect dominant library type).
|
||||
:param candidates: Candidates (unused; kept for API stability).
|
||||
:returns: Reranker instruction string.
|
||||
"""
|
||||
from library.content_types import get_library_type_config
|
||||
|
||||
# Use explicit library type from request
|
||||
if request.library_type:
|
||||
try:
|
||||
config = get_library_type_config(request.library_type)
|
||||
@@ -683,25 +683,12 @@ class SearchService:
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# Use library UID to look up type
|
||||
if request.library_uid:
|
||||
return self._get_library_reranker_instruction(request.library_uid)
|
||||
instruction = self._get_library_reranker_instruction(request.library_uid)
|
||||
if instruction:
|
||||
return instruction
|
||||
|
||||
# Detect dominant type from candidates
|
||||
type_counts: dict[str, int] = {}
|
||||
for c in candidates:
|
||||
if c.library_type:
|
||||
type_counts[c.library_type] = type_counts.get(c.library_type, 0) + 1
|
||||
|
||||
if type_counts:
|
||||
dominant_type = max(type_counts, key=type_counts.get)
|
||||
try:
|
||||
config = get_library_type_config(dominant_type)
|
||||
return config.get("reranker_instruction", "")
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
return ""
|
||||
return self.GENERIC_RERANKER_INSTRUCTION
|
||||
|
||||
def _get_library_reranker_instruction(self, library_uid: str) -> str:
|
||||
"""Get reranker_instruction from a Library node."""
|
||||
@@ -710,7 +697,12 @@ class SearchService:
|
||||
|
||||
lib = Library.nodes.get(uid=library_uid)
|
||||
return lib.reranker_instruction or ""
|
||||
except Exception:
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"Failed to load reranker_instruction for library_uid=%s: %s",
|
||||
library_uid,
|
||||
exc,
|
||||
)
|
||||
return ""
|
||||
|
||||
def _get_embedding_instruction(self, library_uid: str) -> str:
|
||||
@@ -720,7 +712,12 @@ class SearchService:
|
||||
|
||||
lib = Library.nodes.get(uid=library_uid)
|
||||
return lib.embedding_instruction or ""
|
||||
except Exception:
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"Failed to load embedding_instruction for library_uid=%s: %s",
|
||||
library_uid,
|
||||
exc,
|
||||
)
|
||||
return ""
|
||||
|
||||
def _get_type_embedding_instruction(self, library_type: str) -> str:
|
||||
|
||||
@@ -225,8 +225,12 @@ class SearchServiceHelperTest(TestCase):
|
||||
instruction = service._get_reranker_instruction(request, [])
|
||||
self.assertIn("fiction", instruction.lower())
|
||||
|
||||
def test_get_reranker_instruction_from_candidates(self):
|
||||
"""Detects dominant library type from candidate list."""
|
||||
def test_get_reranker_instruction_generic_for_unscoped(self):
|
||||
"""
|
||||
Unscoped queries get the generic instruction even when candidates
|
||||
all share a library_type — type-specific instructions could bias
|
||||
the reranker against minority-type results.
|
||||
"""
|
||||
service = SearchService()
|
||||
request = SearchRequest(query="test")
|
||||
candidates = [
|
||||
@@ -240,10 +244,10 @@ class SearchServiceHelperTest(TestCase):
|
||||
]
|
||||
|
||||
instruction = service._get_reranker_instruction(request, candidates)
|
||||
self.assertIn("technical", instruction.lower())
|
||||
self.assertEqual(instruction, SearchService.GENERIC_RERANKER_INSTRUCTION)
|
||||
|
||||
def test_get_reranker_instruction_empty_when_no_context(self):
|
||||
"""Returns empty when no library type context available."""
|
||||
def test_get_reranker_instruction_generic_when_no_context(self):
|
||||
"""Returns the generic instruction when no library scope is set."""
|
||||
service = SearchService()
|
||||
request = SearchRequest(query="test")
|
||||
candidates = [
|
||||
@@ -256,4 +260,4 @@ class SearchServiceHelperTest(TestCase):
|
||||
]
|
||||
|
||||
instruction = service._get_reranker_instruction(request, candidates)
|
||||
self.assertEqual(instruction, "")
|
||||
self.assertEqual(instruction, SearchService.GENERIC_RERANKER_INSTRUCTION)
|
||||
|
||||
Reference in New Issue
Block a user