feat(ingest): source-bucket registry keyed on ingest source

Generalises the Daedalus-only cross-bucket fetch into a registry
(SOURCE_S3_BUCKETS) keyed on the IngestJob `source` field, so new
upstream sources (Spelunker) can ingest from their own buckets. The
ingest task now calls fetch_from_source(job.source, job.s3_key) and
falls back to "daedalus" for blank/unknown sources (backwards compatible).

Adds SPELUNKER_S3_* env vars and worker env scoping. Replaces
daedalus_s3.py with source_s3.py.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-06-11 22:19:25 -04:00
parent 75013ebfc3
commit ec4f12d601
7 changed files with 214 additions and 78 deletions

View File

@@ -78,6 +78,19 @@ DAEDALUS_S3_REGION_NAME=us-east-1
DAEDALUS_S3_USE_SSL=True
DAEDALUS_S3_VERIFY=True
# --- Spelunker S3 (cross-bucket reads for ingest, source="spelunker") ---
# Consumed by: worker only
# Spelunker scrapes web/git documents into its own bucket and posts ingest
# requests with source="spelunker". These creds should be scoped read-only
# to the Spelunker bucket in your secret manager.
SPELUNKER_S3_ENDPOINT_URL=https://nyx.helu.ca:8555
SPELUNKER_S3_ACCESS_KEY_ID=
SPELUNKER_S3_SECRET_ACCESS_KEY=
SPELUNKER_S3_BUCKET_NAME=spelunker
SPELUNKER_S3_REGION_NAME=us-east-1
SPELUNKER_S3_USE_SSL=True
SPELUNKER_S3_VERIFY=True
# --- Celery / RabbitMQ (Oberon) ---------------------------------------------
# Consumed by: app (producer), worker (consumer). NOT mcp.
# Remember to percent-encode any password characters that have meaning in a