feat(ingest): source-bucket registry keyed on ingest source
Generalises the Daedalus-only cross-bucket fetch into a registry (SOURCE_S3_BUCKETS) keyed on the IngestJob `source` field, so new upstream sources (Spelunker) can ingest from their own buckets. The ingest task now calls fetch_from_source(job.source, job.s3_key) and falls back to "daedalus" for blank/unknown sources (backwards compatible). Adds SPELUNKER_S3_* env vars and worker env scoping. Replaces daedalus_s3.py with source_s3.py. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
13
.env.example
13
.env.example
@@ -78,6 +78,19 @@ DAEDALUS_S3_REGION_NAME=us-east-1
|
||||
DAEDALUS_S3_USE_SSL=True
|
||||
DAEDALUS_S3_VERIFY=True
|
||||
|
||||
# --- Spelunker S3 (cross-bucket reads for ingest, source="spelunker") ---
|
||||
# Consumed by: worker only
|
||||
# Spelunker scrapes web/git documents into its own bucket and posts ingest
|
||||
# requests with source="spelunker". These creds should be scoped read-only
|
||||
# to the Spelunker bucket in your secret manager.
|
||||
SPELUNKER_S3_ENDPOINT_URL=https://nyx.helu.ca:8555
|
||||
SPELUNKER_S3_ACCESS_KEY_ID=
|
||||
SPELUNKER_S3_SECRET_ACCESS_KEY=
|
||||
SPELUNKER_S3_BUCKET_NAME=spelunker
|
||||
SPELUNKER_S3_REGION_NAME=us-east-1
|
||||
SPELUNKER_S3_USE_SSL=True
|
||||
SPELUNKER_S3_VERIFY=True
|
||||
|
||||
# --- Celery / RabbitMQ (Oberon) ---------------------------------------------
|
||||
# Consumed by: app (producer), worker (consumer). NOT mcp.
|
||||
# Remember to percent-encode any password characters that have meaning in a
|
||||
|
||||
Reference in New Issue
Block a user