# Neo4j Utility Scripts

> Documentation for the database management scripts in `utils/`

---

## Scripts Overview

| Script | Purpose | Destructive? |
|--------|---------|:------------:|
| `neo4j-schema-init.py` | Create constraints, indexes, and sample data | No (idempotent) |
| `neo4j-reset.py` | Wipe all data, constraints, and indexes | **Yes** |
| `neo4j-validate.py` | Comprehensive validation report | No (read-only) |

---

## neo4j-schema-init.py

Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships.

### Usage

```bash
# Interactive — prompts for URI, user, password
python utils/neo4j-schema-init.py

# Specify URI (will prompt for user/password)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687

# Skip sample data creation
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples

# Test-only mode (no schema changes)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only

# Quiet mode
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet
```

### What It Creates

1. **74 uniqueness constraints** — one per node type, on the `id` property
2. **~94 performance indexes** — on name/title, date, type/status/category, and domain fields
3. **12 sample nodes** — spanning all three teams (Personal, Work, Engineering)
4. **5 sample relationships** — demonstrating cross-domain connections

### Idempotent

Safe to run multiple times. Uses `IF NOT EXISTS` for constraints/indexes and `MERGE` for sample data.

---

## neo4j-reset.py

Wipes the database clean. Drops all constraints, indexes, nodes, and relationships.

### Usage

```bash
# Interactive — will prompt for confirmation
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687

# Skip confirmation prompt
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force
```

### What It Does

1. Reports current database contents (node/relationship/constraint/index counts)
2. Drops all constraints
3. Drops all non-lookup indexes
4. Deletes all nodes and relationships (batched for large databases)
5. Verifies the database is clean

### Safety

- Requires typing `yes` to confirm (unless `--force`)
- Shows before/after counts so you know exactly what was removed

---

## neo4j-validate.py

Generates a comprehensive validation report. Share the output to verify the graph is correctly built.

### Usage

```bash
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
```

### What It Checks

| Section | What's Validated |
|---------|-----------------|
| **Connection** | Database reachable, APOC plugin available |
| **Constraints** | All 74 uniqueness constraints present, no extras |
| **Indexes** | Total count, spot-check of 11 key indexes |
| **Node Labels** | No unexpected labels (detects junk from Memory server, etc.) |
| **Sample Nodes** | All 12 sample nodes exist with correct properties |
| **Sample Relationships** | All 5 cross-domain relationships exist |
| **Relationship Summary** | Total count and breakdown by type |
| **Node Summary** | Total count and breakdown by label |

### Expected Clean Output

```
═════════════════════════════════════════════════════════════════
  VALIDATION REPORT — Koios Unified Knowledge Graph
═════════════════════════════════════════════════════════════════
  Schema Version: 2.1.0
  ...
  RESULT: ALL 23 CHECKS PASSED ✓
═════════════════════════════════════════════════════════════════
```

---

## Standard Workflow

### Fresh Setup / Clean Slate

```bash
# 1. Wipe everything
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687

# 2. Build schema and sample data
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687

# 3. Validate
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
```

### Routine Validation

```bash
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
```

### Environment Variables

All three scripts support environment variables to avoid repeated prompts:

```bash
export NEO4J_URI="bolt://ariel.incus:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"

# Then just:
python utils/neo4j-reset.py --force
python utils/neo4j-schema-init.py --skip-docs
python utils/neo4j-validate.py
```

---

## Neo4j Python Driver — Lessons Learned

These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x).

### 1. Use Explicit Transactions for Writes

**Problem:** `session.run()` uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit.

**Bad — silently fails to persist:**
```python
with driver.session() as session:
    session.run("CREATE (n:Person {id: 'test'})")
    # Transaction may not commit!
```

**Good — explicit transaction with context manager:**
```python
with driver.session() as session:
    with session.begin_transaction() as tx:
        tx.run("CREATE (n:Person {id: 'test'})")
        # Auto-commits when context exits normally
        # Auto-rolls back on exception
```

**Also good — managed write transaction:**
```python
def create_person_tx(tx, name):
    result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name)
    record = result.single()
    return record["id"]

with driver.session() as session:
    node_id = session.execute_write(create_person_tx, "Alice")
```

### 2. Cypher MERGE Clause Ordering

**Problem:** `ON CREATE SET` must come immediately after `MERGE`, before any general `SET` clause. Placing `SET` before `ON CREATE SET` causes a syntax error.

**Bad — syntax error:**
```cypher
MERGE (p:Person {id: 'user_main'})
SET p.name = 'Main User',
    p.updated_at = datetime()
ON CREATE SET p.created_at = datetime()  -- ERROR: Invalid input 'ON'
```

**Good — correct clause order:**
```cypher
MERGE (p:Person {id: 'user_main'})
ON CREATE SET p.created_at = datetime()
SET p.name = 'Main User',
    p.updated_at = datetime()
```

The full MERGE clause order is:
```
MERGE (pattern)
ON CREATE SET ...   ← only runs when node is first created
ON MATCH SET ...    ← only runs when node already exists (optional)
SET ...             ← always runs
```

### 3. Consume Results in Transactions

**Problem:** In managed transactions (`execute_write`), results must be consumed within the transaction function. Unconsumed results can cause issues.

**Good pattern:**
```python
def create_node_tx(tx, node_id):
    result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id)
    record = result.single()  # Consumes the result
    return record["id"]
```

### 4. MATCH Returns No Rows ≠ Error

**Problem:** If a `MATCH` clause finds nothing, the query succeeds with zero rows — it does **not** raise an error. This means `MERGE` on a relationship after a failed `MATCH` silently does nothing.

```cypher
-- If person_xyz doesn't exist, this returns 0 rows (no error)
MATCH (p:Person {id: 'person_xyz'})
MATCH (b:Book {id: 'book_abc'})
MERGE (p)-[:COMPLETED]->(b)
-- Zero rows processed, zero relationships created, zero errors
```

**Mitigation:** Always check `result.single()` for `None` to detect this case:
```python
record = result.single()
if record is None:
    logger.error("Endpoints not found — no relationship created")
```

### 5. Separate Node and Relationship Transactions

**Problem:** Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope.

**Good pattern:** Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction:
```python
# Transaction 1: Create nodes
with session.begin_transaction() as tx:
    for query in node_queries:
        tx.run(query)
    # Auto-commits on exit

# Transaction 2: Create relationships (nodes now visible)
with session.begin_transaction() as tx:
    for query in relationship_queries:
        tx.run(query)
    # Auto-commits on exit
```

### 6. MCP Memory Server vs Neo4j Cypher Server

**Problem:** The MCP Memory server (`@modelcontextprotocol/server-memory`) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models.

| | Memory Server | Cypher Server |
|---|---|---|
| **Schema** | Fixed: `name`, `type`, `observations` | Your full custom schema |
| **Node labels** | `Memory`, `reference` | Your 74 defined types |
| **Relationships** | Simple string pairs | Rich typed relationships |
| **Query language** | API calls (`search_nodes`) | Full Cypher |

**Resolution:** If you have a custom Neo4j schema, use **only** the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types.

---

## Dependencies

```
pip install neo4j
```

All three scripts require the `neo4j` Python package. APOC is optional but recommended (the init script's test suite checks for it).

---

## Version History

| Date | Change |
|------|--------|
| 2025-01-07 | Initial `neo4j-schema-init.py` |
| 2026-02-17 | Added `neo4j-reset.py` and `neo4j-validate.py` |
| 2026-02-17 | Fixed init script: explicit transactions, correct MERGE clause ordering |