# Neo4j Utility Scripts > Documentation for the database management scripts in `utils/` --- ## Scripts Overview | Script | Purpose | Destructive? | |--------|---------|:------------:| | `neo4j-schema-init.py` | Create constraints, indexes, and sample data | No (idempotent) | | `neo4j-reset.py` | Wipe all data, constraints, and indexes | **Yes** | | `neo4j-validate.py` | Comprehensive validation report | No (read-only) | --- ## neo4j-schema-init.py Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships. ### Usage ```bash # Interactive — prompts for URI, user, password python utils/neo4j-schema-init.py # Specify URI (will prompt for user/password) python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 # Skip sample data creation python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples # Test-only mode (no schema changes) python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only # Quiet mode python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet ``` ### What It Creates 1. **74 uniqueness constraints** — one per node type, on the `id` property 2. **~94 performance indexes** — on name/title, date, type/status/category, and domain fields 3. **12 sample nodes** — spanning all three teams (Personal, Work, Engineering) 4. **5 sample relationships** — demonstrating cross-domain connections ### Idempotent Safe to run multiple times. Uses `IF NOT EXISTS` for constraints/indexes and `MERGE` for sample data. --- ## neo4j-reset.py Wipes the database clean. Drops all constraints, indexes, nodes, and relationships. ### Usage ```bash # Interactive — will prompt for confirmation python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 # Skip confirmation prompt python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force ``` ### What It Does 1. Reports current database contents (node/relationship/constraint/index counts) 2. Drops all constraints 3. Drops all non-lookup indexes 4. Deletes all nodes and relationships (batched for large databases) 5. Verifies the database is clean ### Safety - Requires typing `yes` to confirm (unless `--force`) - Shows before/after counts so you know exactly what was removed --- ## neo4j-validate.py Generates a comprehensive validation report. Share the output to verify the graph is correctly built. ### Usage ```bash python utils/neo4j-validate.py --uri bolt://ariel.incus:7687 ``` ### What It Checks | Section | What's Validated | |---------|-----------------| | **Connection** | Database reachable, APOC plugin available | | **Constraints** | All 74 uniqueness constraints present, no extras | | **Indexes** | Total count, spot-check of 11 key indexes | | **Node Labels** | No unexpected labels (detects junk from Memory server, etc.) | | **Sample Nodes** | All 12 sample nodes exist with correct properties | | **Sample Relationships** | All 5 cross-domain relationships exist | | **Relationship Summary** | Total count and breakdown by type | | **Node Summary** | Total count and breakdown by label | ### Expected Clean Output ``` ═════════════════════════════════════════════════════════════════ VALIDATION REPORT — Koios Unified Knowledge Graph ═════════════════════════════════════════════════════════════════ Schema Version: 2.1.0 ... RESULT: ALL 23 CHECKS PASSED ✓ ═════════════════════════════════════════════════════════════════ ``` --- ## Standard Workflow ### Fresh Setup / Clean Slate ```bash # 1. Wipe everything python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 # 2. Build schema and sample data python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 # 3. Validate python utils/neo4j-validate.py --uri bolt://ariel.incus:7687 ``` ### Routine Validation ```bash python utils/neo4j-validate.py --uri bolt://ariel.incus:7687 ``` ### Environment Variables All three scripts support environment variables to avoid repeated prompts: ```bash export NEO4J_URI="bolt://ariel.incus:7687" export NEO4J_USER="neo4j" export NEO4J_PASSWORD="your-password" # Then just: python utils/neo4j-reset.py --force python utils/neo4j-schema-init.py --skip-docs python utils/neo4j-validate.py ``` --- ## Neo4j Python Driver — Lessons Learned These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x). ### 1. Use Explicit Transactions for Writes **Problem:** `session.run()` uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit. **Bad — silently fails to persist:** ```python with driver.session() as session: session.run("CREATE (n:Person {id: 'test'})") # Transaction may not commit! ``` **Good — explicit transaction with context manager:** ```python with driver.session() as session: with session.begin_transaction() as tx: tx.run("CREATE (n:Person {id: 'test'})") # Auto-commits when context exits normally # Auto-rolls back on exception ``` **Also good — managed write transaction:** ```python def create_person_tx(tx, name): result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name) record = result.single() return record["id"] with driver.session() as session: node_id = session.execute_write(create_person_tx, "Alice") ``` ### 2. Cypher MERGE Clause Ordering **Problem:** `ON CREATE SET` must come immediately after `MERGE`, before any general `SET` clause. Placing `SET` before `ON CREATE SET` causes a syntax error. **Bad — syntax error:** ```cypher MERGE (p:Person {id: 'user_main'}) SET p.name = 'Main User', p.updated_at = datetime() ON CREATE SET p.created_at = datetime() -- ERROR: Invalid input 'ON' ``` **Good — correct clause order:** ```cypher MERGE (p:Person {id: 'user_main'}) ON CREATE SET p.created_at = datetime() SET p.name = 'Main User', p.updated_at = datetime() ``` The full MERGE clause order is: ``` MERGE (pattern) ON CREATE SET ... ← only runs when node is first created ON MATCH SET ... ← only runs when node already exists (optional) SET ... ← always runs ``` ### 3. Consume Results in Transactions **Problem:** In managed transactions (`execute_write`), results must be consumed within the transaction function. Unconsumed results can cause issues. **Good pattern:** ```python def create_node_tx(tx, node_id): result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id) record = result.single() # Consumes the result return record["id"] ``` ### 4. MATCH Returns No Rows ≠ Error **Problem:** If a `MATCH` clause finds nothing, the query succeeds with zero rows — it does **not** raise an error. This means `MERGE` on a relationship after a failed `MATCH` silently does nothing. ```cypher -- If person_xyz doesn't exist, this returns 0 rows (no error) MATCH (p:Person {id: 'person_xyz'}) MATCH (b:Book {id: 'book_abc'}) MERGE (p)-[:COMPLETED]->(b) -- Zero rows processed, zero relationships created, zero errors ``` **Mitigation:** Always check `result.single()` for `None` to detect this case: ```python record = result.single() if record is None: logger.error("Endpoints not found — no relationship created") ``` ### 5. Separate Node and Relationship Transactions **Problem:** Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope. **Good pattern:** Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction: ```python # Transaction 1: Create nodes with session.begin_transaction() as tx: for query in node_queries: tx.run(query) # Auto-commits on exit # Transaction 2: Create relationships (nodes now visible) with session.begin_transaction() as tx: for query in relationship_queries: tx.run(query) # Auto-commits on exit ``` ### 6. MCP Memory Server vs Neo4j Cypher Server **Problem:** The MCP Memory server (`@modelcontextprotocol/server-memory`) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models. | | Memory Server | Cypher Server | |---|---|---| | **Schema** | Fixed: `name`, `type`, `observations` | Your full custom schema | | **Node labels** | `Memory`, `reference` | Your 74 defined types | | **Relationships** | Simple string pairs | Rich typed relationships | | **Query language** | API calls (`search_nodes`) | Full Cypher | **Resolution:** If you have a custom Neo4j schema, use **only** the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types. --- ## Dependencies ``` pip install neo4j ``` All three scripts require the `neo4j` Python package. APOC is optional but recommended (the init script's test suite checks for it). --- ## Version History | Date | Change | |------|--------| | 2025-01-07 | Initial `neo4j-schema-init.py` | | 2026-02-17 | Added `neo4j-reset.py` and `neo4j-validate.py` | | 2026-02-17 | Fixed init script: explicit transactions, correct MERGE clause ordering |