- Introduced `neo4j-schema-init.py` for creating the foundational schema for the personal knowledge graph used by multiple AI assistants. - Implemented functionality for creating constraints, indexes, and sample nodes, along with comprehensive testing of the schema. - Added `neo4j-validate.py` to perform validation checks on the Neo4j knowledge graph, including constraints, indexes, sample nodes, relationships, and junk data detection. - Enhanced logging for better traceability and debugging during schema initialization and validation processes.
9.5 KiB
Neo4j Utility Scripts
Documentation for the database management scripts in
utils/
Scripts Overview
| Script | Purpose | Destructive? |
|---|---|---|
neo4j-schema-init.py |
Create constraints, indexes, and sample data | No (idempotent) |
neo4j-reset.py |
Wipe all data, constraints, and indexes | Yes |
neo4j-validate.py |
Comprehensive validation report | No (read-only) |
neo4j-schema-init.py
Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships.
Usage
# Interactive — prompts for URI, user, password
python utils/neo4j-schema-init.py
# Specify URI (will prompt for user/password)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
# Skip sample data creation
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples
# Test-only mode (no schema changes)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only
# Quiet mode
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet
What It Creates
- 74 uniqueness constraints — one per node type, on the
idproperty - ~94 performance indexes — on name/title, date, type/status/category, and domain fields
- 12 sample nodes — spanning all three teams (Personal, Work, Engineering)
- 5 sample relationships — demonstrating cross-domain connections
Idempotent
Safe to run multiple times. Uses IF NOT EXISTS for constraints/indexes and MERGE for sample data.
neo4j-reset.py
Wipes the database clean. Drops all constraints, indexes, nodes, and relationships.
Usage
# Interactive — will prompt for confirmation
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
# Skip confirmation prompt
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force
What It Does
- Reports current database contents (node/relationship/constraint/index counts)
- Drops all constraints
- Drops all non-lookup indexes
- Deletes all nodes and relationships (batched for large databases)
- Verifies the database is clean
Safety
- Requires typing
yesto confirm (unless--force) - Shows before/after counts so you know exactly what was removed
neo4j-validate.py
Generates a comprehensive validation report. Share the output to verify the graph is correctly built.
Usage
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
What It Checks
| Section | What's Validated |
|---|---|
| Connection | Database reachable, APOC plugin available |
| Constraints | All 74 uniqueness constraints present, no extras |
| Indexes | Total count, spot-check of 11 key indexes |
| Node Labels | No unexpected labels (detects junk from Memory server, etc.) |
| Sample Nodes | All 12 sample nodes exist with correct properties |
| Sample Relationships | All 5 cross-domain relationships exist |
| Relationship Summary | Total count and breakdown by type |
| Node Summary | Total count and breakdown by label |
Expected Clean Output
═════════════════════════════════════════════════════════════════
VALIDATION REPORT — Koios Unified Knowledge Graph
═════════════════════════════════════════════════════════════════
Schema Version: 2.1.0
...
RESULT: ALL 23 CHECKS PASSED ✓
═════════════════════════════════════════════════════════════════
Standard Workflow
Fresh Setup / Clean Slate
# 1. Wipe everything
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
# 2. Build schema and sample data
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
# 3. Validate
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
Routine Validation
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
Environment Variables
All three scripts support environment variables to avoid repeated prompts:
export NEO4J_URI="bolt://ariel.incus:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"
# Then just:
python utils/neo4j-reset.py --force
python utils/neo4j-schema-init.py --skip-docs
python utils/neo4j-validate.py
Neo4j Python Driver — Lessons Learned
These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x).
1. Use Explicit Transactions for Writes
Problem: session.run() uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit.
Bad — silently fails to persist:
with driver.session() as session:
session.run("CREATE (n:Person {id: 'test'})")
# Transaction may not commit!
Good — explicit transaction with context manager:
with driver.session() as session:
with session.begin_transaction() as tx:
tx.run("CREATE (n:Person {id: 'test'})")
# Auto-commits when context exits normally
# Auto-rolls back on exception
Also good — managed write transaction:
def create_person_tx(tx, name):
result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name)
record = result.single()
return record["id"]
with driver.session() as session:
node_id = session.execute_write(create_person_tx, "Alice")
2. Cypher MERGE Clause Ordering
Problem: ON CREATE SET must come immediately after MERGE, before any general SET clause. Placing SET before ON CREATE SET causes a syntax error.
Bad — syntax error:
MERGE (p:Person {id: 'user_main'})
SET p.name = 'Main User',
p.updated_at = datetime()
ON CREATE SET p.created_at = datetime() -- ERROR: Invalid input 'ON'
Good — correct clause order:
MERGE (p:Person {id: 'user_main'})
ON CREATE SET p.created_at = datetime()
SET p.name = 'Main User',
p.updated_at = datetime()
The full MERGE clause order is:
MERGE (pattern)
ON CREATE SET ... ← only runs when node is first created
ON MATCH SET ... ← only runs when node already exists (optional)
SET ... ← always runs
3. Consume Results in Transactions
Problem: In managed transactions (execute_write), results must be consumed within the transaction function. Unconsumed results can cause issues.
Good pattern:
def create_node_tx(tx, node_id):
result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id)
record = result.single() # Consumes the result
return record["id"]
4. MATCH Returns No Rows ≠ Error
Problem: If a MATCH clause finds nothing, the query succeeds with zero rows — it does not raise an error. This means MERGE on a relationship after a failed MATCH silently does nothing.
-- If person_xyz doesn't exist, this returns 0 rows (no error)
MATCH (p:Person {id: 'person_xyz'})
MATCH (b:Book {id: 'book_abc'})
MERGE (p)-[:COMPLETED]->(b)
-- Zero rows processed, zero relationships created, zero errors
Mitigation: Always check result.single() for None to detect this case:
record = result.single()
if record is None:
logger.error("Endpoints not found — no relationship created")
5. Separate Node and Relationship Transactions
Problem: Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope.
Good pattern: Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction:
# Transaction 1: Create nodes
with session.begin_transaction() as tx:
for query in node_queries:
tx.run(query)
# Auto-commits on exit
# Transaction 2: Create relationships (nodes now visible)
with session.begin_transaction() as tx:
for query in relationship_queries:
tx.run(query)
# Auto-commits on exit
6. MCP Memory Server vs Neo4j Cypher Server
Problem: The MCP Memory server (@modelcontextprotocol/server-memory) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models.
| Memory Server | Cypher Server | |
|---|---|---|
| Schema | Fixed: name, type, observations |
Your full custom schema |
| Node labels | Memory, reference |
Your 74 defined types |
| Relationships | Simple string pairs | Rich typed relationships |
| Query language | API calls (search_nodes) |
Full Cypher |
Resolution: If you have a custom Neo4j schema, use only the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types.
Dependencies
pip install neo4j
All three scripts require the neo4j Python package. APOC is optional but recommended (the init script's test suite checks for it).
Version History
| Date | Change |
|---|---|
| 2025-01-07 | Initial neo4j-schema-init.py |
| 2026-02-17 | Added neo4j-reset.py and neo4j-validate.py |
| 2026-02-17 | Fixed init script: explicit transactions, correct MERGE clause ordering |