- Introduced `neo4j-schema-init.py` for creating the foundational schema for the personal knowledge graph used by multiple AI assistants. - Implemented functionality for creating constraints, indexes, and sample nodes, along with comprehensive testing of the schema. - Added `neo4j-validate.py` to perform validation checks on the Neo4j knowledge graph, including constraints, indexes, sample nodes, relationships, and junk data detection. - Enhanced logging for better traceability and debugging during schema initialization and validation processes.
302 lines
9.5 KiB
Markdown
302 lines
9.5 KiB
Markdown
# Neo4j Utility Scripts
|
|
|
|
> Documentation for the database management scripts in `utils/`
|
|
|
|
---
|
|
|
|
## Scripts Overview
|
|
|
|
| Script | Purpose | Destructive? |
|
|
|--------|---------|:------------:|
|
|
| `neo4j-schema-init.py` | Create constraints, indexes, and sample data | No (idempotent) |
|
|
| `neo4j-reset.py` | Wipe all data, constraints, and indexes | **Yes** |
|
|
| `neo4j-validate.py` | Comprehensive validation report | No (read-only) |
|
|
|
|
---
|
|
|
|
## neo4j-schema-init.py
|
|
|
|
Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
# Interactive — prompts for URI, user, password
|
|
python utils/neo4j-schema-init.py
|
|
|
|
# Specify URI (will prompt for user/password)
|
|
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
|
|
|
|
# Skip sample data creation
|
|
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples
|
|
|
|
# Test-only mode (no schema changes)
|
|
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only
|
|
|
|
# Quiet mode
|
|
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet
|
|
```
|
|
|
|
### What It Creates
|
|
|
|
1. **74 uniqueness constraints** — one per node type, on the `id` property
|
|
2. **~94 performance indexes** — on name/title, date, type/status/category, and domain fields
|
|
3. **12 sample nodes** — spanning all three teams (Personal, Work, Engineering)
|
|
4. **5 sample relationships** — demonstrating cross-domain connections
|
|
|
|
### Idempotent
|
|
|
|
Safe to run multiple times. Uses `IF NOT EXISTS` for constraints/indexes and `MERGE` for sample data.
|
|
|
|
---
|
|
|
|
## neo4j-reset.py
|
|
|
|
Wipes the database clean. Drops all constraints, indexes, nodes, and relationships.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
# Interactive — will prompt for confirmation
|
|
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
|
|
|
|
# Skip confirmation prompt
|
|
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force
|
|
```
|
|
|
|
### What It Does
|
|
|
|
1. Reports current database contents (node/relationship/constraint/index counts)
|
|
2. Drops all constraints
|
|
3. Drops all non-lookup indexes
|
|
4. Deletes all nodes and relationships (batched for large databases)
|
|
5. Verifies the database is clean
|
|
|
|
### Safety
|
|
|
|
- Requires typing `yes` to confirm (unless `--force`)
|
|
- Shows before/after counts so you know exactly what was removed
|
|
|
|
---
|
|
|
|
## neo4j-validate.py
|
|
|
|
Generates a comprehensive validation report. Share the output to verify the graph is correctly built.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
|
|
```
|
|
|
|
### What It Checks
|
|
|
|
| Section | What's Validated |
|
|
|---------|-----------------|
|
|
| **Connection** | Database reachable, APOC plugin available |
|
|
| **Constraints** | All 74 uniqueness constraints present, no extras |
|
|
| **Indexes** | Total count, spot-check of 11 key indexes |
|
|
| **Node Labels** | No unexpected labels (detects junk from Memory server, etc.) |
|
|
| **Sample Nodes** | All 12 sample nodes exist with correct properties |
|
|
| **Sample Relationships** | All 5 cross-domain relationships exist |
|
|
| **Relationship Summary** | Total count and breakdown by type |
|
|
| **Node Summary** | Total count and breakdown by label |
|
|
|
|
### Expected Clean Output
|
|
|
|
```
|
|
═════════════════════════════════════════════════════════════════
|
|
VALIDATION REPORT — Koios Unified Knowledge Graph
|
|
═════════════════════════════════════════════════════════════════
|
|
Schema Version: 2.1.0
|
|
...
|
|
RESULT: ALL 23 CHECKS PASSED ✓
|
|
═════════════════════════════════════════════════════════════════
|
|
```
|
|
|
|
---
|
|
|
|
## Standard Workflow
|
|
|
|
### Fresh Setup / Clean Slate
|
|
|
|
```bash
|
|
# 1. Wipe everything
|
|
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
|
|
|
|
# 2. Build schema and sample data
|
|
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
|
|
|
|
# 3. Validate
|
|
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
|
|
```
|
|
|
|
### Routine Validation
|
|
|
|
```bash
|
|
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
All three scripts support environment variables to avoid repeated prompts:
|
|
|
|
```bash
|
|
export NEO4J_URI="bolt://ariel.incus:7687"
|
|
export NEO4J_USER="neo4j"
|
|
export NEO4J_PASSWORD="your-password"
|
|
|
|
# Then just:
|
|
python utils/neo4j-reset.py --force
|
|
python utils/neo4j-schema-init.py --skip-docs
|
|
python utils/neo4j-validate.py
|
|
```
|
|
|
|
---
|
|
|
|
## Neo4j Python Driver — Lessons Learned
|
|
|
|
These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x).
|
|
|
|
### 1. Use Explicit Transactions for Writes
|
|
|
|
**Problem:** `session.run()` uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit.
|
|
|
|
**Bad — silently fails to persist:**
|
|
```python
|
|
with driver.session() as session:
|
|
session.run("CREATE (n:Person {id: 'test'})")
|
|
# Transaction may not commit!
|
|
```
|
|
|
|
**Good — explicit transaction with context manager:**
|
|
```python
|
|
with driver.session() as session:
|
|
with session.begin_transaction() as tx:
|
|
tx.run("CREATE (n:Person {id: 'test'})")
|
|
# Auto-commits when context exits normally
|
|
# Auto-rolls back on exception
|
|
```
|
|
|
|
**Also good — managed write transaction:**
|
|
```python
|
|
def create_person_tx(tx, name):
|
|
result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name)
|
|
record = result.single()
|
|
return record["id"]
|
|
|
|
with driver.session() as session:
|
|
node_id = session.execute_write(create_person_tx, "Alice")
|
|
```
|
|
|
|
### 2. Cypher MERGE Clause Ordering
|
|
|
|
**Problem:** `ON CREATE SET` must come immediately after `MERGE`, before any general `SET` clause. Placing `SET` before `ON CREATE SET` causes a syntax error.
|
|
|
|
**Bad — syntax error:**
|
|
```cypher
|
|
MERGE (p:Person {id: 'user_main'})
|
|
SET p.name = 'Main User',
|
|
p.updated_at = datetime()
|
|
ON CREATE SET p.created_at = datetime() -- ERROR: Invalid input 'ON'
|
|
```
|
|
|
|
**Good — correct clause order:**
|
|
```cypher
|
|
MERGE (p:Person {id: 'user_main'})
|
|
ON CREATE SET p.created_at = datetime()
|
|
SET p.name = 'Main User',
|
|
p.updated_at = datetime()
|
|
```
|
|
|
|
The full MERGE clause order is:
|
|
```
|
|
MERGE (pattern)
|
|
ON CREATE SET ... ← only runs when node is first created
|
|
ON MATCH SET ... ← only runs when node already exists (optional)
|
|
SET ... ← always runs
|
|
```
|
|
|
|
### 3. Consume Results in Transactions
|
|
|
|
**Problem:** In managed transactions (`execute_write`), results must be consumed within the transaction function. Unconsumed results can cause issues.
|
|
|
|
**Good pattern:**
|
|
```python
|
|
def create_node_tx(tx, node_id):
|
|
result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id)
|
|
record = result.single() # Consumes the result
|
|
return record["id"]
|
|
```
|
|
|
|
### 4. MATCH Returns No Rows ≠ Error
|
|
|
|
**Problem:** If a `MATCH` clause finds nothing, the query succeeds with zero rows — it does **not** raise an error. This means `MERGE` on a relationship after a failed `MATCH` silently does nothing.
|
|
|
|
```cypher
|
|
-- If person_xyz doesn't exist, this returns 0 rows (no error)
|
|
MATCH (p:Person {id: 'person_xyz'})
|
|
MATCH (b:Book {id: 'book_abc'})
|
|
MERGE (p)-[:COMPLETED]->(b)
|
|
-- Zero rows processed, zero relationships created, zero errors
|
|
```
|
|
|
|
**Mitigation:** Always check `result.single()` for `None` to detect this case:
|
|
```python
|
|
record = result.single()
|
|
if record is None:
|
|
logger.error("Endpoints not found — no relationship created")
|
|
```
|
|
|
|
### 5. Separate Node and Relationship Transactions
|
|
|
|
**Problem:** Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope.
|
|
|
|
**Good pattern:** Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction:
|
|
```python
|
|
# Transaction 1: Create nodes
|
|
with session.begin_transaction() as tx:
|
|
for query in node_queries:
|
|
tx.run(query)
|
|
# Auto-commits on exit
|
|
|
|
# Transaction 2: Create relationships (nodes now visible)
|
|
with session.begin_transaction() as tx:
|
|
for query in relationship_queries:
|
|
tx.run(query)
|
|
# Auto-commits on exit
|
|
```
|
|
|
|
### 6. MCP Memory Server vs Neo4j Cypher Server
|
|
|
|
**Problem:** The MCP Memory server (`@modelcontextprotocol/server-memory`) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models.
|
|
|
|
| | Memory Server | Cypher Server |
|
|
|---|---|---|
|
|
| **Schema** | Fixed: `name`, `type`, `observations` | Your full custom schema |
|
|
| **Node labels** | `Memory`, `reference` | Your 74 defined types |
|
|
| **Relationships** | Simple string pairs | Rich typed relationships |
|
|
| **Query language** | API calls (`search_nodes`) | Full Cypher |
|
|
|
|
**Resolution:** If you have a custom Neo4j schema, use **only** the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types.
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
```
|
|
pip install neo4j
|
|
```
|
|
|
|
All three scripts require the `neo4j` Python package. APOC is optional but recommended (the init script's test suite checks for it).
|
|
|
|
---
|
|
|
|
## Version History
|
|
|
|
| Date | Change |
|
|
|------|--------|
|
|
| 2025-01-07 | Initial `neo4j-schema-init.py` |
|
|
| 2026-02-17 | Added `neo4j-reset.py` and `neo4j-validate.py` |
|
|
| 2026-02-17 | Fixed init script: explicit transactions, correct MERGE clause ordering |
|