Add Neo4j schema initialization and validation scripts
- Introduced `neo4j-schema-init.py` for creating the foundational schema for the personal knowledge graph used by multiple AI assistants. - Implemented functionality for creating constraints, indexes, and sample nodes, along with comprehensive testing of the schema. - Added `neo4j-validate.py` to perform validation checks on the Neo4j knowledge graph, including constraints, indexes, sample nodes, relationships, and junk data detection. - Enhanced logging for better traceability and debugging during schema initialization and validation processes.
This commit is contained in:
301
docs/neo4j-utils.md
Normal file
301
docs/neo4j-utils.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# Neo4j Utility Scripts
|
||||
|
||||
> Documentation for the database management scripts in `utils/`
|
||||
|
||||
---
|
||||
|
||||
## Scripts Overview
|
||||
|
||||
| Script | Purpose | Destructive? |
|
||||
|--------|---------|:------------:|
|
||||
| `neo4j-schema-init.py` | Create constraints, indexes, and sample data | No (idempotent) |
|
||||
| `neo4j-reset.py` | Wipe all data, constraints, and indexes | **Yes** |
|
||||
| `neo4j-validate.py` | Comprehensive validation report | No (read-only) |
|
||||
|
||||
---
|
||||
|
||||
## neo4j-schema-init.py
|
||||
|
||||
Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships.
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# Interactive — prompts for URI, user, password
|
||||
python utils/neo4j-schema-init.py
|
||||
|
||||
# Specify URI (will prompt for user/password)
|
||||
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
|
||||
|
||||
# Skip sample data creation
|
||||
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples
|
||||
|
||||
# Test-only mode (no schema changes)
|
||||
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only
|
||||
|
||||
# Quiet mode
|
||||
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet
|
||||
```
|
||||
|
||||
### What It Creates
|
||||
|
||||
1. **74 uniqueness constraints** — one per node type, on the `id` property
|
||||
2. **~94 performance indexes** — on name/title, date, type/status/category, and domain fields
|
||||
3. **12 sample nodes** — spanning all three teams (Personal, Work, Engineering)
|
||||
4. **5 sample relationships** — demonstrating cross-domain connections
|
||||
|
||||
### Idempotent
|
||||
|
||||
Safe to run multiple times. Uses `IF NOT EXISTS` for constraints/indexes and `MERGE` for sample data.
|
||||
|
||||
---
|
||||
|
||||
## neo4j-reset.py
|
||||
|
||||
Wipes the database clean. Drops all constraints, indexes, nodes, and relationships.
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
# Interactive — will prompt for confirmation
|
||||
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
|
||||
|
||||
# Skip confirmation prompt
|
||||
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force
|
||||
```
|
||||
|
||||
### What It Does
|
||||
|
||||
1. Reports current database contents (node/relationship/constraint/index counts)
|
||||
2. Drops all constraints
|
||||
3. Drops all non-lookup indexes
|
||||
4. Deletes all nodes and relationships (batched for large databases)
|
||||
5. Verifies the database is clean
|
||||
|
||||
### Safety
|
||||
|
||||
- Requires typing `yes` to confirm (unless `--force`)
|
||||
- Shows before/after counts so you know exactly what was removed
|
||||
|
||||
---
|
||||
|
||||
## neo4j-validate.py
|
||||
|
||||
Generates a comprehensive validation report. Share the output to verify the graph is correctly built.
|
||||
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
|
||||
```
|
||||
|
||||
### What It Checks
|
||||
|
||||
| Section | What's Validated |
|
||||
|---------|-----------------|
|
||||
| **Connection** | Database reachable, APOC plugin available |
|
||||
| **Constraints** | All 74 uniqueness constraints present, no extras |
|
||||
| **Indexes** | Total count, spot-check of 11 key indexes |
|
||||
| **Node Labels** | No unexpected labels (detects junk from Memory server, etc.) |
|
||||
| **Sample Nodes** | All 12 sample nodes exist with correct properties |
|
||||
| **Sample Relationships** | All 5 cross-domain relationships exist |
|
||||
| **Relationship Summary** | Total count and breakdown by type |
|
||||
| **Node Summary** | Total count and breakdown by label |
|
||||
|
||||
### Expected Clean Output
|
||||
|
||||
```
|
||||
═════════════════════════════════════════════════════════════════
|
||||
VALIDATION REPORT — Koios Unified Knowledge Graph
|
||||
═════════════════════════════════════════════════════════════════
|
||||
Schema Version: 2.1.0
|
||||
...
|
||||
RESULT: ALL 23 CHECKS PASSED ✓
|
||||
═════════════════════════════════════════════════════════════════
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Standard Workflow
|
||||
|
||||
### Fresh Setup / Clean Slate
|
||||
|
||||
```bash
|
||||
# 1. Wipe everything
|
||||
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687
|
||||
|
||||
# 2. Build schema and sample data
|
||||
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687
|
||||
|
||||
# 3. Validate
|
||||
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
|
||||
```
|
||||
|
||||
### Routine Validation
|
||||
|
||||
```bash
|
||||
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
All three scripts support environment variables to avoid repeated prompts:
|
||||
|
||||
```bash
|
||||
export NEO4J_URI="bolt://ariel.incus:7687"
|
||||
export NEO4J_USER="neo4j"
|
||||
export NEO4J_PASSWORD="your-password"
|
||||
|
||||
# Then just:
|
||||
python utils/neo4j-reset.py --force
|
||||
python utils/neo4j-schema-init.py --skip-docs
|
||||
python utils/neo4j-validate.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Neo4j Python Driver — Lessons Learned
|
||||
|
||||
These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x).
|
||||
|
||||
### 1. Use Explicit Transactions for Writes
|
||||
|
||||
**Problem:** `session.run()` uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit.
|
||||
|
||||
**Bad — silently fails to persist:**
|
||||
```python
|
||||
with driver.session() as session:
|
||||
session.run("CREATE (n:Person {id: 'test'})")
|
||||
# Transaction may not commit!
|
||||
```
|
||||
|
||||
**Good — explicit transaction with context manager:**
|
||||
```python
|
||||
with driver.session() as session:
|
||||
with session.begin_transaction() as tx:
|
||||
tx.run("CREATE (n:Person {id: 'test'})")
|
||||
# Auto-commits when context exits normally
|
||||
# Auto-rolls back on exception
|
||||
```
|
||||
|
||||
**Also good — managed write transaction:**
|
||||
```python
|
||||
def create_person_tx(tx, name):
|
||||
result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name)
|
||||
record = result.single()
|
||||
return record["id"]
|
||||
|
||||
with driver.session() as session:
|
||||
node_id = session.execute_write(create_person_tx, "Alice")
|
||||
```
|
||||
|
||||
### 2. Cypher MERGE Clause Ordering
|
||||
|
||||
**Problem:** `ON CREATE SET` must come immediately after `MERGE`, before any general `SET` clause. Placing `SET` before `ON CREATE SET` causes a syntax error.
|
||||
|
||||
**Bad — syntax error:**
|
||||
```cypher
|
||||
MERGE (p:Person {id: 'user_main'})
|
||||
SET p.name = 'Main User',
|
||||
p.updated_at = datetime()
|
||||
ON CREATE SET p.created_at = datetime() -- ERROR: Invalid input 'ON'
|
||||
```
|
||||
|
||||
**Good — correct clause order:**
|
||||
```cypher
|
||||
MERGE (p:Person {id: 'user_main'})
|
||||
ON CREATE SET p.created_at = datetime()
|
||||
SET p.name = 'Main User',
|
||||
p.updated_at = datetime()
|
||||
```
|
||||
|
||||
The full MERGE clause order is:
|
||||
```
|
||||
MERGE (pattern)
|
||||
ON CREATE SET ... ← only runs when node is first created
|
||||
ON MATCH SET ... ← only runs when node already exists (optional)
|
||||
SET ... ← always runs
|
||||
```
|
||||
|
||||
### 3. Consume Results in Transactions
|
||||
|
||||
**Problem:** In managed transactions (`execute_write`), results must be consumed within the transaction function. Unconsumed results can cause issues.
|
||||
|
||||
**Good pattern:**
|
||||
```python
|
||||
def create_node_tx(tx, node_id):
|
||||
result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id)
|
||||
record = result.single() # Consumes the result
|
||||
return record["id"]
|
||||
```
|
||||
|
||||
### 4. MATCH Returns No Rows ≠ Error
|
||||
|
||||
**Problem:** If a `MATCH` clause finds nothing, the query succeeds with zero rows — it does **not** raise an error. This means `MERGE` on a relationship after a failed `MATCH` silently does nothing.
|
||||
|
||||
```cypher
|
||||
-- If person_xyz doesn't exist, this returns 0 rows (no error)
|
||||
MATCH (p:Person {id: 'person_xyz'})
|
||||
MATCH (b:Book {id: 'book_abc'})
|
||||
MERGE (p)-[:COMPLETED]->(b)
|
||||
-- Zero rows processed, zero relationships created, zero errors
|
||||
```
|
||||
|
||||
**Mitigation:** Always check `result.single()` for `None` to detect this case:
|
||||
```python
|
||||
record = result.single()
|
||||
if record is None:
|
||||
logger.error("Endpoints not found — no relationship created")
|
||||
```
|
||||
|
||||
### 5. Separate Node and Relationship Transactions
|
||||
|
||||
**Problem:** Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope.
|
||||
|
||||
**Good pattern:** Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction:
|
||||
```python
|
||||
# Transaction 1: Create nodes
|
||||
with session.begin_transaction() as tx:
|
||||
for query in node_queries:
|
||||
tx.run(query)
|
||||
# Auto-commits on exit
|
||||
|
||||
# Transaction 2: Create relationships (nodes now visible)
|
||||
with session.begin_transaction() as tx:
|
||||
for query in relationship_queries:
|
||||
tx.run(query)
|
||||
# Auto-commits on exit
|
||||
```
|
||||
|
||||
### 6. MCP Memory Server vs Neo4j Cypher Server
|
||||
|
||||
**Problem:** The MCP Memory server (`@modelcontextprotocol/server-memory`) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models.
|
||||
|
||||
| | Memory Server | Cypher Server |
|
||||
|---|---|---|
|
||||
| **Schema** | Fixed: `name`, `type`, `observations` | Your full custom schema |
|
||||
| **Node labels** | `Memory`, `reference` | Your 74 defined types |
|
||||
| **Relationships** | Simple string pairs | Rich typed relationships |
|
||||
| **Query language** | API calls (`search_nodes`) | Full Cypher |
|
||||
|
||||
**Resolution:** If you have a custom Neo4j schema, use **only** the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types.
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
pip install neo4j
|
||||
```
|
||||
|
||||
All three scripts require the `neo4j` Python package. APOC is optional but recommended (the init script's test suite checks for it).
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
| Date | Change |
|
||||
|------|--------|
|
||||
| 2025-01-07 | Initial `neo4j-schema-init.py` |
|
||||
| 2026-02-17 | Added `neo4j-reset.py` and `neo4j-validate.py` |
|
||||
| 2026-02-17 | Fixed init script: explicit transactions, correct MERGE clause ordering |
|
||||
Reference in New Issue
Block a user