Files
koios/docs/neo4j-utils.md
Robert Helewka 7859264359 Add Neo4j schema initialization and validation scripts
- Introduced `neo4j-schema-init.py` for creating the foundational schema for the personal knowledge graph used by multiple AI assistants.
- Implemented functionality for creating constraints, indexes, and sample nodes, along with comprehensive testing of the schema.
- Added `neo4j-validate.py` to perform validation checks on the Neo4j knowledge graph, including constraints, indexes, sample nodes, relationships, and junk data detection.
- Enhanced logging for better traceability and debugging during schema initialization and validation processes.
2026-03-06 14:11:52 +00:00

9.5 KiB

Neo4j Utility Scripts

Documentation for the database management scripts in utils/


Scripts Overview

Script Purpose Destructive?
neo4j-schema-init.py Create constraints, indexes, and sample data No (idempotent)
neo4j-reset.py Wipe all data, constraints, and indexes Yes
neo4j-validate.py Comprehensive validation report No (read-only)

neo4j-schema-init.py

Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships.

Usage

# Interactive — prompts for URI, user, password
python utils/neo4j-schema-init.py

# Specify URI (will prompt for user/password)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687

# Skip sample data creation
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples

# Test-only mode (no schema changes)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only

# Quiet mode
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet

What It Creates

  1. 74 uniqueness constraints — one per node type, on the id property
  2. ~94 performance indexes — on name/title, date, type/status/category, and domain fields
  3. 12 sample nodes — spanning all three teams (Personal, Work, Engineering)
  4. 5 sample relationships — demonstrating cross-domain connections

Idempotent

Safe to run multiple times. Uses IF NOT EXISTS for constraints/indexes and MERGE for sample data.


neo4j-reset.py

Wipes the database clean. Drops all constraints, indexes, nodes, and relationships.

Usage

# Interactive — will prompt for confirmation
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687

# Skip confirmation prompt
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force

What It Does

  1. Reports current database contents (node/relationship/constraint/index counts)
  2. Drops all constraints
  3. Drops all non-lookup indexes
  4. Deletes all nodes and relationships (batched for large databases)
  5. Verifies the database is clean

Safety

  • Requires typing yes to confirm (unless --force)
  • Shows before/after counts so you know exactly what was removed

neo4j-validate.py

Generates a comprehensive validation report. Share the output to verify the graph is correctly built.

Usage

python utils/neo4j-validate.py --uri bolt://ariel.incus:7687

What It Checks

Section What's Validated
Connection Database reachable, APOC plugin available
Constraints All 74 uniqueness constraints present, no extras
Indexes Total count, spot-check of 11 key indexes
Node Labels No unexpected labels (detects junk from Memory server, etc.)
Sample Nodes All 12 sample nodes exist with correct properties
Sample Relationships All 5 cross-domain relationships exist
Relationship Summary Total count and breakdown by type
Node Summary Total count and breakdown by label

Expected Clean Output

═════════════════════════════════════════════════════════════════
  VALIDATION REPORT — Koios Unified Knowledge Graph
═════════════════════════════════════════════════════════════════
  Schema Version: 2.1.0
  ...
  RESULT: ALL 23 CHECKS PASSED ✓
═════════════════════════════════════════════════════════════════

Standard Workflow

Fresh Setup / Clean Slate

# 1. Wipe everything
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687

# 2. Build schema and sample data
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687

# 3. Validate
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687

Routine Validation

python utils/neo4j-validate.py --uri bolt://ariel.incus:7687

Environment Variables

All three scripts support environment variables to avoid repeated prompts:

export NEO4J_URI="bolt://ariel.incus:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"

# Then just:
python utils/neo4j-reset.py --force
python utils/neo4j-schema-init.py --skip-docs
python utils/neo4j-validate.py

Neo4j Python Driver — Lessons Learned

These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x).

1. Use Explicit Transactions for Writes

Problem: session.run() uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit.

Bad — silently fails to persist:

with driver.session() as session:
    session.run("CREATE (n:Person {id: 'test'})")
    # Transaction may not commit!

Good — explicit transaction with context manager:

with driver.session() as session:
    with session.begin_transaction() as tx:
        tx.run("CREATE (n:Person {id: 'test'})")
        # Auto-commits when context exits normally
        # Auto-rolls back on exception

Also good — managed write transaction:

def create_person_tx(tx, name):
    result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name)
    record = result.single()
    return record["id"]

with driver.session() as session:
    node_id = session.execute_write(create_person_tx, "Alice")

2. Cypher MERGE Clause Ordering

Problem: ON CREATE SET must come immediately after MERGE, before any general SET clause. Placing SET before ON CREATE SET causes a syntax error.

Bad — syntax error:

MERGE (p:Person {id: 'user_main'})
SET p.name = 'Main User',
    p.updated_at = datetime()
ON CREATE SET p.created_at = datetime()  -- ERROR: Invalid input 'ON'

Good — correct clause order:

MERGE (p:Person {id: 'user_main'})
ON CREATE SET p.created_at = datetime()
SET p.name = 'Main User',
    p.updated_at = datetime()

The full MERGE clause order is:

MERGE (pattern)
ON CREATE SET ...   ← only runs when node is first created
ON MATCH SET ...    ← only runs when node already exists (optional)
SET ...             ← always runs

3. Consume Results in Transactions

Problem: In managed transactions (execute_write), results must be consumed within the transaction function. Unconsumed results can cause issues.

Good pattern:

def create_node_tx(tx, node_id):
    result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id)
    record = result.single()  # Consumes the result
    return record["id"]

4. MATCH Returns No Rows ≠ Error

Problem: If a MATCH clause finds nothing, the query succeeds with zero rows — it does not raise an error. This means MERGE on a relationship after a failed MATCH silently does nothing.

-- If person_xyz doesn't exist, this returns 0 rows (no error)
MATCH (p:Person {id: 'person_xyz'})
MATCH (b:Book {id: 'book_abc'})
MERGE (p)-[:COMPLETED]->(b)
-- Zero rows processed, zero relationships created, zero errors

Mitigation: Always check result.single() for None to detect this case:

record = result.single()
if record is None:
    logger.error("Endpoints not found — no relationship created")

5. Separate Node and Relationship Transactions

Problem: Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope.

Good pattern: Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction:

# Transaction 1: Create nodes
with session.begin_transaction() as tx:
    for query in node_queries:
        tx.run(query)
    # Auto-commits on exit

# Transaction 2: Create relationships (nodes now visible)
with session.begin_transaction() as tx:
    for query in relationship_queries:
        tx.run(query)
    # Auto-commits on exit

6. MCP Memory Server vs Neo4j Cypher Server

Problem: The MCP Memory server (@modelcontextprotocol/server-memory) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models.

Memory Server Cypher Server
Schema Fixed: name, type, observations Your full custom schema
Node labels Memory, reference Your 74 defined types
Relationships Simple string pairs Rich typed relationships
Query language API calls (search_nodes) Full Cypher

Resolution: If you have a custom Neo4j schema, use only the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types.


Dependencies

pip install neo4j

All three scripts require the neo4j Python package. APOC is optional but recommended (the init script's test suite checks for it).


Version History

Date Change
2025-01-07 Initial neo4j-schema-init.py
2026-02-17 Added neo4j-reset.py and neo4j-validate.py
2026-02-17 Fixed init script: explicit transactions, correct MERGE clause ordering