r/koios

Files

Robert Helewka 7859264359 Add Neo4j schema initialization and validation scripts

- Introduced `neo4j-schema-init.py` for creating the foundational schema for the personal knowledge graph used by multiple AI assistants.
- Implemented functionality for creating constraints, indexes, and sample nodes, along with comprehensive testing of the schema.
- Added `neo4j-validate.py` to perform validation checks on the Neo4j knowledge graph, including constraints, indexes, sample nodes, relationships, and junk data detection.
- Enhanced logging for better traceability and debugging during schema initialization and validation processes.

2026-03-06 14:11:52 +00:00

9.5 KiB

Raw Blame History

Neo4j Utility Scripts

Documentation for the database management scripts in utils/

Scripts Overview

Script	Purpose	Destructive?
`neo4j-schema-init.py`	Create constraints, indexes, and sample data	No (idempotent)
`neo4j-reset.py`	Wipe all data, constraints, and indexes	Yes
`neo4j-validate.py`	Comprehensive validation report	No (read-only)

neo4j-schema-init.py

Creates the foundational schema for the unified knowledge graph: 74 uniqueness constraints, ~94 performance indexes, and 12 sample nodes with 5 cross-domain relationships.

Usage

# Interactive — prompts for URI, user, password
python utils/neo4j-schema-init.py

# Specify URI (will prompt for user/password)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687

# Skip sample data creation
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --skip-samples

# Test-only mode (no schema changes)
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --test-only

# Quiet mode
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687 --quiet

What It Creates

74 uniqueness constraints — one per node type, on the id property
~94 performance indexes — on name/title, date, type/status/category, and domain fields
12 sample nodes — spanning all three teams (Personal, Work, Engineering)
5 sample relationships — demonstrating cross-domain connections

Idempotent

Safe to run multiple times. Uses IF NOT EXISTS for constraints/indexes and MERGE for sample data.

neo4j-reset.py

Wipes the database clean. Drops all constraints, indexes, nodes, and relationships.

Usage

# Interactive — will prompt for confirmation
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687

# Skip confirmation prompt
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687 --force

What It Does

Reports current database contents (node/relationship/constraint/index counts)
Drops all constraints
Drops all non-lookup indexes
Deletes all nodes and relationships (batched for large databases)
Verifies the database is clean

Safety

Requires typing yes to confirm (unless --force)
Shows before/after counts so you know exactly what was removed

neo4j-validate.py

Generates a comprehensive validation report. Share the output to verify the graph is correctly built.

Usage

python utils/neo4j-validate.py --uri bolt://ariel.incus:7687

What It Checks

Section	What's Validated
Connection	Database reachable, APOC plugin available
Constraints	All 74 uniqueness constraints present, no extras
Indexes	Total count, spot-check of 11 key indexes
Node Labels	No unexpected labels (detects junk from Memory server, etc.)
Sample Nodes	All 12 sample nodes exist with correct properties
Sample Relationships	All 5 cross-domain relationships exist
Relationship Summary	Total count and breakdown by type
Node Summary	Total count and breakdown by label

Expected Clean Output

═════════════════════════════════════════════════════════════════
  VALIDATION REPORT — Koios Unified Knowledge Graph
═════════════════════════════════════════════════════════════════
  Schema Version: 2.1.0
  ...
  RESULT: ALL 23 CHECKS PASSED ✓
═════════════════════════════════════════════════════════════════

Standard Workflow

Fresh Setup / Clean Slate

# 1. Wipe everything
python utils/neo4j-reset.py --uri bolt://ariel.incus:7687

# 2. Build schema and sample data
python utils/neo4j-schema-init.py --uri bolt://ariel.incus:7687

# 3. Validate
python utils/neo4j-validate.py --uri bolt://ariel.incus:7687

Routine Validation

python utils/neo4j-validate.py --uri bolt://ariel.incus:7687

Environment Variables

All three scripts support environment variables to avoid repeated prompts:

export NEO4J_URI="bolt://ariel.incus:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"

# Then just:
python utils/neo4j-reset.py --force
python utils/neo4j-schema-init.py --skip-docs
python utils/neo4j-validate.py

Neo4j Python Driver — Lessons Learned

These patterns were discovered during development and are critical for anyone writing Cypher through the Neo4j Python driver (v5.x / v6.x).

1. Use Explicit Transactions for Writes

Problem: session.run() uses auto-commit transactions that don't reliably commit writes in the Neo4j Python driver 5.x+. Results must be fully consumed or the transaction may not commit.

Bad — silently fails to persist:

with driver.session() as session:
    session.run("CREATE (n:Person {id: 'test'})")
    # Transaction may not commit!

Good — explicit transaction with context manager:

with driver.session() as session:
    with session.begin_transaction() as tx:
        tx.run("CREATE (n:Person {id: 'test'})")
        # Auto-commits when context exits normally
        # Auto-rolls back on exception

Also good — managed write transaction:

def create_person_tx(tx, name):
    result = tx.run("CREATE (a:Person {name: $name}) RETURN a.id AS id", name=name)
    record = result.single()
    return record["id"]

with driver.session() as session:
    node_id = session.execute_write(create_person_tx, "Alice")

2. Cypher MERGE Clause Ordering

Problem: ON CREATE SET must come immediately after MERGE, before any general SET clause. Placing SET before ON CREATE SET causes a syntax error.

Bad — syntax error:

MERGE (p:Person {id: 'user_main'})
SET p.name = 'Main User',
    p.updated_at = datetime()
ON CREATE SET p.created_at = datetime()  -- ERROR: Invalid input 'ON'

Good — correct clause order:

MERGE (p:Person {id: 'user_main'})
ON CREATE SET p.created_at = datetime()
SET p.name = 'Main User',
    p.updated_at = datetime()

The full MERGE clause order is:

MERGE (pattern)
ON CREATE SET ...   ← only runs when node is first created
ON MATCH SET ...    ← only runs when node already exists (optional)
SET ...             ← always runs

3. Consume Results in Transactions

Problem: In managed transactions (execute_write), results must be consumed within the transaction function. Unconsumed results can cause issues.

Good pattern:

def create_node_tx(tx, node_id):
    result = tx.run("MERGE (n:Person {id: $id}) RETURN n.id AS id", id=node_id)
    record = result.single()  # Consumes the result
    return record["id"]

4. MATCH Returns No Rows ≠ Error

Problem: If a MATCH clause finds nothing, the query succeeds with zero rows — it does not raise an error. This means MERGE on a relationship after a failed MATCH silently does nothing.

-- If person_xyz doesn't exist, this returns 0 rows (no error)
MATCH (p:Person {id: 'person_xyz'})
MATCH (b:Book {id: 'book_abc'})
MERGE (p)-[:COMPLETED]->(b)
-- Zero rows processed, zero relationships created, zero errors

Mitigation: Always check result.single() for None to detect this case:

record = result.single()
if record is None:
    logger.error("Endpoints not found — no relationship created")

5. Separate Node and Relationship Transactions

Problem: Creating nodes and then matching them for relationships in the same auto-commit transaction can fail because the nodes aren't visible yet within the same transaction scope.

Good pattern: Create all nodes in one explicit transaction (commit), then create relationships in a separate explicit transaction:

# Transaction 1: Create nodes
with session.begin_transaction() as tx:
    for query in node_queries:
        tx.run(query)
    # Auto-commits on exit

# Transaction 2: Create relationships (nodes now visible)
with session.begin_transaction() as tx:
    for query in relationship_queries:
        tx.run(query)
    # Auto-commits on exit

6. MCP Memory Server vs Neo4j Cypher Server

Problem: The MCP Memory server (@modelcontextprotocol/server-memory) and Neo4j Cypher MCP server can both connect to the same Neo4j instance, but they use completely different data models.

	Memory Server	Cypher Server
Schema	Fixed: `name`, `type`, `observations`	Your full custom schema
Node labels	`Memory`, `reference`	Your 74 defined types
Relationships	Simple string pairs	Rich typed relationships
Query language	API calls (`search_nodes`)	Full Cypher

Resolution: If you have a custom Neo4j schema, use only the Cypher MCP server. Remove the Memory server to prevent it from polluting your graph with its own primitive node types.

Dependencies

pip install neo4j

All three scripts require the neo4j Python package. APOC is optional but recommended (the init script's test suite checks for it).

Version History

Date	Change
2025-01-07	Initial `neo4j-schema-init.py`
2026-02-17	Added `neo4j-reset.py` and `neo4j-validate.py`
2026-02-17	Fixed init script: explicit transactions, correct MERGE clause ordering

9.5 KiB Raw Blame History

Neo4j Utility Scripts

Scripts Overview

neo4j-schema-init.py

Usage

What It Creates

Idempotent

neo4j-reset.py

Usage

What It Does

Safety

neo4j-validate.py

Usage

What It Checks

Expected Clean Output

Standard Workflow

Fresh Setup / Clean Slate

Routine Validation

Environment Variables

Neo4j Python Driver — Lessons Learned

1. Use Explicit Transactions for Writes

2. Cypher MERGE Clause Ordering

3. Consume Results in Transactions

4. MATCH Returns No Rows ≠ Error

5. Separate Node and Relationship Transactions

6. MCP Memory Server vs Neo4j Cypher Server

Dependencies

Version History

9.5 KiB

Raw Blame History