name: clickhouse-best-practices description: MUST USE when reviewing ClickHouse schemas, queries, or configurations. Contains 28 rules that MUST be checked before providing recommendations. Always read relevant rule files and cite specific rules in responses. license: Apache-2.0 metadata: author: ClickHouse Inc version: "0.3.0"

ClickHouse Best Practices

Name: clickhouse-best-practices
Author: clickhouse

Comprehensive guidance for ClickHouse covering schema design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact.

Official docs: ClickHouse Best Practices

IMPORTANT: How to Apply This Skill

Before answering ClickHouse questions, follow this priority order:

Check for applicable rules in the rules/ directory
If rules exist: Apply them and cite them in your response using "Per rule-name..."
If no rule exists: Use the LLM's ClickHouse knowledge or search documentation
If uncertain: Use web search for current best practices
Always cite your source: rule name, "general ClickHouse guidance", or URL

Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated, ClickHouse-specific guidance.

For Formal Reviews

When performing a formal review of schemas, queries, or data ingestion:

Review Procedures

For Schema Reviews (CREATE TABLE, ALTER TABLE)

Read these rule files in order:

rules/schema-pk-plan-before-creation.md - ORDER BY is immutable
rules/schema-pk-cardinality-order.md - Column ordering in keys
rules/schema-pk-prioritize-filters.md - Filter column inclusion
rules/schema-types-native-types.md - Proper type selection
rules/schema-types-minimize-bitwidth.md - Numeric type sizing
rules/schema-types-lowcardinality.md - LowCardinality usage
rules/schema-types-avoid-nullable.md - Nullable vs DEFAULT
rules/schema-partition-low-cardinality.md - Partition count limits
rules/schema-partition-lifecycle.md - Partitioning purpose

Check for:

[ ] PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
[ ] Data types match actual data ranges
[ ] LowCardinality applied to appropriate string columns
[ ] Partition key cardinality bounded (100-1,000 values)
[ ] ReplacingMergeTree has version column if used

For Query Reviews (SELECT, JOIN, aggregations)

Read these rule files:

rules/query-join-choose-algorithm.md - Algorithm selection
rules/query-join-filter-before.md - Pre-join filtering
rules/query-join-use-any.md - ANY vs regular JOIN
rules/query-index-skipping-indices.md - Secondary index usage
rules/schema-pk-filter-on-orderby.md - Filter alignment with ORDER BY

Check for:

[ ] Filters use ORDER BY prefix columns
[ ] JOINs filter tables before joining (not after)
[ ] Correct JOIN algorithm for table sizes
[ ] Skipping indices for non-ORDER BY filter columns

For Insert Strategy Reviews (data ingestion, updates, deletes)

Read these rule files:

rules/insert-batch-size.md - Batch sizing requirements
rules/insert-mutation-avoid-update.md - UPDATE alternatives
rules/insert-mutation-avoid-delete.md - DELETE alternatives
rules/insert-async-small-batches.md - Async insert usage
rules/insert-optimize-avoid-final.md - OPTIMIZE TABLE risks

Check for:

[ ] Batch size 10K-100K rows per INSERT
[ ] No ALTER TABLE UPDATE for frequent changes
[ ] ReplacingMergeTree or CollapsingMergeTree for update patterns
[ ] Async inserts enabled for high-frequency small batches

Output Format

Structure your response as follows:

## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...

## Findings

### Violations
- **`rule-name`**: Description of the issue
  - Current: [what the code does]
  - Required: [what it should do]
  - Fix: [specific correction]

### Compliant
- `rule-name`: Brief note on why it's correct

## Recommendations
[Prioritized list of changes, citing rules]

Rule Categories by Priority

| Priority | Category | Impact | Prefix | Rule Count | |----------|----------|--------|--------|------------| | 1 | Primary Key Selection | CRITICAL | schema-pk- | 4 | | 2 | Data Type Selection | CRITICAL | schema-types- | 5 | | 3 | JOIN Optimization | CRITICAL | query-join- | 5 | | 4 | Insert Batching | CRITICAL | insert-batch- | 1 | | 5 | Mutation Avoidance | CRITICAL | insert-mutation- | 2 | | 6 | Partitioning Strategy | HIGH | schema-partition- | 4 | | 7 | Skipping Indices | HIGH | query-index- | 1 | | 8 | Materialized Views | HIGH | query-mv- | 2 | | 9 | Async Inserts | HIGH | insert-async- | 2 | | 10 | OPTIMIZE Avoidance | HIGH | insert-optimize- | 1 | | 11 | JSON Usage | MEDIUM | schema-json- | 1 |

Quick Reference

Schema Design - Primary Key (CRITICAL)

schema-pk-plan-before-creation - Plan ORDER BY before table creation (immutable)
schema-pk-cardinality-order - Order columns low-to-high cardinality
schema-pk-prioritize-filters - Include frequently filtered columns
schema-pk-filter-on-orderby - Query filters must use ORDER BY prefix

Schema Design - Data Types (CRITICAL)

schema-types-native-types - Use native types, not String for everything
schema-types-minimize-bitwidth - Use smallest numeric type that fits
schema-types-lowcardinality - LowCardinality for <10K unique strings
schema-types-enum - Enum for finite value sets with validation
schema-types-avoid-nullable - Avoid Nullable; use DEFAULT instead

Schema Design - Partitioning (HIGH)

schema-partition-low-cardinality - Keep partition count 100-1,000
schema-partition-lifecycle - Use partitioning for data lifecycle, not queries
schema-partition-query-tradeoffs - Understand partition pruning trade-offs
schema-partition-start-without - Consider starting without partitioning

Schema Design - JSON (MEDIUM)

schema-json-when-to-use - JSON for dynamic schemas; typed columns for known

Query Optimization - JOINs (CRITICAL)

query-join-choose-algorithm - Select algorithm based on table sizes
query-join-use-any - ANY JOIN when only one match needed
query-join-filter-before - Filter tables before joining
query-join-consider-alternatives - Dictionaries/denormalization vs JOIN
query-join-null-handling - join_use_nulls=0 for default values

Query Optimization - Indices (HIGH)

query-index-skipping-indices - Skipping indices for non-ORDER BY filters

Query Optimization - Materialized Views (HIGH)

query-mv-incremental - Incremental MVs for real-time aggregations
query-mv-refreshable - Refreshable MVs for complex joins

Insert Strategy - Batching (CRITICAL)

insert-batch-size - Batch 10K-100K rows per INSERT

Insert Strategy - Async (HIGH)

insert-async-small-batches - Async inserts for high-frequency small batches
insert-format-native - Native format for best performance

Insert Strategy - Mutations (CRITICAL)

insert-mutation-avoid-update - ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-delete - Lightweight DELETE or DROP PARTITION

Insert Strategy - Optimization (HIGH)

insert-optimize-avoid-final - Let background merges work

When to Apply

This skill activates when you encounter:

CREATE TABLE statements
ALTER TABLE modifications
ORDER BY or PRIMARY KEY discussions
Data type selection questions
Slow query troubleshooting
JOIN optimization requests
Data ingestion pipeline design
Update/delete strategy questions
ReplacingMergeTree or other specialized engine usage
Partitioning strategy decisions

Rule File Structure

Each rule file in rules/ contains:

YAML frontmatter: title, impact level, tags
Brief explanation: Why this rule matters
Incorrect example: Anti-pattern with explanation
Correct example: Best practice with explanation
Additional context: Trade-offs, when to apply, references

Full Compiled Document

For the complete guide with all rules expanded inline: AGENTS.md

Use AGENTS.md when you need to check multiple rules quickly without reading individual files.

SkillForge

clickhouse-best-practices

Before / After 效果对比

description 文档

ClickHouse Best Practices

IMPORTANT: How to Apply This Skill

For Formal Reviews

Review Procedures

For Schema Reviews (CREATE TABLE, ALTER TABLE)

For Query Reviews (SELECT, JOIN, aggregations)

For Insert Strategy Reviews (data ingestion, updates, deletes)

Output Format

Rule Categories by Priority

Quick Reference

Schema Design - Primary Key (CRITICAL)

Schema Design - Data Types (CRITICAL)

Schema Design - Partitioning (HIGH)

Schema Design - JSON (MEDIUM)

Query Optimization - JOINs (CRITICAL)

Query Optimization - Indices (HIGH)

Query Optimization - Materialized Views (HIGH)

Insert Strategy - Batching (CRITICAL)

Insert Strategy - Async (HIGH)

Insert Strategy - Mutations (CRITICAL)

Insert Strategy - Optimization (HIGH)

When to Apply

Rule File Structure

Full Compiled Document

forum用户评价 (0)

统计数据

用户评分

兼容平台

时间线

SkillForge