๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง

๋ฐ์ดํ„ฐ ๋ถ„์„ ํ™˜๊ฒฝ ๊ตฌ์ถ• - 07. Clickhouse ํด๋Ÿฌ์Šคํ„ฐ ํ™˜๊ฒฝ์—์„œ ๋ฐ์ดํ„ฐ ์‚ฝ์ž… ๋ฐ ์—…๋ฐ์ดํŠธ ์ฒ˜๋ฆฌ ๋ฐฉ์‹ ๐Ÿš€(part. 1)

Tempo 2025. 4. 1. 14:50

Clickhouse ํด๋Ÿฌ์Šคํ„ฐ ํ™˜๊ฒฝ์—์„œ ๋ฐ์ดํ„ฐ ์‚ฝ์ž… ๋ฐ ์—…๋ฐ์ดํŠธ ์ฒ˜๋ฆฌ ๋ฐฉ์‹ ๐Ÿš€(part. 1)

์ด์ „ ๊ธ€ ๋ฐ์ดํ„ฐ ๋ถ„์„ ํ™˜๊ฒฝ ๊ตฌ์ถ• - 06. Clickhouse ํ…Œ์ด๋ธ” ์ข…๋ฅ˜์™€ ์šฉ๋„์— ์ด์–ด Kubernetes ๊ธฐ๋ฐ˜ ClickHouse ํด๋Ÿฌ์Šคํ„ฐ ํ™˜๊ฒฝ์—์„œ ์ฃผ์š” ํ…Œ์ด๋ธ” ์—”์ง„๋ณ„ ๋ฐ์ดํ„ฐ ์‚ฝ์ž… ๋ฐ ์—…๋ฐ์ดํŠธ ์ฒ˜๋ฆฌ ๋ฐฉ์‹์„ ์‚ดํŽด๋ณด๊ณ , ์ตœ์‹  ๊ธฐ๋Šฅ๊ณผ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

 

1. ClickHouse Kubernetes ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š” ๐Ÿ› ๏ธ

Kubernetes ํ™˜๊ฒฝ์—์„œ ClickHouse๋Š” ์ƒค๋“œ(Shard)์™€ ๋ ˆํ”Œ๋ฆฌ์นด(Replica) ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ ์ €์žฅํ•˜๊ณ  ๊ณ ๊ฐ€์šฉ์„ฑ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค.

1.1 Kubernetes ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์„ฑ

  • ์ƒค๋“œ(Shard): ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• ํ•˜์—ฌ ์ €์žฅํ•˜๋Š” ๋…ธ๋“œ
  • ๋ ˆํ”Œ๋ฆฌ์นด(Replica): ๋™์ผ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์ œํ•˜์—ฌ ์ƒค๋“œ์˜ ์žฅ์• ๋ฅผ ๋Œ€๋น„ํ•˜๋Š” ๋ฐฑ์—… ๋…ธ๋“œ
  • ZooKeeper: ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๋ฐ ๋ ˆํ”Œ๋ฆฌ์นด ๋™๊ธฐํ™”

์˜ˆ์‹œ ์•„ํ‚คํ…์ฒ˜

ClickHouse Cluster (2 Shards, 2 Replicas per Shard)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Shard 1      โ”‚   โ”‚ Shard 2      โ”‚
โ”‚ Replica 1, 2 โ”‚   โ”‚ Replica 1, 2 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

 

๋ฐ˜์‘ํ˜•

2. ๋ฐ์ดํ„ฐ ์‚ฝ์ž…(Insert) ์ฒ˜๋ฆฌ ๋ฐฉ์‹ ๐Ÿ”„

2.1 Distributed + MergeTree ์กฐํ•ฉ

ClickHouse ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•  ๋•Œ, ์ผ๋ฐ˜์ ์œผ๋กœ Distributed ์—”์ง„๊ณผ MergeTree ๊ณ„์—ด ์—”์ง„์„ ์กฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.

โœ… ๋ฐ์ดํ„ฐ ์‚ฝ์ž… ํ๋ฆ„

  1. ํด๋ผ์ด์–ธํŠธ ์š”์ฒญ: ์‚ฌ์šฉ์ž๊ฐ€ Distributed ํ…Œ์ด๋ธ”์— ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค.
  2. ์ƒค๋”ฉ ๋ผ์šฐํŒ…: Distributed ์—”์ง„์ด ์ง€์ •๋œ ์ƒค๋”ฉ ํ‚ค๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ ์ƒค๋“œ๋กœ ๋ถ„๋ฐฐํ•ฉ๋‹ˆ๋‹ค.
  3. ๋ฐ์ดํ„ฐ ์ €์žฅ: ๊ฐ ์ƒค๋“œ์˜ MergeTree ํ…Œ์ด๋ธ”์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ๋ณ‘ํ•ฉ(merge) ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

โœ… ํ…Œ์ด๋ธ” ์ƒ์„ฑ ์˜ˆ์ œ

-- ์ƒค๋“œ๋ณ„ ๋กœ์ปฌ ํ…Œ์ด๋ธ”
CREATE TABLE shard_table ON CLUSTER my_cluster
(
    id UInt32,
    event_time DateTime,
    event_type String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (id, event_time);

-- ๋ถ„์‚ฐ ํ…Œ์ด๋ธ”
CREATE TABLE distributed_table ON CLUSTER my_cluster
(
    id UInt32,
    event_time DateTime,
    event_type String
)
ENGINE = Distributed(my_cluster, default, shard_table, cityHash64(id));

โœ… ๋ฐ์ดํ„ฐ ์‚ฝ์ž… ์˜ˆ์ œ

INSERT INTO distributed_table VALUES 
(1, '2025-04-01 10:00:00', 'login'),
(2, '2025-04-01 10:05:00', 'purchase'),
(3, '2025-04-01 10:10:00', 'logout');

 

 

โญ•๏ธ ์‹ค์ „ ์˜ˆ์‹œ - ์ด๋ฒคํŠธ ์ฒ˜๋ฆฌ

๐ŸŽฏ ์‹ค์ „ ์˜ˆ์ œ์—์„œ๋Š” shard 2๊ฐœ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
NAME                     READY   STATUS    RESTARTS
clickhouse-shard0-0      1/1     Running   0       
clickhouse-shard1-0      1/1     Running   0       
clickhouse-zookeeper-0   1/1     Running   0

 

1๏ธโƒฃ MergeTree ์—”์ง„์„ ์‚ฌ์šฉํ•œ ํ…Œ์ด๋ธ”์—๋งŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์—ˆ์„ ๋•Œ

-- ์ผ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ƒ์„ฑ(Replicated X)
CREATE DATABASE IF NOT EXISTS regular_db;

-- ํ…Œ์ด๋ธ” ์ƒ์„ฑ
CREATE TABLE IF NOT EXISTS regular_db.test_merge_tree (
    id UInt64,
    event_date Date,
    event_time DateTime,
    user_id String,
    event_type String,
    payload String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, id);

-- ์ž„์˜ ๋ฐ์ดํ„ฐ Insert
INSERT INTO regular_db.test_merge_tree
SELECT
    number as id,
    toDate('2024-03-01') + (number % 30) as event_date,
    toDateTime(toDate('2024-03-01') + (number % 30) + (number % 24) / 24) as event_time,
    concat('user_', toString(1 + number % 1000)) as user_id,
    multiIf(
        number % 5 = 0, 'click',
        number % 5 = 1, 'view',
        number % 5 = 2, 'purchase',
        number % 5 = 3, 'login',
        'logout'
    ) as event_type,
    concat('{"data":"', repeat('X', 10 + number % 50), '","value":', toString(number % 100), '}') as payload
FROM numbers(1000000);

 

์ƒค๋“œ๋ณ„ ๋ฐ์ดํ„ฐ ์ €์žฅ ์œ„์น˜๋ณ„ ํฌ๊ธฐ(byte)์™€ row์ˆ˜ ํ™•์ธ

SELECT hostName(), sum(rows) AS total_rows, sum(bytes) AS total_bytes
FROM cluster('default', 'system', 'parts')
WHERE database = 'regular_db' AND table = 'test_merge_tree'
GROUP BY hostName();

  • ๋ณ„๋„ ์ง€์ • ์—†์ด mergetree์— ์ง์ ‘ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ–ˆ์„ ๋•Œ shard0-0์— ์ถ”๊ฐ€๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2๏ธโƒฃ MergeTree ์—”์ง„์„ ์‚ฌ์šฉํ•œ ํ…Œ์ด๋ธ”์— Distributed ์—”์ง„์„ ์‚ฌ์šฉํ•œ ํ…Œ์ด๋ธ”์„ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์—ˆ์„ ๋•Œ

  • ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, ํ…Œ์ด๋ธ”์„ ์žฌ ์ƒ์„ฑํ•˜๊ณ  ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
-- ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ œ๊ฑฐ ๋ฐ ์žฌ ์ƒ์„ฑ
DROP DATABASE regular_db;
CREATE DATABASE IF NOT EXISTS regular_db on cluster default;

-- ํ…Œ์ด๋ธ” ์žฌ ์ƒ์„ฑ
CREATE TABLE IF NOT EXISTS regular_db.test_merge_tree on cluster default (
    id UInt64,
    event_date Date,
    event_time DateTime,
    user_id String,
    event_type String,
    payload String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, id);

-- ๋ถ„์‚ฐ ํ…Œ์ด๋ธ” ์ƒ์„ฑ
CREATE TABLE regular_db.distributed_table ON CLUSTER default
(
    id UInt64,
    event_date Date,
    event_time DateTime,
    user_id String,
    event_type String,
    payload String
)
ENGINE = Distributed(default, regular_db, test_merge_tree, cityHash64(id));

-- ๋ถ„์‚ฐ ํ…Œ์ด๋ธ”์— ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€
INSERT INTO regular_db.distributed_table
SELECT
    number as id,
    toDate('2024-03-01') + (number % 30) as event_date,
    toDateTime(toDate('2024-03-01') + (number % 30) + (number % 24) / 24) as event_time,
    concat('user_', toString(1 + number % 1000)) as user_id,
    multiIf(
        number % 5 = 0, 'click',
        number % 5 = 1, 'view',
        number % 5 = 2, 'purchase',
        number % 5 = 3, 'login',
        'logout'
    ) as event_type,
    concat('{"data":"', repeat('X', 10 + number % 50), '","value":', toString(number % 100), '}') as payload
FROM numbers(1000000);

 

์ƒค๋“œ๋ณ„ row์ˆ˜์™€ byte๋ฅผ ํ™•์ธํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„์‚ฐ๋˜์–ด ์ €์žฅ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3. ๊ฒฐ๋ก  ๐Ÿ’ก

Kubernetes ํ™˜๊ฒฝ์—์„œ ClickHouse๋ฅผ ์šด์˜ํ•˜๋ฉด ํ™•์žฅ์„ฑ๊ณผ ๊ณ ๊ฐ€์šฉ์„ฑ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ ์ ˆํ•œ ํ…Œ์ด๋ธ” ์—”์ง„, ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—”์ง„ ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

 

โœ… ๋‹ค์Œ ์‹œ๊ฐ„์—๋Š” ReplicatedMergetree ์—”์ง„์„ ์„ค๋ช…ํ•˜๊ณ  Distributed + Mergetree ๊ตฌ์„ฑ๊ณผ ๋น„๊ตํ•˜์—ฌ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์–ด๋–ค ์ƒํ™ฉ์—์„œ ๊ฐ ์กฐํ•ฉ์ด ์‚ฌ์šฉ๋˜๋Š”์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ˜์‘ํ˜•