๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง

๋ฐ์ดํ„ฐ ๋ถ„์„ ํ™˜๊ฒฝ ๊ตฌ์ถ• - 08. Clickhouse ํด๋Ÿฌ์Šคํ„ฐ ํ™˜๊ฒฝ์—์„œ ๋ฐ์ดํ„ฐ ์‚ฝ์ž… ๋ฐ ์—…๋ฐ์ดํŠธ ์ฒ˜๋ฆฌ ๋ฐฉ์‹ ๐Ÿš€(part. 2)

Tempo 2025. 4. 3. 08:00

์ด์ „ clickhouse ํด๋Ÿฌ์Šคํ„ฐ ํ™˜๊ฒฝ์—์„œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐฉ์‹(Clickhouse ํด๋Ÿฌ์Šคํ„ฐ ํ™˜๊ฒฝ์—์„œ ๋ฐ์ดํ„ฐ ์‚ฝ์ž… ๋ฐ ์—…๋ฐ์ดํŠธ ์ฒ˜๋ฆฌ ๋ฐฉ์‹ ๐Ÿš€(part. 1)) ์„ค๋ช…์— ์ด์–ด ReplicatedMergetree ํ…Œ์ด๋ธ”์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฒ˜๋ฆฌ๋˜๋Š” ๋ฐฉ์‹์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

1. ClickHouse Kubernetes ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š” ๐Ÿ› ๏ธ

์ด๋ฒˆ ์‹ค์Šต์—์„œ๋Š” replica๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณต์ œ๋˜๋„๋ก ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด clickhouse helm ๋ฐฐํฌ์šฉ yaml ํŒŒ์ผ์—์„œ replicaCount๋ฅผ 2๋กœ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.(Clickhouse ์„ค์น˜ ๋ฐฉ๋ฒ•)

NAME                     READY   STATUS    RESTARTS
clickhouse-shard0-0      1/1     Running   0       
clickhouse-shard0-1      1/1     Running   0       
clickhouse-shard1-0      1/1     Running   0       
clickhouse-shard1-1      1/1     Running   0       
clickhouse-zookeeper-0   1/1     Running   0

2. ๋ฐ์ดํ„ฐ ์‚ฝ์ž…(Insert) ์ฒ˜๋ฆฌ ๋ฐฉ์‹ ๐Ÿ”„

2.1 ReplicatedMergeTree

์ด์ „ ์‹ค์Šต์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ regular_db์— replicatedmergetree ์—”์ง„์„ ์‚ฌ์šฉํ•˜๋Š” ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

-- ReplicatedMergetree ์—”์ง„ ํ…Œ์ด๋ธ” ์ƒ์„ฑ
CREATE TABLE IF NOT EXISTS regular_db.test_replicated_merge_tree (
    id UInt64,
    event_date Date,
    event_time DateTime,
    user_id String,
    event_type String,
    payload String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/regular/test_replicated_merge_tree', '{replica}')
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, id);

-- ๋ถ„์‚ฐ ํ…Œ์ด๋ธ”
CREATE TABLE regular_db.distributed_replicated_table ON CLUSTER default
(
    id UInt64,
    event_date Date,
    event_time DateTime,
    user_id String,
    event_type String,
    payload String
)
ENGINE = Distributed(default, regular_db, test_replicated_merge_tree, cityHash64(id));

-- ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€
INSERT INTO regular_db.distributed_replicated_table
SELECT
    number as id,
    toDate('2024-03-01') + (number % 30) as event_date,
    toDateTime(toDate('2024-03-01') + (number % 30) + (number % 24) / 24) as event_time,
    concat('user_', toString(1 + number % 1000)) as user_id,
    multiIf(
        number % 5 = 0, 'click',
        number % 5 = 1, 'view',
        number % 5 = 2, 'purchase',
        number % 5 = 3, 'login',
        'logout'
    ) as event_type,
    concat('{"data":"', repeat('X', 10 + number % 50), '","value":', toString(number % 100), '}') as payload
FROM numbers(1000000);

์œ„์™€ ๊ฐ™์ด ์•ฝ 1,000,000 row์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด๋„ select count(*) from regular_db.test_replicated_merge_tree ์‹œ์— ์•„๋ฌด ๋ฐ์ดํ„ฐ๊ฐ€ ์กฐํšŒ๋˜์ง€ ์•Š๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” dbeaver์—์„œ clickhouse cluster ์—ฐ๊ฒฐ ์‹œ ๊ฐ๊ฐ์˜ shard์— ์—ฐ๊ฒฐ๋˜๊ธฐ์— ์•„์ง ๋ณต์ œ๋˜์ง€ ์•Š์€ shard์— ๋ฐ์ดํ„ฐ๋ฅผ ์กฐํšŒํ•˜์—ฌ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

select *
from cluster('default', regular_db.test_replicated_merge_tree)

๋งŒ์•ฝ ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋ฐ์ดํ„ฐ ์กฐํšŒ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด table not found ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์ด๋Š” ์•„์ง clickhouse๊ฐ€ ๊ฐ shard์— ๋ณต์ œํ•˜์ง€ ๋ชปํ•˜์—ฌ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๋กœ ์‹œ๊ฐ„์ด ์ง€๋‚˜๋ฉด(clickhouse๊ฐ€ background copy๋ฅผ ์™„๋ฃŒํ•˜๋ฉด) ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์ œํ•˜๋ฉฐ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์กฐํšŒ๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

MergeTree Vs ReplicatedMergetree์˜ ์ฐจ์ด์ ?

ReplicatedMergetree ์—”์ง„์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ shard ๋‚ด replica์— ํ•ด๋‹น ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ๋‘ ๋ณต์ œ๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์กฐํšŒ ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์•Œ์•„๋ณด๋ฉด

 

Clickhouse-shard1-0๊ณผ Clickhouse-shard1-1์— ๋™์ผํ•œ ์ˆซ์ž์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ณต์ œ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ mergetree์˜ ๊ฒฝ์šฐ ๊ณ ์ •๋œ shard ๋‚ด์—๋งŒ ์กด์žฌํ•˜์—ฌ replicated ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

 

 

3. ํ…Œ์ด๋ธ” ์—”์ง„ ๋ณ„ ๋ฐ์ดํ„ฐ ์‚ฝ์ž…(Insert) ์ฒ˜๋ฆฌ ๋ฐฉ์‹์— ๋”ฐ๋ฅธ ์žฅ,๋‹จ์  ๋น„๊ต

ํ…Œ์ด๋ธ” ์—”์ง„ ์žฅ์  ๋‹จ์ 
Distributed + MergeTree 1. ์ˆ˜ํ‰ ํ™•์žฅ์„ฑ: ์ƒค๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์ž๋™์œผ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅ

2. ์ค‘์•™ ์ง‘์ค‘์‹ ์กฐํšŒ: ํด๋ผ์ด์–ธํŠธ๋Š” Distributed ํ…Œ์ด๋ธ”์„ ํ†ตํ•ด ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์กฐํšŒํ•  ์ˆ˜ ์žˆ์Œ

3. ํšจ์œจ์ ์ธ ์ƒค๋”ฉ: ์ƒค๋”ฉ ํ‚ค๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„๋ฐฐํ•˜์—ฌ ๋ถ€ํ•˜๋ฅผ ์ตœ์†Œํ™”
1. ๋ฐ์ดํ„ฐ ๋ณต์ œ ์—†์Œ: ์ƒค๋“œ ๊ฐ„ ๋ฐ์ดํ„ฐ๊ฐ€ ์ค‘๋ณต๋˜์ง€ ์•Š์•„ ํŠน์ • ์ƒค๋“œ ์žฅ์•  ์‹œ ๋ฐ์ดํ„ฐ ์†์‹ค ๊ฐ€๋Šฅ.

2.๋ณต๊ตฌ ์‹œ๊ฐ„ ์ฆ๊ฐ€: ์žฅ์•  ๋ฐœ์ƒ ์‹œ ์ˆ˜๋™ ๋ณต๊ตฌ๊ฐ€ ํ•„์š”ํ•จ.
ReplicatedMergeTree 1. ๊ณ ๊ฐ€์šฉ์„ฑ: ์žฅ์•  ๋ฐœ์ƒ ์‹œ ๋‹ค๋ฅธ ๋ ˆํ”Œ๋ฆฌ์นด์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์„œ๋น„์Šค ์ค‘๋‹จ ๋ฐฉ์ง€.

2. ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ ๋ณด์žฅ: ZooKeeper๋ฅผ ํ†ตํ•ด ๋ ˆํ”Œ๋ฆฌ์นด ๊ฐ„ ๋™๊ธฐํ™” ๊ด€๋ฆฌ.

3. ๋ณต๊ตฌ ์ž๋™ํ™”: ์žฅ์•  ๋…ธ๋“œ ๋ณต๊ตฌ ์‹œ ZooKeeper๊ฐ€ ์ž๋™์œผ๋กœ ์ƒํƒœ๋ฅผ ์กฐ์ •ํ•จ.
1. ์šด์˜ ๋น„์šฉ ์ฆ๊ฐ€: ๋ ˆํ”Œ๋ฆฌ์นด ๊ฐ„ ๋™๊ธฐํ™”๋กœ ์ธํ•ด ์ถ”๊ฐ€์ ์ธ ๋ฆฌ์†Œ์Šค ์†Œ๋ชจ ๋ฐœ์ƒ.

2. ๋ณต์žกํ•œ ์„ค์ • ์š”๊ตฌ: ZooKeeper ๋ฐ ๋ ˆํ”Œ๋ฆฌ์นด ๊ด€๋ฆฌ๊ฐ€ ํ•„์š”ํ•จ.

 

๊ฐ ์ฃผ์š” ์—”์ง„๋ณ„ ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 

Distributed + MergeTree: ์‹ค์‹œ๊ฐ„ ๋ถ„์„ ์ค‘์‹ฌ ํ™˜๊ฒฝ (๋Œ€๊ทœ๋ชจ ์ด๋ฒคํŠธ ์ฒ˜๋ฆฌ)

  • ๐Ÿš€ ๋†’์€ ์“ฐ๊ธฐ ์„ฑ๋Šฅ๊ณผ ํ™•์žฅ์„ฑ์ด ํ•„์š”ํ•˜๋ฉฐ, ์žฅ์•  ๋ฐœ์ƒ ์‹œ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ ์†์‹ค์ด ํ—ˆ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค

ReplicatedMergeTree: ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ์ด ์ค‘์š”ํ•œ ํ™˜๊ฒฝ (๊ธˆ์œต/์˜๋ฃŒ)

  • ๐Ÿš€ ์žฅ์•  ๋ฐœ์ƒ ์‹œ์—๋„ ๋ฐ์ดํ„ฐ ์†์‹ค ์—†์ด ์„œ๋น„์Šค๋ฅผ ์ง€์†ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค

 

 

๋ฐ˜์‘ํ˜•