๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง

๋ฐ์ดํ„ฐ ๋ถ„์„ ํ™˜๊ฒฝ ๊ตฌ์ถ• - 05. Clickhouse ์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ๋ถ„์„ ํ…Œ์ด๋ธ” ์ƒ์„ฑํ•˜๊ธฐ

Tempo 2025. 3. 30. 08:00

๐Ÿš€ ์ด ๊ธ€์—์„œ ๋‹ค๋ฃฐ ๋‚ด์šฉ

1๏ธโƒฃ Kafka์™€ ์—ฐ๊ฒฐ๋œ ClickHouse ํ…Œ์ด๋ธ” ์ƒ์„ฑ (ํ™˜๊ฒฝ์„ค์ • ํฌํ•จ)
2๏ธโƒฃ ์‹ค์Šต์šฉ ๋ฐ์ดํ„ฐ์…‹ kafka produce
3๏ธโƒฃ Kafka ํ…Œ์ด๋ธ”์—์„œ ๋ฐ์ดํ„ฐ ์…‹ ํ™•์ธ ๋ฐฉ๋ฒ•

1. Kafka์™€ ์—ฐ๊ฒฐ๋œ ClickHouse ํ…Œ์ด๋ธ” ์ƒ์„ฑ

๐Ÿ”น Kafka ๋ฐ์ดํ„ฐ๋ฅผ ClickHouse์—์„œ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹

ClickHouse๋Š” Kafka์™€ ์ง์ ‘ ์—ฐ๊ฒฐํ•˜์—ฌ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋Š” Kafka ์—”์ง„์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ณธ์ ์œผ๋กœ Kafka → Buffer ํ…Œ์ด๋ธ” → MergeTree ํ…Œ์ด๋ธ”์˜ ๊ตฌ์กฐ๋กœ ์šด์˜๋ฉ๋‹ˆ๋‹ค.

๐Ÿ”น ClickHouse ์„ค์ • ๋ณ€๊ฒฝ (Kafka ์‚ฌ์šฉ์„ ์œ„ํ•œ ์„ค์ •)

๋จผ์ €, clickhouse-server์˜ ํ™˜๊ฒฝ์„ค์ •์„ ์ˆ˜์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
ClickHouse ์„ค์ • ํŒŒ์ผ (config.xml)์—์„œ Kafka๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „์— ๊ฒŒ์‹œํ–ˆ๋˜ ํฌ์ŠคํŠธ(https://jongwho.tistory.com/21 )์—์„œ ์ƒ์„ฑํ•œ values.yaml์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.

[๋ฐ์ดํ„ฐ ๋ถ„์„ ํ™˜๊ฒฝ ๊ตฌ์ถ• - 02. Clickhouse ์„ค์น˜ ๋ฐ ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์„ฑ

์ด๋ฒˆ ๊ธ€์—์„œ๋Š” Kubernetes ํ™˜๊ฒฝ์—์„œ ClickHouse๋ฅผ ๋ฐฐํฌ ๋ฐ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.Docker Desktop์˜ ๋‚ด์žฅ Kubernetes๋ฅผ ํ™œ์šฉํ•˜๊ณ , Helm ๋ฐ Bitnami Helm ์ฐจํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ„ํŽธํ•˜๊ฒŒ ClickHouse๋ฅผ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.1. ์‚ฌ

jongwho.tistory.com](https://jongwho.tistory.com/21)

๐Ÿ“Œ 1) values.yaml ์ˆ˜์ •

  • line 273: defaultConfigurationOverrides ๋‚ด xml ํ˜•์‹์˜ ์ฝ”๋“œ ์ถ”๊ฐ€
<!-- config.xml -->
<clickhouse>
    <!-- Kafka -->
    <kafka>
      <debug>all</debug>
      <security_protocol>SASL_SSL</security_protocol>
      <sasl_mechanism>SCRAM-SHA-512</sasl_mechanism>
      <sasl_username>(์œ ์ € ์ด๋ฆ„ ์ž…๋ ฅ)</sasl_username>
      <sasl_password>(password ์ž…๋ ฅ)</sasl_password>

      <kafka_topic>
          <name>(kafka ํ† ํ”ฝ ์ž…๋ ฅ)</name>
          <statistics_interval_ms>4000</statistics_interval_ms>
      </kafka_topic>

      <!-- Settings for consumer -->
      <consumer>
          <auto_offset_reset>smallest</auto_offset_reset>
          <kafka_topic>
              <name>(kafka ํ† ํ”ฝ ์ž…๋ ฅ)</name>
              <fetch_min_bytes>100000</fetch_min_bytes>
          </kafka_topic>
      </consumer>
    </kafka>
    ...(์ดํ•˜ ์ƒ๋žต)
</clickhouse>

๐Ÿ“Œ 2) ClickHouse ์žฌ ๋ฐฐํฌ

helm upgrade clickhouse oci://registry-1.docker.io/bitnamicharts/clickhouse -f ./clickhouse/values.yaml -n clickhouse
# or
helm upgrade clickhouse bitnami/clickhouse -f ./clickhouse/values.yaml -n clickhouse

์ดํ›„ k8s ๋‚ด clickhouse ์„œ๋น„์Šค๋ฅผ port-forwardํ•˜์—ฌ ์ ‘์†ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

  • ์•„๋ž˜ ๋ช…๋ น์–ด๋กœ localhost:8123 ์œผ๋กœ clickhouse์— ์ ‘๊ทผ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
kubectl port-forward svc/clickhouse 8123:8123 -n clickhouse
๋ฐ˜์‘ํ˜•

 

2. ClickHouse์—์„œ Kafka ํ…Œ์ด๋ธ” ์ƒ์„ฑํ•˜๊ธฐ

Kafka์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€ ClickHouse์— ์ €์žฅํ•˜๋Š” ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ”น 1) Kafka ์—”์ง„ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

๋จผ์ €, Kafka ํ† ํ”ฝ์„ ๊ตฌ๋…ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐ ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

CREATE DATABASE IF NOT EXISTS streaming_db;

CREATE TABLE streaming_db.kafka_events (
    event_time DateTime,
    user_id UInt32,
    event_type String
) ENGINE = Kafka
-- kafka_broker_list์— ์ฝค๋งˆ(,)๋กœ broker ๋ชฉ๋ก์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
SETTINGS kafka_broker_list = 'kafka-broker-headless.kafka.svc.cluster.local:9092',
         kafka_topic_list = 'events',
         kafka_group_name = 'clickhouse_consumer',
         kafka_format = 'JSONEachRow';

๐Ÿ‘‰ ์„ค๋ช…

  • ENGINE = Kafka: Kafka์—์„œ ์ง์ ‘ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.
  • kafka_broker_list: Kafka ๋ธŒ๋กœ์ปค ์ฃผ์†Œ
  • kafka_topic_list: ๊ตฌ๋…ํ•  Kafka ํ† ํ”ฝ
  • kafka_group_name: Kafka Consumer Group
  • kafka_format = 'JSONEachRow': JSON ๋ฐ์ดํ„ฐ๋ฅผ Row ๋‹จ์œ„๋กœ ๋ณ€ํ™˜

๐Ÿ”น 2) Materialized View ๋ฐ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

Buffer ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜์—ฌ clickhouse ์„œ๋ฒ„ ์„ฑ๋Šฅ์— ์ €ํ•˜ ์—†์ด ์‚ฌ์šฉํ•˜๋„๋ก ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์Šต ์ค‘ ๋น ๋ฅธ ๋ฐ์ดํ„ฐ ํ™•์ธ์„ ์œ„ํ•ด Materialized View๋ฅผ ์ƒ์„ฑํ•˜์—ฌ kafka - clickhouse ์—ฐ๊ฒฐ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“Œ Materiazlied View ๋ฐ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

-- events ํ…Œ์ด๋ธ” ์ƒ์„ฑ
CREATE TABLE streaming_db.events (
    event_time DateTime,
    user_id UInt32,
    event_type String
) ENGINE = MergeTree()
ORDER BY (event_time, user_id);

-- Materialized View ์ƒ์„ฑ
CREATE MATERIALIZED VIEW streaming_db.events_mv TO streaming_db.events AS
SELECT * 
FROM streaming_db.kafka_events
;

ํ•ด๋‹น ํ…Œ์ด๋ธ” ์ƒ์„ฑ์œผ๋กœ Redpanda Console(Kafka ui)์—์„œ Consumer group์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ”น ๊ธฐํƒ€) Buffer ํ…Œ์ด๋ธ”

๐Ÿ“Œ Buffer ํ…Œ์ด๋ธ”์ด ํ•„์š”ํ•œ ์ด์œ ?

Kafka์—์„œ ClickHouse๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”๋กœ ์ €์žฅํ•˜๋ฉด MergeTree์— ๋Œ€ํ•œ ๋นˆ๋ฒˆํ•œ INSERT ์ž‘์—…์ด ๋ฐœ์ƒํ•˜์—ฌ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๋ ค๋ฉด Buffer ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ RAM์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•œ ํ›„, ์ผ์ • ์กฐ๊ฑด์ด ์ถฉ์กฑ๋˜๋ฉด MergeTree๋กœ ๋ฐฐ์น˜ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๐Ÿ“Œ Buffer ํ…Œ์ด๋ธ” ๊ธฐ๋ณธ ๊ฐœ๋…

  • ๋ฐ์ดํ„ฐ๋ฅผ RAM์— ์ž„์‹œ ์ €์žฅ ํ›„ ์กฐ๊ฑด์ด ์ถฉ์กฑ๋˜๋ฉด ์ž๋™ ํ”Œ๋Ÿฌ์‹œ (Flush) ์ˆ˜ํ–‰
  • INSERT ์†๋„ ํ–ฅ์ƒ ๋ฐ MergeTree ํ…Œ์ด๋ธ”์˜ ๋””์Šคํฌ ์“ฐ๊ธฐ ๋ถ€ํ•˜ ๊ฐ์†Œ
  • num_layers ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ‘๋ ฌ ๋ฒ„ํผ๋ง ๊ฐ€๋Šฅ
  • ํŠน์ • ์‹œ๊ฐ„(min_time), ํ–‰ ์ˆ˜(min_rows), ํฌ๊ธฐ(min_bytes) ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด ์ž๋™ ์ €์žฅ

๐Ÿ“Œ Buffer ํ…Œ์ด๋ธ” ์ƒ์„ฑ (RAM ์บ์‹ฑ ํ™œ์šฉ)

CREATE TABLE streaming_db.buffer_events (
    event_time DateTime,
    user_id UInt32,
    event_type String
) ENGINE = Buffer(streaming_db, events, 1, 10, 100, 10000, 100000, 1000000, 10000000);

โœ… ์„ค๋ช…

  • Buffer(database, ๋Œ€์ƒ(flush) ํ…Œ์ด๋ธ”, 1, 10, 100, 10000, 100000, 1000000, 10000000)
    • 1: ๋‹จ์ผ ๋ฒ„ํผ ์‚ฌ์šฉ (num_layers)
    • 10: ์ตœ์†Œ 10์ดˆ ํ›„ ์ž๋™ ์ €์žฅ (min_time)
    • 100: ์ตœ์†Œ 100๊ฐœ ํ–‰์ด ์Œ“์ด๋ฉด ์ €์žฅ (min_rows)
    • 10000: ์ตœ์†Œ 10KB ๋ฐ์ดํ„ฐ๊ฐ€ ์Œ“์ด๋ฉด ์ €์žฅ (min_bytes)
    • 10000000: ์ตœ๋Œ€ 10MB ๋ฐ์ดํ„ฐ๊นŒ์ง€ ์ €์žฅ ๊ฐ€๋Šฅ (max_bytes)

โœ… ๋ฐ์ดํ„ฐ ํ๋ฆ„

Kafka → Buffer ํ…Œ์ด๋ธ” (RAM) → MergeTree ํ…Œ์ด๋ธ” (๋””์Šคํฌ ์ €์žฅ)

๐Ÿš€ ๊ฒฐ๊ณผ:

  • Kafka์—์„œ ๋Œ€๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•  ๋•Œ ๋น ๋ฅธ INSERT ์†๋„ ๋ณด์žฅ
  • MergeTree ํ…Œ์ด๋ธ”๋กœ ์ €์žฅํ•  ๋•Œ ์ผ๊ด„ ์ฒ˜๋ฆฌ(Batch Insert)๋กœ ์„ฑ๋Šฅ ์ตœ์ ํ™”

3. ์‹ค์Šต์šฉ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ(Kafka)

kafka ๋ฐ์ดํ„ฐ ์ƒ์„ฑ(in redpanda console)

๐Ÿ“Œ ์ „์ฒด ์กฐํšŒ SQL

select event_time, user_id, event_type
from cluster('default', streaming_db.events_mv)
limit 10;

 

  • ๋ฐ์ดํ„ฐ ์˜ˆ์‹œ
{"event_time": "2025-03-20 10:00:00", "user_id": 101, "event_type": "page_view"}
{"event_time": "2025-03-20 10:01:00", "user_id": 102, "event_type": "click"}
{"event_time": "2025-03-20 10:02:00", "user_id": 103, "event_type": "purchase"}
{"event_time": "2025-03-20 10:03:00", "user_id": 104, "event_type": "page_view"}
{"event_time": "2025-03-20 10:04:00", "user_id": 105, "event_type": "click"}
{"event_time": "2025-03-20 10:05:00", "user_id": 106, "event_type": "purchase"}
{"event_time": "2025-03-20 10:06:00", "user_id": 107, "event_type": "page_view"}
{"event_time": "2025-03-20 10:07:00", "user_id": 108, "event_type": "click"}
{"event_time": "2025-03-20 10:08:00", "user_id": 109, "event_type": "purchase"}
{"event_time": "2025-03-20 10:09:00", "user_id": 110, "event_type": "page_view"}
{"event_time": "2025-03-20 10:10:00", "user_id": 111, "event_type": "click"}
{"event_time": "2025-03-20 10:11:00", "user_id": 112, "event_type": "purchase"}
{"event_time": "2025-03-20 10:12:00", "user_id": 113, "event_type": "page_view"}
{"event_time": "2025-03-20 10:13:00", "user_id": 114, "event_type": "click"}
{"event_time": "2025-03-20 10:14:00", "user_id": 115, "event_type": "purchase"}
{"event_time": "2025-03-20 10:15:00", "user_id": 116, "event_type": "page_view"}
{"event_time": "2025-03-20 10:16:00", "user_id": 117, "event_type": "click"}
{"event_time": "2025-03-20 10:17:00", "user_id": 118, "event_type": "purchase"}
{"event_time": "2025-03-20 10:18:00", "user_id": 119, "event_type": "page_view"}
{"event_time": "2025-03-20 10:19:00", "user_id": 120, "event_type": "click"}
  • ์กฐํšŒ ์˜ˆ์‹œ

 

๋ฐ˜์‘ํ˜•