RedPanda, Iceberg๋ฅผ Docker Compose๋ก ๊ตฌ์ฑํด์ ์ค์๊ฐ ํ์ดํ๋ผ์ธ ๊ธฐ์ด ๊ตฌ์ฑ
์ด๋ฒ ๊ธ์์๋ Redpanda์ Iceberg, Minio๋ฅผ ๊ตฌ์ฑํด์ ์ค์๊ฐ ๋ฐ์ดํฐ๋ ์ดํฌ ํ๊ฒฝ์ ๊ตฌ์ฑํฉ๋๋ค.
๐ ๋ชฉํ
- Redpanda ์ค์น
- Apache Iceberg ์ค์น
- Minio ์ค์น
- ์ ์ค์น๋ฅผ Docker compose๋ฅผ ์ฌ์ฉํ์ฌ ๊ตฌํ
โ๏ธ Iceberg๋ ๋ฌด์์ธ๊ฐ์?
Apache Iceberg๋ ๋๊ท๋ชจ ํ ์ด๋ธ์ ์ํ ์คํ์์ค ๋ฐ์ดํฐ ๋ ์ดํฌ ํฌ๋งท์ ๋๋ค. ๊ธฐ์กด Hive ๋ฉํ์คํ ์ด ๊ธฐ๋ฐ์ ๋๋ฆฌ๊ณ ๋นํจ์จ์ ์ธ ์ฟผ๋ฆฌ๋ฅผ ๊ทน๋ณตํ๊ณ ์ ์ค๊ณ๋์์ผ๋ฉฐ, Spark, Trino, Flink ๋ฑ ๋ค์ํ ๋ถ์ ๋๊ตฌ์ ์ฝ๊ฒ ํตํฉ๋ฉ๋๋ค.
โ Iceberg์ ํน์ง
- ACID ํธ๋์ญ์ ์ง์
- Schema Evolution (์คํค๋ง ๋ณ๊ฒฝ) ๊ฐ๋ฅ
- Partitioning ์ ๋ต์ด ๋ฐ์ด๋ ์ฟผ๋ฆฌ ์ฑ๋ฅ ํฅ์
- MinIO ๊ฐ์ S3 ํธํ ์คํ ๋ฆฌ์ง์๋ ์ฝ๊ฒ ์ฐ๋
๐ฆ Docker Compose๋ก ํ๊ฒฝ ๊ตฌ์ฑํ๊ธฐ
์์ด์ค๋ฒ๊ทธ์์ ๊ณต์์ผ๋ก ์ ๊ณตํ๋ Docker compose ํ์ผ ์์๋ ๊ณต์ ๋ฌธ์(https://iceberg.apache.org/spark-quickstart/#docker-compose)์์ ํ์ธํ ์ ์์ต๋๋ค. ํ์ง๋ง 2025.05 ๊ธฐ์ค ํด๋น iceberg ๋์ปค ์ด๋ฏธ์ง์๋ kafka์ ๊ด๋ จ๋ spark jar package๊ฐ ์กด์ฌํ์ง ์์ต๋๋ค. ์ด์ ์ถ๊ฐ ํจํค์ง๋ฅผ ์ค์นํ์ฌ kafka topic์ consumeํ๋๋ก ํฉ๋๋ค.
๐ ๋์ปค ์ด๋ฏธ์ง ๋ณ๊ฒฝ
์ฐ์ ๋์ปค ์ปดํฌ์ฆ๋ฅผ ์ ๊ณตํ๋ git ์ ์ฅ์๋ฅผ clone ํฉ๋๋ค.
- Git ์ ์ฅ์: https://github.com/databricks/docker-spark-iceberg.git
git clone https://github.com/databricks/docker-spark-iceberg.git
ํด๋น ์ ์ฅ์ clone ํ ์๋ ๊ฒฝ๋ก์ ํ์ผ์ ๋ค์๊ณผ ๊ฐ์ด ๋ณ๊ฒฝํฉ๋๋ค.
- docker-spark-iceberg > spark > Dockerfile ๋ด line 92์ ์ถ๊ฐ
- Spark 3.5.5 ๊ธฐ์ค
- iceberg ์คํ ์ค ์ค๋ฅ๊ฐ ๋ฐ์ํ๋ค๋ฉด kafka client, spark-sql-kafka-0-10 ๋ฑ ํจํค์ง ๋ฒ์ ์ถฉ๋๋ก ํจํค์ง ๋ฒ์ ์ ๋ง์ถฐ์ผ ํฉ๋๋ค.
# Download Kafka
RUN curl -s https://repo1.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.12/${SPARK_VERSION}/spark-sql-kafka-0-10_2.12-${SPARK_VERSION}.jar -Lo /opt/spark/jars/spark-sql-kafka-0-10_2.12-${SPARK_VERSION}.jar
RUN curl -s https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/3.4.1/kafka-clients-3.4.1.jar -Lo /opt/spark/jars/kafka-clients-3.4.1.jar
RUN curl -s https://repo1.maven.org/maven2/org/apache/commons/commons-pool2/2.11.1/commons-pool2-2.11.1.jar -Lo /opt/spark/jars/commons-pool2-2.11.1.jar
RUN curl -s https://repo1.maven.org/maven2/org/apache/spark/spark-token-provider-kafka-0-10_2.12/${SPARK_VERSION}/spark-token-provider-kafka-0-10_2.12-${SPARK_VERSION}.jar -Lo /opt/spark/jars/spark-token-provider-kafka-0-10_2.12-${SPARK_VERSION}.jar
๐ Spark, Kafka Maven ๋ง์ถค ๋ฒ์ ์ฐพ๊ธฐ
1. Spark-sql-kafka ํ์ด์ง์ ์ ์ํฉ๋๋ค.(https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10)
2. ํด๋นํ๋ Spark ๋ฒ์ ์ ์ ํํฉ๋๋ค. ์ ํ ์ Scala ๋ฒ์ ์ ํ์ธํฉ๋๋ค. ์ ๋ scala ๋ฒ์ ์ 2.12๋ก ๋ง์ท์ต๋๋ค.

3. Spark ๋ฒ์ ์ ํ ์ ํด๋น ๋ฒ์ ๊ณผ ๊ด๋ จ๋ ๋๋ ํธํ์ฑ์ด ๋ง๋ ํจํค์ง๋ฅผ ํ์ธํ ์ ์์ต๋๋ค.

4. ์๋ ๋ช ๋ น์ด๋ก customํ ๋์ปค ์ด๋ฏธ์ง๋ก ๋น๋ํฉ๋๋ค.
docker buildx build -t rt/iceberg --platform=linux/amd64,linux/arm64 .
๐ณ Docker compose ๊ตฌ์ฑ
ํธ์๋ฅผ ์ํด Delta-lake๋ ๋ฏธ๋ฆฌ ์ถ๊ฐํ์ต๋๋ค.
version: "3.8"
services:
spark-iceberg:
image: rt/iceberg
container_name: spark-iceberg
build: spark/
networks:
iceberg_net:
depends_on:
- rest
- minio
volumes:
- ./warehouse:/home/iceberg/warehouse
- ./notebooks:/home/iceberg/notebooks/notebooks
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
ports:
- 8888:8888
- 8080:8080
- 10000:10000
- 10001:10001
rest:
image: apache/iceberg-rest-fixture
container_name: iceberg-rest
networks:
iceberg_net:
ports:
- 8181:8181
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
- CATALOG_WAREHOUSE=s3://warehouse/
- CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
- CATALOG_S3_ENDPOINT=http://minio:9000
minio:
image: minio/minio
container_name: minio
environment:
- MINIO_ROOT_USER=admin
- MINIO_ROOT_PASSWORD=password
- MINIO_DOMAIN=minio
networks:
iceberg_net:
aliases:
- warehouse.minio
ports:
- 9001:9001
- 9000:9000
command: ["server", "/data", "--console-address", ":9001"]
mc:
depends_on:
- minio
image: minio/mc
container_name: mc
networks:
iceberg_net:
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
entrypoint: |
/bin/sh -c "
until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done;
/usr/bin/mc rm -r --force minio/warehouse;
/usr/bin/mc mb minio/warehouse;
/usr/bin/mc policy set public minio/warehouse;
tail -f /dev/null
"
delta-lake:
# ์ด์์ฒด์ ์ ๋ฐ๋ผ ์ด๋ฏธ์ง๋ฅผ ๋ณ๊ฒฝํ์ฌ ๋ฐฐํฌํด์ผ ํจ
image: deltaio/delta-docker:latest_arm64
container_name: delta_quickstart
volumes:
- rustbuild:/tmp
ports:
- "8088:8888"
# entrypoint: ["bash", "deltaio/delta-docker:latest_arm64"]
networks:
- iceberg_net
depends_on:
- minio
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
- HADOOP_CONF_DIR=/tmp/hadoop-conf
- HADOOP_OPTIONAL_TOOLS=hadoop-aws
- SPARK_CONF__spark.hadoop.fs.s3a.endpoint=http://minio:9000
- SPARK_CONF__spark.hadoop.fs.s3a.access.key=admin
- SPARK_CONF__spark.hadoop.fs.s3a.secret.key=password
- SPARK_CONF__spark.hadoop.fs.s3a.path.style.access=true
- SPARK_CONF__spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
redpanda-0:
command:
- redpanda
- start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
# Address the broker advertises to clients that connect to the Kafka API.
# Use the internal addresses to connect to the Redpanda brokers'
# from inside the same Docker network.
# Use the external addresses to connect to the Redpanda brokers'
# from outside the Docker network.
- --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092
- --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
# Address the broker advertises to clients that connect to the HTTP Proxy.
- --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082
- --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
# Redpanda brokers use the RPC API to communicate with each other internally.
- --rpc-addr redpanda-0:33145
- --advertise-rpc-addr redpanda-0:33145
# Mode dev-container uses well-known configuration properties for development in containers.
- --mode dev-container
# Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system.
- --smp 1
- --default-log-level=info
image: docker.redpanda.com/redpandadata/redpanda:v25.1.4
container_name: redpanda-0
volumes:
- redpanda-0:/var/lib/redpanda/data
networks:
- iceberg_net
ports:
- 18081:18081
- 18082:18082
- 19092:19092
- 19644:9644
console:
container_name: redpanda-console
image: docker.redpanda.com/redpandadata/console:v3.1.0
networks:
- iceberg_net
entrypoint: /bin/sh
command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console'
environment:
CONFIG_FILEPATH: /tmp/config.yml
CONSOLE_CONFIG_FILE: |
kafka:
brokers: ["redpanda-0:9092"]
schemaRegistry:
enabled: true
urls: ["http://redpanda-0:8081"]
redpanda:
adminApi:
enabled: true
urls: ["http://redpanda-0:9644"]
ports:
- 18080:8080
depends_on:
- redpanda-0
volumes:
rustbuild:
redpanda-0: null
networks:
iceberg_net:
ํด๋น ๋์ปค ์ปดํฌ์ฆ๋ฅผ ์คํ ์ ๊ฐ URL์์ ์๋น์ค๋ฅผ ํ์ธ ํ ์ ์์ต๋๋ค.

- Iceberg Jupyter notebook: http://localhost:8888/
- Redpanda Console(Kafka): http://localhost:18080/
- Minio: http://localhost:9001/ (ID: admin, PW: password)
โ ๋ค์ ๊ธ: Kafka - Iceberg ์ฐ๊ฒฐ๋ก ์ค์๊ฐ ๋ฐ์ดํฐ ์ ์ฌ
๋ค์ ๊ธ์์ Iceberg์ ํ ์ด๋ธ์ ์์ฑํ๊ณ Kafka ํ ํฝ์ ์์ฑ, ๋ฉ์์ง๋ฅผ ์์ฑํด์ ์ค์๊ฐ์ผ๋ก ์ ์ฌํ๋ ์ค์ต์ ์งํํ๊ฒ ์ต๋๋ค.