trino(presto), minio docker 테스트 환경
업데이트:
trino(presto), minio docker 활용 테스트 환경
trino (hive connector) to minio
- trino 의 hive connector 를 활용해서 minio 데이터 분석 환경
- Hive connector : https://trino.io/docs/current/connector/hive.html
- 참고 : https://github.com/cdecl/trino-minio-docker
docker-compose (trino, minio)
version: '3'
services:
trino:
image: trinodb/trino
container_name: trino
restart: always
ports:
- "8080:8080"
volumes:
- ./etc:/etc/trino
minio:
image: minio/minio
restart: always
command: server /data --console-address ":9001"
container_name: minio
environment:
MINIO_ROOT_USER: minio
MINIO_ROOT_PASSWORD: minio1234
restart: always
ports:
- "9000:9000"
- "9001:9001"
volumes:
- ./minio-data:/data
minio (hive) properties
etc/catalog/hive.properties
- metastore 는 minio catalog 버킷에 저장
s3a://catalog/trino/
- minio 세팅 정보 :
access-key
,secret-key
,endpoint
connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=s3a://catalog/trino/
hive.recursive-directories=true
hive.non-managed-table-writes-enabled=true
hive.allow-drop-table=true
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.s3select-pushdown.enabled=true
hive.s3.aws-access-key=minio
hive.s3.aws-secret-key=minio1234
hive.s3.endpoint=http://minio:9000
docker-compose 실행 및 초기값 세팅
## 실행
$ docker-compose up -d
reating network "trino_default" with the default driver
Creating minio ... done
Creating trino ... done
rclone 으로 minio 버킷 생성
# rclone 설정
$ cat local.conf
[local]
type = s3
provider = Minio
env_auth = false
access_key_id = minio
secret_access_key = minio1234
endpoint = http://localhost:9000
# create catalog, data directory
$ rclone --config local.conf mkdir local:catalog
$ rclone --config local.conf mkdir local:data
# sample json (ndjson) data)
$ cat 1.json
{"id":1,"name":"Alice"}
{"id":2,"name":"Bob"}
{"id":3,"name":"Carol"}
# test data copy
$ rclone --config local.conf copy 1.json local:data/sample/
schema, table 생성
default
schema (db) 생성 및 external_location table 생성
-- hive 의 schema(db) 생성
create schema hive.default;
-- table 생성
create table hive.default.sample (
id varchar,
name varchar
)
with (
format = 'json',
external_location = 's3a://data/sample/'
);
-- select
select * from sample;
댓글남기기