ElasticSearch 7

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 2. Emoji ๊ฒ€์ƒ‰ ๋ฐ Aggregation(3ํŽธ)

Re-Index ๋ง‰์ƒ ์ด๋ชจํ‹ฐ์ฝ˜ ๊ฒ€์ƒ‰์„ ํ•ด๋ณด๋‹ˆ ๊ฐ ์›๋ฌธ์—์„œ ์ด๋ชจํ‹ฐ์ฝ˜์ด ์–ผ๋งˆ๋‚˜ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€, ์–ด๋–ค ์ด๋ชจํ‹ฐ์ฝ˜์ด ๊ฐ€์žฅ ๋งŽ์ด ์žˆ๋Š”์ง€ ๊ฒ€์ƒ‰ํ•ด๋ณด์ž ๊ทธ์ „์— ์‚ฌ์ „ ์ค€๋น„ ์ž‘์—…์œผ๋กœ text field๋กœ ๋“ค์–ด๊ฐ„ ๋ฐ์ดํ„ฐ์—์„œ ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”์ถœ(Es ๋‚ด๋ถ€์—์„œ๋Š” Term)ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ธ๋ฑ์Šค๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ  ์ „์ฒด ๋ฌธ์„œ์—์„œ ํ‚ค์›Œ๋“œ ๋นˆ๋„์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์•„๋ด…๋‹ˆ๋‹ค. ์ธ๋ฑ์Šค ๊ตฌ์„ฑ PUT /mbti_term { "settings": { "analysis": { "analyzer": { "nori_mixed": { "tokenizer": "nori_t_mixed", "filter": "shingle" }, "nori_pos_noun": { "type": "custom", "tokenizer": "nori_t_mixed", "..

Elastic Search 2022.04.24

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 2. Emoji ๊ฒ€์ƒ‰ ๋ฐ Aggregation(2ํŽธ)

๊ธฐ์กด ์ฝ˜ํ…์ธ ์—์„œ ์ด๋ชจํ‹ฐ์ฝ˜๋งŒ ํŒŒ์‹ฑ ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ RDB์— ์ˆ˜์ง‘ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (ES Analyzer์— regex filter๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์€ ๋‹ค์Œ์— ์ง„ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!) ์Šคํ‚ค๋งˆ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ Insert ํ•˜์˜€์Šต๋‹ˆ๋‹ค(RDB) T: t_emoji_dashboard Columns: emoji, mbti_type(MBTI ํƒ€์ž… ์ž…๋‹ˆ๋‹ค), emoji_count(๊ฐ ๋ฌธ์„œ๋ณ„ ๋“ฑ์žฅ ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค) SELECT emoji, mbti_type, sum(emoji_count) FROM t_emoji_dashboard WHERE emoji = '๐Ÿ˜˜' GROUP BY emoji, mbti_type ORDER BY mbti_type, sum DESC ์‚ฌ์šฉํ•œ ์ฟผ๋ฆฌ๋กœ ์กฐํšŒํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. (ํŠน์ • ์ด๋ชจํ‹ฐ์ฝ˜๋งŒ ์กฐํšŒํ•˜์˜€์Šต๋‹ˆ..

Elastic Search 2022.04.21

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 2. Emoji ๊ฒ€์ƒ‰ ๋ฐ Aggregation

MBTI ๋ณ„ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ ์ค‘์ž…๋‹ˆ๋‹ค. ์ˆ˜์ง‘๋œ ํ…์ŠคํŠธ๋“ค์„ ๋ณด๋‹ˆ ์ด๋ชจ์ง€๊ฐ€ ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ๊ฑธ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ###Q: Emoji๊ฐ€ ElasticSearch ์—์„œ ๊ฒ€์ƒ‰์ด ๋˜๋‚˜์š”? ###A: ๋„ค Emoji๋„ ํ…์ŠคํŠธ๋กœ ์ทจ๊ธ‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฒ€์ƒ‰์ด ์ž˜ ๋ฉ๋‹ˆ๋‹ค! ํ•˜์ง€๋งŒ ๋ชจ๋“  ์ด๋ชจ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋ฌธ์„œ ์ˆ˜๊ฐ€ ์–ผ๋งˆ๋‚˜ ์žˆ๋Š”์ง€ ํŒŒ์•…ํ•˜๊ธฐ๋Š” ์‰ฝ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ฏธ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ์—์„œ ์ด๋ชจ์ง€๋งŒ ํŒŒ์‹ฑ ํ•ด๋ด…์‹œ๋‹ค. ์ €๋Š” Python์˜ Regex๋ฅผ ์ด์šฉํ•ด์„œ ์ด๋ชจ์ง€๋ฅผ ์ถ”์ถœํ–ˆ์Šต๋‹ˆ๋‹ค. import pandas as pd import re # ... DB์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๋ถ€๋ถ„์€ ์ƒ๋žต # df๋Š” ์นผ๋Ÿผ์œผ๋กœ contents(์ˆ˜์ง‘๋œ ํ…์ŠคํŠธ), doc_url(ํ…์ŠคํŠธ์˜ url)์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ emoji_pattern = re..

Elastic Search 2022.04.13

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 1. ๊ฒ€์ƒ‰ Score ํŠœ๋‹

ํ˜„์žฌ ์—˜๋ผ์Šคํ‹ฑ์„œ์น˜๋ฅผ ์ด์šฉํ•ด ์ˆ˜์ง‘ํ•œ ๋ฐ์ดํ„ฐ(MBTI ํƒ€์ž…๋ณ„ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ)๋ฅผ ์กฐํšŒํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ MBTI ํƒ€์ž…๊ณผ ์Šค๋งˆํŠธํฐ(์•„์ดํฐ ๋˜๋Š” ๊ฐค๋Ÿญ์‹œ)์˜ ์ƒ๊ด€์„ฑ์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด ES์˜ ์ฟผ๋ฆฌ๋ฅผ ํŠœ๋‹ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ธ๋ฑ์Šค ๊ตฌ์„ฑ ์ฝ˜ํ…์ธ  ๋‚ด๋ถ€์—์„œ ๋ช…์‚ฌ๋งŒ ์ถ”์ถœํ•˜์—ฌ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด nori_noun์ด๋ผ๋Š” ๋ถ„์„๊ธฐ๋ฅผ ๋ณ„๋„๋กœ ์ƒ์„ฑํ•˜์—ฌ ํ•„๋“œ๋กœ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. { "mbti" : { "aliases" : { }, "mappings" : { "properties" : { "comment_cnt" : { "type" : "integer" }, "contents" : { "type" : "text", "fields" : { "full" : { "type" : "keyword" }, "nori_mixed" : { "t..

Elastic Search 2022.04.12

[Elastic Search] ๊ฒ€์ƒ‰ ๊ตฌํ˜„ํ•˜๊ธฐ(with Fast API)

ES๋กœ ๊ฒ€์ƒ‰์—”์ง„์„ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ๊ตฌํ˜„๋œ ๊ฒ€์ƒ‰ ์—”์ง„์„ ์‹ค์ œ ์„œ๋น„์Šค์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด REST API๋ฅผ ๊ตฌํ˜„ํ•ด๋ณด์ž. REST API์˜ ๋กœ์ง์€ ๋‹จ์ˆœํ•˜๊ฒŒ ๋ณธ๋‹ค๋ฉด 2๋‹จ๊ณ„์ด๋‹ค. 1. ์‚ฌ์šฉ์ž๊ฐ€ ๊ฒ€์ƒ‰ ํ‚ค์›Œ๋“œ๋ฅผ ์ž…๋ ฅํ•œ๋‹ค. 2. ๊ฒ€์ƒ‰ ํ‚ค์›Œ๋“œ์— ํ•ด๋‹นํ•˜๋Š” ๋ฌธ์„œ๋ฅผ ์ฐพ๋Š”๋‹ค. ์‚ฌ์šฉ์ž ์ž…๋ ฅ ๊ตฌํ˜„ ์‚ฌ์šฉ์ž ์ž…๋ ฅ ๊ตฌํ˜„์—์„œ ๊ณ ๋ คํ•  ์ ์€ ๋‹จ์ˆœํ•˜๊ฒŒ ํ•˜๋‚˜์˜ ํ‚ค์›Œ๋“œ๋งŒ ์ž…๋ ฅ๋ฐ›์•„์„œ ๋ฌธ์„œ๋ฅผ ์ƒ์„ธํ•˜๊ฒŒ ๊ฒ€์ƒ‰ํ•  ์ˆ˜๋Š” ์—†๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์œ ์‚ฌ์–ด, ์ œ์™ธ์–ด, ์—ฌ๋Ÿฌ ํ‚ค์›Œ๋“œ, And ์กฐ๊ฑด, Or ์กฐ๊ฑด ๋“ฑ ๋‹ค์–‘ํ•œ ์กฐ๊ฑด์œผ๋กœ ๊ฒ€์ƒ‰์ด ๊ฐ€๋Šฅํ•˜๋ฉด ์‚ฌ์šฉ์ž์—๊ฒŒ ๋” ์ข‹์€ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฒ€์ƒ‰ ํ‚ค์›Œ๋“œ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํŠน์ˆ˜ ์ปค๋งจ๋“œ๋ฅผ ์ •๋ฆฌํ•ด๋ณด์•˜๋‹ค. ๋‹จ์ผ ๊ฒ€์ƒ‰: search= ์˜ˆ) ๋งจํˆฌ๋งจ ์œ ์‚ฌ์–ด ๊ฒ€์ƒ‰: search= ์˜ˆ) ๋งจํˆฌ๋งจ +์•„๋””๋‹ค์Šค => ๋งจํˆฌ๋งจ์ด ํฌํ•จ๋œ ๋ฌธ์„œ์—์„œ..

Elastic Search 2022.02.15

[Elastic Search] Nori Tokenizer & Filter ์ ์šฉ๊ธฐ

์ด์ „ ๊ธ€์—์„œ Elastic Search์˜ ์ฟผ๋ฆฌ๋“ค์„ ๊ณต๋ถ€ํ•˜๋ฉด์„œ ์กฐ๊ธˆ ๋” ์ž์„ธํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ ์กฐํšŒ๋ฅผ ํ•ด๋ณด๊ณ  ์‹ถ์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ €์žฅ๋œ ํ…์ŠคํŠธ๋“ค์— ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰์„ ์ข€ ๋” ์ž์„ธํžˆ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์•„๋ณด์•˜๋‹ค. Elastic Search ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ Elastic Search 7.0 ์ดํ›„ ๋ฒ„์ „๋ถ€ํ„ฐ๋Š” Nori(๋…ธ๋ฆฌ)๋ผ๋Š” ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. (๊ณต์‹์ ์œผ๋กœ๋Š” 6.6 ๋ฒ„์ „ ์ดํ›„๋ถ€ํ„ฐ ์ œ๊ณต) Nori์˜ ์„ค์น˜๋Š” ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์ง„ํ–‰ํ•œ๋‹ค. https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori.html ํ˜„ ์ƒํ™ฉ ๊ธฐ์กด์—๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ธ๋ฑ์Šค & ๋ถ„์„๊ธฐ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. { "settings": { "inde..

Elastic Search 2022.02.09

[Elastic Search]Query ์Šคํ„ฐ๋””-2ํŽธ Term ์ฟผ๋ฆฌ, Multi-match ์ฟผ๋ฆฌ

์šฉ์–ด ์ฟผ๋ฆฌ ์šฉ์–ด ์ฟผ๋ฆฌ๋Š” ์™„๋ฒฝํ•˜๊ฒŒ ๋งค์นญ๋˜๋Š” ๋‹จ์–ด ํ˜น์€ ๋ฌธ๊ตฌ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค. match ์ฟผ๋ฆฌ์™€๋Š” ๋‹ค๋ฅด๊ฒŒ ๋ฌธ์žฅ์— ๋Œ€ํ•œ full-text-match ๊ฒ€์ƒ‰์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฌธ์žฅ ์ „์ฒด๋ฅผ ์•Œ๊ณ  ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๊ธฐ์กด review ์ธ๋ฑ์Šค์—์„œ review ๋ฐ์ดํ„ฐ๊ฐ€ text ํƒ€์ž…์œผ๋กœ ๋˜์–ด ์žˆ์–ด term ์ฟผ๋ฆฌ๊ฐ€ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ƒˆ๋กญ๊ฒŒ ์ธ๋ฑ์Šค๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. "mappings": { "properties": { "prd_id": { "type": "text" }, "review_id": { "type": "text" }, "review": { "type": "text", "fields": { "full": { "type": "keyword" }, // nori ํ•„ํ„ฐ๋ฅผ ์‹œ๋„ํ–ˆ์ง€๋งŒ ์‹คํŒจํ•˜์˜€๋‹ค.. ๋‹ค์Œ์— ๋‹ค์‹œ! "nori_..

Elastic Search 2022.01.29