์—˜๋ผ์Šคํ‹ฑ์„œ์น˜ 5

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 3. API ๊ตฌ์ถ•ํ•˜๊ธฐ

MBTI ๊ฒ€์ƒ‰์—”์ง„ ๋ฐ์ดํ„ฐ๋ฅผ API ํ˜•ํƒœ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. API ๋ชฉ๋ก ์ „์ฒด ๋ฌธ์„œ์—์„œ ๊ฐ MBTI ํƒ€์ž…๋ณ„ ์ƒ์œ„ 100๊ฐœ ํ‚ค์›Œ๋“œ ์ถœ๋ ฅ - ๋ฌธ์„œ ์ˆ˜ ๊ธฐ์ค€ MBTI ์œ ํ˜• ์ค‘ E ๋˜๋Š” I ์œ ํ˜•์— ๋”ฐ๋ผ ์ƒ์œ„ 100๊ฐœ ํ‚ค์›Œ๋“œ ์ถœ๋ ฅ MBTI ์œ ํ˜• ์ค‘ E ๋˜๋Š” I ์œ ํ˜•์— ๋”ฐ๋ผ ๊ฒ€์ƒ‰์–ด๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ๊ฒ€์ƒ‰๋œ ์ƒ์œ„ 100๊ฐœ ํ‚ค์›Œ๋“œ ์ถœ๋ ฅ ๊ตฌํ˜„ํ•œ API๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ „์ฒด ๋ฌธ์„œ์—์„œ ๊ฐ MBTI ํƒ€์ž…๋ณ„ ์ƒ์œ„ 100๊ฐœ ํ‚ค์›Œ๋“œ๋ฅผ ์ถœ๋ ฅ @app.get('/top/keywords/{mbti_type}') def get_top_keywords(mbti_type: str, q:Optional[str]=None): es_query = { "size": 0, "query": {"match": {"keyword": mbti_type}}, "aggs..

Elastic Search 2022.04.29

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 2. Emoji ๊ฒ€์ƒ‰ ๋ฐ Aggregation(3ํŽธ)

Re-Index ๋ง‰์ƒ ์ด๋ชจํ‹ฐ์ฝ˜ ๊ฒ€์ƒ‰์„ ํ•ด๋ณด๋‹ˆ ๊ฐ ์›๋ฌธ์—์„œ ์ด๋ชจํ‹ฐ์ฝ˜์ด ์–ผ๋งˆ๋‚˜ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€, ์–ด๋–ค ์ด๋ชจํ‹ฐ์ฝ˜์ด ๊ฐ€์žฅ ๋งŽ์ด ์žˆ๋Š”์ง€ ๊ฒ€์ƒ‰ํ•ด๋ณด์ž ๊ทธ์ „์— ์‚ฌ์ „ ์ค€๋น„ ์ž‘์—…์œผ๋กœ text field๋กœ ๋“ค์–ด๊ฐ„ ๋ฐ์ดํ„ฐ์—์„œ ํ‚ค์›Œ๋“œ๋ฅผ ์ถ”์ถœ(Es ๋‚ด๋ถ€์—์„œ๋Š” Term)ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ธ๋ฑ์Šค๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ  ์ „์ฒด ๋ฌธ์„œ์—์„œ ํ‚ค์›Œ๋“œ ๋นˆ๋„์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์•„๋ด…๋‹ˆ๋‹ค. ์ธ๋ฑ์Šค ๊ตฌ์„ฑ PUT /mbti_term { "settings": { "analysis": { "analyzer": { "nori_mixed": { "tokenizer": "nori_t_mixed", "filter": "shingle" }, "nori_pos_noun": { "type": "custom", "tokenizer": "nori_t_mixed", "..

Elastic Search 2022.04.24

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 2. Emoji ๊ฒ€์ƒ‰ ๋ฐ Aggregation(2ํŽธ)

๊ธฐ์กด ์ฝ˜ํ…์ธ ์—์„œ ์ด๋ชจํ‹ฐ์ฝ˜๋งŒ ํŒŒ์‹ฑ ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ RDB์— ์ˆ˜์ง‘ํ•˜์˜€์Šต๋‹ˆ๋‹ค. (ES Analyzer์— regex filter๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์€ ๋‹ค์Œ์— ์ง„ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!) ์Šคํ‚ค๋งˆ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌ์„ฑํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ Insert ํ•˜์˜€์Šต๋‹ˆ๋‹ค(RDB) T: t_emoji_dashboard Columns: emoji, mbti_type(MBTI ํƒ€์ž… ์ž…๋‹ˆ๋‹ค), emoji_count(๊ฐ ๋ฌธ์„œ๋ณ„ ๋“ฑ์žฅ ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค) SELECT emoji, mbti_type, sum(emoji_count) FROM t_emoji_dashboard WHERE emoji = '๐Ÿ˜˜' GROUP BY emoji, mbti_type ORDER BY mbti_type, sum DESC ์‚ฌ์šฉํ•œ ์ฟผ๋ฆฌ๋กœ ์กฐํšŒํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. (ํŠน์ • ์ด๋ชจํ‹ฐ์ฝ˜๋งŒ ์กฐํšŒํ•˜์˜€์Šต๋‹ˆ..

Elastic Search 2022.04.21

[Elastic Search] MBTI ๊ฒ€์ƒ‰ ํ”„๋กœ์ ํŠธ - 2. Emoji ๊ฒ€์ƒ‰ ๋ฐ Aggregation

MBTI ๋ณ„ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ ์ค‘์ž…๋‹ˆ๋‹ค. ์ˆ˜์ง‘๋œ ํ…์ŠคํŠธ๋“ค์„ ๋ณด๋‹ˆ ์ด๋ชจ์ง€๊ฐ€ ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ๊ฑธ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ###Q: Emoji๊ฐ€ ElasticSearch ์—์„œ ๊ฒ€์ƒ‰์ด ๋˜๋‚˜์š”? ###A: ๋„ค Emoji๋„ ํ…์ŠคํŠธ๋กœ ์ทจ๊ธ‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฒ€์ƒ‰์ด ์ž˜ ๋ฉ๋‹ˆ๋‹ค! ํ•˜์ง€๋งŒ ๋ชจ๋“  ์ด๋ชจ์ง€๋ฅผ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋ฌธ์„œ ์ˆ˜๊ฐ€ ์–ผ๋งˆ๋‚˜ ์žˆ๋Š”์ง€ ํŒŒ์•…ํ•˜๊ธฐ๋Š” ์‰ฝ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ฏธ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ์—์„œ ์ด๋ชจ์ง€๋งŒ ํŒŒ์‹ฑ ํ•ด๋ด…์‹œ๋‹ค. ์ €๋Š” Python์˜ Regex๋ฅผ ์ด์šฉํ•ด์„œ ์ด๋ชจ์ง€๋ฅผ ์ถ”์ถœํ–ˆ์Šต๋‹ˆ๋‹ค. import pandas as pd import re # ... DB์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๋ถ€๋ถ„์€ ์ƒ๋žต # df๋Š” ์นผ๋Ÿผ์œผ๋กœ contents(์ˆ˜์ง‘๋œ ํ…์ŠคํŠธ), doc_url(ํ…์ŠคํŠธ์˜ url)์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ emoji_pattern = re..

Elastic Search 2022.04.13

[Elastic Search] Nori Tokenizer & Filter ์ ์šฉ๊ธฐ

์ด์ „ ๊ธ€์—์„œ Elastic Search์˜ ์ฟผ๋ฆฌ๋“ค์„ ๊ณต๋ถ€ํ•˜๋ฉด์„œ ์กฐ๊ธˆ ๋” ์ž์„ธํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ ์กฐํšŒ๋ฅผ ํ•ด๋ณด๊ณ  ์‹ถ์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ €์žฅ๋œ ํ…์ŠคํŠธ๋“ค์— ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰์„ ์ข€ ๋” ์ž์„ธํžˆ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์•„๋ณด์•˜๋‹ค. Elastic Search ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ Elastic Search 7.0 ์ดํ›„ ๋ฒ„์ „๋ถ€ํ„ฐ๋Š” Nori(๋…ธ๋ฆฌ)๋ผ๋Š” ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. (๊ณต์‹์ ์œผ๋กœ๋Š” 6.6 ๋ฒ„์ „ ์ดํ›„๋ถ€ํ„ฐ ์ œ๊ณต) Nori์˜ ์„ค์น˜๋Š” ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์ง„ํ–‰ํ•œ๋‹ค. https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori.html ํ˜„ ์ƒํ™ฉ ๊ธฐ์กด์—๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ธ๋ฑ์Šค & ๋ถ„์„๊ธฐ๋ฅผ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. { "settings": { "inde..

Elastic Search 2022.02.09