ESλ‘ κ²μμμ§μ ꡬννμλ€. ꡬνλ κ²μ μμ§μ μ€μ μλΉμ€μ²λΌ μ¬μ©νκΈ° μν΄ REST APIλ₯Ό ꡬνν΄λ³΄μ.
REST APIμ λ‘μ§μ λ¨μνκ² λ³Έλ€λ©΄ 2λ¨κ³μ΄λ€.
1. μ¬μ©μκ° κ²μ ν€μλλ₯Ό μ λ ₯νλ€. 2. κ²μ ν€μλμ ν΄λΉνλ λ¬Έμλ₯Ό μ°Ύλλ€.
μ¬μ©μ μ λ ₯ ꡬν
μ¬μ©μ μ
λ ₯ ꡬνμμ κ³ λ €ν μ μ λ¨μνκ² νλμ ν€μλλ§ μ
λ ₯λ°μμ λ¬Έμλ₯Ό μμΈνκ² κ²μν μλ μλ€λ κ²μ΄λ€.
μ μ¬μ΄, μ μΈμ΄, μ¬λ¬ ν€μλ, And 쑰건, Or 쑰건 λ± λ€μν 쑰건μΌλ‘ κ²μμ΄ κ°λ₯νλ©΄ μ¬μ©μμκ² λ μ’μ κ²μ μμ€ν
μ΄ λ μ μλ€.
κ·Έλμ κ²μ ν€μλμμ μ¬μ©ν μ μλλ‘ νΉμ 컀맨λλ₯Ό μ 리ν΄λ³΄μλ€.
λ¨μΌ κ²μ: search=<ν€μλ> μ) 맨ν¬λ§¨
μ μ¬μ΄ κ²μ: search=<Keyword-A + keyword-B> μ) 맨ν¬λ§¨ +μλλ€μ€
=> 맨ν¬λ§¨μ΄ ν¬ν¨λ λ¬Έμμμ μλλ€μ€κ° ν¬ν¨λ λ¬Έμ
μ μΈμ΄ κ²μ: search=<Keyword-A - keyword-B> μ) 맨ν¬λ§¨ -μλλ€μ€
=> 맨ν¬λ§¨μ΄ ν¬ν¨λ λ¬Έμμμ μλλ€μ€λΌλ λ¨μ΄κ° μλ€λ©΄ μ μΈ
μ¬λ¬ ν€μλ κ²μ: search=<keyword-A, keyword-B> μ) 맨ν¬λ§¨, μλλ€μ€
=> 맨ν¬λ§¨μ΄ ν¬ν¨λ λ¬Έμμμ μλλ€μ€λ¬λ λ¨μ΄κ° ν¬ν¨λ λ¬Έμ(νμ§λ§ κ°μ λ μλ)
And 쑰건 κ²μ: search=<Keyword-A & keyword-B> μ) 맨ν¬λ§¨ & μλλ€μ€
=> 맨ν¬λ§¨κ³Ό μλλ€μ€κ° 무쑰건 λ€μ΄κ° λ¬Έμ
OR 쑰건 κ²μ: search=<Keyword-A | Keyword-B> μ) λμ΄ν€ | μλλ€μ€
=> 맨ν¬λ§¨ λλ μλλ€μ€κ° ν¬ν¨λ λ¬Έμ
μ±λ³ κ²μκ²°κ³Ό λΆμ€ν
: search=<keyword-A> (μ¬μ±: ^^, λ¨μ±:^~)
=> ν€μλ λ€ ν΄λΉ νΉμλ¬Έμλ₯Ό λΆμ¬ μ¬μ± λλ λ¨μ±μ΄ μμ±ν κ² κ°μ κ²°κ³Όλ₯Ό λΆμ€ν
λͺ
μ¬λ‘λ§ κ²μ: search=<keyword-A>^& μ) μμ¨^&
=> μ
λ ₯ ν€μλλ₯Ό λͺ
μ¬λ‘ λΆλ₯, ν΄λΉ λͺ
μ¬κ° ν¬ν¨λ λ¬Έμλ§ κ°μ Έμ΄
ν΄λΉ κ²μ ν€μλλ₯Ό λ°μ μ μλλ‘ APIλ₯Ό ꡬνν΄λ³΄λ©΄ μλμ κ°μ μ½λκ° λλ€. κ°λ¨ν API ꡬνμ μν΄ Pythonμ Fast APIλ₯Ό μ¬μ©νμλ€.
from typing import Optional
from fastapi import FastAPI
from create_es_query import CreateEsIndex
import uvicorn
import json
import elasticsearch
app = FastAPI()
# Elastic Search config
f = open('config.json')
config = json.loads(f.read())
es = elasticsearch.Elasticsearch([f"http://{config['host_url']}:9200"], http_auth=(config['username'], config['password']))
# νμ΄μ§ κΈ°λ³Έ μ μμ μν API
@app.get("/")
def read_root():
return {"Hello": "World"}
# κ²μ ν€μλλ₯Ό μ
λ ₯λ°κΈ° μν API
@app.get("/search/keyword")
def search_result(search: str = None):
qry = CreateEsIndex()
search_key = search
query = qry.create_complex_query(search_key)
t = es.search(index="review_pos", body=query)
doc_list = t.body['hits']['hits']
hit_docs = [d['_source'] for d in doc_list]
return {"result": hit_docs}
if __name__ == '__main__':
uvicorn.run(app, host="localhost", port=8000)
μ μ½λμμ search_result
ν¨μλ‘ κ²μμ΄λ₯Ό λ°λ APIλ₯Ό ꡬννμλ€. μ€μ λ‘ ν΄λΉ APIλ‘ μμ²νλ €λ©΄http://localhost:8000/search/keyword?search=<keyword>
λ‘ μμ²νλ©΄ λλ€.
μ λ ₯ ν€μλ μ μ²λ¦¬ λ‘μ§ κ΅¬ν
μ¬μ©μλ‘ λΆν° μ λ ₯λ°μ ν€μλμ μ μ¬μ΄κ° ν¬ν¨λμλμ§, μ μΈμ΄κ° ν¬ν¨λμλμ§ μ»΄ν¨ν°λ μμ§ λͺ»νλ€. μ΄λ₯Ό μν΄ λ³λμ λͺ¨λμ νλ λ§λ€μ΄ μ λ ₯λ°μ ν€μλλ₯Ό ꡬλΆν μ μλλ‘ νμλ€.
class CreateEsIndex:
def __init__(self):
self.main_keyword = []
self.synonym, self.exclude, self.multi_key, self.and_cond, \
self.or_cond, self.woman, self.man, self.noun = [], [], [], [], [], [], [], []
self.condition_map = {
'-': self.exclude, '+': self.synonym, '|': self.or_cond, '&': self.and_cond,
',': self.multi_key, '^^': self.woman, '^~': self.man, '^&': self.noun
}
return
@staticmethod
def text_clear(keyword: str):
rep_list = ['-', '+', '|', '&', ',', '^^', '^~', '^&']
for r in rep_list:
keyword = keyword.replace(r, '')
return keyword
def process_word(self, search_word: str):
word_list = search_word.split(' ')
for i in range(0, len(word_list)):
w = word_list[i]
if i == 0:
self.main_keyword.append({"match": {"review": self.text_clear(w)}})
[self.condition_map[s].append(self.text_clear(w)) if s in w else None for s in self.condition_map.keys()]
def create_complex_query(self, search_word: str):
self.process_word(search_word)
q = {
"query": {
"bool": {
"must": self.main_keyword,
"must_not": [],
"should": []
}
}
}
bol = q['query']['bool']
for k, v in self.condition_map.items():
if not v:
continue
words = [{"match": {"review": w}} for w in v]
if k == "+":
bol['should'].extend(words)
if k == "-":
bol['must_not'].extend(words)
if k == ",":
bol['should'].extend(words)
if k == "&":
bol['must'].extend(words)
if k == "|":
bol['should'].extend(words)
if k == "^^":
boost = {"rank_feature": {"field": "genders.female", "boost": 4}}
bol['should'].append(boost)
if k == "^~":
boost = {"rank_feature": {"field": "genders.male", "boost": 4}}
bol['should'].append(boost)
if k == "^&":
words = [{"match": {"review.nori_noun": w}} for w in v]
bol['should'].extend(words)
return q
μ μ½λμμ CreateEsIndex
ν΄λμ€μ μ΄κΈ° λ³μλ‘ κ²μ 쑰건μ ν΄λΉνλ 리μ€νΈλ₯Ό μμ±νμλ€.
μ¬κΈ°μ μ μ ν λ¨μ΄λ₯Ό μΆκ°νκΈ° μν΄ process_word
ν¨μλ‘ μ¬μ©μλ‘λΆν° μ
λ ₯λ°μ ν€μλλ₯Ό λΆλ₯νλ€. μ΄λ₯Ό ν΅ν΄ ESμ μ‘°νν μ μλ 쿼리λ₯Ό μμ±νμλ€.
μμ
ν€μλ: 맨ν¬λ§¨
μ λ ₯ Query
{'query': {'bool': {'must': [{'match': {'review': '맨ν¬λ§¨'}}], 'must_not': [], 'should': []}}}
κ²°κ³Ό
{
"result": [
{
"gender": {
"female": 0.385272741317749,
"male": 0.614727258682251
},
"prd_id": 957878,
"review": "μλλ€μ€ 맨ν¬λ§¨ νκΈ°μ§μ§ μ€λ²ν μ λλ‘μΈ λ§¨ν¬λ§¨ μΈμ 맨ν¬λ§¨!!!!!",
"review_id": 7325803
},
{
"gender": {
"female": 0.5234223008155823,
"male": 0.4765776991844177
},
"prd_id": 897632,
"review": "맨ν¬λ§¨ λμ΄μ§ κ³ λ―Όνμ λ€λ©΄ μ΄ λ§¨ν¬λ§¨ μΆμ²λλ €μ",
"review_id": 4934238
},
...
]
}
ν€μλ: 맨ν¬λ§¨ -μλλ€μ€
μ λ ₯ Query
{'query': {'bool': {'must': [{'match': {'review': '맨ν¬λ§¨'}}], 'must_not': [{'match': {'review': 'μλλ€μ€'}}], 'should': []}}}
κ²°κ³Ό
{
"result": [
{
"gender": {
"female": 0.5234223008155823,
"male": 0.4765776991844177
},
"prd_id": 897632,
"review": "맨ν¬λ§¨ λμ΄μ§ κ³ λ―Όνμ λ€λ©΄ μ΄ λ§¨ν¬λ§¨ μΆμ²λλ €μ",
"review_id": 4934238
},
{
"gender": {
"female": 0.6838095188140869,
"male": 0.3161904811859131
},
"prd_id": 1163605,
"review": "λΈλ¬μνΈ ν΄λ½ 맨ν¬λ§¨ λΈλμ
λλ€λΈλ¬μνΈ ν΄λ½ 맨ν¬λ§¨ λΈλμ
λλ€",
"review_id": 7883291
},...
]
}