Ssul's Blog

[ChatGPT] openai 임베딩 사용해서 RAG구현(생코딩,csv파일) 본문

AI & ML/사용하기

[ChatGPT] openai 임베딩 사용해서 RAG구현(생코딩,csv파일)

Ssul 2024. 2. 14. 17:00

0. RAG란?

Retrieval-Augmented Generation의 약자로 한글로 표현하면, 검색증강생성?

뭔가 알듯 모르는 단어 느낌?

간단하게 말하면, ai가 응답을 하기 전에, 질문받은 내용과 유사한 자료(텍스트 또는 문장) 찾아서,

찾은 문서와 질문을 함께 AI에게 보내는 것이다.


gpt한테 최철수씨 나이를 물으면 다음과 같은 대답을 한다.

 

하지만, 최철수씨에 대한 정보를 알려주고, 나이를 물으면 잘 알려준다.

 

"최철수는 30살이고, 서울에 살아"가 바로 RAG를 통해서 가져오는 정보이며, 이 정보를 질문과 함께 보내는 것이 RAG의 간단한 맥락으로 이해하면 되겠다.

 

 

1. 어떻게 질문과 유사한 정보(자료)를 가져올까?(임베딩)

기존의 gpt는 프롬프트 튜닝만 해서 사용하는 구조이다. 적절한 페르소나나 행동 강령을 시스템 메세지로 알려주고 챗봇으로 사용하는 것이다.

상담사처럼 이야기하라는 시스템 메세지를 셋팅 후 사용하는 단순한 구조

 

 

RAG는 이 구조에서, 응답을 바로 하는 것이 아니라, 적절한 자료를 참고해서 응답을 하는 형태이다.

질문에 대해서 바로 답하는 것이 아니라, 참고해서 응답하는 구조

 

1번 질문에 대해서 바로 응답을 하는 것이 아니라, 기존의 자료를 검색하여 "최철수는 30살이고, 서울에 살아" 를 얻어오고, 이 내용을 바탕으로 1번 질문에 대해서 응답을 하는 형태.

위 그림이 RAG를 이해하는 간단한 구조.

 

하지만!!! 엄밀히 말하면, 위 그림은 틀렸다.

 

조금 더 난이도 있게 들어가 보자.

아래 그림이 조금더 명확하게 RAG를 표현한 그림이라 할수 있겠다(직접 그렸음 ㅋ)

조금 더 복잡해졌다. 하나 하나 이해해 보도록 하자.

 

① 주요 자료를 페이지(문단 또는 문장)단위 임베딩

임베딩이 깊이 들어가면 설명할 내용이 많지만, 쉽게 설명해서 가상의 N차원 공간에 각 자료를 배치시키는 것이다.

예를들어 통신사 콜센터라 하면, 콜센터 메뉴얼을 주제단위로 임베딩해서 가상의 공간에 배치하는 것이다.

집전화관련 pdf파일들은 파란색점들이 있는 공간에,

핸드폰 관련은 검은색 있는 공간에,

인터넷관련은 녹색있는 공간에....

고장신고는 보라색 공간에...

그렇게 자신이 가지고 있던 자료를 적절한 단위로 쪼개고(예: pdf1쪽당, 한문단별) 임베딩을 진행해서, 가상의 공간에 배치한다.

 

② 사용자 질문과 유사한 자료를 가져오기

사용자가 질문을 입력하면, 입력된 질문 문장을 임베딩 진행한다.

위의 예시에서는 집전화 신규 신청과 관련된 문의라면....파랑색 점들의 위치로 임베딩이 된다.

그렇게 사용자의 질문을 임베딩하여, 하나의 점으로 가상의 공간에 배치해본다.

이 점에서 가장 가까운 점 5개를 가져온다. 가까운 점 5개는 내가 ①에서 임베딩한 문서들 중 5페이지다.

 

③ 확보한 자료와 사용자 질문을 AI에게 질문하여, 좋은 답변을 얻는다.

이렇게 확보된 5페이지 자료사용자의 질문을 gpt/clova에 질문한다.

 

맨 처음에 gpt 예를 아래와 같이 해석할수 있다

"최철수씨 나이는 30살이고, 서울에 살아"(이 정보가 RAG작업을 통해, 사용자 질문과 가장 유사한 자료)

최철수씨의 나이는?(사용자 질문)

이렇게 두개를 완성시켜서 AI에게 질문하는 구조이다.

 

 

2. 코드로 실습

- 서울시 청년관련 정책들을 txt파일로 저장해서, ./data파일에 넣어두었음

 

pip install openai
pip install streamlit
pip install streamlit-chat
pip install matplotlib scipy plotly scikit-learn

- 필요한 라이브러리 설치

- openai가 잘 업데이트 한다. 잘 되던 코드가 안된다면, openai버전 확인하자

(pip show openai, pip install --upgrade openai)

- 나는 2024.02.14기준 최신버전 openai사용

 

import os
import pandas as pd
import numpy as np
from numpy import dot
from numpy.linalg import norm
import ast
import openai
from openai import OpenAI
import streamlit as st
from streamlit_chat import message
from dotenv import load_dotenv

load_dotenv()

# Set OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI()

- 초기 셋팅

- .env에 저장된 OPENAI_API_KEY 지정해서, openai api사용 가능하게 함

 

def get_embedding(text, engine):
    text = text.replace("\n", " ")
    return client.embeddings.create(input = [text], model=engine).data[0].embedding

- 원래는 from openai.embeddings_utils import get_embedding로 사용이 가능했는데, openai업데이트로 없어졌음. 직접 함수 작성

- 해당 함수는 text와 모델명을 입력받아서, 입력받은 모델로 임베딩을 진행

- 청년 정책이 들어있는 57개의 txt파일을 N차원의 가상 공간에 배치시키는 기능을 함

- 또한, 사용자의 질문을 임베딩하여, 동일한 N차원의 가상 공간에 배치시킴

- 사용자 질문과 57개의 txt파일 임베딩을 cosine유사도로 구하면, 가장 유사한 문서부터, 가장 관련없는 문서까지 나열가능하게 됨

 

# Load data
folder_path = './data'
file_name = 'embedding.csv'
file_path = os.path.join(folder_path, file_name)

# 만약 파일이 있으면 로드
if os.path.isfile(file_path):
    print(f"{file_name} 파일이 존재합니다")
    df = pd.read_csv(file_path)
    # 'embedding' 컬럼에 있는 모든 문자열 데이터를 파이썬의 리스트나 딕셔너리 등의 실제 형태로 변환
    df['embedding'] = df['embedding'].apply(ast.literal_eval)
# 파일이 없으면 새로 생성
else:
    folder_path = './data'
    # 폴더 및 모든 파일 중 txt파일만 리스트로 가져와서 for 문 시작
    txt_file_name = [file for file in os.listdir(folder_path) if file.endswith('.txt')]

    data = []

    for file in txt_file_name:
        txt_file_path = os.path.join(folder_path, file)
        with open(txt_file_path, 'r', encoding='utf-8') as f:
            text = f.read()
            data.append(text)

    df = pd.DataFrame(data, columns=['text'])

    # openai embedding실행
    # axis=1이므로 행단위로 get_embedding함수를 실행, row은 arg로 해당 함수에 전달
    df['embedding'] = df.apply(lambda row: get_embedding(row.text, engine="text-embedding-ada-002"), axis=1)

    # 파일로 저장
    df.to_csv(file_path, index=False, encoding='utf-8')

- embedding.csv파일을 만든다

- 첫번째 열은 'text'로 정의하고, 해당 열에는 각 파일의 정책 텍스트 내용이 들어간다

- 두번째 열은 'embedding'으로 정의하고, 해당 열에는 위에서 선언한 get_embedding함수를 통해, 왼쪽에 있는 text내용을 임베딩한 값을 넣어준다. 각 text의 N차원 공간의 좌표가 기록된다.

 

"[0.02063588984310627, -0.003654821775853634, 0.014302623458206654, -0.04061206430196762, -0.021942125633358955, 0.010726967826485634, -0.00544924708083272, 0.0018686430994421244, -0.043356478214263916, 0.003000054508447647, -0.015661636367440224, -0.010377318598330021, -0.010575233027338982, -0.010595024563372135, -0.023538636043667793, -0.0007953690364956856, 0.016717180609703064, -0.02679762803018093, 0.00449266005307436, 0.004182593896985054, 0.006013303529471159, -0.005363483913242817, -0.00782422162592411, -0.013082151301205158, 0.004746650345623493, 0.015569277107715607, 0.02241712063550949, -0.0194220133125782, 0.019276876002550125, -0.0011140939313918352, 0.020926164463162422, -0.005531711503863335, -0.01935604214668274, -0.013379022479057312, -0.00032676514820195735, 0.007481169421225786, 0.01239604689180851, -0.0007496012840420008, 0.029766347259283066, 0.016215797513723373, 0.00810130126774311, -0.016070660203695297, 0.02035880833864212, -0.004420091398060322, 0.015661636367440224, 0.02101852372288704, -0.013695686124265194, -0.023446274921298027, -0.012653335928916931, 0.028578858822584152, -0.004683977458626032, 0.025939999148249626, -0.013854017481207848, -0.006455312483012676, -0.0046213045716285706, 0.008853376843035221, 0.0003465565969236195, 0.02429071068763733, -0.017667170614004135, -0.02285253256559372, 0.007817624136805534, -0.01500192191451788, -0.033645469695329666, -0.0005694166175089777, -0.01120196282863617, 0.030874667689204216, -0.02712748572230339, -0.007039160467684269, -0.01447414979338646, -0.013985960744321346, 0.044068969786167145, 0.030874667689204216, 0.00284172291867435, -0.018498411402106285, 0.033803801983594894, -0.007692278362810612, 0.005653758533298969, -0.005086403805762529, -0.02157268486917019, 0.0026668983045965433, 0.009268997237086296, -0.018445635214447975, -0.01902618445456028, 0.007890192791819572, 0.009071082808077335, 0.01868313178420067, -0.011742928996682167, 0.042168989777565, -0.020437974482774734, -0.0030792204197496176, 0.007421795278787613, 0.02440945990383625, 0.017904669046401978, 0.008041927590966225, 0.004208982456475496, 0.020754637196660042, -0.005887957289814949, 0.014408178627490997, -0.004644394386559725, 0.013722074218094349, -0.013827629387378693, 0.010918285697698593, -0.018735909834504128, -0.010377318598330021, -0.031692713499069214, -0.027998309582471848, 0.0009367954917252064, -0.03670654818415642, 0.012996387667953968, 3.502159597701393e-05, -0.0011982076102867723, 0.027919143438339233, 0.0035921488888561726, -0.03179826959967613, 0.037075988948345184, 0.009559271857142448, -0.004340925719588995, -0.0025646425783634186, -0.015674831345677376, 0.00862907338887453, 0.005591085646301508, 0.013187705539166927, 0.030742725357413292, 0.002973665948957205, -0.017561616376042366, -0.011182171292603016, -0.02435668185353279, -0.02650735341012478, -0.0052645266987383366, -0.01459289900958538, 0.025821249932050705, 0.015463722869753838, 0.031534381210803986, 0.03459545969963074, -0.029528848826885223, -0.006376146804541349, -0.010964465327560902, -0.013801240362226963, -0.03140243887901306, -0.0404801182448864, 0.0023700266610831022, 0.0018702923553064466, -0.01935604214668274, -0.03288020193576813, 0.010634607635438442, 0.02934412844479084, 0.0202400591224432, 0.0044266884215176105, 0.00465758889913559, 0.009064486250281334, -0.0068610371090471745, -0.008035330101847649, -0.0006098241428844631, -0.006788468454033136, 0.024937231093645096, 0.005290915258228779, -0.0014959040563553572, 0.02318239025771618, -0.029264962300658226, 0.0024063109885901213, 0.0016080556670203805, -0.01698106713593006, 0.02873719111084938, 0.011168977245688438, -0.02373654954135418, 0.014751230366528034, -0.0056438627652823925, 0.03237881883978844, 0.00826623011380434, -0.01403873786330223, -0.01203320361673832, 0.02086019143462181, -0.02884274534881115, -0.010858911089599133, 0.006847843062132597, 0.016228992491960526, 0.0015115722781047225, 0.0015264158137142658, -0.0052249436266720295, -0.01178910955786705, -0.012653335928916931, -0.014751230366528034, 0.002973665948957205, 0.02345946989953518, -0.007646098267287016, -0.0011536767706274986, 0.009268997237086296, 0.02768164686858654, -0.010878702625632286, 0.006250801030546427, 0.02235114760696888, 0.007349226623773575, 0.010957867838442326, -0.010251972824335098, -0.609260082244873, -0.016888707876205444, 0.018881047144532204, -0.009176637046039104, 0.0005657057045027614, 0.0059935119934380054, 0.02789275534451008, 0.032035768032073975, -0.013141524977982044, 0.00823324453085661, 0.001744946464896202, 0.0252407006919384, -0.004895086400210857, -0.004908280447125435, -0.019540762528777122, -0.010746759362518787, 0.005706535652279854, 0.017495645210146904, 0.03261631727218628, -0.007144714705646038, -0.016294963657855988, -0.011525223031640053, -0.019553955644369125, 0.007791235577315092, 0.019105350598692894, 0.021928930655121803, 0.014263041317462921, -0.028262196108698845, -0.010990854352712631, 0.025042785331606865, -0.015490110963582993, 0.025662917643785477, 0.005831881891936064, 0.015463722869753838, 0.057210493832826614, -0.002686689840629697, -0.013458188623189926, 0.014935950748622417, 0.019369235262274742, 0.03317047655582428, -0.013985960744321346, -0.015622054226696491, 0.037075988948345184, 0.0024211544077843428, 0.02595319226384163, -0.004687275737524033, 0.01292381901293993, -0.012956804595887661, 0.008906153962016106, -0.02396085299551487, -0.02929135225713253, 0.015239419415593147, 0.0085960878059268, -0.0029687180649489164, 0.024277515709400177, -0.008134287782013416, 0.0029456280171871185, -0.018379664048552513, 0.0018208137480542064, 0.009730798192322254, 0.018564384430646896, 0.0012377904495224357, -0.0006997103337198496, -0.03417324274778366, -0.03285381197929382, 0.0310593880712986, -0.009941906668245792, 0.01020579319447279, -0.012343269772827625, -0.022760171443223953, -0.020398391410708427, -0.0007825870416127145, -0.01256097573786974, 0.01295020803809166, 0.018881047144532204, 0.004030859563499689, 0.03528156504034996, 0.005465739872306585, -0.02240392565727234, 0.01675676368176937, -0.029581626877188683, 0.0069006201811134815, 0.001207278692163527, 0.0009706058772280812, -0.010199195705354214, -0.02778720110654831, -0.029080241918563843, -0.026916377246379852, 0.005657057277858257, -0.011531820520758629, 0.011947440914809704, 0.018326885998249054, 0.008774210698902607, -0.018828269094228745, -0.017891474068164825, 0.03876486048102379, 0.0026751449331641197, 0.013867211528122425, 0.002473931759595871, -0.0019412117544561625, -0.0032210589852184057, -0.022034484893083572, 0.014249846339225769, -0.0021919035352766514, 0.03090105578303337, 0.006940203253179789, -0.045018959790468216, -0.026137912645936012, 0.01603107713162899, -0.010364124551415443, 0.0012056294362992048, 0.010548844933509827, 0.004248565528541803, -0.0032919785007834435, -0.011135991662740707, -0.015714414417743683, 0.004100129473954439, 0.01364290900528431, 0.014117904007434845, 0.00013452004350256175, 0.0011883118422701955, -0.010317944921553135, -7.231096242321655e-05, 0.024646956473588943, 0.011109602637588978, 0.010726967826485634, 0.006296980660408735, -0.002816983498632908, -0.009203026071190834, -0.0041265180334448814, -0.02101852372288704, -0.035466283559799194, 0.015239419415593147, -0.019501179456710815, 0.06217155233025551, 0.0010233831126242876, 0.02096574753522873, -0.028156641870737076, 0.02440945990383625, -0.025979582220315933, -0.02169143408536911, 0.006079274695366621, 0.013827629387378693, -0.02435668185353279, 0.0017515436047688127, 0.00179277581628412, -0.019118543714284897, -0.01264673937112093, -0.01530539058148861, 0.009275594726204872, -0.020767832174897194, 0.022311566397547722, -0.004565228708088398, -0.003420623019337654, -0.024383071810007095, 0.011221754364669323, 0.0185775775462389, -0.010271764360368252, -0.02112407796084881, 0.008510325103998184, 0.016598433256149292, 0.007863804697990417, -0.012382852844893932, -0.01680954173207283, 0.003120452631264925, -0.011472445912659168, -0.0038395419251173735, -0.015015115961432457, 0.0042023854330182076, -0.028473304584622383, -0.011010645888745785, 0.00809470470994711, -0.015186642296612263, 0.02834136225283146, 0.03512323275208473, -0.0068676345981657505, -0.010977659374475479, 0.012079384177923203, -0.019369235262274742, 0.0007747529307380319, 0.0038329449016600847, 0.011465848423540592, -0.020754637196660042, -0.0016204252606257796, 0.010832522064447403, 0.013774852268397808, 0.000990397296845913, 0.007725263945758343, -0.00868185143917799, 0.0001247273903572932, -0.0006160089978948236, 0.03528156504034996, -0.0005405540578067303, 0.007078743074089289, 0.009334969334304333, -0.00032470354926772416, 0.012976596131920815, 0.0327218696475029, 0.019382430240511894, 0.020332420244812965, 0.023565024137496948, 0.0005805492983199656, 0.025333059951663017, -0.02136157639324665, -0.003931902348995209, -0.010608219541609287, 0.007256866432726383, -0.006666421424597502, 0.012514796108007431, -0.0021919035352766514, 0.011940843425691128, -0.053832754492759705, 0.0068280515260994434, -0.011545014567673206, -0.0022149935830384493, 0.0183004979044199, -0.026824016124010086, 0.016294963657855988, -0.025504587218165398, 0.010944673791527748, 0.009809964336454868, 0.008648864924907684, 0.02046436257660389, -0.011769318021833897, -0.008510325103998184, -0.0017498943489044905, -0.011129394173622131, 0.01807619445025921, 0.01178910955786705, 0.008714837022125721, -0.028869133442640305, -0.002078102668747306, 0.0058021945878863335, 0.05306748300790787, 0.006682914216071367, 0.021044911816716194, 0.020108116790652275, -0.01347798015922308, -0.023538636043667793, -0.010997450910508633, 0.015727609395980835, 0.03251076117157936, 0.005861568730324507, 0.0046147070825099945, 0.006247502285987139, -0.0013177809305489063, 0.018603965640068054, 0.015714414417743683, -0.023604607209563255, 0.020108116790652275, -0.025319866836071014, 0.01824771985411644, 0.015516499988734722, 0.005330498330295086, -0.005472336895763874, -0.026111524552106857, 0.0045784227550029755, 0.00472685880959034, 0.024224739521741867, 0.023156000301241875, 0.018102582544088364, 0.015833163633942604, -0.002208396326750517, -0.01736370287835598, 0.011782512068748474, 0.01680954173207283, -0.010984256863594055, -0.01753522828221321, -0.015516499988734722, -0.02186295948922634, 0.0013870510738343, -0.015767190605401993, -0.010660996660590172, -0.023327527567744255, -0.01159119512885809, -0.0064817010425031185, 0.006132052280008793, -0.039081525057554245, 0.013695686124265194, 0.00019904841610696167, -0.022219205275177956, -0.018221331760287285, 0.014540120959281921, 0.025755278766155243, -0.005765910260379314, -0.006468506995588541, -0.0101200295612216, -0.013827629387378693, -0.026692073792219162, 0.02018728293478489, 0.0012946908827871084, 0.007566932588815689, 0.0002935732190962881, -0.0056438627652823925, -0.036310721188783646, -0.026164302602410316, 0.026085136458277702, 0.004994043614715338, -0.013999154791235924, 0.018485218286514282, -0.002304055029526353, -0.010225584730505943, -0.02190254256129265, -0.015846356749534607, 0.01444776076823473, 0.012244313023984432, -0.018722714856266975, -0.02944968268275261, -0.011333906091749668, -0.02380252256989479, -0.017785919830203056, -0.007395406719297171, -0.01881507597863674, -0.004489361308515072, -0.025042785331606865, 0.0014604444149881601, -0.027655258774757385, -0.025082368403673172, 0.030874667689204216, -0.021480323746800423, -0.0014975533122196794, -0.0016352689126506448, -0.0210713017731905, -0.003648224752396345, 0.07911303639411926, -0.033698249608278275, 0.017495645210146904, 0.015160253271460533, 0.0028070879634469748, -0.014394983649253845, -0.006986383348703384, -0.028156641870737076, -0.015015115961432457, 0.007896790280938148, 0.00432773120701313, 0.009922115132212639, 0.02113727293908596, -0.0163741298019886, 0.028473304584622383, 0.016743570566177368, 0.009038097225129604, -0.036363497376441956, 0.011696749366819859, 0.01054224744439125, 0.007223880384117365, -0.03084827959537506, 0.006191426422446966, 0.041905105113983154, 0.02002895064651966, -0.001243563019670546, 0.04227454587817192, 0.023261554539203644, 0.007039160467684269, -0.023710161447525024, -0.018947018310427666, -0.01587274670600891, 0.005294214002788067, 0.011129394173622131, -0.017680365592241287, 0.03876486048102379, 0.007982552982866764, -0.004459674470126629, -0.007223880384117365, 0.014012348838150501, 0.04559950903058052, 0.04333008825778961, 0.017838697880506516, -0.006438819691538811, 0.0060297963209450245, 0.0035030872095376253, 0.0027262726798653603, 0.04000512510538101, -0.012165146879851818, -0.0016525863902643323, 0.02224559336900711, -0.017178982496261597, -0.012310284189879894, -0.0031155047472566366, 0.013748463243246078, -0.026916377246379852, 0.003034689463675022, 0.0021325291600078344, -0.009948504157364368, -0.02489764802157879, -0.002780699171125889, -0.043963417410850525, -0.006438819691538811, -0.002092946320772171, -0.018287302926182747, -0.030373284593224525, -0.00442339014261961, -0.005571294110268354, -0.0038263476453721523, 0.009229414165019989, 0.003823049133643508, -0.010192598216235638, -0.02617749571800232, 0.0032111634500324726, 0.009882532991468906, 0.02728581801056862, 0.04533562436699867, 0.017178982496261597, -0.017152592539787292, 0.04087594896554947, -0.004495958797633648, -0.009097471833229065, -0.005112792365252972, -0.007844013161957264, 0.005132583901286125, 0.031877435743808746, 0.010792938992381096, -0.010436693206429482, -0.017997028306126595, 0.018669938668608665, 0.011155783198773861, -0.009460315108299255, 0.02013450488448143, -0.001637742854654789, -0.03557183966040611, -0.0017416479531675577, 0.011940843425691128, 0.009170040488243103, -0.021058106794953346, -0.017706753686070442, -0.0011074967915192246, -0.003067675279453397, -0.009143651463091373, 0.01919770985841751, -0.00432773120701313, 0.023644190281629562, -0.009414134547114372, -0.007916581816971302, -0.010911688208580017, 0.01885465905070305, 0.022047679871320724, -0.02546500414609909, -0.005765910260379314, -0.02268100529909134, 0.004733455833047628, -0.007065549027174711, -0.008160675875842571, 0.009790172800421715, 0.026837211102247238, -0.004100129473954439, -0.005475635640323162, -0.03340797498822212, 0.0180366113781929, 0.028156641870737076, -0.017099816352128983, -0.0018603967037051916, -0.017165787518024445, -0.009539480321109295, 0.0013449941761791706, -0.00904469471424818, -0.008180467411875725, 0.01162418071180582, 0.006656525656580925, -0.02029283717274666, 0.00014256031136028469, -0.0032820827327668667, -0.01325367670506239, 0.002737817820161581, -0.00279389345087111, -0.035914890468120575, -0.012580767273902893, -0.03285381197929382, -0.012732502073049545, -0.018181748688220978, 0.005970421712845564, -0.026665685698390007, 0.009176637046039104, -0.005139180924743414, -0.0041990866884589195, 0.021493518725037575, 0.005772507283836603, 0.018603965640068054, 0.02513514645397663, 0.00884678028523922, 0.004941266495734453, -0.022760171443223953, -0.010588428005576134, -0.009268997237086296, 0.027708034962415695, 0.014025543816387653, 0.023050446063280106, 0.003133646911010146, 0.0056834458373487, 0.011300920508801937, 0.017376895993947983, 0.0037372861988842487, -0.01519983634352684, 0.017442867159843445, -0.0038395419251173735, 0.025992775335907936, 0.012673127464950085, 0.03789403662085533, 0.01785189099609852, -0.005907748825848103, -0.010073849931359291, 0.03322325274348259, 0.015595665201544762, -0.006277189590036869, 0.005330498330295086, -0.023419886827468872, 0.007019368931651115, -0.01902618445456028, -0.008741225115954876, -0.011584597639739513, -0.02129560336470604, -0.003684509079903364, -0.0014398283092305064, 0.0016765010077506304, 0.031587161123752594, 0.010713773779571056, -0.0020005861297249794, -0.03947735205292702, 0.041693996638059616, 0.03377741575241089, -0.022271983325481415, 0.02051714062690735, -0.0126203503459692, -0.021058106794953346, -0.012165146879851818, 0.0066367341205477715, 0.01156480610370636, -0.009579063393175602, -0.020015757530927658, -0.01781230792403221, 0.0045454371720552444, -0.0013293259544298053, -0.03831625357270241, -0.033645469695329666, -0.015094282105565071, -0.01381443440914154, -0.012158549390733242, -0.022311566397547722, 0.01394637767225504, -0.02103171870112419, 0.008035330101847649, -0.005099597852677107, -0.008912751451134682, 0.024778900668025017, 0.0004576773790176958, -0.009249205701053143, -0.004575124476104975, 0.0415884405374527, 0.022443508729338646, 0.0038890207652002573, 0.0049247732385993, 0.01758800446987152, -0.006389340851455927, -0.015450527891516685, -5.714524377253838e-06, -0.004241968039423227, -0.021150466054677963, 0.03111216612160206, 0.005894554778933525, -0.019039377570152283, -0.021757405251264572, 0.013167914003133774, 0.013200899586081505, -0.012132161296904087, -0.028288584202528, 0.021282410249114037, 0.02035880833864212, 0.006359654013067484, 0.001266653067432344, -0.005122688133269548, -0.0048423088155686855, 0.03177187964320183, 0.016281768679618835, 0.001039876020513475, -0.01802341639995575, -0.0010332787642255425, -0.004716963041573763, 0.029819123446941376, 0.0018488516798242927, 0.01592552289366722, 0.02489764802157879, -0.029212186112999916, -0.019039377570152283, -0.022826142609119415, -0.016506072133779526, 0.025253895670175552, 0.012158549390733242, 0.04480785131454468, -0.016334546729922295, 0.001647638506256044, -0.0066400328651070595, 0.003552566049620509, -0.003249096916988492, -0.013576936908066273, -0.031138554215431213, 0.032537151128053665, -0.001028330996632576, -0.016004689037799835, 0.004895086400210857, 0.022311566397547722, 0.014685258269309998, -0.013550548814237118, 0.014236652292311192, -0.01361651998013258, -0.02396085299551487, -0.014500538818538189, 0.0050006406381726265, 0.03456907346844673, 0.003131997538730502, -0.016281768679618835, -0.02279975451529026, -0.003323314944282174, -0.01381443440914154, -0.019210904836654663, 0.04755226522684097, 0.070985347032547, 0.0005339569179341197, 0.01145925186574459, -0.032985758036375046, 0.02950246073305607, -0.0169150959700346, 0.0007669188198633492, -0.006755482871085405, 0.014197069220244884, -0.04414813593029976, 0.018735909834504128, 0.004736754577606916, 0.009994683787226677, 0.017376895993947983, 0.015569277107715607, 0.00843115895986557, -0.029370516538619995, 0.021506713703274727, -0.009130457416176796, -0.03084827959537506, -0.010759953409433365, -0.013464786112308502, 0.007573529612272978, -0.02557055838406086, 0.0032771348487585783, -0.01153841707855463, 0.001993988873437047, 0.008200258947908878, 0.004420091398060322, -0.023881686851382256, 0.007731861434876919, -0.019659509882330894, 0.001277373405173421, 0.023881686851382256, -0.018155360594391823, -0.003684509079903364, -0.01054224744439125, 0.029001077637076378, 0.008140884339809418, -0.017733141779899597, 0.00224633002653718, 0.005736222956329584, 0.005525114014744759, -0.014856784604489803, -0.02567611262202263, -0.010700579732656479, -0.006501492578536272, -0.022760171443223953, 0.011320711113512516, -0.0017416479531675577, -0.011439460329711437, 0.013346036896109581, 0.024594180285930634, -0.013801240362226963, 0.020068533718585968, 0.008780808188021183, 0.010938076302409172, -0.020490752533078194, -0.004664185922592878, -0.026045553386211395, -0.011775914579629898, -0.010852313600480556, 0.0643354207277298, -0.013200899586081505, -0.0404801182448864, -0.0023749745450913906, 0.0024195052683353424, -0.027048319578170776, 0.020952552556991577, 0.022377537563443184, 0.010911688208580017, 0.00203192257322371, 0.00531400553882122, -0.005739521700888872, 0.01592552289366722, -0.004182593896985054, 0.03227326273918152, -0.014764424413442612, -0.011696749366819859, 0.009862741455435753, -0.009493300691246986, -0.0027411163318902254, 0.009163442999124527, -0.03185104578733444, -0.0046509914100170135, 0.01791786216199398, 0.03609961271286011, -0.0036020446568727493, 0.04596894979476929, 0.008754420094192028, 0.01791786216199398, 0.014460955746471882, 0.009460315108299255, -0.029476072639226913, -0.019606733694672585, 0.017113009467720985, -0.019936591386795044, -0.0008782457443885505, 0.0266261026263237, -0.022839337587356567, -0.023393498733639717, 0.0019197709625586867, 0.007837415672838688, -0.011109602637588978, 0.008437756448984146, 0.008220050483942032, -0.019553955644369125, 0.015688026323914528, -0.012250909581780434, 0.023538636043667793, -0.001265828381292522, -0.0006939378217794001, 0.007896790280938148, -0.006564165465533733, -0.03317047655582428, -0.0014909561723470688, 0.013082151301205158, -0.027417760342359543, -0.0017531929770484567, 0.011103005148470402, -0.02307683415710926, 0.01070717629045248, -0.02163865603506565, -0.0018472023075446486, -0.014408178627490997, 0.01203320361673832, -0.007580126635730267, -0.023696966469287872, -0.005376678425818682, 0.024317098781466484, 0.013972766697406769, -0.01625538058578968, -0.006745587103068829, -0.002257874934002757, -0.020715054124593735, -0.016268575564026833, -0.01868313178420067, 0.02463376335799694, -0.01746925711631775, 0.009057888761162758, 0.03942457586526871, -0.015780385583639145, -0.005330498330295086, -0.01774633675813675, -0.015081088058650494, 0.00585167296230793, -0.0011643972247838974, 0.24171961843967438, -0.004848906304687262, 0.01886785216629505, 0.01023877877742052, -0.006894023157656193, 0.003024793928489089, 0.013999154791235924, 0.010601622052490711, -0.019012989476323128, 0.0344107411801815, -0.05852992460131645, 0.004994043614715338, 0.00790338683873415, -0.006762079894542694, -0.025887221097946167, -0.025610141456127167, -0.013200899586081505, -0.01278527919203043, -0.008523519150912762, 0.013438397087156773, -0.016994262114167213, 0.02652054838836193, -0.025821249932050705, -0.004733455833047628, 0.0027411163318902254, 0.010496067814528942, -0.0171921756118536, 0.000268833915470168, -0.004182593896985054, 0.013788046315312386, 0.0036878075916320086, -0.0113668916746974, 0.013108539395034313, 0.010245376266539097, -0.0077384584583342075, -0.010786342434585094, 0.0255177803337574, -0.011670360341668129, 0.03982040658593178, 0.032642703503370285, -0.0095724668353796, -0.010614816099405289, 0.012237715534865856, -0.013431799598038197, 0.0008592789527028799, 0.01669079251587391, -0.01406512688845396, 0.0027064813766628504, -0.02768164686858654, -0.016492879018187523, -0.026203885674476624, -0.03206215426325798, 0.007072146050632, 0.034701015800237656, 0.018656743690371513, -0.004113323986530304, 0.0002286325179738924, 0.010502664372324944, -0.013082151301205158, 0.027417760342359543, 0.006362952291965485, 0.036917656660079956, -0.03311770036816597, 0.03726071119308472, -0.01206618919968605, -0.015833163633942604, -0.011808901093900204, 0.011274531483650208, 0.01686231791973114, -0.021044911816716194, -0.003654821775853634, 0.004766441881656647, -0.01592552289366722, -0.0004477816401049495, -0.021889347583055496, -0.019672704860568047, 0.04652310907840729, 0.018049806356430054, 0.02297127991914749, 0.014183875173330307, -0.016163021326065063, 0.01841924712061882, 0.018762297928333282, 0.012264104560017586, -0.004733455833047628, -0.04813281446695328, -0.01351096574217081, 0.00948670320212841, -0.03554544970393181, -0.020886581391096115, 0.011907857842743397, 0.006616942584514618, -0.006151843350380659, -0.023314332589507103, -0.0069336057640612125, 0.0034338172990828753, 0.00965163204818964, 0.021176856011152267, -0.034806568175554276, 0.029370516538619995, -0.004588318523019552, 0.03390935808420181, 0.015885939821600914, 0.0021902541629970074, -1.4044716408534441e-05, 0.006006706040352583, 0.0022776664700359106, -0.0015429087216034532, -0.006369549315422773, -0.011723137460649014, 0.012521392665803432, -0.013656103052198887, 0.012534587644040585, 0.0003249096916988492, 0.008325604721903801, 0.006102364975959063, 0.01308874785900116, -0.04998001828789711, 0.010773148387670517, 0.01040370762348175, -0.03390935808420181, -0.020596306771039963, -0.019092155620455742, 0.013385619968175888, 0.011228351853787899, -0.006527881138026714, -0.025333059951663017, 0.014091514982283115, 0.00034861822496168315, -0.05013835057616234, 0.006663122680038214, -0.02101852372288704, 0.03293297812342644, -0.00011503782297950238, 0.0026734955608844757, 0.0183004979044199, 0.004057248122990131, -0.0017828801646828651, -0.005739521700888872, 0.007487766444683075, 0.008206856437027454, -0.021559489890933037, 0.03583572432398796, 0.0050699105486273766, -0.020451169461011887, -0.022549062967300415, 0.005845075938850641, -0.0015800177352502942, -0.01308874785900116, 0.007270060479640961, -0.0021836571395397186, -0.02378932759165764, 0.00551851699128747, -0.04414813593029976, 0.020992135629057884, -0.018445635214447975, -0.030821891501545906, -0.039081525057554245, 0.009447120130062103, 0.03306492045521736, -0.04090233892202377, 0.010073849931359291, 0.0360732227563858, -0.019276876002550125, -0.024831676855683327, 0.006616942584514618, -0.16614265739917755, 0.012349867261946201, 0.012178340926766396, -0.007138117682188749, -0.02020047791302204, 0.01958034560084343, 0.007758249994367361, -0.010759953409433365, -0.01791786216199398, -0.013880406506359577, 0.030030231922864914, -0.02839413844048977, -0.004486063029617071, 0.011294323019683361, 0.014315818436443806, 0.012534587644040585, -0.005762611515820026, 0.017113009467720985, 0.0035294760018587112, 0.002236434258520603, 0.01824771985411644, -0.0057988958433270454, 0.020332420244812965, 0.01973867602646351, 0.0047763376496732235, 0.0009838001569733024, -0.027048319578170776, 0.008345396257936954, 0.004179295152425766, -0.017825502902269363, -0.022707395255565643, 0.02918579801917076, 0.024778900668025017, -0.0017977237002924085, 0.007573529612272978, -0.012059592641890049, -0.018551189452409744, 0.01892063021659851, -0.006303578149527311, 0.02458098530769348, -0.0016426906222477555, 0.004670782946050167, 0.003549267305061221, 0.011155783198773861, -0.03720793128013611, 0.002617419697344303, 0.017997028306126595, 0.0027889457996934652, 0.008642268367111683, 0.008081510663032532, -0.012613752856850624, -0.021176856011152267, -0.003011599648743868, 0.006432222668081522, 0.0004226299934089184, 0.02834136225283146, 0.017931057140231133, 0.0011561507126316428, -0.00432773120701313, 0.011696749366819859, -0.02562333457171917, -0.00033727934351190925, 0.05370081216096878, -0.002589381765574217, -0.007949567399919033, -0.0006828051409684122, -0.021889347583055496, -0.0008077386883087456, -0.02013450488448143, -0.008226647973060608, 0.005690042860805988, -0.01023877877742052, 0.015133865177631378, -0.00466088717803359, 0.017931057140231133, -0.020728249102830887, -0.012429033406078815, 0.018445635214447975, 0.0018653444712981582, 0.008173870854079723, 0.010720370337367058, 0.01591232791543007, -0.04670783132314682, 0.0037966605741530657, -0.03404130041599274, 0.009176637046039104, 0.012402644380927086, -0.02035880833864212, 0.006247502285987139, 0.0210844948887825, 0.027444148436188698, -0.005396469496190548, -0.003346404992043972, -0.010324541479349136, 0.038421809673309326, 0.019883813336491585, 0.009156845510005951, 0.004733455833047628, -0.014302623458206654, -0.021955318748950958, -0.002661950420588255, -0.03269548341631889, -0.026375411078333855, 0.0008436107309535146, 0.0327218696475029, -0.0013788045616820455, -0.013999154791235924, 0.013999154791235924, 0.028209418058395386, -0.010291555896401405, -0.013029373250901699, -0.00406384514644742, 0.01436859555542469, -0.016163021326065063, 0.016017884016036987, 0.024607373401522636, -0.016387322917580605, -0.00449266005307436, 0.01497553288936615, -0.004789531696587801, 0.03760376200079918, -0.013827629387378693, -0.007204089313745499, 0.016835929825901985, -0.0084575479850173, 0.01785189099609852, -0.05451885610818863, 0.01159119512885809, 0.025504587218165398, 0.0023947658482939005, -0.020820610225200653, 0.01215195283293724, 0.0028977987822145224, 0.0180366113781929, -0.02679762803018093, -0.015569277107715607, -0.04615366831421852, -0.01946159638464451, 0.0064784022979438305, 0.012857847847044468, 0.014104709029197693, -0.013919989578425884, -0.0013103592209517956, -0.037075988948345184, 0.014513732865452766, -0.000873297918587923, -0.0004708716587629169, -0.009414134547114372, 0.002833476522937417, -0.006719198543578386, -0.024475431069731712, -0.019448401406407356, -0.01940881833434105, 0.025728890672326088, 0.007243671920150518, 0.020939357578754425, 0.008728031069040298, -0.013774852268397808, 0.00016822735778987408, -0.013418605551123619, 0.012640141882002354, -0.05351608991622925, -0.018274109810590744, -0.026982348412275314, 0.0038263476453721523, -0.012824862264096737, 0.01434220653027296, 0.033751025795936584, 0.017733141779899597, -0.012204729951918125, 0.016440100967884064, 0.0022413821425288916, -0.0017399986973032355, 0.0038560349494218826, 0.0077054728753864765, -0.018669938668608665, -0.013042568229138851, -0.01670398749411106, -0.02918579801917076, -0.023538636043667793, 0.054941076785326004, 0.00482581602409482, 0.026969153434038162, -0.005676848813891411, -0.03346075117588043, 0.023446274921298027, -0.013933183625340462, 0.024224739521741867, -0.02058311179280281, -0.02290530875325203, 0.02174421027302742, -0.011782512068748474, -0.013985960744321346, -0.0030066517647355795, 0.0005059190443716943, -0.011973829939961433, -0.0036284332163631916, 0.021599072962999344, -0.011419668793678284, 0.025702500715851784, -0.025702500715851784, 0.0030330403242260218, -0.03322325274348259, 0.004749949090182781, -0.00697318883612752, -0.0041298167780041695, -0.010971062816679478, -0.016070660203695297, 0.011254739947617054, -0.010311347432434559, 0.010905090719461441, 0.004552034195512533, 0.013669297099113464, 0.0005747767863795161, -0.0006024023750796914, -0.02768164686858654, 0.02512195147573948, 0.020437974482774734, -0.01364290900528431, -0.008226647973060608, 0.005013835150748491, 0.002736168447881937, -0.005310706794261932, 0.01067419070750475, 0.006966591812670231, 0.016334546729922295, -0.04681338369846344, -0.021269215270876884, -0.07789916545152664, 0.0013078852789476514, 0.018432440236210823, -0.022760171443223953, -0.010819328017532825, 0.0183004979044199, 0.03805236890912056, -0.0194220133125782, -0.025715695694088936, -0.027813589200377464, -0.014685258269309998, 0.025438616052269936, -0.010304749943315983, 0.00029460404766723514, -0.03562461584806442, -0.04377869516611099, 0.016440100967884064, 0.004858802072703838, 0.03433157503604889, -0.00012575818982440978, 0.015160253271460533, 0.0015445580938830972, 0.01292381901293993, 0.006633435375988483, -0.03517600893974304, 0.01625538058578968, -0.02695596031844616, 0.014289429411292076, -0.00034593811142258346, 0.0004956109914928675, 0.0008914400823414326, -0.03211493045091629, 0.010331138968467712, 0.009097471833229065, 0.00047252094373106956, -0.02563652954995632, 0.018669938668608665, 0.0009969944367185235, 0.017099816352128983, 0.0127588901668787, -0.024607373401522636, -0.016901900991797447, 0.022311566397547722, -0.002561344066634774, -0.017165787518024445, 0.019672704860568047, 0.013827629387378693, 0.026388604193925858, 0.022984474897384644, -0.02839413844048977, 0.03206215426325798, 0.006626838352531195, 0.011947440914809704, -0.014289429411292076, 0.0007632079068571329, -0.012673127464950085, -0.00033398077357560396, 0.003046234603971243, -0.005818687379360199, -0.02773442305624485, 0.016849124804139137, -0.0070919375866651535, -0.0027625570073723793, -0.00717110326513648, 0.01808938942849636, -0.041403722018003464, -0.016097048297524452, -0.0052249436266720295, 0.0055482042953372, -0.029264962300658226, -0.024871259927749634, -0.01731092482805252, 0.01952756755053997, 0.022113651037216187, -0.018485218286514282, 0.022654617205262184, 0.013880406506359577, -0.009110665880143642, -0.014210264198482037, 0.014856784604489803, 0.005281019490212202, -0.004119921009987593, -0.010317944921553135, -0.006946800276637077, 0.008549908176064491, 0.015569277107715607, -0.028763579204678535, -0.00255969469435513, 0.016070660203695297, 0.004436584189534187, -0.05348970368504524, 0.030214952304959297, 0.013075553812086582, -0.018049806356430054, 0.001182539388537407, 0.024990009143948555, 0.012006815522909164, 0.00358555163256824, -7.228519280033652e-06, 0.010726967826485634, 0.023208778351545334, 0.01963312178850174, 0.005356886889785528, -0.014394983649253845, -0.019897008314728737, 0.007850609719753265, -0.006762079894542694, -0.02617749571800232, 0.0043871053494513035, 0.02390807680785656, 0.016954679042100906, 0.011736332438886166, -0.013082151301205158, 0.015120670199394226, -0.01951437257230282, 0.017271341755986214, -0.00815407931804657, -0.04757865518331528, -0.02584763802587986, 0.014513732865452766, 0.0015824916772544384, 0.01278527919203043, 0.0029373816214501858, -0.006946800276637077, 0.0454675666987896, 0.004241968039423227, 0.010430095717310905, 0.006762079894542694, -0.0012452122755348682, -0.009691215120255947, -0.015331779606640339, -0.007646098267287016, -0.03356630727648735, -0.029027465730905533, -0.008121092803776264, 0.03090105578303337, 0.022456703707575798, 0.00041046651313081384, -0.01459289900958538, 0.05309387296438217, 0.022944891825318336, -0.006560866720974445, -0.008767614141106606, 0.013075553812086582, 0.011195365339517593, 0.010575233027338982, 0.012336673215031624, -0.008470742031931877, -0.01913173869252205, 0.010278361849486828, -0.03478018194437027, 0.00640913238748908, -0.05446607992053032, 0.0015973352128639817, -0.013181108050048351, -0.016506072133779526, 0.017271341755986214, -0.009328371845185757, 0.03543989732861519, 0.026929572224617004, -0.014183875173330307, -0.02252267487347126, 0.013319648802280426, -0.01736370287835598, -0.001325202756561339, 0.003644926007837057, -0.018828269094228745, 0.00045231718104332685, -0.004241968039423227, -0.00640913238748908, -0.010047460906207561, -0.01646648906171322, -0.018551189452409744, -0.011749526485800743, -0.004608110059052706, -0.03000384382903576, 0.0098495464771986, 0.005584488622844219, -0.008615879341959953, 0.009902323596179485, 0.020675472915172577, -0.024040019139647484, -0.00048612759565003216, 0.017152592539787292, -0.0066730184480547905, -0.013669297099113464, 0.005762611515820026, -0.00505011947825551]

- openai가 제공하는 "text-embedding-ada-002"모델로 text내용을 임베딩하여, 얻은 1,536차원 좌표

 

# 두 임베딩간 유사도 계산
def cos_sim(A, B):
    return dot(A, B)/(norm(A)*norm(B))

- 두 좌표간 유사도를 측정하는 함수

- 사용자의 질문과 57개 정책정보간 유사도를 측정하게 된다

 

# 질문을 임베딩하고, 유사도 높은 탑3 자료
def return_answer_candidate(df, query):
    query_embedding = get_embedding(
        query,
        engine="text-embedding-ada-002"
    )
    # 입력된 질문과 각 문서의 유사도
    df['similarity'] = df['embedding'].apply(lambda x: cos_sim(np.array(query_embedding), np.array(x)))
    # 유사도 높은 순으로 정렬
    top3 = df.sort_values("similarity", ascending=False).head(3)
    return top3

- 사용자의 입력질문과 57개 자료의 유사도를 측정해서, 가장 유사한 3개의 자료를 가져오는 함수

- 사용자의 질문을 임베딩하여 1,536차원의 좌표를 얻는다

- 57개 자료와 사용자 질문간 유사도를 측정한다

df['similarity'] = df['embedding'].apply(lambda x: cos_sim(np.array(query_embedding), np.array(x)))

- 가장 유사도가 높은 자료 3개를 가져온다

 top3 = df.sort_values("similarity", ascending=False).head(3)

 

 

# 질문에 대한 가장 유사한 문서3개 가져와서, messages셋 만들어서 리턴
def create_prompt(df, query):
    # 질문과 가장 유사한 문서 3개 가져오기
    result = return_answer_candidate(df, query)
    system_message = f"""
    너는 주어진 문서를 참고해서, 자세하게 대답해줘.
    문서내용:
    문서1: """ + str(result.iloc[0]['text']) + """
    문서2: """ + str(result.iloc[1]['text']) + """
    문서3: """ + str(result.iloc[2]['text']) + """
    한국어로 답변해주고, 문서에 기반에서 정확한 답을 해줘
    """

    user_message = f"""User question: "{str(query)}". """

    messages =[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message}
    ]
    print(result)
    return messages

- 우리에게 익숙한 gpt에게 던질 messages작성하는 과정

- 가장 유사한 자료 3개를 가져온다

result = return_answer_candidate(df, query)

- 가져온 3개의 자료의 내용을 system메세지에 적절하게 작성해준다

- 유저가 입력한 질문을 더해준다

- gpt/clova에 던질 최종 messages를 완성한다.

 

# 완성된 질문에 대한 답변 생성
def generate_response(messages):
    result = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0.4,
        max_tokens=500)
    print(result.choices[0].message.content)
    return result.choices[0].message.content

- 완성된 messages와 모델을 설정하여,

- gpt에게 질문하고, 응답을 받아낸다

 

if 'generated' not in st.session_state:
    st.session_state['generated'] = []

if 'past' not in st.session_state:
    st.session_state['past'] = []

with st.form('form', clear_on_submit=True):
    user_input = st.text_input('물어보세요!', '', key='input')
    submitted = st.form_submit_button('Send')

if submitted and user_input:
    # 프롬프트 생성 후 프롬프트를 기반으로 챗봇의 답변을 반환
    prompt = create_prompt(df, user_input)
    chatbot_response = generate_response(prompt)
    st.session_state['past'].append(user_input)
    st.session_state["generated"].append(chatbot_response)

if st.session_state['generated']:
    for i in reversed(range(len(st.session_state['generated']))):
        message(st.session_state['past'][i], is_user=True, key=str(i) + '_user')
        message(st.session_state["generated"][i], key=str(i))

- streamlit 채팅 내용(생략)

- streamlit run 파일명.py 실행!

 

3. 정리

최신 정보가 없는 gpt3.5의 답변

굉장히 general한 응답을 해준다.

 

RAG를 활용해서, 57개의 서울시 청년정책 내용을 참고해서 응답할 경우

확실히 존재하는 문서를 참고해서 대답하기 때문에, 구체적인 내용들이 들어 있다.

 

이처럼 RAG를 통해서, AI가 조금 더 정확하고, 구체적인 답변을 하는 것이 가능해진다.

하지만, 돌아가는 내용을 자세히 살펴보면, AI가 무엇인가를 학습하는 것이 아닌,

사용자의 질문과 유사한 자료를 찾아내서,

해당 내용과 입력질문을 함께 물어보는 구조라는 것을 확인 할 수 있었다.

 

fine-tunig이 더 멋있어 보이지만, 실전에서는 아직 좋은 LLM+RAG가 더 좋은 것 같다!