Chat Bot Evaluation as Multi-agent Simulation 是一种使用 LangGraph 模拟用户与聊天机器人交互的方法,旨在自动化评估聊天机器人的性能。传统的手动测试耗时且难以复现,而通过模拟用户(Simulated User)与聊天机器人(Chat Bot)之间的多代理交互,可以更高效地测试机器人在不同场景下的表现。这种方法特别适用于客户支持机器人等需要处理复杂对话的应用。

主要内容
Simulated User Node:通常基于 LLM(如 ChatOpenAI)生成用户输入,结合提示模板(Prompt Template)模拟真实用户行为。例如,模拟用户可能是一个航空公司的客户,提出退款或预订请求。
Chat Bot Node:实现聊天机器人逻辑,可以调用外部 API 或基于 LLM生成响应。例如,机器人可能被配置为航空公司的客户支持代理,遵循特定政策响应用户请求。

停止条件(should_continue):检查是否满足终止条件(如消息数量超过阈值或用户发送“FINISHED”)
一个关键挑战是区分模拟用户和聊天机器人的消息,因为两者都由 LLM 生成(均为 AI Messages)。LangGraph 的解决方案是:
- 假设 HumanMessages 表示模拟用户的消息,AIMessages 表示聊天机器人的消息。
以下是完整源代码
from typing import List
from openai import OpenAI
from langchain_community.llms import Tongyi
###############################################################################################
# 定义参数
###############################################################################################
import f_common
import os
###############################################################################################
# 请求LLM的消息和返回内容
###############################################################################################
# This is flexible, but you can define your agent here, or call your agent API here.
def my_chat_bot(messages: List[dict]) -> dict:
system_message = {
"role": "system",
"content": "You are a customer support agent for an airline.",
}
messages = [system_message] + messages
completion = f_common.myQwen_LLM.chat.completions.create(
messages=messages, model="qwen-plus"
)
return completion.choices[0].message.model_dump()
# 测试调用一次该方法
my_chat_bot([{"role": "user", "content": "hi!"}])
###############################################################################################
# 模拟一个用户的请求
###############################################################################################
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
system_prompt_template = """You are a customer of an airline company. \
You are interacting with a user who is a customer support person. \
{instructions}
When you are finished with the conversation, respond with a single word 'FINISHED'"""
prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt_template),
MessagesPlaceholder(variable_name="messages"),
]
)
instructions = """Your name is Harrison. You are trying to get a refund for the trip you took to Alaska. \
You want them to give you ALL the money back. \
This trip happened 5 years ago."""
prompt = prompt.partial(name="Harrison", instructions=instructions)
# 以下代码套用了一个回答模板
model=Tongyi(temperature=1)
#model = ChatOpenAI()
simulated_user = prompt | model
from langchain_core.messages import HumanMessage
messages = [HumanMessage(content="Hi! How can I help you?")]
simulated_user.invoke({"messages": messages})
###############################################################################################
# # 这里定义了一个Bot的Node,模拟Bot的对话请求
###############################################################################################
from langchain_community.adapters.openai import convert_message_to_dict
from langchain_core.messages import AIMessage
def chat_bot_node(state):
messages = state["messages"]
# Convert from LangChain format to the OpenAI format, which our chatbot function expects.
messages = [convert_message_to_dict(m) for m in messages]
# Call the chat bot
chat_bot_response = my_chat_bot(messages)
# Respond with an AI Message
return {"messages": [AIMessage(content=chat_bot_response["content"])]}
def _swap_roles(messages):
new_messages = []
for m in messages:
if isinstance(m, AIMessage):
new_messages.append(HumanMessage(content=m.content))
else:
new_messages.append(AIMessage(content=m.content))
return new_messages
# 这里定义了一个用户的Node,模拟用户的对话请求
def simulated_user_node(state):
messages = state["messages"]
# Swap roles of messages
new_messages = _swap_roles(messages)
# Call the simulated user
response = simulated_user.invoke({"messages": new_messages})
# This response is an AI message - we need to flip this to be a human message
return {"messages": [HumanMessage(content=response)]}
# 这里简单的定义消息的数量超过6条就结束。
def should_continue(state):
messages = state["messages"]
if len(messages) > 6:
return "end"
elif messages[-1].content == "FINISHED":
return "end"
else:
return "continue"
from langgraph.graph import END, StateGraph, START
from langgraph.graph.message import add_messages
from typing import Annotated
from typing_extensions import TypedDict
class State(TypedDict):
messages: Annotated[list, add_messages]
graph_builder = StateGraph(State)
graph_builder.add_node("user", simulated_user_node)
graph_builder.add_node("chat_bot", chat_bot_node)
# Every response from your chat bot will automatically go to the
# simulated user
graph_builder.add_edge("chat_bot", "user")
graph_builder.add_conditional_edges(
"user",
should_continue,
# If the finish criteria are met, we will stop the simulation,
# otherwise, the virtual user's message will be sent to your chat bot
{
"end": END,
"continue": "chat_bot",
},
)
# The input will first go to your chat bot
graph_builder.add_edge(START, "chat_bot")
simulation = graph_builder.compile()
# 这里是主要的测试内容,相互的输出对话消息
for chunk in simulation.stream({"messages": []}):
# Print out all events aside from the final end chunk
if END not in chunk:
print(chunk)
print("----")
最终结果输出示意

RA/SD 衍生者AI训练营。发布者:稻草人,转载请注明出处:https://www.shxcj.com/archives/9610