Learning Agent - 01
LLM决定调用什么工具:返回AI message 工具输出的结果:Tool message
LLM最后要看的部分:messages将所有的Message(包括SystemMessage, HumanMessage, AIMessage, ToolMessage)都放进去,重点表示的是一个“时间线”,让LLM能有记忆功能
追踪流失输出中的event:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
====================
[SystemMessage(content="You are a CTF Reverse Bot running with the ReAct pattern. Your general task is ananlyze the binary and find the flag. If needed, write the solve script in python. Always produce a visible trace for debugging: use 'Thought:' for reasoning, 'Observation:' after tools, and finish with 'Final Answer:'. instead of guessing. Tools are invoked externally, so only name the tool when you call it.", additional_kwargs={}, response_metadata={}), HumanMessage(content='Please analyze the file opened in ida', additional_kwargs={}, response_metadata={})]
============ AI ===========
content='' additional_kwargs={} response_metadata={'model_provider': 'openai'} id='lc_run--019b35c0-fdb3-7562-9aff-d84a77e23dd6' tool_calls=[{'name': 'ida-pro-mcp_entrypoints', 'args': {}, 'id': 'call_K7Hlk4HsbLeVmG6gfT5Uz8iM', 'type': 'tool_call'}] tool_call_chunks=[{'name': 'ida-pro-mcp_entrypoints', 'args': '', 'id': 'call_K7Hlk4HsbLeVmG6gfT5Uz8iM', 'index': 0, 'type': 'tool_call_chunk'}]
============ AI ===========
content='' additional_kwargs={} response_metadata={'model_provider': 'openai'} id='lc_run--019b35c0-fdb3-7562-9aff-d84a77e23dd6' tool_calls=[{'name': '', 'args': {}, 'id': None, 'type': 'tool_call'}] tool_call_chunks=[{'name': None, 'args': '{}', 'id': None, 'index': 0, 'type': 'tool_call_chunk'}]
============ AI ===========
content='' additional_kwargs={} response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-5.1-2025-11-13', 'service_tier': 'default', 'model_provider': 'openai'} id='lc_run--019b35c0-fdb3-7562-9aff-d84a77e23dd6' chunk_position='last'
============ AI ===========
content='' additional_kwargs={} response_metadata={} id='lc_run--019b35c0-fdb3-7562-9aff-d84a77e23dd6' usage_metadata={'input_tokens': 2763, 'output_tokens': 24, 'total_tokens': 2787, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}
============ AI ===========
content='' additional_kwargs={} response_metadata={} id='lc_run--019b35c0-fdb3-7562-9aff-d84a77e23dd6' chunk_position='last'
============ Tool ===========
content='[\n {\n "addr": "0x1234",\n "name": ".term_proc",\n "size": "0xd"\n },\n {\n "addr": "0x1159",\n "name": "valid",\n "size": "0x57"\n },\n {\n "addr": "0x1060",\n "name": "_start",\n "size": "0x26"\n },\n {\n "addr": "0x11b0",\n "name": "main",\n "size": "0x84"\n },\n {\n "addr": "0x1000",\n "name": ".init_proc",\n "size": "0x1b"\n }\n]\n{\n "result": [\n {\n "addr": "0x1234",\n "name": ".term_proc",\n "size": "0xd"\n },\n {\n "addr": "0x1159",\n "name": "valid",\n "size": "0x57"\n },\n {\n "addr": "0x1060",\n "name": "_start",\n "size": "0x26"\n },\n {\n "addr": "0x11b0",\n "name": "main",\n "size": "0x84"\n },\n {\n "addr": "0x1000",\n "name": ".init_proc",\n "size": "0x1b"\n }\n ]\n}' name='ida-pro-mcp_entrypoints' id='a94ecd5d-c1c0-47f6-bf6d-f67ff4ee502e' tool_call_id='call_K7Hlk4HsbLeVmG6gfT5Uz8iM'
TODO:
- 新的问题:原本一次性的输出,可以让tool_node根据关键词判断是否要调用工具;但是改成流式输出后,变成了很多片段,又是如何决定的? answer: 内部使用invoke的时候会同步,保证所有消息输出完(可能是END标志?) 外面的chunk片段是显示给人看的。只不过我们显式控制history message list的时候,不是在langgraph内部做的,因此无法享受到“自动同步”的红利,需要在外部手动合并
- 有没有简单的方法合并chunk
问题:如何合并chunk?如何知道哪些是一组的?按照metadata来分类总感觉不太稳定。但是仔细一想,这样重要的功能,上游一定有实现。仔细看一下chunk,同一组的id是一样的 另外发现了chunk_position字段,在最后会变成last (补充;似乎只有AIMessage有chunk position),那么这样对于tool message就没有必要拼接来
理解BaseMessageChunk, AIMessageChunk, ToolMessageChunk的关系,前者是基类,所以要判断目前输出的message类型,可以很自然地使用isinstance (重要!)
如果把
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
async for message, _metadata in graph.astream(
{"messages": history},
stream_mode="messages",
):
if message is None:
continue
if isinstance(message, ToolMessage):
# print('============ Tool ===========')
# print(message)
print_tool_message(message)
# NOTE: 工具还是会一次性返回,直接转换成message并append
next_history.append(message_chunk_to_message(message))
# history.append(message) BUG: DON'T append chunk to message list
elif isinstance(message, AIMessageChunk):
# NOTE: AI may stream multiple chunks; buffer until the final chunk.
# Ignore pure usage/metadata chunks that arrive without any prior content.
has_substance = bool(message.content) or bool(getattr(message, "tool_call_chunks", None))
if msg_buffer is None and not has_substance:
continue
msg_buffer = message if msg_buffer is None else (msg_buffer + message)
if message.chunk_position == "last":
# Convert the buffered chunks into a full message for history.
next_history.append(message_chunk_to_message(msg_buffer))
msg_buffer = None
# TODO: Neat
if message.content:
chunk_text = format_chunk_content(message.content)
assembled_reply += chunk_text
print(chunk_text, end="", flush=True)
else:
print('=========== XX ===========')
if message.content:
chunk_text = format_chunk_content(message.content)
assembled_reply += chunk_text
print(chunk_text, end="", flush=True)
# Append only complete messages (e.g., AIMessage with tool_calls)
next_history.append(message_chunk_to_message(message))
中的
1
2
3
has_substance = bool(message.content) or bool(getattr(message, "tool_call_chunks", None))
if msg_buffer is None and not has_substance:
continue
删掉,会导致:当llm调用完一次tool后,下一轮继续询问会crash (GPT说空chunk会破坏 Assistance Call Tool 和Tool Result Message的对应结构,所以要丢弃)
TODO
This post is licensed under CC BY 4.0 by the author.