nanobot 会话与记忆系统深度分析

概述

nanobot 的会话与记忆系统是一个轻量级但设计精巧的双存储架构，旨在为 AI 助手提供持久的上下文感知能力。系统分为两个核心模块：

会话管理器 - 管理对话历史，支持跨渠道的会话隔离
记忆系统 - 持久化存储重要信息，分为长期记忆和日记式记录

设计目标

轻量级 - 使用纯文本格式（JSONL、Markdown），无需数据库
渠道隔离 - 不同聊天渠道的会话完全独立
持久化 - 所有数据持久化到磁盘，支持程序重启后恢复
可读性 - 使用人类可读的格式，便于调试和人工审查
内存缓存 - 热数据缓存在内存中，提升性能

会话管理器

类结构

Session 类数据结构

Session 类位于 /home/sujie/dev/github/nanobot/nanobot/session/manager.py:14-59：

@dataclass
class Session:
    key: str  # channel:chat_id
    messages: list[dict[str, Any]] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    updated_at: datetime = field(default_factory=datetime.now)
    metadata: dict[str, Any] = field(default_factory=dict)

消息存储格式

每条消息包含以下字段（session/manager.py:28-37）：

{
    "role": "user" | "assistant",
    "content": "消息内容",
    "timestamp": "2024-01-15T10:30:00.123456",
    # 可选的额外字段通过 **kwargs 注入
}

关键方法

add_message (line 28-37): 添加消息到会话，自动更新 updated_at
get_history (line 39-53): 获取 LLM 格式的消息历史，支持截断
clear (line 55-58): 清空会话消息

SessionManager 职责

初始化 (line 68-71)

def __init__(self, workspace: Path):
    self.workspace = workspace
    self.sessions_dir = ensure_dir(Path.home() / ".nanobot" / "sessions")
    self._cache: dict[str, Session] = {}

关键点：

会话存储在 ~/.nanobot/sessions/ 目录
使用 _cache 字典实现内存缓存，减少磁盘 I/O

会话获取流程

核心方法实现

1. get_or_create (line 78-98) - 获取或创建会话

def get_or_create(self, key: str) -> Session:
    # 检查缓存
    if key in self._cache:
        return self._cache[key]

    # 尝试从磁盘加载
    session = self._load(key)
    if session is None:
        session = Session(key=key)

    self._cache[key] = session
    return session

2. _load (line 100-134) - 从磁盘加载会话

def _load(self, key: str) -> Session | None:
    path = self._get_session_path(key)

    if not path.exists():
        return None

    try:
        messages = []
        metadata = {}
        created_at = None

        with open(path) as f:
            for line in f:
                line = line.strip()
                if not line:
                    continue

                data = json.loads(line)

                if data.get("_type") == "metadata":
                    metadata = data.get("metadata", {})
                    created_at = datetime.fromisoformat(data["created_at"])
                else:
                    messages.append(data)

        return Session(
            key=key,
            messages=messages,
            created_at=created_at or datetime.now(),
            metadata=metadata
        )
    except Exception as e:
        logger.warning(f"Failed to load session {key}: {e}")
        return None

3. save (line 136-154) - 保存会话到磁盘

def save(self, session: Session) -> None:
    path = self._get_session_path(session.key)

    with open(path, "w") as f:
        # 先写入元数据行
        metadata_line = {
            "_type": "metadata",
            "created_at": session.created_at.isoformat(),
            "updated_at": session.updated_at.isoformat(),
            "metadata": session.metadata
        }
        f.write(json.dumps(metadata_line) + "\n")

        # 写入所有消息
        for msg in session.messages:
            f.write(json.dumps(msg) + "\n")

    self._cache[session.key] = session

会话历史截断策略

get_history 方法 (line 39-53) 实现了截断逻辑：

def get_history(self, max_messages: int = 50) -> list[dict[str, Any]]:
    # 获取最近的 max_messages 条
    recent = self.messages[-max_messages:] if len(self.messages) > max_messages else self.messages

    # 转换为 LLM 格式（只保留 role 和 content）
    return [{"role": m["role"], "content": m["content"]} for m in recent]

设计亮点：

默认截断为 50 条消息，控制 token 使用
转换时只保留 role 和 content 字段，符合 LLM API 要求
保留原始消息中的 timestamp 和其他元数据用于持久化

记忆系统架构

类结构

MemoryStore 位于 /home/sujie/dev/github/nanobot/nanobot/agent/memory.py:9-110。

存储结构

长期记忆 (MEMORY.md)

示例文件位于 /home/sujie/dev/github/nanobot/workspace/memory/MEMORY.md:1-24：

# Long-term Memory

This file stores important information that should persist across sessions.

## User Information
(Important facts about the user)

## Preferences
(User preferences learned over time)

## Project Context
(Information about ongoing projects)

## Important Notes
(Things to remember)

---

*This file is automatically updated by nanobot when important information should be remembered.*

日记式记忆 (YYYY-MM-DD.md)

按日期分隔的日记文件，每天一个：

文件路径：workspace/memory/2024-01-15.md
自动添加日期标题：# 2024-01-15

记忆写入策略

1. append_today (line 32-44) - 追加到今日笔记

def append_today(self, content: str) -> None:
    today_file = self.get_today_file()

    if today_file.exists():
        existing = today_file.read_text(encoding="utf-8")
        content = existing + "\n" + content
    else:
        # 新建文件时添加日期标题
        header = f"# {today_date()}\n\n"
        content = header + content

    today_file.write_text(content, encoding="utf-8")

2. write_long_term (line 52-54) - 覆盖写入长期记忆

def write_long_term(self, content: str) -> None:
    self.memory_file.write_text(content, encoding="utf-8")

设计区别：

append_today 是追加模式，适合记录零散信息
write_long_term 是覆盖模式，适合维护结构化知识

持久化设计

JSONL 格式详解

会话文件使用 JSONL (JSON Lines) 格式，每行一个 JSON 对象：

~/.nanobot/sessions/telegram_123456789.jsonl

文件结构：

{"_type":"metadata","created_at":"2024-01-15T10:00:00","updated_at":"2024-01-15T11:30:00","metadata":{}}
{"role":"user","content":"Hello","timestamp":"2024-01-15T10:00:00"}
{"role":"assistant","content":"Hi there!","timestamp":"2024-01-15T10:00:01"}
{"role":"user","content":"How are you?","timestamp":"2024-01-15T11:30:00"}

JSONL 的优势

流式处理 - 可以逐行读取，无需加载整个文件到内存
易于追加 - 新消息只需在文件末尾追加一行
可读性好 - 人类可以轻松查看和编辑
版本控制友好 - git diff 能清晰显示每行变化
容错性强 - 单行损坏不影响其他消息

元数据行设计

第一行作为元数据行，使用 _type: "metadata" 标识：

{
    "_type": "metadata",
    "created_at": "2024-01-15T10:00:00",
    "updated_at": "2024-01-15T11:30:00",
    "metadata": {}
}

优势：

快速读取会话基本信息（无需解析所有消息）
支持未来扩展（可添加更多元数据字段）
便于会话列表功能（list_sessions 方法只需读取第一行）

文件路径生成和安全处理

_get_session_path (line 73-76)：

def _get_session_path(self, key: str) -> Path:
    safe_key = safe_filename(key.replace(":", "_"))
    return self.sessions_dir / f"{safe_key}.jsonl"

safe_filename (utils/helpers.py:69-75)：

def safe_filename(name: str) -> str:
    """Convert a string to a safe filename."""
    # 替换不安全字符
    unsafe = '<>:"/\\|?*'
    for char in unsafe:
        name = name.replace(char, "_")
    return name.strip()

安全处理流程：

将 : 替换为 _（避免路径冲突）
过滤文件系统不支持的字符
移除首尾空格

示例转换：

telegram:123456789 → telegram_123456789.jsonl
cli:direct → cli_direct.jsonl

会话上下文使用

session_key 生成规则

在 bus/events.py:20-23 中定义：

@property
def session_key(self) -> str:
    """Unique key for session identification."""
    return f"{self.channel}:{self.chat_id}"

示例：

Telegram 私聊：telegram:123456789
Telegram 群组：telegram:-1001234567890
CLI：cli:direct

Agent Loop 中的会话流程

代码实现

AgentLoop._process_message (agent/loop.py:123-216):

async def _process_message(self, msg: InboundMessage) -> OutboundMessage | None:
    # 获取或创建会话
    session = self.sessions.get_or_create(msg.session_key)

    # 更新工具上下文
    message_tool = self.tools.get("message")
    if isinstance(message_tool, MessageTool):
        message_tool.set_context(msg.channel, msg.chat_id)

    # 构建初始消息
    messages = self.context.build_messages(
        history=session.get_history(),  # 从会话获取历史
        current_message=msg.content,
        media=msg.media if msg.media else None,
    )

    # Agent Loop 处理...
    # ...

    # 保存到会话
    session.add_message("user", msg.content)
    session.add_message("assistant", final_content)
    self.sessions.save(session)

    return OutboundMessage(
        channel=msg.channel,
        chat_id=msg.chat_id,
        content=final_content
    )

跨渠道会话隔离机制

通过 session_key = "{channel}:{chat_id}" 实现隔离：

渠道	chat_id	session_key	会话文件
Telegram	123456789	`telegram:123456789`	`telegram_123456789.jsonl`
WhatsApp	5511999888777	`whatsapp:5511999888777`	`whatsapp_5511999888777.jsonl`
CLI	direct	`cli:direct`	`cli_direct.jsonl`

隔离效果：

同一用户在不同渠道的对话相互独立
每个渠道可以有独立的会话历史
渠道切换不会干扰其他渠道的上下文

会话历史在 LLM 调用中的作用

ContextBuilder.build_messages (agent/context.py:115-147):

def build_messages(
    self,
    history: list[dict[str, Any]],  # 来自 session.get_history()
    current_message: str,
    skill_names: list[str] | None = None,
    media: list[str] | None = None,
) -> list[dict[str, Any]]:
    messages = []

    # 1. System prompt (包含记忆)
    system_prompt = self.build_system_prompt(skill_names)
    messages.append({"role": "system", "content": system_prompt})

    # 2. Conversation history
    messages.extend(history)

    # 3. Current user message
    user_content = self._build_user_content(current_message, media)
    messages.append({"role": "user", "content": user_content})

    return messages

最终消息结构：

[
  {
    "role": "system",
    "content": "# nanobot 🐈\n\n# Memory\n\n## Long-term Memory\n..."
  },
  {
    "role": "user",
    "content": "What's my name?"
  },
  {
    "role": "assistant",
    "content": "I don't have your name recorded."
  },
  {
    "role": "user",
    "content": "My name is Alice."
  }
]

记忆集成

记忆在系统提示中的位置

ContextBuilder.build_system_prompt (agent/context.py:27-70):

def build_system_prompt(self, skill_names: list[str] | None = None) -> str:
    parts = []

    # 1. 核心身份
    parts.append(self._get_identity())

    # 2. Bootstrap 文件 (AGENTS.md, SOUL.md, USER.md, etc.)
    bootstrap = self._load_bootstrap_files()
    if bootstrap:
        parts.append(bootstrap)

    # 3. Memory 上下文
    memory = self.memory.get_memory_context()  # ← 这里集成记忆
    if memory:
        parts.append(f"# Memory\n\n{memory}")

    # 4. Skills (always-loaded 和 available)
    # ...

    return "\n\n---\n\n".join(parts)

长期记忆和今天笔记的组装逻辑

MemoryStore.get_memory_context (agent/memory.py:90-110):

def get_memory_context(self) -> str:
    """
    Get memory context for the agent.

    Returns:
        Formatted memory context including long-term and recent memories.
    """
    parts = []

    # 长期记忆
    long_term = self.read_long_term()
    if long_term:
        parts.append("## Long-term Memory\n" + long_term)

    # 今天笔记
    today = self.read_today()
    if today:
        parts.append("## Today's Notes\n" + today)

    return "\n\n".join(parts) if parts else ""

系统提示中的记忆部分：

# Memory

## Long-term Memory
# Long-term Memory

This file stores important information that should persist across sessions.

## User Information
(Alice prefers Python over JavaScript)

## Preferences
(Always include code examples in responses)

---

## Today's Notes
# 2024-01-15

- Discussed Python vs JavaScript with Alice
- She is working on a web scraper project

记忆检索策略

当前实现使用完整内容加载策略：

长期记忆 - 每次完整加载 MEMORY.md
今天笔记 - 每次加载今天的 YYYY-MM-DD.md

特点：

简单直接，无需复杂索引
适合小规模记忆文件（<10KB）
随着记忆增长可能需要优化（如实现向量检索）

潜在的优化方向：

使用 get_recent_memories(days=7) 替代完整加载
实现语义搜索（向量化 + 相似度检索）
按重要性分级（必须记住 vs 可选）

关键代码位置索引

模块	文件路径	关键类/方法	行号
会话管理
Session 类	`nanobot/session/manager.py`	`Session`	14-59
SessionManager 类	`nanobot/session/manager.py`	`SessionManager`	61-203
创建/获取会话	`nanobot/session/manager.py`	`get_or_create`	78-98
加载会话	`nanobot/session/manager.py`	`_load`	100-134
保存会话	`nanobot/session/manager.py`	`save`	136-154
获取历史	`nanobot/session/manager.py`	`get_history`	39-53
记忆系统
MemoryStore 类	`nanobot/agent/memory.py`	`MemoryStore`	9-110
读取今日笔记	`nanobot/agent/memory.py`	`read_today`	25-30
追加今日笔记	`nanobot/agent/memory.py`	`append_today`	32-44
读取长期记忆	`nanobot/agent/memory.py`	`read_long_term`	46-50
写入长期记忆	`nanobot/agent/memory.py`	`write_long_term`	52-54
获取最近记忆	`nanobot/agent/memory.py`	`get_recent_memories`	56-80
获取记忆上下文	`nanobot/agent/memory.py`	`get_memory_context`	90-110
上下文构建
ContextBuilder 类	`nanobot/agent/context.py`	`ContextBuilder`	12-218
构建系统提示	`nanobot/agent/context.py`	`build_system_prompt`	27-70
构建消息列表	`nanobot/agent/context.py`	`build_messages`	115-147
Agent Loop
AgentLoop 类	`nanobot/agent/loop.py`	`AgentLoop`	24-330
处理消息	`nanobot/agent/loop.py`	`_process_message`	123-216
事件系统
InboundMessage	`nanobot/bus/events.py`	`session_key` property	20-23
工具函数
安全文件名	`nanobot/utils/helpers.py`	`safe_filename`	69-75
今日日期	`nanobot/utils/helpers.py`	`today_date`	52-54