Code Reader
首页
帮助
设计文档
首页
帮助
设计文档
  • Wegent CRD Domain Model Analysis

Wegent CRD Domain Model Analysis

Executive Summary

The Wegent platform implements a Kubernetes-style CRD (Custom Resource Definition) architecture for managing AI agents. The domain model separates concerns into distinct resource types with clear relationships, utilizing a dual-table storage strategy for performance optimization.


1. Core Database Models

1.1 Kind Model (Base CRD Class)

Location: shared/models/db/kind.py

class Kind(Base):
    """Unified Kind model for all Kubernetes-style resources."""
    
    __tablename__ = "kinds"
    
    id = Column(Integer, primary_key=True, index=True)
    user_id = Column(Integer, nullable=False)
    kind = Column(String(50), nullable=False, index=True)
    name = Column(String(100), nullable=False)
    namespace = Column(String(100), nullable=False, default="default")
    json = Column(JSON, nullable=False)
    is_active = Column(Boolean, default=True)
    created_at = Column(DateTime, default=datetime.now)
    updated_at = Column(DateTime, default=datetime.now, onupdate=datetime.now)

Key Attributes:

AttributeTypeDescription
idIntegerPrimary key, auto-increment
user_idIntegerOwner reference (not FK for flexibility)
kindString(50)Resource type discriminator (Ghost, Model, Shell, Bot, Team, Skill, KnowledgeBase, Retriever, Device)
nameString(100)Resource name within namespace
namespaceString(100)Logical grouping (default: "default")
jsonJSONResource-specific spec and status data
is_activeBooleanSoft delete flag

Design Pattern: Single Table Inheritance (STI) with JSON storage for polymorphic attributes. This provides:

  • Unified querying across all CRD types
  • Schema flexibility via JSON spec storage
  • Namespace-based resource organization

1.2 TaskResource Model (Separate Table)

Location: backend/app/models/task.py

class TaskResource(Base):
    """TaskResource model for Task and Workspace resources.
    
    Separated from kinds table to improve query performance for task-related operations.
    """
    
    __tablename__ = "tasks"
    
    id = Column(Integer, primary_key=True, index=True)
    user_id = Column(Integer, nullable=False, default=0, index=True)
    kind = Column(String(50), nullable=False, index=True)  # "Task" or "Workspace"
    name = Column(String(100), nullable=False, default="")
    namespace = Column(String(100), nullable=False, default="default")
    json = Column(JSON, nullable=False)
    is_active = Column(Boolean, nullable=False, default=True)
    created_at = Column(DateTime, nullable=False, default=datetime.now, index=True)
    updated_at = Column(DateTime, nullable=False, default=datetime.now, onupdate=datetime.now)
    project_id = Column(Integer, nullable=False, default=0, index=True)

Rationale for Separation:

  • Task/Workspace resources have high query frequency
  • Tasks have additional indexing requirements (project_id)
  • Isolates high-volume operations from other CRDs
  • Improves cache locality for task-heavy workloads

1.3 SkillBinary Model (Binary Storage)

Location: shared/models/db/skill_binary.py

class SkillBinary(Base):
    """Skill binary data storage for ZIP packages."""
    
    __tablename__ = "skill_binaries"
    
    id = Column(Integer, primary_key=True, index=True)
    kind_id = Column(Integer, ForeignKey("kinds.id", ondelete="CASCADE"), nullable=False, unique=True)
    binary_data = Column(LargeBinary, nullable=False)  # ZIP package binary data
    file_size = Column(Integer, nullable=False)  # File size in bytes
    file_hash = Column(String(64), nullable=False)  # SHA256 hash
    created_at = Column(DateTime, default=datetime.now)

Key Features:

  • One-to-one relationship with Kind (Skill type only)
  • SHA256 hash for integrity verification
  • Cascading delete when Skill is deleted

1.4 Subtask Model (Message/Conversation Storage)

Location: shared/models/db/subtask.py

class Subtask(Base):
    """Subtask model representing a message in a task conversation."""
    
    __tablename__ = "subtasks"
    
    id = Column(Integer, primary_key=True, index=True)
    user_id = Column(Integer, nullable=False)
    task_id = Column(Integer, nullable=False)  # References TaskResource.id
    team_id = Column(Integer, nullable=False)  # References Kind.id (Team type)
    title = Column(String(256), nullable=False)
    bot_ids = Column(JSON, nullable=False)
    role = Column(SQLEnum(SubtaskRole), nullable=False, default=SubtaskRole.ASSISTANT)
    executor_namespace = Column(String(100))
    executor_name = Column(String(100))
    executor_deleted_at = Column(Boolean, nullable=False, default=False)
    prompt = Column(Text)
    message_id = Column(Integer, nullable=False, default=1)
    parent_id = Column(Integer, nullable=True)  # For threaded conversations
    status = Column(SQLEnum(SubtaskStatus), nullable=False, default=SubtaskStatus.PENDING)
    progress = Column(Integer, nullable=False, default=0)
    result = Column(JSON)
    error_message = Column(Text)
    created_at = Column(DateTime, default=func.now())
    updated_at = Column(DateTime, default=func.now(), onupdate=func.now())
    completed_at = Column(DateTime, nullable=False, default="1970-01-01 00:00:00")
    
    # Group chat fields
    sender_type = Column(String(20), nullable=False, default="")
    sender_user_id = Column(Integer, nullable=False, default=0)
    reply_to_subtask_id = Column(Integer, nullable=False, default=0)

1.5 SubtaskContext Model (Attachment & Knowledge Base)

Location: shared/models/db/subtask_context.py

class SubtaskContext(Base):
    """Subtask context storage for various context types."""
    
    __tablename__ = "subtask_contexts"
    
    id = Column(Integer, primary_key=True, index=True)
    subtask_id = Column(Integer, nullable=False, default=0, index=True)
    user_id = Column(Integer, nullable=False, index=True)
    context_type = Column(String(50), nullable=False, index=True)  # 'attachment', 'knowledge_base', 'table', 'selected_documents'
    name = Column(String(255), nullable=False)
    status = Column(String(20), nullable=False, default=ContextStatus.PENDING.value, index=True)
    error_message = Column(Text, nullable=False, default="")
    binary_data = Column(BinaryDataType, nullable=False, default=b"")  # LONGBLOB for MySQL
    image_base64 = Column(LongTextType, nullable=False, default="")  # For vision models
    extracted_text = Column(LongTextType, nullable=False, default="")
    text_length = Column(Integer, nullable=False, default=0)
    type_data = Column(JSON, nullable=False, default=dict)  # Type-specific metadata
    created_at = Column(DateTime, nullable=False, default=func.now())
    updated_at = Column(DateTime, nullable=False, default=func.now(), onupdate=func.now())

Polymorphic Design: The type_data JSON field stores type-specific attributes:

  • attachment: original_filename, file_extension, file_size, mime_type, storage_backend, storage_key, is_encrypted, encryption_version
  • knowledge_base: knowledge_id, document_count
  • table: url, source_config
  • selected_documents: knowledge_base_id, document_ids[]

2. CRD Type Hierarchy


3. CRD Resource Relationships

3.1 Resource Composition Hierarchy

3.2 Reference Patterns

Reference TypePatternExample
CompositionDirect nesting in JSONBot.spec contains ghostRef, shellRef, modelRef
AssociationID reference with loose couplingSubtask.task_id → TaskResource.id
InheritanceSingle table with kind discriminatorAll CRDs stored in kinds table
Binary AttachmentSeparate table with FKSkillBinary.kind_id → Kind.id

4. CRD Schemas (Pydantic Models)

4.1 Common Base Schemas

class ObjectMeta(BaseModel):
    """Standard Kubernetes object metadata"""
    name: str
    namespace: str = "default"
    displayName: Optional[str] = None
    labels: Optional[Dict[str, str]] = None

class Status(BaseModel):
    """Standard status object"""
    state: str
    message: Optional[str] = None

4.2 Ghost CRD Schema

class GhostSpec(BaseModel):
    systemPrompt: str
    mcpServers: Optional[Dict[str, Any]] = None
    skills: Optional[List[str]] = None  # Skill names list
    preload_skills: Optional[List[str]] = None  # Preloaded skill names

class GhostStatus(Status):
    state: str = "Available"  # Available, Unavailable

class Ghost(BaseModel):
    apiVersion: str = "agent.wecode.io/v1"
    kind: str = "Ghost"
    metadata: ObjectMeta
    spec: GhostSpec
    status: Optional[GhostStatus] = None

4.3 Model CRD Schema (Multi-Type Support)

class ModelCategoryType(str, Enum):
    LLM = "llm"
    TTS = "tts"
    STT = "stt"
    EMBEDDING = "embedding"
    RERANK = "rerank"

class ModelSpec(BaseModel):
    modelConfig: Dict[str, Any]
    isCustomConfig: Optional[bool] = None
    protocol: Optional[str] = None  # 'openai', 'claude', etc.
    apiFormat: Optional[ApiFormat] = None  # 'chat/completions' or 'responses'
    contextWindow: Optional[int] = None
    maxOutputTokens: Optional[int] = None
    modelType: Optional[ModelCategoryType] = ModelCategoryType.LLM
    # Type-specific configs
    ttsConfig: Optional[TTSConfig] = None
    sttConfig: Optional[STTConfig] = None
    embeddingConfig: Optional[EmbeddingConfig] = None
    rerankConfig: Optional[RerankConfig] = None

4.4 Shell CRD Schema

class ShellSpec(BaseModel):
    shellType: str  # 'ClaudeCode', 'Agno', 'Dify', 'Chat'
    supportModel: Optional[List[str]] = None
    baseImage: Optional[str] = None  # Custom Docker image
    baseShellRef: Optional[str] = None  # Reference to base public shell
    requiresWorkspace: Optional[bool] = None  # Auto-inferred if None

Shell Types:

TypeCategoryDescription
ClaudeCodelocal_engineClaude Code SDK in Docker
Agnolocal_engineAgno framework in Docker
Difyexternal_apiExternal Dify API proxy
Chatexternal_apiDirect LLM API (no Docker)

4.5 Bot CRD Schema

class GhostRef(BaseModel):
    name: str
    namespace: str = "default"

class ShellRef(BaseModel):
    name: str
    namespace: str = "default"

class ModelRef(BaseModel):
    name: str
    namespace: str = "default"

class BotSpec(BaseModel):
    ghostRef: GhostRef
    shellRef: ShellRef
    modelRef: Optional[ModelRef] = None  # Optional for some shell types

class Bot(BaseModel):
    apiVersion: str = "agent.wecode.io/v1"
    kind: str = "Bot"
    metadata: ObjectMeta
    spec: BotSpec
    status: Optional[BotStatus] = None

4.6 Team CRD Schema

class TeamMember(BaseModel):
    botRef: BotTeamRef
    prompt: Optional[str] = None  # Role-specific prompt
    role: Optional[str] = None  # 'leader', 'member', etc.
    requireConfirmation: Optional[bool] = False  # Pipeline confirmation

class TeamSpec(BaseModel):
    members: List[TeamMember]
    collaborationModel: str  # 'pipeline', 'route', 'coordinate', 'collaborate'
    bind_mode: Optional[List[str]] = None  # ['chat', 'code'] or empty
    description: Optional[str] = None
    icon: Optional[str] = None
    requiresWorkspace: Optional[bool] = None

Collaboration Models:

ModelDescription
pipelineSequential execution with data flow
routeLeader assigns to appropriate bot
coordinateLeader coordinates parallel execution
collaborateFree-form multi-bot discussion

4.7 Task & Workspace CRD Schemas

class Repository(BaseModel):
    gitUrl: str
    gitRepo: str
    gitRepoId: Optional[int] = None
    branchName: str
    gitDomain: str

class WorkspaceSpec(BaseModel):
    repository: Repository

class TaskSpec(BaseModel):
    title: str
    prompt: str
    teamRef: TeamTaskRef
    workspaceRef: WorkspaceTaskRef
    is_group_chat: bool = False
    knowledgeBaseRefs: Optional[List[KnowledgeBaseTaskRef]] = None
    device_id: Optional[str] = None

class TaskStatus(Status):
    state: str = "Available"
    status: str = "PENDING"  # PENDING, RUNNING, COMPLETED, FAILED, CANCELLED, DELETE
    progress: int = 0
    result: Optional[Dict[str, Any]] = None
    errorMessage: Optional[str] = None
    subTasks: Optional[List[Dict[str, Any]]] = None
    app: Optional[TaskApp] = None  # App preview from expose_service

4.8 Skill CRD Schema

class SkillToolDeclaration(BaseModel):
    name: str
    provider: str
    config: Optional[Dict[str, Any]] = None

class SkillProviderConfig(BaseModel):
    module: str = "provider"  # Python module in skill ZIP
    class_name: str  # Provider class name (alias: 'class')

class SkillSpec(BaseModel):
    description: str  # Trigger condition from SKILL.md
    displayName: Optional[str] = None  # Display during tool use
    prompt: Optional[str] = None  # Full prompt from SKILL.md
    version: Optional[str] = None
    author: Optional[str] = None
    tags: Optional[List[str]] = None
    bindShells: Optional[List[str]] = None  # ['ClaudeCode', 'Agno', 'Dify', 'Chat']
    config: Optional[Dict[str, Any]] = None  # Shared skill config
    tools: Optional[List[SkillToolDeclaration]] = None
    provider: Optional[SkillProviderConfig] = None  # Dynamic provider loading
    mcpServers: Optional[Dict[str, Any]] = None
    source: Optional[SkillSource] = None  # Git import tracking

class SkillStatus(Status):
    state: str = "Available"
    fileSize: Optional[int] = None
    fileHash: Optional[str] = None

5. Status State Machines

5.1 Task Status State Machine

TaskStatus Enum Values:

  • PENDING - Waiting for execution
  • RUNNING - Currently executing
  • COMPLETED - Successfully finished
  • FAILED - Execution failed with error
  • CANCELLED - User cancelled
  • CANCELLING - Cancel in progress
  • DELETE - Marked for deletion
  • PENDING_CONFIRMATION - Pipeline stage complete, awaiting user confirmation

5.2 Subtask Status State Machine

5.3 Context Status State Machine

ContextStatus Values:

  • pending - Initial state
  • uploading - Data being uploaded
  • parsing - Content extraction in progress
  • ready - Available for use
  • failed - Processing failed

6. Database Schema Mapping

6.1 Table Summary

TableModel ClassStoresRecords
kindsKindGhost, Model, Shell, Bot, Team, Skill, KnowledgeBase, Retriever, Device9 CRD types
tasksTaskResourceTask, Workspace2 CRD types
skill_binariesSkillBinarySkill ZIP packages1-to-1 with Skill kinds
subtasksSubtaskTask messages/conversationsHigh volume
subtask_contextsSubtaskContextAttachments, knowledge basesHigh volume

6.2 Index Strategy

High-Performance Indexes:

-- kinds table
INDEX (kind, name, namespace)  -- Resource lookup
INDEX (user_id, kind)          -- User resource lists

-- tasks table  
INDEX (user_id, kind, name, namespace)  -- Unique constraint
INDEX (project_id)             -- Project grouping
INDEX (created_at)             -- Time-series queries

-- subtasks table
INDEX (task_id, created_at)    -- Task message history
INDEX (user_id, created_at)    -- User activity

-- subtask_contexts table
INDEX (subtask_id)             -- Context retrieval
INDEX (user_id, context_type)  -- User context lists

7. Design Patterns Analysis

7.1 Single Table Inheritance (STI)

Application: Kind model stores 9 different CRD types in one table.

Benefits:

  • Unified resource management
  • Simplified querying across types
  • Schema evolution via JSON

Trade-offs:

  • No type-specific constraints at DB level
  • JSON querying limitations
  • Task/Workspace separated for performance

7.2 Reference Pattern (Namespace + Name)

Application: All CRD references use name + namespace tuple.

class GhostRef(BaseModel):
    name: str
    namespace: str = "default"

Benefits:

  • Human-readable references
  • Kubernetes compatibility
  • Easy resource relocation

7.3 Polymorphic Context Storage

Application: SubtaskContext uses context_type + type_data for multiple context types.

Benefits:

  • Extensible to new context types
  • Common fields shared
  • Type-specific data in JSON

7.4 Soft Delete Pattern

Application: All tables have is_active boolean for soft deletion.

Benefits:

  • Data recovery capability
  • Referential integrity preservation
  • Audit trail maintenance

7.5 Dual-Table Strategy

Application: Task/Workspace separated from other CRDs.

Benefits:

  • Query performance optimization
  • Different scaling characteristics
  • Independent indexing strategies

8. Complex Design Decisions

8.1 Why Separate tasks Table?

Problem: Tasks have fundamentally different access patterns:

  • High write volume (message creation)
  • Time-series queries (history)
  • Project grouping requirements

Solution: Separate table with additional project_id column.

Impact:

  • ✅ Improved task query performance
  • ✅ Reduced kinds table contention
  • ❌ More complex model selection logic

8.2 Why No Foreign Key Constraints?

Notable Design: Most ID references (e.g., subtask.user_id, task.team_id) lack FK constraints.

Rationale:

  • Cross-service references (executor service manages executors)
  • Soft delete support without cascade complexity
  • Performance optimization
  • Eventual consistency model

8.3 Skill Binary Storage in Database

Decision: Store Skill ZIP packages in skill_binaries table as LargeBinary.

Alternatives Considered:

  • File system: Simpler but harder to backup/scale
  • Object storage (S3): More complex infrastructure

Trade-off: Database bloat vs. transactional integrity

8.4 Subtask Message Threading

Design: parent_id field enables threaded conversations.

Use Cases:

  • Reply chains in group chat
  • Follow-up questions
  • Conversation branching

9. CRD API Version

All CRDs use consistent API versioning:

apiVersion: agent.wecode.io/v1
kind: <ResourceType>
metadata:
  name: <resource-name>
  namespace: <namespace>
spec:
  # Resource-specific configuration
status:
  state: <state>

10. Key Findings & Recommendations

10.1 Strengths

  1. Kubernetes-Compatible Design: Familiar patterns for DevOps users
  2. Flexible JSON Storage: Rapid iteration without migrations
  3. Clear Separation of Concerns: Ghost/Shell/Model → Bot → Team → Task
  4. Performance Optimization: Dual-table strategy for high-volume resources
  5. Extensible Skill System: Dynamic loading with ZIP packages

10.2 Areas for Investigation

  1. JSON Schema Validation: No DB-level constraints on json columns

    • Risk: Data corruption from application bugs
    • Mitigation: Pydantic validation at application layer
  2. Query Performance: JSON field queries may not use indexes efficiently

    • Investigation: Review query patterns for json column filtering
  3. Skill Binary Storage: Large binary data in database

    • Risk: Table bloat, backup size
    • Consider: Migration to object storage for skills > 1MB
  4. Status State Consistency: Multiple status enums across modules

    • shared/status.py: TaskStatus
    • backend/app/schemas/task.py: TaskStatus
    • shared/models/db/enums.py: SubtaskStatus
    • Investigation: Verify synchronization between definitions
  5. Reference Integrity: No FK constraints means potential orphaned references

    • Risk: References to deleted resources
    • Mitigation: Application-level validation, soft delete pattern

10.3 Design Recommendations

  1. Add JSON Schema Constraints: Consider CHECK constraints for critical fields
  2. Implement Archive Strategy: Move completed tasks to archive tables after N days
  3. Consider Read Replicas: Task/subtask queries are read-heavy
  4. Add Resource Caching: Frequently referenced Ghosts/Models should be cached

11. Terminology Mapping

Code/CRD LevelFrontend UI (Chinese)Frontend UI (English)Description
Team智能体AgentUser-facing AI agent
Bot机器人BotBuilding block component
Task任务TaskExecutable work unit
Workspace工作空间WorkspaceCode repository context
Ghost灵魂GhostSystem prompt & tools
Shell运行环境ShellExecution runtime
Model模型ModelAI model configuration
Skill技能SkillDynamic capabilities

12. Related Documentation

  • Core Concepts
  • Collaboration Models
  • Skill System
  • YAML Specification

Analysis generated: 2026-02-01Version: agent.wecode.io/v1