Skip to content

MiniSWE-Agent Scaffold

📅 发表于 2026/03/07
🔄 更新于 2026/03/07
👁️ -- 次访问
📝 0 字
0 分钟
minisweagent
#miniswe scaffold

Prompt 设计

SWE-Bench Prompt

System Template

bash
You are a helpful assistant that can interact with a computer shell to solve programming tasks.

Instance Template

bash
<pr_description>
Consider the following PR description:
{{task}}
</pr_description>

<instructions>
# Task Instructions

## Overview

You're a software engineer interacting continuously with a computer by submitting commands.
You'll be helping implement necessary changes to meet requirements in the PR description.
Your task is specifically to make changes to non-test files in the current directory in order to fix the issue described in the PR description in a way that is general and consistent with the codebase.
<IMPORTANT>This is an interactive process where you will think and issue AT LEAST ONE command, see the result, then think and issue your next command(s).</important>

For each response:

1. Include a THOUGHT section explaining your reasoning and what you're trying to accomplish
2. Provide one or more bash tool calls to execute

## Important Boundaries

- MODIFY: Regular source code files in /testbed (this is the working directory for all your subsequent commands)
- DO NOT MODIFY: Tests, configuration files (pyproject.toml, setup.cfg, etc.)

## Recommended Workflow

1. Analyze the codebase by finding and reading relevant files
2. Create a script to reproduce the issue
3. Edit the source code to resolve the issue
4. Verify your fix works by running your script again
5. Test edge cases to ensure your fix is robust

## Command Execution Rules

You are operating in an environment where

1. You issue at least one command
2. The system executes the command(s) in a subshell
3. You see the result(s)
4. You write your next command(s)

Each response should include:

1. **Reasoning text** where you explain your analysis and plan
2. At least one tool call with your command

**CRITICAL REQUIREMENTS:**

- Your response SHOULD include reasoning text explaining what you're doing
- Your response MUST include AT LEAST ONE bash tool call. You can make MULTIPLE tool calls in a single response when the commands are independent (e.g., searching multiple files, reading different parts of the codebase).
- Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
- However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files

Example of a CORRECT response:
<example_response>
I need to understand the Builder-related code. Let me find relevant files and check the project structure.

[Makes multiple bash tool calls: {"command": "ls -la"}, {"command": "find src -name '*.java' | grep -i builder"}, {"command": "cat README.md | head -50"}]
</example_response>

## Environment Details

- You have a full Linux shell environment
- Always use non-interactive flags (-y, -f) for commands
- Avoid interactive tools like vi, nano, or any that require user input
- You can use bash commands or invoke any tool that is available in the environment
- You can also create new tools or scripts to help you with the task
- If a tool isn't available, you can also install it

## Submission

When you've completed your work, you MUST submit your changes as a git patch.
Follow these steps IN ORDER, with SEPARATE commands:

Step 1: Create the patch file
Run `git diff -- path/to/file1 path/to/file2 > patch.txt` listing only the source files you modified.
Do NOT commit your changes.

<IMPORTANT>
The patch must only contain changes to the specific source files you modified to fix the issue.
Do not submit file creations or changes to any of the following files:

- test and reproduction files
- helper scripts, tests, or tools that you created
- installation, build, packaging, configuration, or setup scripts unless they are directly part of the issue you were fixing (you can assume that the environment is already set up for your client)
- binary or compiled files
</IMPORTANT>

Step 2: Verify your patch
Inspect patch.txt to confirm it only contains your intended changes and headers show `--- a/` and `+++ b/` paths.

Step 3: Submit (EXACT command required)
You MUST use this EXACT command to submit:

```bash
echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT && cat patch.txt
```

If the command fails (nonzero exit status), it will not submit.

<CRITICAL>
- Creating/viewing the patch and submitting it MUST be separate commands (not combined with &&).
- If you modify patch.txt after verifying, you SHOULD verify again before submitting.
- You CANNOT continue working (reading, editing, testing) in any way on this task after submitting.
</CRITICAL>
</instructions>

Observation Template

bash
{% if output.exception_info -%}
<exception>{{output.exception_info}}</exception>
{% endif -%}
<returncode>{{output.returncode}}</returncode>
{% if output.output | length < 10000 -%}
<output>
{{ output.output -}}
</output>
{%- else -%}
<warning>
The output of your last command was too long.
Please try a different command that produces less output.
If you're looking at a file you can try use head, tail or sed to view a smaller number of lines selectively.
If you're using grep or find and it produced too much output, you can use a more selective search pattern.
If you really need to see something from the full command's output, you can redirect output to a file and then search in that file.
</warning>
{%- set elided_chars = output.output | length - 10000 -%}
<output_head>
{{ output.output[:5000] }}
</output_head>
<elided_chars>
{{ elided_chars }} characters elided
</elided_chars>
<output_tail>
{{ output.output[-5000:] }}
</output_tail>
{%- endif -%}

Format Error Template

bash
Tool call error:

<error>
{{error}}
</error>

Here is general guidance on how to submit correct toolcalls:

Every response needs to use the 'bash' tool at least once to execute commands.

Call the bash tool with your command as the argument:
- Tool: bash
- Arguments: {"command": "your_command_here"}

If you have completed your assignment, please consult the first message about how to
submit your solution (you will not be able to continue working on this task after that).

完整Prompt 示例

Prompt Tokenize 代码

python
from transformers import AutoTokenizer

BASH_TOOL = {
    "type": "function",
    "function": {
        "name": "bash",
        "description": "Execute a bash command",
        "parameters": {
            "type": "object",
            "properties": {
                "command": {
                    "type": "string",
                    "description": "The bash command to execute",
                }
            },
            "required": ["command"],
        },
    },
}

def load_tokenizer_and_messages():
    tokenizer_path = "/mnt/nas/liming.plm/models/Qwen3-14B"
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_fast=True)
    system_prompt = "You are a helpful assistant that can interact with a computer shell to solve programming tasks."
    user_prompt = "你好啊,我是一个开发测试者。"
    tokens = tokenizer(user_prompt, return_tensors="pt", padding=False)
    print("user_prompt 长度", len(tokens["input_ids"][0]), ", user_prompt tokens:", tokens )
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    return tokenizer, messages

def test_tokenize_tools():
    tokenizer, messages = load_tokenizer_and_messages()
    
    tokens_with_gen_prompt = tokenizer.apply_chat_template(
            messages, 
            add_generation_prompt=True, 
            tokenize=True
        )
    print("tokens_with_gen_prompt 原始长度:", len(tokens_with_gen_prompt))
    print("tokens_with_gen_prompt 解码结果", tokenizer.decode(tokens_with_gen_prompt, skip_special_tokens=False))


    tokens_without_gen_prompt = tokenizer.apply_chat_template(
            messages, 
            add_generation_prompt=False, 
            tokenize=True
        )
    print("tokens_without_gen_prompt 原始长度:", len(tokens_without_gen_prompt))
    print("tokens_without_gen_prompt 解码结果", tokenizer.decode(tokens_without_gen_prompt, skip_special_tokens=False))


    # tools = {"bash": BASH_TOOL}
    tool_schemas = [BASH_TOOL]
   
    tokens_with_tools = tokenizer.apply_chat_template(
            messages, 
            add_generation_prompt=True, 
            tokenize=True,
            tools=tool_schemas
        )
    
    print("增加tools 后, token长度:", len(tokens_with_tools), ", tokens:", tokens_with_tools)

    print("skip_special_tokens=True, 解码结果:", tokenizer.decode(tokens_with_tools, skip_special_tokens=True))
    print("skip_special_tokens=False, 解码结果:", tokenizer.decode(tokens_with_tools, skip_special_tokens=False))
    return

user prompt 长度

bash
user_prompt 长度 9 , user_prompt tokens: {'input_ids': tensor([[108386, 103924,   3837,  35946, 101909, 100013,  81705,  28946,   1773]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

tokens_with_gen_prompt

bash
tokens_with_gen_prompt 原始长度: 39
tokens_with_gen_prompt 解码结果:
 <|im_start|>system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.<|im_end|>
<|im_start|>user
你好啊,我是一个开发测试者。<|im_end|>
<|im_start|>assistant

tokens_without_gen_prompt

bash
tokens_without_gen_prompt 原始长度: 36
tokens_without_gen_prompt 解码结果:
 <|im_start|>system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.<|im_end|>
<|im_start|>user
你好啊,我是一个开发测试者。<|im_end|>

tokens_with_tools, skip_special_tokens=True

bash
tokens_with_tools 增加tools 后, token长度: 180

tokens_with_tools skip_special_tokens=True, 解码结果:
 system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "bash", "description": "Execute a bash command", "parameters": {"type": "object", "properties": {"command": {"type": "string", "description": "The bash command to execute"}}, "required": ["command"]}}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
user
你好啊,我是一个开发测试者。
assistant
bash
tokens_with_tools skip_special_tokens=False, 解码结果:
 <|im_start|>system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "bash", "description": "Execute a bash command", "parameters": {"type": "object", "properties": {"command": {"type": "string", "description": "The bash command to execute"}}, "required": ["command"]}}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
你好啊,我是一个开发测试者。<|im_end|>
<|im_start|>assistant

环境Docker代码

启动容器

python
class DockerEnvironment:
	def _start_container(self):
        """Start the Docker container and return the container ID."""
        container_name = f"minisweagent-{uuid.uuid4().hex[:8]}"
        cmd = [
            self.config.executable,
            "run",
            "-d",
            "--name",
            container_name,
            "-w",
            self.config.cwd,
            *self.config.run_args,
            self.config.image,
            "sleep",
            self.config.container_timeout,
        ]
        self.logger.debug(f"Starting container with command: {shlex.join(cmd)}")
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=self.config.pull_timeout,  # docker pull might take a while
            check=True,
        )
        self.logger.info(f"Started container {container_name} with ID {result.stdout.strip()}")
        self.container_id = result.stdout.strip()

执行action

execute action
  • 非交互式,可以执行简单短命令
  • 不能执行长命令,比如git applay patch、eval_script 内容等。
    • eval 需要拷贝文件到容器内部,再执行命令。
  • 返回code:-1 为错误0 为正常
python
def execute(self, action: dict, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
    """Execute a command in the Docker container and return the result as a dict."""
    command = action.get("command", "")
    cwd = cwd or self.config.cwd
    assert self.container_id, "Container not started"

    cmd = [self.config.executable, "exec", "-w", cwd]
    for key in self.config.forward_env:
        if (value := os.getenv(key)) is not None:
            cmd.extend(["-e", f"{key}={value}"])
    for key, value in self.config.env.items():
        cmd.extend(["-e", f"{key}={value}"])
    cmd.extend([self.container_id, *self.config.interpreter, command]) 

    try:
        result = subprocess.run( 
            cmd,
            text=True,
            timeout=timeout or self.config.timeout,
            encoding="utf-8",
            errors="replace",
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
        )
        output = {"output": result.stdout, "returncode": result.returncode, "exception_info": ""} 
    except Exception as e:
        raw_output = getattr(e, "output", None)
        raw_output = (
            raw_output.decode("utf-8", errors="replace") if isinstance(raw_output, bytes) else (raw_output or "")
        )
        output = {
            "output": raw_output,
            "returncode": -1,
            "exception_info": f"An error occurred while executing the command: {e}", 
            "extra": {"exception_type": type(e).__name__, "exception": str(e)}, 
        }
    self._check_finished(output)
    return output

任务完成标志

输出特定字符,则标识完成。后面的内容,则为patch.txt里的内容(使用git 先生成)。具体prompt见上文。

bash
Run `git diff -- path/to/file1 path/to/file2 > patch.txt`

```bash
echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT && cat patch.txt
```

check_finished代码:

python
def _check_finished(self, output: dict):
    """Raises Submitted if the output indicates task completion."""
    lines = output.get("output", "").lstrip().splitlines(keepends=True)
    if lines and lines[0].strip() == "COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT" and output["returncode"] == 0:  
        submission = "".join(lines[1:])  
        raise Submitted(
            {
                "role": "exit",
                "content": submission,
                "extra": {"exit_status": "Submitted", "submission": submission},
            }
        )

Agent循环和中断设计

Agent 主循环

python
def run(self, task: str = "", **kwargs) -> dict:
    """Run step() until agent is finished. Returns dictionary with exit_status, submission keys."""
    self.extra_template_vars |= {"task": task, **kwargs}
    self.messages = []
    self.add_messages(
        self.model.format_message(role="system", content=self._render_template(self.config.system_template)),
        self.model.format_message(role="user", content=self._render_template(self.config.instance_template)),
    )
    while True:
        try:
            self.step()
        except InterruptAgentFlow as e:  
            self.add_messages(*e.messages)  
        except Exception as e:
            self.handle_uncaught_exception(e)
            raise
        finally:
            self.save(self.config.output_path)
        if self.messages[-1].get("role") == "exit":
            break
    return self.messages[-1].get("extra", {})

InterruptAgentFlow 可执行中断

出现这类异常,都可以,都可以继续执行。

python
class InterruptAgentFlow(Exception):
    """Raised to interrupt the agent flow and add messages."""

    def __init__(self, *messages: dict):
        self.messages = messages
        super().__init__()


class Submitted(InterruptAgentFlow):
    """Raised when the agent has completed its task."""


class LimitsExceeded(InterruptAgentFlow):
    """Raised when the agent has exceeded its cost or step limit."""


class UserInterruption(InterruptAgentFlow):
    """Raised when the user interrupts the agent."""


class FormatError(InterruptAgentFlow):  
    """Raised when the LM's output is not in the expected format."""

中断处理

python
def handle_uncaught_exception(self, e: Exception) -> list[dict]:
    return self.add_messages(
        self.model.format_message(
            role="exit",
            content=str(e),
            extra={
                "exit_status": type(e).__name__,
                "submission": "",
                "exception_str": str(e),
                "traceback": traceback.format_exc(),
            },
        )
    )

Tool Call 调用解析

Bash Tool 定义

python
BASH_TOOL = {
  "type": "function",
  "function": {
      "name": "bash",
      "description": "Execute a bash command",
      "parameters": {
          "type": "object",
          "properties": {
              "command": {
                  "type": "string",
                  "description": "The bash command to execute",
              }
          },
          "required": ["command"],  
      },
  },
}

调用LLM输入BashTool

python
def _query(self, messages: list[dict[str, str]], **kwargs):
    try:
        return litellm.completion(
            model=self.config.model_name,
            messages=messages,
            tools=[BASH_TOOL],  
            **(self.config.model_kwargs | kwargs),
        )
    except litellm.exceptions.AuthenticationError as e:
        e.message += " You can permanently set your API key with `mini-extra config set KEY VALUE`."
        raise e

ToolCall 解析

主流程

调用完成后,从response里,解析actions,放到meesage['extra']里。

python
def query(self, messages: list[dict[str, str]], **kwargs) -> dict:
    for attempt in retry(logger=logger, abort_exceptions=self.abort_exceptions):
        with attempt:
            response = self._query(self._prepare_messages_for_api(messages), **kwargs)
    message = response.choices[0].message.model_dump()
    message["extra"] = {
        "actions": self._parse_actions(response),  
        "response": response.model_dump(),
        **cost_output,
        "timestamp": time.time(),
    }
    return message
  
  
def _parse_actions(self, response) -> list[dict]:
    """Parse tool calls from the response. Raises FormatError if unknown tool."""
    tool_calls = response.choices[0].message.tool_calls or []
    actions = parse_toolcall_actions(  
      	tool_calls,  
      	format_error_template=self.config.format_error_template  
    )  
    return actions

解析成可执行Actions

如果没有actions,则抛出格式错误,封装一个格式错误的content。

python
def parse_toolcall_actions(tool_calls: list, *, format_error_template: str) -> list[dict]:
    """Parse tool calls from the response. Raises FormatError if unknown tool or invalid args."""
    if not tool_calls:
        raise FormatError(
            {
                "role": "user",
                "content": Template(format_error_template, undefined=StrictUndefined).render(  
                    error="No tool calls found in the response. Every response MUST include at least one tool call.",
                    actions=[],
                ),
                "extra": {"interrupt_type": "FormatError"},
            }
        )
    actions = []
    for tool_call in tool_calls:
        error_msg = ""
        args = {}
        try:
            args = json.loads(tool_call.function.arguments) 
        except Exception as e:
            error_msg = f"Error parsing tool call arguments: {e}."
        if tool_call.function.name != "bash":
            error_msg += f"Unknown tool '{tool_call.function.name}'."
        if not isinstance(args, dict) or "command" not in args:
            error_msg += "Missing 'command' argument in bash tool call."
        if error_msg:
            raise ...
        actions.append({"command": args["command"], "tool_call_id": tool_call.id})  
    return actions

Tool Call 解析失败

python
def parse_toolcall_actions(tool_calls: list, *, format_error_template: str) -> list[dict]:
    """Parse tool calls from the response. Raises FormatError if unknown tool or invalid args."""
    if not tool_calls:
        raise FormatError( 
            {
                "role": "user", 
                "content": Template(format_error_template, undefined=StrictUndefined).render(   
                    error="No tool calls found in the response. Every response MUST include at least one tool call.", 
                    actions=[], 
                ),
                "extra": {"interrupt_type": "FormatError"}, 
            }
        )
python
def run(self, task: str = "", **kwargs) -> dict:
    # ....
    while True:
        try:
            self.step()
        except InterruptAgentFlow as e:   
            self.add_messages(*e.messages)   
        except Exception as e:
            self.handle_uncaught_exception(e)
            raise
        finally:
            self.save(self.config.output_path)
        if self.messages[-1].get("role") == "exit":
            break
    return self.messages[-1].get("extra", {})

Actions调用

python
def execute_actions(self, message: dict) -> list[dict]:
      """Execute actions in message, add observation messages, return them."""
      outputs = [self.env.execute(action) for action in message.get("extra", {}).get("actions", [])] 
      return self.add_messages(*self.model.format_observation_messages(message, outputs, self.get_template_vars()))

Docker执行Action返回Output

python
def execute(self, action: dict, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
    """Execute a command in the Docker container and return the result as a dict."""
    command = action.get("command", "")
    cwd = cwd or self.config.cwd
    assert self.container_id, "Container not started"

    cmd = [self.config.executable, "exec", "-w", cwd]
    for key in self.config.forward_env:
        if (value := os.getenv(key)) is not None:
            cmd.extend(["-e", f"{key}={value}"])
    for key, value in self.config.env.items():
        cmd.extend(["-e", f"{key}={value}"])
    cmd.extend([self.container_id, *self.config.interpreter, command]) 

    try:
        result = subprocess.run(  
            cmd,  
            text=True,
            timeout=timeout or self.config.timeout,
            encoding="utf-8",
            errors="replace",
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
        )
        output = {"output": result.stdout, "returncode": result.returncode, "exception_info": ""}   
    except Exception as e:
        raw_output = getattr(e, "output", None)
        raw_output = (
            raw_output.decode("utf-8", errors="replace") if isinstance(raw_output, bytes) else (raw_output or "")
        )
        output = { 
            "output": raw_output, 
            "returncode": -1, 
            "exception_info": f"An error occurred while executing the command: {e}",  
            "extra": {"exception_type": type(e).__name__, "exception": str(e)},  
        }
    self._check_finished(output)
    return output

Output做模板封装成Message

在LLM里调用

python
def format_observation_messages(
      self, message: dict, outputs: list[dict], template_vars: dict | None = None
  ) -> list[dict]:
      """Format execution outputs into tool result messages."""
      actions = message.get("extra", {}).get("actions", [])
      return format_toolcall_observation_messages(
          actions=actions,
          outputs=outputs,
          observation_template=self.config.observation_template,
          template_vars=template_vars,
          multimodal_regex=self.config.multimodal_regex,
      )

具体封装:注意未执行action处理。注意模板。

python
def format_toolcall_observation_messages(
    *,
    actions: list[dict],
    outputs: list[dict],
    observation_template: str,
    template_vars: dict | None = None,
    multimodal_regex: str = "",
) -> list[dict]:
    """Format execution outputs into tool result messages."""
    not_executed = {"output": "", "returncode": -1, "exception_info": "action was not executed"}  
    padded_outputs = outputs + [not_executed] * (len(actions) - len(outputs))
    results = []
    for action, output in zip(actions, padded_outputs):
        content = Template(observation_template, undefined=StrictUndefined).render(  
            output=output, **(template_vars or {})  
        )
        msg = {
            "content": content,  
            "extra": {
                "raw_output": output.get("output", ""),
                "returncode": output.get("returncode"),
                "timestamp": time.time(),
                "exception_info": output.get("exception_info"),
                **output.get("extra", {}),
            },
        }
        if "tool_call_id" in action:
            msg["tool_call_id"] = action["tool_call_id"]
            msg["role"] = "tool"
        else:
            msg["role"] = "user"  # human issued commands
        if multimodal_regex:
            msg = expand_multimodal_content(msg, pattern=multimodal_regex)
        results.append(msg)
    return results

示例

模型Response

json
{
  "content": "THOUGHT: The first step is to locate the file where `clear_cache` is defined and understand its current implementation. I will use the `find` command to search for the relevant Python files that 
contain the `clear_cache` function.",
  "role": "assistant",
  "tool_calls": [
    {
      "index": 0,
      "function": {
        "arguments": "{\"command\": \"find . -name '*.py' | xargs grep -l 'def clear_cache('\"}",
        "name": "bash"
      },
      "id": "call_05d5f773d6814a578c56c6",
      "type": "function"
    }
  ],
  "function_call": null,
  "provider_specific_fields": {
    "refusal": null
  }
}

Actions(ToolCalls)

json
[
  {
    "command": "find . -name '*.py' | xargs grep -l 'def clear_cache('",
    "tool_call_id": "call_05d5f773d6814a578c56c6"
  }
]

Output封装Message

xml
<returncode>0</returncode>
<output>
./django/apps/registry.py
./django/contrib/contenttypes/models.py
./django/contrib/sites/models.py
</output>
总访客数:   ·   总访问量:
PLM's Blog @ 2016 - 2026