MiniSWE-Agent Scaffold
📅 发表于 2026/03/07
🔄 更新于 2026/03/07
👁️ -- 次访问
📝 0 字
⏳ 0 分钟
minisweagent
#miniswe scaffold
You are a helpful assistant that can interact with a computer shell to solve programming tasks.<pr_description>
Consider the following PR description:
{{task}}
</pr_description>
<instructions>
# Task Instructions
## Overview
You're a software engineer interacting continuously with a computer by submitting commands.
You'll be helping implement necessary changes to meet requirements in the PR description.
Your task is specifically to make changes to non-test files in the current directory in order to fix the issue described in the PR description in a way that is general and consistent with the codebase.
<IMPORTANT>This is an interactive process where you will think and issue AT LEAST ONE command, see the result, then think and issue your next command(s).</important>
For each response:
1. Include a THOUGHT section explaining your reasoning and what you're trying to accomplish
2. Provide one or more bash tool calls to execute
## Important Boundaries
- MODIFY: Regular source code files in /testbed (this is the working directory for all your subsequent commands)
- DO NOT MODIFY: Tests, configuration files (pyproject.toml, setup.cfg, etc.)
## Recommended Workflow
1. Analyze the codebase by finding and reading relevant files
2. Create a script to reproduce the issue
3. Edit the source code to resolve the issue
4. Verify your fix works by running your script again
5. Test edge cases to ensure your fix is robust
## Command Execution Rules
You are operating in an environment where
1. You issue at least one command
2. The system executes the command(s) in a subshell
3. You see the result(s)
4. You write your next command(s)
Each response should include:
1. **Reasoning text** where you explain your analysis and plan
2. At least one tool call with your command
**CRITICAL REQUIREMENTS:**
- Your response SHOULD include reasoning text explaining what you're doing
- Your response MUST include AT LEAST ONE bash tool call. You can make MULTIPLE tool calls in a single response when the commands are independent (e.g., searching multiple files, reading different parts of the codebase).
- Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
- However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...` or write/load environment variables from files
Example of a CORRECT response:
<example_response>
I need to understand the Builder-related code. Let me find relevant files and check the project structure.
[Makes multiple bash tool calls: {"command": "ls -la"}, {"command": "find src -name '*.java' | grep -i builder"}, {"command": "cat README.md | head -50"}]
</example_response>
## Environment Details
- You have a full Linux shell environment
- Always use non-interactive flags (-y, -f) for commands
- Avoid interactive tools like vi, nano, or any that require user input
- You can use bash commands or invoke any tool that is available in the environment
- You can also create new tools or scripts to help you with the task
- If a tool isn't available, you can also install it
## Submission
When you've completed your work, you MUST submit your changes as a git patch.
Follow these steps IN ORDER, with SEPARATE commands:
Step 1: Create the patch file
Run `git diff -- path/to/file1 path/to/file2 > patch.txt` listing only the source files you modified.
Do NOT commit your changes.
<IMPORTANT>
The patch must only contain changes to the specific source files you modified to fix the issue.
Do not submit file creations or changes to any of the following files:
- test and reproduction files
- helper scripts, tests, or tools that you created
- installation, build, packaging, configuration, or setup scripts unless they are directly part of the issue you were fixing (you can assume that the environment is already set up for your client)
- binary or compiled files
</IMPORTANT>
Step 2: Verify your patch
Inspect patch.txt to confirm it only contains your intended changes and headers show `--- a/` and `+++ b/` paths.
Step 3: Submit (EXACT command required)
You MUST use this EXACT command to submit:
```bash
echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT && cat patch.txt
```
If the command fails (nonzero exit status), it will not submit.
<CRITICAL>
- Creating/viewing the patch and submitting it MUST be separate commands (not combined with &&).
- If you modify patch.txt after verifying, you SHOULD verify again before submitting.
- You CANNOT continue working (reading, editing, testing) in any way on this task after submitting.
</CRITICAL>
</instructions>{% if output.exception_info -%}
<exception>{{output.exception_info}}</exception>
{% endif -%}
<returncode>{{output.returncode}}</returncode>
{% if output.output | length < 10000 -%}
<output>
{{ output.output -}}
</output>
{%- else -%}
<warning>
The output of your last command was too long.
Please try a different command that produces less output.
If you're looking at a file you can try use head, tail or sed to view a smaller number of lines selectively.
If you're using grep or find and it produced too much output, you can use a more selective search pattern.
If you really need to see something from the full command's output, you can redirect output to a file and then search in that file.
</warning>
{%- set elided_chars = output.output | length - 10000 -%}
<output_head>
{{ output.output[:5000] }}
</output_head>
<elided_chars>
{{ elided_chars }} characters elided
</elided_chars>
<output_tail>
{{ output.output[-5000:] }}
</output_tail>
{%- endif -%}Tool call error:
<error>
{{error}}
</error>
Here is general guidance on how to submit correct toolcalls:
Every response needs to use the 'bash' tool at least once to execute commands.
Call the bash tool with your command as the argument:
- Tool: bash
- Arguments: {"command": "your_command_here"}
If you have completed your assignment, please consult the first message about how to
submit your solution (you will not be able to continue working on this task after that).from transformers import AutoTokenizer
BASH_TOOL = {
"type": "function",
"function": {
"name": "bash",
"description": "Execute a bash command",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The bash command to execute",
}
},
"required": ["command"],
},
},
}
def load_tokenizer_and_messages():
tokenizer_path = "/mnt/nas/liming.plm/models/Qwen3-14B"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_fast=True)
system_prompt = "You are a helpful assistant that can interact with a computer shell to solve programming tasks."
user_prompt = "你好啊,我是一个开发测试者。"
tokens = tokenizer(user_prompt, return_tensors="pt", padding=False)
print("user_prompt 长度", len(tokens["input_ids"][0]), ", user_prompt tokens:", tokens )
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
return tokenizer, messages
def test_tokenize_tools():
tokenizer, messages = load_tokenizer_and_messages()
tokens_with_gen_prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True
)
print("tokens_with_gen_prompt 原始长度:", len(tokens_with_gen_prompt))
print("tokens_with_gen_prompt 解码结果", tokenizer.decode(tokens_with_gen_prompt, skip_special_tokens=False))
tokens_without_gen_prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=False,
tokenize=True
)
print("tokens_without_gen_prompt 原始长度:", len(tokens_without_gen_prompt))
print("tokens_without_gen_prompt 解码结果", tokenizer.decode(tokens_without_gen_prompt, skip_special_tokens=False))
# tools = {"bash": BASH_TOOL}
tool_schemas = [BASH_TOOL]
tokens_with_tools = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
tools=tool_schemas
)
print("增加tools 后, token长度:", len(tokens_with_tools), ", tokens:", tokens_with_tools)
print("skip_special_tokens=True, 解码结果:", tokenizer.decode(tokens_with_tools, skip_special_tokens=True))
print("skip_special_tokens=False, 解码结果:", tokenizer.decode(tokens_with_tools, skip_special_tokens=False))
returnuser prompt 长度
user_prompt 长度 9 , user_prompt tokens: {'input_ids': tensor([[108386, 103924, 3837, 35946, 101909, 100013, 81705, 28946, 1773]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}tokens_with_gen_prompt
tokens_with_gen_prompt 原始长度: 39
tokens_with_gen_prompt 解码结果:
<|im_start|>system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.<|im_end|>
<|im_start|>user
你好啊,我是一个开发测试者。<|im_end|>
<|im_start|>assistanttokens_without_gen_prompt
tokens_without_gen_prompt 原始长度: 36
tokens_without_gen_prompt 解码结果:
<|im_start|>system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.<|im_end|>
<|im_start|>user
你好啊,我是一个开发测试者。<|im_end|>tokens_with_tools, skip_special_tokens=True
tokens_with_tools 增加tools 后, token长度: 180
tokens_with_tools skip_special_tokens=True, 解码结果:
system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "bash", "description": "Execute a bash command", "parameters": {"type": "object", "properties": {"command": {"type": "string", "description": "The bash command to execute"}}, "required": ["command"]}}}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
user
你好啊,我是一个开发测试者。
assistanttokens_with_tools skip_special_tokens=False, 解码结果:
<|im_start|>system
You are a helpful assistant that can interact with a computer shell to solve programming tasks.
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "bash", "description": "Execute a bash command", "parameters": {"type": "object", "properties": {"command": {"type": "string", "description": "The bash command to execute"}}, "required": ["command"]}}}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
你好啊,我是一个开发测试者。<|im_end|>
<|im_start|>assistantclass DockerEnvironment:
def _start_container(self):
"""Start the Docker container and return the container ID."""
container_name = f"minisweagent-{uuid.uuid4().hex[:8]}"
cmd = [
self.config.executable,
"run",
"-d",
"--name",
container_name,
"-w",
self.config.cwd,
*self.config.run_args,
self.config.image,
"sleep",
self.config.container_timeout,
]
self.logger.debug(f"Starting container with command: {shlex.join(cmd)}")
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=self.config.pull_timeout, # docker pull might take a while
check=True,
)
self.logger.info(f"Started container {container_name} with ID {result.stdout.strip()}")
self.container_id = result.stdout.strip()简单短命令。不能执行长命令,比如git applay patch、eval_script 内容等。 -1 为错误,0 为正常。def execute(self, action: dict, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
"""Execute a command in the Docker container and return the result as a dict."""
command = action.get("command", "")
cwd = cwd or self.config.cwd
assert self.container_id, "Container not started"
cmd = [self.config.executable, "exec", "-w", cwd]
for key in self.config.forward_env:
if (value := os.getenv(key)) is not None:
cmd.extend(["-e", f"{key}={value}"])
for key, value in self.config.env.items():
cmd.extend(["-e", f"{key}={value}"])
cmd.extend([self.container_id, *self.config.interpreter, command])
try:
result = subprocess.run(
cmd,
text=True,
timeout=timeout or self.config.timeout,
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
output = {"output": result.stdout, "returncode": result.returncode, "exception_info": ""}
except Exception as e:
raw_output = getattr(e, "output", None)
raw_output = (
raw_output.decode("utf-8", errors="replace") if isinstance(raw_output, bytes) else (raw_output or "")
)
output = {
"output": raw_output,
"returncode": -1,
"exception_info": f"An error occurred while executing the command: {e}",
"extra": {"exception_type": type(e).__name__, "exception": str(e)},
}
self._check_finished(output)
return output输出特定字符,则标识完成。后面的内容,则为patch.txt里的内容(使用git 先生成)。具体prompt见上文。
Run `git diff -- path/to/file1 path/to/file2 > patch.txt`
```bash
echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT && cat patch.txt
```check_finished代码:
def _check_finished(self, output: dict):
"""Raises Submitted if the output indicates task completion."""
lines = output.get("output", "").lstrip().splitlines(keepends=True)
if lines and lines[0].strip() == "COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT" and output["returncode"] == 0:
submission = "".join(lines[1:])
raise Submitted(
{
"role": "exit",
"content": submission,
"extra": {"exit_status": "Submitted", "submission": submission},
}
)def run(self, task: str = "", **kwargs) -> dict:
"""Run step() until agent is finished. Returns dictionary with exit_status, submission keys."""
self.extra_template_vars |= {"task": task, **kwargs}
self.messages = []
self.add_messages(
self.model.format_message(role="system", content=self._render_template(self.config.system_template)),
self.model.format_message(role="user", content=self._render_template(self.config.instance_template)),
)
while True:
try:
self.step()
except InterruptAgentFlow as e:
self.add_messages(*e.messages)
except Exception as e:
self.handle_uncaught_exception(e)
raise
finally:
self.save(self.config.output_path)
if self.messages[-1].get("role") == "exit":
break
return self.messages[-1].get("extra", {})出现这类异常,都可以,都可以继续执行。
class InterruptAgentFlow(Exception):
"""Raised to interrupt the agent flow and add messages."""
def __init__(self, *messages: dict):
self.messages = messages
super().__init__()
class Submitted(InterruptAgentFlow):
"""Raised when the agent has completed its task."""
class LimitsExceeded(InterruptAgentFlow):
"""Raised when the agent has exceeded its cost or step limit."""
class UserInterruption(InterruptAgentFlow):
"""Raised when the user interrupts the agent."""
class FormatError(InterruptAgentFlow):
"""Raised when the LM's output is not in the expected format."""def handle_uncaught_exception(self, e: Exception) -> list[dict]:
return self.add_messages(
self.model.format_message(
role="exit",
content=str(e),
extra={
"exit_status": type(e).__name__,
"submission": "",
"exception_str": str(e),
"traceback": traceback.format_exc(),
},
)
)参考代码
BASH_TOOL = {
"type": "function",
"function": {
"name": "bash",
"description": "Execute a bash command",
"parameters": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "The bash command to execute",
}
},
"required": ["command"],
},
},
}def _query(self, messages: list[dict[str, str]], **kwargs):
try:
return litellm.completion(
model=self.config.model_name,
messages=messages,
tools=[BASH_TOOL],
**(self.config.model_kwargs | kwargs),
)
except litellm.exceptions.AuthenticationError as e:
e.message += " You can permanently set your API key with `mini-extra config set KEY VALUE`."
raise e调用完成后,从response里,解析actions,放到meesage['extra']里。
def query(self, messages: list[dict[str, str]], **kwargs) -> dict:
for attempt in retry(logger=logger, abort_exceptions=self.abort_exceptions):
with attempt:
response = self._query(self._prepare_messages_for_api(messages), **kwargs)
message = response.choices[0].message.model_dump()
message["extra"] = {
"actions": self._parse_actions(response),
"response": response.model_dump(),
**cost_output,
"timestamp": time.time(),
}
return message
def _parse_actions(self, response) -> list[dict]:
"""Parse tool calls from the response. Raises FormatError if unknown tool."""
tool_calls = response.choices[0].message.tool_calls or []
actions = parse_toolcall_actions(
tool_calls,
format_error_template=self.config.format_error_template
)
return actions如果没有actions,则抛出格式错误,封装一个格式错误的content。
def parse_toolcall_actions(tool_calls: list, *, format_error_template: str) -> list[dict]:
"""Parse tool calls from the response. Raises FormatError if unknown tool or invalid args."""
if not tool_calls:
raise FormatError(
{
"role": "user",
"content": Template(format_error_template, undefined=StrictUndefined).render(
error="No tool calls found in the response. Every response MUST include at least one tool call.",
actions=[],
),
"extra": {"interrupt_type": "FormatError"},
}
)
actions = []
for tool_call in tool_calls:
error_msg = ""
args = {}
try:
args = json.loads(tool_call.function.arguments)
except Exception as e:
error_msg = f"Error parsing tool call arguments: {e}."
if tool_call.function.name != "bash":
error_msg += f"Unknown tool '{tool_call.function.name}'."
if not isinstance(args, dict) or "command" not in args:
error_msg += "Missing 'command' argument in bash tool call."
if error_msg:
raise ...
actions.append({"command": args["command"], "tool_call_id": tool_call.id})
return actionsdef parse_toolcall_actions(tool_calls: list, *, format_error_template: str) -> list[dict]:
"""Parse tool calls from the response. Raises FormatError if unknown tool or invalid args."""
if not tool_calls:
raise FormatError(
{
"role": "user",
"content": Template(format_error_template, undefined=StrictUndefined).render(
error="No tool calls found in the response. Every response MUST include at least one tool call.",
actions=[],
),
"extra": {"interrupt_type": "FormatError"},
}
)def run(self, task: str = "", **kwargs) -> dict:
# ....
while True:
try:
self.step()
except InterruptAgentFlow as e:
self.add_messages(*e.messages)
except Exception as e:
self.handle_uncaught_exception(e)
raise
finally:
self.save(self.config.output_path)
if self.messages[-1].get("role") == "exit":
break
return self.messages[-1].get("extra", {})def execute_actions(self, message: dict) -> list[dict]:
"""Execute actions in message, add observation messages, return them."""
outputs = [self.env.execute(action) for action in message.get("extra", {}).get("actions", [])]
return self.add_messages(*self.model.format_observation_messages(message, outputs, self.get_template_vars()))def execute(self, action: dict, cwd: str = "", *, timeout: int | None = None) -> dict[str, Any]:
"""Execute a command in the Docker container and return the result as a dict."""
command = action.get("command", "")
cwd = cwd or self.config.cwd
assert self.container_id, "Container not started"
cmd = [self.config.executable, "exec", "-w", cwd]
for key in self.config.forward_env:
if (value := os.getenv(key)) is not None:
cmd.extend(["-e", f"{key}={value}"])
for key, value in self.config.env.items():
cmd.extend(["-e", f"{key}={value}"])
cmd.extend([self.container_id, *self.config.interpreter, command])
try:
result = subprocess.run(
cmd,
text=True,
timeout=timeout or self.config.timeout,
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
output = {"output": result.stdout, "returncode": result.returncode, "exception_info": ""}
except Exception as e:
raw_output = getattr(e, "output", None)
raw_output = (
raw_output.decode("utf-8", errors="replace") if isinstance(raw_output, bytes) else (raw_output or "")
)
output = {
"output": raw_output,
"returncode": -1,
"exception_info": f"An error occurred while executing the command: {e}",
"extra": {"exception_type": type(e).__name__, "exception": str(e)},
}
self._check_finished(output)
return output在LLM里调用
def format_observation_messages(
self, message: dict, outputs: list[dict], template_vars: dict | None = None
) -> list[dict]:
"""Format execution outputs into tool result messages."""
actions = message.get("extra", {}).get("actions", [])
return format_toolcall_observation_messages(
actions=actions,
outputs=outputs,
observation_template=self.config.observation_template,
template_vars=template_vars,
multimodal_regex=self.config.multimodal_regex,
)具体封装:注意未执行action处理。注意模板。
def format_toolcall_observation_messages(
*,
actions: list[dict],
outputs: list[dict],
observation_template: str,
template_vars: dict | None = None,
multimodal_regex: str = "",
) -> list[dict]:
"""Format execution outputs into tool result messages."""
not_executed = {"output": "", "returncode": -1, "exception_info": "action was not executed"}
padded_outputs = outputs + [not_executed] * (len(actions) - len(outputs))
results = []
for action, output in zip(actions, padded_outputs):
content = Template(observation_template, undefined=StrictUndefined).render(
output=output, **(template_vars or {})
)
msg = {
"content": content,
"extra": {
"raw_output": output.get("output", ""),
"returncode": output.get("returncode"),
"timestamp": time.time(),
"exception_info": output.get("exception_info"),
**output.get("extra", {}),
},
}
if "tool_call_id" in action:
msg["tool_call_id"] = action["tool_call_id"]
msg["role"] = "tool"
else:
msg["role"] = "user" # human issued commands
if multimodal_regex:
msg = expand_multimodal_content(msg, pattern=multimodal_regex)
results.append(msg)
return results{
"content": "THOUGHT: The first step is to locate the file where `clear_cache` is defined and understand its current implementation. I will use the `find` command to search for the relevant Python files that
contain the `clear_cache` function.",
"role": "assistant",
"tool_calls": [
{
"index": 0,
"function": {
"arguments": "{\"command\": \"find . -name '*.py' | xargs grep -l 'def clear_cache('\"}",
"name": "bash"
},
"id": "call_05d5f773d6814a578c56c6",
"type": "function"
}
],
"function_call": null,
"provider_specific_fields": {
"refusal": null
}
}[
{
"command": "find . -name '*.py' | xargs grep -l 'def clear_cache('",
"tool_call_id": "call_05d5f773d6814a578c56c6"
}
]<returncode>0</returncode>
<output>
./django/apps/registry.py
./django/contrib/contenttypes/models.py
./django/contrib/sites/models.py
</output>