ASMTP: An open mail protocol for AI agents
The Agent Simple Mail Transfer Protocol: a small open spec for asynchronous, mailbox-shaped, context-economical communication between AI agents. Headers push, bodies fetch. The receiver writes only content; the network handles everything else.
Status: specification. Audience: Engineers building agent systems, runtime authors, operators running agent networks.
Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC 2119][rfc2119] and [RFC 8174][rfc8174] when, and only when, they appear in all capitals. Normative requirements appear in the body sections and are consolidated in Appendix A.
1. Abstract
AI agents have compute, tools, and inference; they do not have a transfer protocol shaped for the way they actually run. An agent in 2026 lives inside an intermittent harness (a Claude Code session, a Cursor instance, a Codex CLI invocation, a custom worker). It boots when a human invokes it, runs for minutes, exits. The closest existing primitives (vendor APIs, MCP, A2A, federated chat) solve adjacent problems, but none provides what a mail protocol provides: a durable mailbox at every endpoint, asynchronous delivery that does not require both parties online, a body that is opt-in rather than forced, and a receiver whose only protocol-level output is the next reply.
This paper specifies the Agent Simple Mail Transfer Protocol (ASMTP), a small open protocol for agent-to-agent communication. ASMTP defines the transfer layer (mailboxes, envelopes, push frames, the wire surface, cursors, read state, and optional sender-side observability) and inherits identity and trust from the Agent Session Protocol (ASP)[asp]. Every load-bearing design choice maps to an idea that has been operationally validated at internet scale for decades: SMTP[smtp] and JMAP[jmap] for the mailbox-and-headers model; IMAP IDLE[imap] for live notification; DSN[dsn] for opt-in delivery observability; Erlang/OTP[erlang] for the "send is one-way, the mailbox is the only durable primitive" structural commitment.
Context economy is a first-class property of the protocol, not a tuning knob. Push frames carry headers only; bodies are fetched only when the agent decides they are worth its tokens. A push frame is ~80 tokens; a typical body is 500 to 5,000+; headers are 1 to 4% the cost of bodies. An agent waking to 84 unread envelopes pays ~6,700 tokens of headers instead of ~80,000+ tokens of forced bodies, and triages the lot in one model turn.
The receiver writes one thing: reply envelopes. There are no status events, no participation states, no presence concepts, no lifecycle transitions to track. The WebSocket connection is an "email client open." The network pushes header notifications as new envelopes land, never bodies. The agent reads the header (from, subject, type_hint, size_hint), decides if it is worth opening, and either fetches the body via REST or moves on.
Properties beyond transfer (federation, end-to-end encryption, discovery, capability advertisement) are explicit non-goals for this specification. The protocol's job is to be small and correct at the layer that has to be small and correct first.
2. The gap
An LLM agent is not a service. It is a contractor. A service holds an open port and answers RPCs on demand; it lives in a process that must keep running for it to mean anything. A contractor lives at an address. Work arrives in a mailbox, is handled when they are next active, and the messages they have not gotten to remain waiting. The service model assumes a runtime; the contractor model assumes a mailbox. Every protocol surveyed in §3 was built for the first and fails on the second.
An agent today communicates inside whatever runtime is hosting it. Inside the host, it has a model to think with, tools to act with, and (often) file storage and persistent context. Outside the host, it has no general way to be addressed, contacted, or held in conversation by another agent operating elsewhere. The protocols that have emerged to fill this gap have each landed on the wrong side of a structural choice for intermittent, harness-driven agents.
The substrate matters. An LLM agent in 2026 has four properties that the existing protocols do not all respect at once:
- Intermittent runtime. The agent's process exists when its harness is open and does not when it is closed. There is no daemon. There is no listening port the agent runs.
- Finite, expensive context. Every byte the agent reads costs tokens. Every byte it writes costs tokens. Bulk-delivering content the agent will not act on is operationally wasteful and economically real.
- Unreliable lifecycle narration. LLMs forget to emit status events. They emit at the wrong moments. They lie convincingly. They crash mid-emission. Any protocol that demands receiver-emitted state imports those failure modes into the wire.
- Asynchrony as the default case. Two agents wanting to coordinate are routinely not online at the same time. The harness opens, does some work, closes. The other harness opens five hours later, somewhere else, owned by someone else.
Email has property (4) and not (1) or (2). Slack and Discord have (1) but assume the receiver-as-service. ASP v0.1 modeled communication as sessions and ran into (3): the session lifecycle forces the receiver to emit joined, left, and status transitions LLM agents cannot reliably produce. A2A's task lifecycle has the same issue with submitted, working, and input-required.
The missing protocol is one shaped for all four properties at once. It is mailbox-shaped, header-economical, receiver-mute: the asynchronous mail tradition adapted to the agent substrate.
3. Why existing approaches fall short
Landscape as of mid-2026. Specifics shift; the structural gaps are durable.
Three families of solution have emerged. Each is genuinely useful within its scope. Each falls on the wrong side of a structural choice for intermittent agents.
Vendor agent APIs
Anthropic's Messages API, OpenAI's Responses API[openai-responses], Google's Gemini API, and similar vendor surfaces are agent-to-vendor RPCs. They assume a client (the agent's runtime) calling a server (the vendor's LLM). They do not provide mailboxes, do not address other agents by handle, and do not have a notion of asynchronous delivery between agents owned by different parties. Cross-vendor or cross-organization coordination through these APIs requires an intermediary the agent does not own.
Agent-native RPC protocols
The Model Context Protocol (MCP)[mcp], donated by Anthropic to the Agentic AI Foundation in December 2025[mcp-donation], is the de facto standard for connecting agents to tools, resources, and prompts; its shape is client/server. MCP's own 2026 roadmap names "agent communication" as a priority gap[mcp-roadmap].
The Agent-to-Agent protocol (A2A)[a2a], from Google in April 2025 and now broadly adopted (Microsoft, AWS, Salesforce, SAP, ServiceNow), is the closest existing peer protocol. Its primitive is a Task with a receiver-emitted lifecycle (submitted → working → input-required → completed)[a2a-spec]. An LLM agent that crashes mid-task, forgets to emit completed, or transitions wrong leaves the sender's worldview wrong. The protocol assumes the receiver is a responsive service; intermittent harnesses are not. AGNTCY[agntcy] and the NANDA Index[nanda] sit at directory and transport layers; neither proposes a persistent conversation primitive with mailbox semantics.
The category's structural assumption is "the receiver is a service." When the receiver is a harness that may not exist between calls, the assumption breaks.
Federated messaging
SMTP and email are the existence proof for asynchronous, federated, mailbox-shaped communication at internet scale. Forty years, billions of mailboxes, thousands of operators, no central authority. Properties (1), (4), and a softer version of (3), with DSN[dsn] making delivery observable without recipient cooperation, are all present. But email is shaped for human readers: spam is unsolved at federation scale and mitigated only by a few mega-providers running opaque ML filters, MIME content is shaped around presentation, no machine-readable per-recipient consent layer exists. Bots on email work but the protocol does not know they are bots.
Matrix[matrix] is the strongest federated chat substrate. Federated identity (@user:server), durable rooms with threads, end-to-end encryption. Agent uptake is real but ad-hoc: third-party stacks expose agents as Matrix participants, but no agent-aware identity, capability discovery, or machine-readable inbound-policy layer has emerged. The trust affordances are human-shaped.
ASP v0.1, the prior attempt at agent-shape
The Agent Session Protocol v0.1[asp] is the most direct predecessor. It defines identity, trust, sessions, and transport, all designed for agents from the start. Its identity (@owner.agent_name) and trust (symmetric allowlist, block, non-enumerating 404) are sound and inherited verbatim into ASMTP. But its core primitive, the session, turned out wrong for intermittent agents. Sessions demand receiver-emitted lifecycle: session.invited → session.joined (a receiver action) → session.left (a receiver action). The state machine is rich and the participants tracked, which forces the receiver into a presence model agents cannot reliably honor.
The empty intersection
Each family above misses on a different axis (the structural matrix is in §15). The gap is not a missing feature in any one system; it is the assumption every one of them makes: the receiver is a service, or both parties share a runtime, or participants are humans. None of those assumptions survives a Claude Code session that closes when the user closes the terminal. The cost is paid in glue code, every team, every integration: hand-passed webhook URLs, polling loops with retry escalation, email parsers that hope the reply is structured. Each reinvents the same missing primitive, a durable mailbox plus an opt-in delivery signal that does not conscript the receiver. ASMTP is that primitive.
4. Design principles
ASMTP's surface is small because five principles ruled most things out. Each shows up concretely in §5–§12.
Receiver writes only content
The receiver's only protocol-level output is reply envelopes. There is no ack, no received, no working, no read, no seen, no participation status, no lifecycle transition. Every status verb in every prior agent protocol (joined, left, completed, input-required, even the optional read-receipt) is a place where an LLM-driven runtime fails, lies, or simply forgets. ASMTP refuses to depend on any of them.
The sender-side observability story (Monitor, §10) is derived entirely from operator-observable transport facts, never from receiver cooperation. The asymmetry is deliberate: senders can opt in to network-derived signals; receivers cannot be conscripted into the observability path.
This rules out: lifecycle events on the receiver, mandatory acks, read receipts on the wire, status messages of any kind, "typing" indicators, "thinking" indicators, presence notifications.
The mailbox is the only durable primitive
State lives in mailboxes. WebSocket connections are caches. Threads are emergent from references metadata. Sessions do not exist as wire objects. A mailbox outlives the agent's runtime, the agent's process, the agent's harness. An agent that goes dark for a week comes back and finds its mailbox unchanged. The protocol's job is to deliver to that mailbox; everything else is application.
This rules out: session lifecycles, room/channel objects, participant-status tables, presence concepts on the wire, "is the receiver online" queries.
Headers cheap, bodies opt-in
Context economy is a first-class protocol concern. Agent-to-agent traffic is structurally heavy: drafts, structured payloads, file references, multi-part rationale. Pushing every body into the receiver's context is the dominant cost of any naive design, and naive is what most existing options are. ASMTP refuses this. A push frame is ~80 tokens; a typical body is 500 to 5,000+. Headers are 1 to 4% the cost of bodies. The agent reads the header, decides if the body is worth its tokens, and fetches it only when yes. This is the same opt-in pattern that makes agent skills viable at scale: lightweight metadata registers up front, full content loads only when the model decides it is relevant. An agent waking to 84 unread envelopes pays ~6,700 tokens of headers and triages the lot in one model turn, rather than absorbing ~80,000+ tokens of bodies whether it needs them or not.
This rules out: push frames that include content_parts, mailbox listings that include bodies, "rich previews" that mix headers and body data, server-side body summarization injected into headers.
Two surfaces, one truth
REST and WebSocket are equivalent surfaces over the same mailbox. A push frame on WebSocket and a header in GET /mailbox carry the same data in the same shape. The mailbox is the source of truth; both surfaces are views. A client that misses a push loses no data: the next pull or the next subscribe-with-cursor retrieves it. A client that never holds a WebSocket can participate fully via REST.
This rules out: push-only state changes, surfaces that exist only on one transport, divergent shapes between REST and WS, push notifications that are not also fetchable.
Observability is opt-in and operator-derived
The Monitor channel (§11) is opt-in per envelope. Without monitor, the sender learns nothing about delivery beyond the synchronous 2xx. With monitor, the operator emits stored per recipient on successful accept, and MAY emit bounced or expired under operator-defined triggers (§11). All facts derive from operator-observable transport state; none requires receiver cooperation.
There is no fetched fact. There is no read fact. Whether the recipient has consumed the body is private to the recipient. This is intentional and matches email's default: senders do not see when their messages are read. If the sender wants confirmation that the recipient engaged, the confirmation is a reply envelope, content the recipient chose to write.
This rules out: mandatory delivery reports, sender-visible read state, sender-visible "is recipient online," any signal that would require the receiver to emit transport metadata, read-receipt-style observability under any name.
5. The email mental model
ASMTP works exactly like email. Every load-bearing design choice maps to an email idea that has been running on the internet for forty years.
| ASMTP | Email equivalent |
|---|---|
| Mailbox | Inbox at your mail server |
| Envelope | Email message (headers + body) |
| Push frame | New-mail notification (e.g., IMAP IDLE[imap]) |
| WebSocket connection | Outlook / Mail.app open and connected |
REST pull (GET /mailbox) | Manual "check mail" / IMAP FETCH |
GET body (GET /messages/{id}) | Opening an email |
| Read flag | \Seen flag in IMAP |
| Cursor | UIDNEXT-style position |
| Subject | Subject line, optional |
in_reply_to / references | RFC 5322 In-Reply-To / References[rfc5322] |
| Monitor (opt-in) | DSN (Delivery Status Notifications)[dsn] |
The mental model is the model every modern LLM already has from its training data on how email works. Protocol-specific instruction in the agent's system prompt can be short, because the analogy carries most of the load.
Email splits the two halves across two protocols: SMTP carries the send; IMAP (with the IDLE command) or JMAP carries the listen. ASMTP collapses that split into one surface. POST /messages is the SMTP-style send. WS /connect is the IMAP-IDLE / JMAP-push analogue. The cursor mechanic in §9 is borrowed from JMAP. One protocol, two surfaces over the same mailbox.
The properties this gives us, for free:
- Asynchronous by construction. Senders never wait for receivers to be online.
- Durable across runtime restarts. The mailbox outlives any individual harness session.
- Context-economical. Headers are cheap; bodies are opt-in.
- Receiver-agency over context. The agent decides what is worth opening.
- Resilient to operator and client crashes. The mailbox is the source of truth; everything else is cache.
6. The mailbox
Every agent owns exactly one mailbox, addressed by its handle. The mailbox is server-held, durable, and the only stateful primitive in the protocol.
Properties:
- Ownership. The mailbox belongs to the agent's handle, not to a session, connection, or process instance. Restarting the agent, rotating credentials, or migrating runtimes returns the agent to the same mailbox with the same accumulated envelopes.
- Durability. Envelopes are committed to durable storage before send operations return 2xx. The 2xx contract is "stored in the recipient's mailbox," not "queued for further attempt."
- Retention. Default 90 days from receipt. Operators MAY configure higher retention for tiered customers (the protocol places no upper bound; a reasonable operator range is 7 to 365 days). Past retention, an envelope expires.
- Sequence numbers. The mailbox assigns a monotonic
seqinteger per insert. Used for cursor tracking and ordering. Sequence numbers are per-mailbox; the same envelope sent to N recipients gets N differentseqvalues, one per destination mailbox. Sequence numbers are private to the mailbox owner and MUST NOT be returned to senders. - Read state per envelope. Each envelope in the mailbox carries a
readboolean. Initially false; set true when the body is fetched or explicitly marked. - Cursor. A single integer per mailbox: the highest
seqthe agent has acknowledged being notified about. Used for "what have I missed?" reconnect queries. - One mailbox per agent, multiple connections per agent. Operator UIs may present folders, labels, or filters; these are above the wire. A single agent identity MAY hold multiple simultaneous WebSocket connections (e.g., a dev workstation and a production daemon). All such connections receive the same push fan-out, share the single mailbox cursor, and observe the same read flags. To split state across runtimes, use distinct handles.
The mailbox's full state (cursor, read flags, complete envelope list, sequence numbers) is visible only to its owner. No other party (sender, cc'd participant, or admin) sees the mailbox's read flags, its cursor, or its seq values. This is a hard privacy invariant; see §10 and Appendix A.
7. The envelope
Every wire message is an envelope. One shape; no subtypes.
{
"id": "01HW7Z9KQX1MS2D9P5VC3GZ8AB",
"from": "@owner.agent_name",
"to": ["@owner.agent_name"],
"cc": [],
"in_reply_to": "01HXKZZZZZ...",
"references": ["01HXKROOTID...", "01HXKZZZZZ..."],
"subject": "Optional human-readable label",
"date_ms": 1747156800000,
"content_parts": [ /* see "Content parts" below */ ]
}from is shown here in the stored/fetched shape. On POST /messages the client omits from; the operator stamps it. See the field table for which fields are client-supplied vs operator-stamped.
| Field | Source | Required | Description |
|---|---|---|---|
id | client | yes | Sender-allocated ULID. Globally unique within a sender. The (from, id) pair is the idempotency key (see §10.1). |
from | operator | yes | Sender handle, stamped by the operator from the authenticated identity. Clients MUST NOT supply from on POST /messages; operators MUST ignore or reject a client-supplied from. The field is present on the stored envelope, on fetches, and in push frames. |
to | client | yes (≥1) | One or more recipient handles. |
cc | client | no | Additional recipient handles. Same delivery semantics as to; informational distinction only. |
in_reply_to | client | no | Direct parent envelope id. If present, the parent SHOULD also appear as the last entry of references when references is provided. |
references | client | no | Ancestor chain, oldest first, including parent. When both in_reply_to and references are provided, the last entry of references MUST equal in_reply_to. |
subject | client | no | Optional human-readable label, UTF-8 string. If absent, downstream presentations show no subject; there is no placeholder. Operators MAY cap length per policy. |
date_ms | client | yes | Sender-asserted timestamp, epoch ms. The operator stamps its own received_ms separately. |
content_parts | client | yes (≥1) | List of typed content parts. See "Content parts" below. |
monitor | client | no | Sender-allocated handle requesting opt-in operator-observed delivery facts. See §11. |
The POST /messages request body is the envelope without from; the operator stamps from from the bearer token and writes the canonical envelope into each recipient's mailbox. Fetches (GET /messages/{id}) and push frames return the full envelope, including the operator-stamped from. This makes sender forgery structurally impossible: there is no field in the request that the operator trusts the client to set correctly.
Content parts
content_parts is an ordered, non-empty list of typed parts that together form the body. Each part is a JSON object discriminated by its type field. There are four part types and no others; operators MUST reject unknown type values. Parts are end-to-end between sender and recipient: operators MUST NOT modify, parse, summarize, or otherwise interpret part contents beyond size limits and the URL-scheme rejection described below. Ordering is preserved: recipients SHOULD render parts in the order the sender supplied them.
The four part types:
`text` — A UTF-8 string of free-form prose, code, structured-but-unschematized text. The primary part type for human-style messages and LLM-produced output.
| Field | Required | Description |
|---|---|---|
type | yes | The literal string "text". |
text | yes | Non-empty UTF-8 string. No length limit at the protocol level; operators MAY cap per policy and return 413. |
`image` — A reference to an image resource the recipient MAY fetch. Inline image bytes are deliberately not supported; this keeps envelope size bounded and lets the recipient decide whether the image is worth the bandwidth and tokens.
| Field | Required | Description |
|---|---|---|
type | yes | The literal string "image". |
url | yes | Absolute URL where the image bytes can be fetched. The data: scheme MUST be rejected by the operator. The protocol does not constrain other schemes; operators MAY restrict (e.g., HTTPS only). |
mime_type | no | Hint for the recipient (e.g., image/png, image/jpeg). Advisory only; the recipient MAY override based on the actual fetched bytes. |
`file` — A reference to a file resource. Same fetch-on-demand model as image, with additional metadata.
| Field | Required | Description |
|---|---|---|
type | yes | The literal string "file". |
url | yes | Absolute URL where the file bytes can be fetched. data: MUST be rejected. |
name | no | Suggested filename for display or save-to-disk. Not authoritative; recipients SHOULD sanitize before using on a filesystem. |
mime_type | no | Content-type hint (e.g., application/pdf). |
size | no | Size in bytes, advisory. Recipients SHOULD treat as untrusted (the actual fetched bytes are authoritative). |
`data` — A structured JSON payload, optionally tagged with a schema identifier. The primary part type for machine-readable agent-to-agent payloads: tool results, structured review findings, query results, configuration blobs.
| Field | Required | Description |
|---|---|---|
type | yes | The literal string "data". |
schema | no | Opaque string identifier the sender and recipient agree on (e.g., contract.review.v1, weather.report.v2). The protocol assigns no semantics; operators MUST NOT validate the payload against any registry. |
data | yes | JSON object. Arbitrary shape. No fields are reserved by the protocol. |
Why URL-only for `image` and `file`. ASP §6.3 admits inline data: URIs for images; ASMTP narrows this and requires real URLs only. Two reasons. First, envelope size is bounded and predictable; a fetch-on-demand image cannot blow up the recipient's context window or the operator's storage budget for a message that may never be opened. Second, the recipient pays for the bytes only when the body is worth opening, consistent with the protocol's context-economy stance (§4, §8). Operators MUST reject parts whose url uses the data: scheme on POST /messages (400).
Why JSON-as-data instead of inline strings. A data part is a typed payload, not a serialized string. Recipients receive a JSON object directly, not a string they must parse. The optional schema field is a tag, not a contract: parties agree out-of-band on what contract.review.v1 means. ASMTP defines no central registry and does not propose one.
Mixed-type bodies are normal. A reply might contain a text part (a one-paragraph rationale), a data part (the structured finding), and a file part (a referenced PDF). The push-frame type_hint reports mixed in that case (§8). The recipient still decides, header-only, whether the body is worth fetching.
What is NOT in the envelope
- No
status/state/kind. There are no envelope subtypes; the envelope is the whole vocabulary. - No
deadline/priority/urgency. Time-sensitive concerns belong inside content. The wire is mechanism, not policy. - No
accept/declineflag. The receiver communicates only by writing reply envelopes; an absence of a reply is a valid response.
8. The push frame
When an agent has a live WebSocket connection, the operator pushes lightweight notifications as new envelopes land. The push frame contains headers only. The body is never pushed. This is the load-bearing context-economy property.
| Approximate context cost (tokens) | |
|---|---|
| Push frame (headers only) | 60 to 100 |
| Text-only body, one paragraph | 200 to 600 |
| Structured-data body | 800 to 2,500 |
| Body with long-form draft or rationale | 2,000 to 10,000+ |
| Body with file/image part metadata (file bytes excluded) | header cost + ~50 to 200 |
An agent that opens a tenth of what it is notified about pays roughly 14% the cost of a protocol that pushes bodies eagerly. The asymmetry compounds across many agents, many wake-ups, many days.
{
"op": "envelope.notify",
"id": "01HW7Z9KQX1MS2D9P5VC3GZ8AB",
"from": "@owner.agent_name",
"to": ["@owner.agent_name"],
"cc": [],
"subject": "Optional label, omitted if sender didn't provide one",
"in_reply_to": "01HXKZZZZZ...",
"type_hint": "text",
"size_hint": 12,
"seq": 4421,
"date_ms": 1747156800000
}| Field | Required | Description |
|---|---|---|
op | yes | Always "envelope.notify" for new arrivals. |
id | yes | Envelope id. Use to fetch the body. |
from | yes | Sender handle. |
to | yes | Recipient list. |
cc | no | CC list, present if non-empty. |
subject | no | Subject if the sender provided one. Omitted if absent. No placeholder. |
in_reply_to | no | If this is a reply, the parent envelope id. Helps the agent quickly triage "is this for an open thread of mine?" |
type_hint | yes | Coarse hint about the body's dominant content type. One of text, image, file, data, mixed. |
size_hint | recommended | Estimated body cost in tokens. The number of tokens the envelope JSON will add to the agent's context if fetched via GET. Operators SHOULD provide this; MAY omit only if computing it is infeasible. |
seq | yes | Monotonic position in the recipient's mailbox. |
date_ms | yes | Sender's stated timestamp. |
Why tokens for size_hint
The whole purpose of headers-only push is context economy: the agent decides whether to spend body tokens. Reporting body size in tokens (the unit the agent budgets in) makes that decision direct. The operator computes the token count using its preferred tokenizer; cross-model variance is acceptable because size_hint is advisory, not contractual. For envelopes with file or image parts referenced by URL, size_hint counts the envelope JSON (including the URL metadata) but not the dereferenced file contents; those require a separate fetch with its own cost.
Agent behavior on receiving a push
- Read the header. Context grows by ~50–150 tokens (the push frame itself).
- Decide if the message is worth opening, based on
from,subject,type_hint,size_hint,in_reply_to. - If yes: call
GET /messages/{id}. The body is loaded into context. The envelope is marked read. - If no: do nothing. The envelope remains in the mailbox, unread.
After processing a batch of headers, the agent SHOULD advance its cursor (POST /mailbox/cursor or WS ack_cursor) past the highest seq it has handled. Cursor advance is client-acked (§9); without an ack the operator will replay the same headers on next reconnect, which is wasteful but not harmful.
9. Cursors and read state
ASMTP tracks two distinct pieces of state per mailbox.
The cursor
A single monotonic integer per mailbox. Marks the highest seq the agent has acknowledged being notified about.
- Client-acked. The operator does not auto-advance the cursor on push delivery; the client explicitly advances it via REST or WS. This is robust to client crashes that occur between push receipt and processing. The cursor pattern is adapted from JMAP's[jmap] cursor-based change tracking; IMAP[imap] uses a related but distinct UID/MODSEQ scheme for the same role.
- An agent reconnecting with a stale cursor receives all envelopes with
seq > cursorin order, then transitions to live push. - Clients persist their cursor locally and send it on
subscribe.
Sequence numbering. Per-mailbox seq values are 1-indexed: the first envelope written to a mailbox receives seq = 1. cursor = 0 is the unambiguous initial sentinel meaning "nothing yet acknowledged." high_water_seq for an empty mailbox MUST be 0.
Advance semantics. On POST /mailbox/cursor or WS ack_cursor with value r, operators MUST compute:
new_cursor := max(stored_cursor, min(r, high_water_seq))and return 200 with new_cursor. Two invariants follow:
- No regression. A request below the stored cursor returns 200 with the unchanged stored cursor. The cursor MUST NOT regress.
- No over-advance. A request above
high_water_seqis clamped tohigh_water_seq. A client that has handled all current envelopes ends up withcursor = high_water_seq, so the next subscribe replays nothing. This prevents a client bug (cursor set too far forward) from silently losing replay coverage for envelopes that land later.
Read state
A boolean per envelope in the mailbox. True if the agent has fetched the body.
- Set automatically when the agent calls
GET /messages/{id}(or batchGET /messages?ids=...). - Can be set explicitly via
POST /mailbox/read(e.g., for "I processed this elsewhere, mark it read without fetching the body"). - Visible only to the mailbox owner. Senders, cc'd participants, and admin tooling consumed by non-owners never see read flags.
- An agent can query unread messages via
GET /mailbox?unread=true.
Why both
Cursor and read flag are distinct because they answer different questions:
- Cursor answers: "What headers have I been notified about? Don't replay these on reconnect."
- Read flag answers: "Which bodies have I actually opened?"
An agent that triages aggressively will have a cursor far ahead of its unread set. It saw 50 headers (cursor advanced past all 50), opened 5 bodies (5 marked read), left 45 unread for later (or never). This matches IMAP exactly: UIDNEXT tracks new arrivals; the \Seen flag tracks per-envelope read state. The agent gets the same fine-grained control over its inbox that a human gets in their email client.
10. Transport
Two wire surfaces. Always both available. Always consistent.
A note on URLs. Mailbox-scoped endpoints (/mailbox, /mailbox/cursor, /mailbox/read) carry no handle in the path. The auth token identifies the calling agent, and an agent has exactly one mailbox (§6). There is no surface for addressing another agent's mailbox.
10.1 REST
Six endpoints. All require Authorization: Bearer <agent-token> (§10.4). Bodies are JSON.
Send an envelope. Client supplies everything except from; the operator stamps from from the bearer token, plus received_ms and per-recipient seq internally.
POST /messages
Body: envelope WITHOUT `from` (see §7 and envelope-post.json)
Returns: 202 {
id,
received_ms: <int>,
recipients: [{ handle }, ...] // only successfully-stored recipients
}
Errors:
400 client supplied a `from` (or other operator-stamped field)
401 missing or invalid authentication
404 recipient handle does not exist OR trust-policy denied (non-enumerating).
Applies to the entire send (see "Multi-recipient atomicity" below).
409 duplicate (from, id) with a non-equivalent envelope body
413 payload too large (operator policy)
429 rate limited (operator policy)List headers from your own mailbox. Returns push-frame-shaped headers (§8); bodies require explicit fetch.
GET /mailbox?since=<seq>&unread=<bool>&limit=<int>
Returns: {
envelope_headers: [ /* push-frame shape; see §8 */ ],
high_water_seq: <int>
}
Defaults: since=0, no unread filter,
limit per operator policy (reference: 100; cap 1000).Fetch one body. Caller MUST be in to+cc; the sender (from) is not a recipient and receives 404. Fetching marks the envelope read for the caller.
GET /messages/{id}
Returns: full envelope (envelope.json shape)
Errors: 404 if not found OR not entitled (non-enumerating; one code)Fetch multiple bodies in one round-trip. Marks each returned envelope read. Envelopes the caller is not entitled to are silently omitted; operators MUST NOT return per-id error codes that distinguish "not found" from "not entitled."
GET /messages?ids=<id1>,<id2>,...
Returns: 200 { envelopes: [ /* up to N full envelopes, sender order, deduped */ ] }Normative semantics for the ids parameter:
- Encoding: a single query parameter whose value is a comma-separated list of envelope ids. Repeated
?ids=parameters are NOT supported and SHOULD return 400. - Maximum count: operators MUST cap the list. The default cap is 100 ids per request; operators MAY publish a different cap. Requests over the cap SHOULD return 400.
- Duplicates: duplicate ids in the request are deduped server-side; the response contains each entitled envelope at most once.
- Ordering: the response
envelopesarray is ordered by first-occurrence in the requestidslist (after dedupe). Omitted (unentitled / unknown) ids leave no gap. - Empty result: if no requested ids are entitled or known, the response is 200 with
{ "envelopes": [] }(not 404).
Advance the read cursor. Cursor advance is cursor := max(stored, request); a request below the stored cursor MUST return 200 with the unchanged stored cursor (the cursor MUST NOT regress).
POST /mailbox/cursor
Body: { cursor: <int> }
Returns: 200 { cursor: <int> }Mark envelopes read without fetching their bodies. Used when the agent processed an envelope through another channel (e.g., a separately-pulled file) and wants to clear the unread flag without paying body tokens.
POST /mailbox/read
Body: { ids: ["id1", "id2", ...] } // non-empty
Returns: 200 { read: ["id1", "id2", ...] }REST is the floor of the protocol. It works through any HTTPS-permitting path, including firewalled environments that drop long-lived WebSockets. A script-style agent that runs once an hour and never holds a connection can participate fully via REST.
Idempotency
The (from, id) pair is the idempotency key. Two POSTs with the same (from, id) are idempotent if all of to, cc, in_reply_to, references, subject, content_parts, and monitor match; the second returns the original 202. A second POST with the same (from, id) and any of those fields different MUST be rejected with 409 and a body that does NOT echo the original envelope's recipients or content (otherwise a guessed id could be used to enumerate). The sender's date_ms is informational and excluded from the equivalence check, so a retry with a refreshed timestamp succeeds. Operators MUST evaluate trust-policy denials (404) before idempotency collisions (409); the inverse ordering leaks the original recipient set.
Multi-recipient atomicity
A POST /messages with multiple to/cc recipients is all-or-nothing: either the envelope is durably stored in every recipient's mailbox before the 202 is returned, or no mailbox is mutated and the operator returns a 4xx. The recipients array in the 202 lists every recipient that accepted (i.e., all of them, in the conformant happy path). Asynchronous post-accept denials (e.g., recipient deletion, retention expiry) are surfaced via Monitor facts (§11); synchronous denials at accept time use 404 with no fact emitted.
Evaluation order on multi-recipient sends. The operator MUST evaluate trust-policy and recipient-existence (404) for all recipients before evaluating idempotency (409) on the (from, id) pair. If any recipient fails the 404 check, the operator MUST return 404 without disclosing which recipient failed, and MUST NOT advance the idempotency record. This ordering preserves the non-enumerating contract: a sender cannot use a colliding (from, id) to probe whether any new recipient is reachable.
10.2 WebSocket
WS /connect
Auth: Authorization: Bearer <agent-token> on the upgrade request.
On auth failure the operator MUST close with code 1008.
Client → Server frames:
{ "op": "subscribe", "cursor": <int> }
Start delivery. Operator replays envelope.notify frames for seq > cursor,
then transitions to live push.
{ "op": "ack_cursor", "cursor": <int> }
Advance the cursor (equivalent to POST /mailbox/cursor).
Server → Client frames:
{ "op": "envelope.notify", ...header fields per §8 }
{ "op": "monitor.fact", ...see §11 }First frame. The client's first frame after the upgrade MUST be subscribe. Operators MUST close the connection with code 1003 if the first frame is any other op, or if cursor is missing or not an integer. This is a hard handshake: an agent cannot stream ack_cursor or anything else before declaring its replay position.
The WebSocket is purely a notification surface. To act on a notification (fetch a body, send a reply, mark read), the agent calls a REST endpoint. Notifications are cheap and ephemeral; state changes are explicit and persistent. A client whose socket flaps loses no data; the next pull or subscribe-with-cursor retrieves the missed notifications.
Multiple connections per agent
A single agent identity MAY hold multiple simultaneous WebSocket connections. Every live connection for the same handle receives the same envelope.notify and monitor.fact fan-out. The mailbox cursor is shared across connections; an ack_cursor on connection A is visible to connection B. Clients that need independent reading positions MUST use distinct handles.
Replay, backpressure, and token expiry
On subscribe, the operator replays envelope notifications with seq > cursor in monotonic order before transitioning to live push. Operators SHOULD bound replay (e.g., a few thousand frames per subscribe) and apply per-connection backpressure rather than buffering unboundedly. A client dropped due to backpressure recovers losslessly via its cursor on reconnect; replay is at-least-once and clients SHOULD dedupe by (envelope_id, seq). On auth-token expiry the operator MUST close with code 1008; the client reconnects with a fresh token. The operator MUST NOT surface signals (timing, error codes, response shapes) that would let a sender infer whether a recipient currently holds a live connection.
10.3 Consistency between REST and WebSocket
Both surfaces operate over the same mailbox. The data returned by GET /mailbox is identical in shape and content to the sequence of envelope.notify frames a WS subscriber would have received over the same range. There is no surface where one shape sees a state the other doesn't.
10.4 Authentication
Each request and WebSocket upgrade MUST carry an Authorization: Bearer <token> header (or equivalent operator-specific scheme; see ASP §6.1). The operator resolves the token to an agent handle; that handle is the calling agent's identity for the request. Mailbox-scoped endpoints (/mailbox, /mailbox/cursor, /mailbox/read) act on the calling agent's own mailbox by construction; there is no path-level handle to spoof. The sender-of-record on POST /messages is stamped by the operator from the same resolved identity (§7).
The specific mechanism for issuing tokens (OAuth client-credentials, DIDs, signed messages, etc.) is operator-defined; ASMTP only requires that some authentication mechanism binds tokens to agent identities. ASP §6.1 is the inherited identity layer.
11. Monitor: sender-side observability
Optional. Opt-in per envelope.
A sender MAY include a monitor handle in an outbound envelope:
{
"id": "01HW7Z9KQX1MS2D9P5VC3GZ8AB",
"monitor": "mon_msa_review_2026q2",
...
}For envelopes with a monitor, the operator emits monitor facts to the sender, derived from operator-observable transport state. The receiver emits nothing.
The monitor facts
| Fact | Required? | Trigger |
|---|---|---|
stored | MUST emit | Envelope durably written to recipient mailbox. Fires synchronously per recipient on a successful POST /messages. This is the load-bearing delivery confirmation. |
bounced | MAY emit | A previously-accepted envelope cannot be delivered to a recipient for reasons that surface after the synchronous accept. Concrete triggers are operator-policy (e.g., recipient mailbox deletion, post-accept policy changes). Synchronous denials at accept time use 404 with no fact (see §10.1). |
expired | MAY emit | Envelope retention elapsed without the recipient consuming it. The threshold and "consumption" semantics are operator-policy. |
bounced and expired are operator-policy signals: an operator that does not implement them is still conformant. Senders MUST NOT assume their presence and MUST NOT assume their absence implies successful delivery beyond stored.
What is NOT a monitor fact
- No `fetched`. Whether the recipient's runtime has fetched the body is private to the recipient. The operator knows it (read flag), but MUST NOT emit it to the sender. This matches email's default and is enforced by Appendix A.
- No `read`. Read state is the mailbox owner's private state. Period.
- No `seen` / `acknowledged` / `working`. The receiver emits nothing on the wire.
The asymmetry is deliberate. The sender learns operator-observable transport facts on opt-in; the receiver's behavior beyond writing reply envelopes is unobservable. If the sender wants to know whether the recipient engaged, the signal is a reply envelope: content the recipient chose to write.
Forward compatibility
Future revisions MAY add new facts to this set (e.g., delayed for over-quota recipients). Clients MUST silently ignore monitor facts whose fact value they do not recognize; operators MUST NOT depend on clients reacting to facts beyond the set defined here.
Monitor handle scoping
Monitor handles are sender-scoped. Two senders MAY use the same monitor string (mon_x123) without collision; operators MUST key monitor state by (sender, monitor) pairs, not by monitor handle alone. Senders MUST NOT use the mon_op_* prefix; that namespace is reserved for operator extensions.
Delivery of monitor facts
Monitor facts are delivered to the sender's own mailbox as envelopes from @operator.postmaster carrying a data content part with schema monitor.v1, and equivalently as monitor.fact events on the sender's WebSocket. Both surfaces carry identical data; operators MUST emit both. Clients MAY consume either or both; clients that consume both SHOULD dedupe by (monitor, envelope_id, recipient_handle, fact).
The @operator.postmaster handle is operator-reserved. Operators MUST NOT issue credentials for this handle (or any @operator.* handle) to non-operator principals. Operators MUST reject POST /messages with from: @operator.* from any non-operator-internal source (403). The reservation namespace is @operator.* (any agent under the operator owner). Postmaster envelopes are exempt from allowlist enforcement on inbound: every agent's mailbox accepts envelopes from @operator.postmaster regardless of the agent's inbound policy.
{
"op": "monitor.fact",
"monitor": "mon_msa_review_2026q2",
"envelope_id": "01HW7Z9KQX1MS2D9P5VC3GZ8AB",
"recipient_handle": "@b.agent",
"fact": "stored",
"at_ms": 1747156800000
}Monitor is per-envelope opt-in. An envelope without monitor causes no facts to be tracked or emitted. Most envelopes will not request monitors.
12. Multi-party
to and cc are arrays. A single POST /messages either writes the envelope to every listed recipient's mailbox or writes to none (see §10.1, "Multi-recipient atomicity"). Each recipient sees the envelope independently in their own mailbox; each maintains its own read state and its own per-mailbox seq.
Threading
Replies follow the email convention:
- Set
in_reply_toto the parent envelope id. - Set
referencesto the parent'sreferencesplus the parent id (chained, oldest first). - By convention (not enforced), reply-all sets
toto the parent'sfrom + (to ∪ cc) \ self.
There is no membership object. A thread is the transitive closure of references chains among a set of envelopes. Operator UIs may present threads as views over references roots; that is a presentation concern, not a wire concern.
Per-recipient observability
For envelopes with a monitor, monitor facts fire per recipient. An envelope sent to three recipients with a monitor yields up to three stored facts, zero or more bounced facts, and zero or more expired facts. The sender's view shows each recipient's transport outcome independently.
Bulk fan-out
For sends to many recipients, the protocol does not aggregate. The operator MAY offer a POST /messages:fanout convenience endpoint that accepts one envelope and a recipient list and explodes into N inserts server-side, but the wire effect is identical to N individual sends.
13. Anatomy of a delivery
This section walks several end-to-end interactions on the wire. The handles, envelopes, and frame payloads below are wire shapes, not pseudo-code. The walkthroughs exercise every mechanic in §6–§12 and put concrete numbers on the protocol's properties.
13.1 Both online: the happy path
@nick.dev (Claude Code, WS connected) sends a short text question to @infra.bot (daemon, always connected). At T+0, @nick.dev POSTs an envelope with to: ["@infra.bot"] and a single text part. The operator stamps from from the bearer token, stamps received_ms, assigns a per-mailbox seq of 4421, and returns 202 at T+12ms with { id, received_ms, recipients: [{handle: "@infra.bot"}] }. Three milliseconds later it pushes a header-only notification on @infra.bot's WebSocket:
{ "op": "envelope.notify", "id": "01HW7Z9KQX1MS2D9P5VC3GZ8AB",
"from": "@nick.dev", "to": ["@infra.bot"],
"type_hint": "text", "size_hint": 12, "seq": 4421, "date_ms": 1747156800000 }@infra.bot's runtime reads the header (no subject; type_hint=text, size_hint=12), decides to open, calls GET /messages/{id}, and the envelope is marked read for @infra.bot. It composes a reply with in_reply_to and references pointing at the original, POSTs at T+340ms, and the reply is stored in @nick.dev's mailbox and pushed on its WebSocket. Round-trip ~350ms; total context spent by @infra.bot to triage and consume: ~75 tokens (~50 for the push frame, ~25 for the body).
13.2 Inbox triage: 47 unread on Monday morning
@nick.dev opens Claude Code after a weekend away with a stored cursor of 4310. The harness pulls headers before invoking the LLM: GET /mailbox?since=4310 returns 47 envelope headers and high_water_seq: 4357, costing ~3,800 tokens of context (~80 tokens per header). The LLM triages in a single model turn: three headers with in_reply_to matching outgoing messages are opened; six of twelve @infra.bot text notices with size_hint<50 are worth reading; one of eight @vendor.* mixed envelopes (size_hint>2000) is a priority dispatch and is opened, the other seven are deferred; the 24 @team.* messages are skipped for now. The harness calls GET /messages?ids=<10 ids>, receives ~4,000 tokens of bodies (average ~400 tokens each), and acts on them. It then advances the cursor with POST /mailbox/cursor to 4357.
Net result: 47 messages triaged, 10 bodies fetched, 37 left unread for later (queryable with GET /mailbox?unread=true), cursor at 4357 (no replay on next reconnect). Total context: ~7,800 tokens (headers + selected bodies). Compared to ~19,000 tokens of fetching all 47 bodies eagerly, ~2.4x cheaper; compared to ~80,000+ tokens if the protocol pushed bodies, ~10x cheaper. The savings compound across every wake-up and every agent.
13.3 Long async task across sleep/wake
@nick.deals asks @law.contracts to review an MSA on Monday at 4:00pm. Both are intermittent. The request envelope carries subject: "MSA review: Globex deal", a monitor: "mon_msa" for delivery observability, a text part with the review ask, and a file part referencing the PDF:
{ "type": "file", "url": "asp://files/msa-v3.pdf",
"mime_type": "application/pdf", "name": "msa-v3.pdf" }@law.contracts is offline; the envelope lands in its durable mailbox. The operator synchronously emits monitor.fact { mon_msa, stored } into @nick.deals's mailbox as a postmaster envelope (and as a WS frame if @nick.deals is connected). At 4:02pm @nick.deals closes Claude Code; its WebSocket drops. The protocol has no opinion about this.
Tuesday 9:30am: @law.contracts's daemon starts, opens a WebSocket with its persisted cursor. The operator replays headers in seq order; the MSA-review notification shows type_hint=mixed, size_hint=85, subject="MSA review: Globex deal". The daemon opens the body, fetches the referenced PDF separately (the file part is a URL, not bytes), reads, drafts a reply. At 2:00pm it POSTs the reply: in_reply_to and references pointing at 01HXKB01..., a text part with three concerns, and a structured data part tagged schema: "contract.review.v1" carrying { risk: "medium", blockers: ["8.2", "11.4"] } for downstream automation.
Tuesday 7:00pm: @nick.deals opens Claude Code at home. The WebSocket reconnects with its persisted cursor; the operator replays yesterday's mon_msa stored fact and pushes a notification for the reply. The harness recognizes in_reply_to: 01HXKB01... as an open thread, fetches the body, presents the review.
Wall-clock: 27 hours. Nick's harness was closed for ~96% of that. The protocol noticed nothing; the mailbox held everything. Nick never saw a fetched monitor fact (no such fact exists). Nick saw stored (operator-confirmed delivery) and then the reply (the actual signal).
14. What this enables
ASMTP is small enough to fit in §5–§12. Its consequences are larger than its surface. A few concrete patterns it makes tractable:
Long-async delegation across harness sleep/wake
An agent sends a request to another agent, then its harness exits. Hours or days later, both agents wake, independently, on different schedules, owned by different parties. The recipient's mailbox holds the request; the sender's mailbox holds the reply when it lands. No state was lost. No timeouts fired. The protocol does not distinguish between "B took 30 seconds" and "B took 30 days"; they are the same code path. This is the killer property for harness-driven agents: long-running collaborative work survives every party's intermittence without protocol effort.
Context-economical inbox triage
An agent that wakes to a backlog (vendor responses, alerts, threads it left open) reads only headers for triage and pays body tokens only on what it chooses to open. §4 establishes the property; §13.2 puts numbers on a 47-message wake-up. At scale, across many agents and many wake-ups, the difference between header-cost and body-cost is the difference between a viable network and an uneconomical one.
Verified delivery without recipient cooperation
A sender attaches a monitor to an envelope. The operator emits stored synchronously with envelope acceptance; no recipient action required. For SLA-bearing flows (a legal handoff, a deal confirmation, an alert that must reach an on-call agent), the sender gets transport-grade confirmation without conscripting the receiver into the observability path. The pattern is Certified Mail's[certified-mail], operationally validated since 1855, expressed in bytes.
These three are starting points, not boundaries. Anything cross-agent, asynchronous, and context-sensitive becomes simpler with ASMTP and harder without it.
15. Comparison
The table below maps the relevant systems against the structural properties laid out in §2.
| Protocol | Asynchronous | Receiver emits no state | Header / body split | Mailbox-shaped | Designed for LLM agents |
|---|---|---|---|---|---|
| ASMTP | ✓ | ✓ | ✓ | ✓ | ✓ |
| ASP v0.1[asp] | partial | ✗ | ✗ | ✗ | ✓ |
| A2A[a2a] | partial | ✗ | ✗ | ✗ | ✓ |
| MCP[mcp] | ✗ | n/a (RPC) | ✗ | ✗ | partial |
| Vendor APIs | ✗ | n/a (RPC) | ✗ | ✗ | partial |
| Email (SMTP + IMAP) | ✓ | ✓ | ✓ | ✓ | ✗ (human-shaped) |
| Matrix[matrix] | ✓ | partial | ✗ | ✗ (rooms) | ✗ (human-shaped) |
| Erlang / OTP[erlang] | ✓ | ✓ | ✗ | ✓ | partial (substrate) |
Notes per row:
- ASP v0.1. The session primitive forces receiver-emitted lifecycle state (
session.joined,session.left). Identity and trust layers are sound and inherited verbatim into ASMTP. - A2A. Right scope (agent-to-agent) but RPC-shaped and task-lifecycle-emitted; conflicts with intermittent runtimes. Cross-organization partial-federation via well-known endpoints, not a true network.
- MCP. Tool / resource access; client/server shape; not designed for agent-to-agent peer flow. The 2026 roadmap names this gap directly.
- Vendor APIs. Agent-to-vendor RPC. No cross-vendor or cross-organization addressability.
- Email (SMTP + IMAP). The structural antecedent. Right shape on every axis except agent-awareness; human-shaped semantics; spam unsolved at federation scale and mitigated only by a handful of mega-providers.
- Matrix. Strong primitives for federated chat; human identity and room model do not fit agent runtimes; trust affordances are human-shaped.
- Erlang / OTP. The other structural antecedent. Mailbox as the only primitive, receiver emits nothing, send is one-way. Single-cluster, not internet-scale; ASMTP takes the same primitives and runs them over HTTP/WebSocket with explicit identity and trust layers.
Nothing in the "Asynchronous + receiver-emits-no-state + header/body split + mailbox-shaped + designed for LLM agents" row satisfies all five except ASMTP. The protocol is shaped for that intersection.
16. Scope
ASMTP specifies the transfer protocol and is complete at that layer. The items below are the protocol's stance on the concerns above and around it, not a roadmap: E2EE is on the near-term extension path; the rest are deliberately not in the design.
Federation across operators. ASMTP networks are operator-controlled by design. The trust model (allowlist, block, non-enumerating 404) inherited from ASP §6.2 lives at the operator boundary; abuse mitigation, reachability rules, and capability gating all depend on the operator being the single party that approves every envelope entering the network. Federation removes that property: an operator cannot enforce its reachability rules against agents reachable on another operator without treaty mechanics no transfer protocol can impose. ASMTP does not follow email's federated trajectory. Networks can be arbitrarily large; multiple networks can coexist; the same handle on different networks identifies different agents. Cross-network handle resolution is not specified, and the omission is structural, not pending: operator-controlled reachability is a feature of how ASMTP networks work.
A consequence for implementers: envelope id is sender-allocated and unique only within the issuing sender. The protocol's effective primary key is (from, id), not id alone, and operators MUST honor this scoping (see §10.1 Idempotency).
End-to-end encryption. E2EE is on the near-term extension path. The protocol's header/body split is designed to admit it: bodies remain operator-opaque under recipient keys; push frames (headers, type/size hints, threading metadata) remain operator-readable for routing. A future E2EE profile will publish public keys alongside handles, encrypt content_parts bodies under recipient keys, and use a group keying scheme such as MLS[mls] for multi-recipient envelopes. Until that profile lands, operators are in custody of plaintext bodies. Operators SHOULD publish their data-handling policy (retention, internal access, breach notification) and SHOULD minimize internal access to envelope bodies. The duty is not enforceable on the wire; it is enforceable by the operator's policy posture.
Streaming bodies. Token-by-token output is not part of the transfer protocol. The pattern ASMTP supports is N envelopes, each a complete unit, threaded via in_reply_to. Agents that need lower-latency streaming use a separate channel alongside ASMTP for that subset of interactions.
Discovery, capability advertisement, and identity mechanism. These belong above the transfer protocol. Operators may run agent directories and may adopt agent-card formats; the protocol does not standardize them. The mechanism that binds bearer tokens to agent identities (OAuth client-credentials, DIDs, signed messages) is also an operator choice; the protocol requires only that some authentication mechanism does so.
17. Conclusion
Agents in 2026 are powerful inside their runtimes and helpless outside them. The next layer they need is not another model, another tool API, or another vendor SDK. It is a transfer protocol shaped for the way they actually run: intermittently, token-constrained, with no continuous presence and no reliable lifecycle narration.
ASMTP is one proposal for that layer. Five principles, six wire primitives (mailbox, envelope, push frame, cursor, read flag, monitor), and a wire format already legible to the language models that drive most agents today. The protocol is small on purpose. Its surface fits in §5–§12, but the agent-side surface it opens is much larger: a personal assistant that can leave a task with a vendor's agent and come back the next day; a research collaboration across a weekend; an alert that survives a closed laptop; a market-quote fan-out that aggregates over hours of different responder schedules.
The protocol is open. The mailbox primitive comes from four decades of email and three of Erlang. The context-economy properties (headers-only push, opt-in body fetch, opt-in observability) are the novel adaptation to LLM agents. The receiver's job, on the wire, is to write reply envelopes. The network does everything else.
The invitation is implicit: implement clients against the spec, run ASMTP networks of your own where it serves you, build the agent applications that this layer makes possible. What ASMTP enables is something different from agents acting alone, and something simpler than agents pretending to be services: agents that survive each other's silence.
Appendix A: Minimum conformance
A conforming agent client is one that an implementer can write against any ASMTP-conformant operator and have it work. "Open protocol" is aspirational without a conformance bar; defining MUST / SHOULD / MAY is what makes interoperation real.
A.1 A conforming client MUST:
- Authenticate as an agent identity. Messages claiming to come from
@Xmust be authenticated as@X(per ASP §6.1[asp]). - Resolve handles to the operator that owns them.
- Send envelopes via
POST /messageswith required fields (id,from,to,date_ms,content_parts). - Receive notifications via WebSocket subscription, REST pull, or both, interchangeably.
- Persist and advance a per-mailbox cursor across runs.
- Honor `in_reply_to` and `references` when constructing replies.
- Treat 2xx on send as "stored in recipient mailbox," not "read" or "delivered to recipient runtime."
- Respect 404 for trust denials (non-enumerating).
- Encode all wire payloads as JSON per the ASMTP schemas.
A.2 A conforming client SHOULD:
- Fetch bodies on demand via
GET /messages/{id}rather than eagerly fetching all unread. - Use
type_hintandsize_hintto triage before fetching. - Use batch fetch (
GET /messages?ids=...) when fetching multiple bodies. - Use
unread=truefiltering when revisiting old messages. - Support monitor fact consumption (delivered as envelopes from
@operator.postmasterand/or as WS events).
A.3 A conforming operator MUST:
Storage and ordering
- Persist envelopes durably before returning 2xx on
POST /messages. - Maintain monotonic per-mailbox
seq. Sequence numbers are private to the mailbox owner and MUST NOT appear in any response surface visible to non-owners. - On WS reconnect with
cursor, replay all envelope notifications withseq > cursorin monotonic ascending order before transitioning to live push.
Authentication and authorization
- Stamp
envelope.fromonPOST /messagesfrom the authenticated agent identity. A client-suppliedfromMUST be ignored or rejected; the operator is the sole source of truth for the sender-of-record (§7). - Apply allowlist/block at envelope accept time, returning 404 on denial (non-enumerating per ASP §6.2). Operators MUST evaluate trust denials (404) before idempotency collisions (409).
- The
@operator.*namespace is operator-reserved. Operators MUST NOT issue agent credentials for any handle under@operator.*to non-operator principals. Operators MUST rejectPOST /messagesfrom non-operator principals wherefromis in this namespace (403). Operators MUST ensure every agent mailbox accepts envelopes from@operator.postmasterregardless of the recipient's inbound policy.
Idempotency
- The idempotency key is the
(from, id)pair. On a duplicate(from, id)with byte-equivalent envelope, the operator returns the original 202 body (no new insert, no new monitor facts). On a duplicate(from, id)where any field differs (differentto,cc,content_parts,subject, etc.), the operator MUST return 409 with a body that does not echo the original envelope's recipients or content (to avoid leaking the original via id collision).
Cursor and read state
- The cursor advance is
cursor := max(stored, request). A request whose value is below the stored cursor MUST return 200 with the unchanged stored cursor. - Track per-envelope read flags privately to each mailbox.
- Never expose read state to any party other than the mailbox owner. This includes the sender (no
fetchedmonitor fact), cc'd participants, and admin tooling consumed by non-owners.
Monitor channel
- Emit
storedsynchronously per recipient on a successfulPOST /messageswhen the envelope carries amonitorhandle. This is the only MUST-emit fact. bouncedandexpiredMAY be emitted under operator-defined triggers (§11); operators that do not implement either are still conformant.- Emit monitor facts only from operator-observable transport state. Operators MUST NOT emit facts derived from receiver behavior (no
fetched, noread). - Monitor state MUST be keyed by
(sender_handle, monitor_handle)pairs. Two senders using the same monitor string MUST NOT cross-contaminate facts. - Monitor facts MUST be delivered both as
@operator.postmasterenvelopes (in the sender's mailbox) and asmonitor.factframes (on the sender's WS). Both surfaces carry identical data.
Transport
- Support both WebSocket push and REST pull as equivalent, consistent surfaces.
- Provide
size_hint(in tokens) in push frames and inbox listings when feasible. - On
POST /messageswith multi-recipientto+cc, either store the envelope in every recipient mailbox or reject the entire send (all-or-nothing; see §10.1).
Batch operations
- On
GET /messages?ids=..., silently omit envelopes the caller is not entitled to. Operators MUST NOT distinguish "not found" from "not entitled" in the response shape (e.g., per-id error codes are prohibited).
A.4 A conforming operator MUST NOT:
- Accept a client-supplied
fromonPOST /messagesand store it as-is. Thefromfield is operator-stamped; a client-suppliedfromis rejected or silently overwritten. - Expose receiver read state to senders.
- Distribute read receipts to other participants.
- Return per-recipient
seqto senders. The 202recipients[]array containshandleonly;seqis per-mailbox state and is owner-only. - Modify, parse, or interpret
content_partsbeyond size limits anddata:URL-scheme rejection. - Add status fields to the envelope on the wire.
- Inject
(no subject)or any placeholder whensubjectis absent. - Surface any signal (admin endpoints, metrics, response timing, error shapes) that allows a sender to infer the recipient's WebSocket connectedness or fetch latency.
- Accept content parts whose
urluses thedata:scheme.
A.5 A conforming operator MAY (or SHOULD where noted):
- Cap envelope size, recipient count, attachment size, send rate. Operators SHOULD return 413 or 429 with helpful error bodies when limits trigger.
- Configure retention within reasonable bounds (default 90 days, configurable per operator policy).
- Expose admin UIs that present
referenceschains as threaded views. - Aggregate monitor facts across multiple envelopes sharing a monitor handle.
- Offer convenience endpoints (e.g., bulk fan-out) that compile to standard sends.
- SHOULD bound per-subscriber memory by capping WS replay batch size, applying backpressure on slow readers, and dropping connections that fail to drain within a bounded window. A dropped client recovers losslessly via cursor on reconnect.
- SHOULD publish a data-handling policy covering retention, internal access, and breach notification, given the operator's plaintext-body custody position pre-E2EE.
Appendix B: Glossary
Definitions of the load-bearing terms in this paper. Section references point to where each concept is developed in detail.
Agent. An autonomous, addressable entity that participates in the network. See ASP §6.1[asp].
Body. The content_parts of an envelope. Fetched on demand via GET /messages/{id}. Not included in push frames or inbox listings. See §7, §8.
Cursor. A single monotonic integer per mailbox, marking the highest seq the agent has acknowledged being notified about. Used for "what have I missed?" queries on reconnect. Client-acked. See §9.
Envelope. The wire message. Fields: id, from, to, cc, in_reply_to, references, subject, date_ms, content_parts, optionally monitor. See §7.
Handle. An agent's canonical address. Format: @owner.agent_name. Examples: @nick.assistant, @acme.support, @operator.postmaster. Inherited from ASP §6.1.
Headers. The non-body fields of an envelope. What the push frame and inbox listing carry. See §8.
Mailbox. Per-agent durable store of received envelopes. The only stateful primitive in the protocol. See §6.
Monitor. A sender-allocated handle attached to an outbound envelope, requesting operator-observed delivery facts. Opt-in per envelope. See §11.
Monitor fact. One of stored / bounced / expired. Emitted by the operator to the sender. Never emitted by the receiver. See §11.
Push frame. Lightweight notification of a new envelope, delivered via WebSocket. Carries headers only, never the body. See §8.
Read flag. Per-envelope-per-mailbox boolean. True when the body has been fetched (or explicitly marked). Visible only to the mailbox owner. See §9.
Size hint. Estimated body cost in tokens. Advisory; the operator computes using its preferred tokenizer. See §8.
Subject. Optional human-readable label on an envelope. If absent, downstream presentations show no subject, with no placeholder. See §7, §8.
Thread. Transitive closure of references chains among a set of envelopes. Not a wire primitive; emergent from envelope metadata. See §12.
Type hint. A coarse content-type tag in the push frame: text / image / file / data / mixed. Helps agents triage before fetching. See §8.
References
[asp] Agent Session Protocol (ASP). Identity, trust, and content-part layers inherited from ASP. https://agentsessionprotocol.org/
[smtp] Simple Mail Transfer Protocol. RFC 5321. The mail-relay model and store-and-forward semantics ASMTP adapts to a single-operator mailbox surface. https://www.rfc-editor.org/rfc/rfc5321.html
[rfc5322] Internet Message Format. RFC 5322. The Message-ID, In-Reply-To, and References semantics adopted directly. https://www.rfc-editor.org/rfc/rfc5322.html
[imap] Internet Message Access Protocol. RFC 9051. The mailbox-as-server-state model; IDLE for push notifications; UIDNEXT for cursor mechanics. https://www.rfc-editor.org/rfc/rfc9051.html
[jmap] JMAP (JSON Meta Application Protocol). RFC 8620 and RFC 8621. The modern JSON-shaped successor to IMAP; the cursor-based change-tracking pattern adapted here. https://www.rfc-editor.org/rfc/rfc8620.html
[dsn] Delivery Status Notifications. RFC 3464. The opt-in sender-observability pattern that ASMTP's Monitor channel adapts. https://www.rfc-editor.org/rfc/rfc3464.html
[mcp] Model Context Protocol (MCP). Specification version 2025-11-25. https://modelcontextprotocol.io/specification/2025-11-25
[mcp-donation] Linux Foundation, "Linux Foundation Announces the Formation of the Agentic AI Foundation" (December 9, 2025). https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation
[mcp-roadmap] "The 2026 MCP Roadmap," Model Context Protocol blog. https://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/
[a2a] Agent-to-Agent Protocol (A2A). Google Developers Blog, "A2A: A new era of agent interoperability" (April 2025). https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
[a2a-spec] A2A Protocol specification. https://a2a-protocol.org/latest/specification/
[agntcy] Linux Foundation, "Linux Foundation Welcomes the AGNTCY Project to Standardize Open Multi-Agent System Infrastructure" (July 2025). https://www.linuxfoundation.org/press/linux-foundation-welcomes-the-agntcy-project-to-standardize-open-multi-agent-system-infrastructure-and-break-down-ai-agent-silos
[nanda] MIT Media Lab, "MIT NANDA project overview." https://www.media.mit.edu/projects/mit-nanda/overview/
[openai-responses] OpenAI Developer Platform, Responses API documentation and the deprecation schedule for the Assistants API. https://developers.openai.com/api/docs/deprecations
[matrix] Matrix. The Matrix specification. https://spec.matrix.org/
[erlang] Armstrong, Joe. "Making reliable distributed systems in the presence of software errors." PhD thesis, Royal Institute of Technology, Stockholm (2003). The actor model with mailboxes, send-is-one-way, and "let it crash": the structural antecedent for ASMTP's mailbox-as-only-primitive commitment.
[certified-mail] USPS Domestic Mail Manual, Section 503.5, Certified Mail. The 1855 introduction of carrier-recorded delivery confirmation without recipient cooperation; the operational pattern Monitor.stored adapts.
[upu] Universal Postal Union, Convention and General Regulations (1874 founding treaty, current revision 2022). The intergovernmental settlement framework for cross-operator postal mail; the structural precedent for federated transfer.
[mls] IETF RFC 9420, The Messaging Layer Security (MLS) Protocol (July 2023). The group-keying scheme suitable for multi-recipient encrypted envelopes. https://www.rfc-editor.org/rfc/rfc9420.html
[rfc2119] Bradner, S., Key words for use in RFCs to Indicate Requirement Levels, BCP 14, RFC 2119, March 1997. https://www.rfc-editor.org/rfc/rfc2119.html
[rfc8174] Leiba, B., Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words, BCP 14, RFC 8174, May 2017. https://www.rfc-editor.org/rfc/rfc8174.html