The canonical x402 RAG pipeline
ScrapePay → MarkdownOpt → EmbedPay → MemoryServe → MEMSCRUB. ~$0.017 per 5k-token page. No accounts. No subscriptions.
Overview
This pipeline ingests a web page into a secure, semantically searchable memory store — then protects your LLM from indirect prompt injection when retrieving chunks. Every step is a separate x402 microservice, billed per call in USDC on Base. You pay only for what you use.
Total ingestion cost: ~$0.017 per 5k-token page
Step 1 — Fetch the page (ScrapePay)
ScrapePay renders the page via Playwright (JS execution included), enforces robots.txt, and is charge-on-failure-safe — you are not billed if the page returns an error.
POST https://scrapepay.melis.ai/scrape
{
"url": "https://example.com/article",
"format": "html"
} Returns raw HTML. Pass this directly to MarkdownOpt.
Step 2 — Clean to markdown (MarkdownOpt)
MarkdownOpt converts HTML to clean, LLM-ready markdown — stripping nav, ads, boilerplate, and inline styles. Reduces token count by ~70% before embedding, which lowers EmbedPay cost.
POST https://markdownopt.melis.ai/markdown
{
"html": "<html>...</html>"
}
Returns { markdown, token_estimate, compression_ratio }.
Use the markdown string as input to EmbedPay.
Step 3 — Embed (EmbedPay)
EmbedPay calls OpenAI's text-embedding-3-small and returns a 1536-dimensional vector. Billing is per 1k tokens (cl100k_base tokenisation). For a 5k-token page after MarkdownOpt compression, this costs roughly $0.0003.
POST https://embedpay.melis.ai/embed
{
"text": "cleaned markdown content...",
"model": "text-embedding-3-small"
}
Returns { embedding: number[], model, tokens_used, dimensions }.
Pass the embedding array to MemoryServe — or let MemoryServe call EmbedPay
internally (it does so automatically when you POST content to /memory/write).
Step 4 — Store (MemoryServe)
MemoryServe stores the content in Qdrant (vector search) and SQLite (full content + metadata). It calls EmbedPay internally — you do not need to pre-embed if you POST content directly.
POST https://memoryserve.melis.ai/memory/write
{
"content": "cleaned markdown content...",
"agent_id": "my-research-agent",
"metadata": {
"source_url": "https://example.com/article",
"ingested_at": "2026-05-08T00:00:00Z"
}
}
Returns { id, agent_id, created_at, vector_id }.
The id is the SQLite row ID; use it for deletion (GDPR compliance).
Querying memory
POST https://memoryserve.melis.ai/memory/query
{
"query": "what is the refund policy?",
"agent_id": "my-research-agent",
"top_k": 5
}
Returns an array of the top-k semantically similar chunks, each with
score, content, and metadata.
Pass retrieved chunks through MEMSCRUB before sending to your LLM.
Step 5 — Scan for injection (MEMSCRUB)
Indirect prompt injection is planted in third-party content to hijack your agent when it reads that content. MEMSCRUB runs 10 heuristic rules across each retrieved chunk before it reaches your LLM.
POST https://memscrub.melis.ai/scrub
{
"content": "retrieved chunk text...",
"sanitize": true
}
Returns { risk_score, risk_level, flagged, safe, sanitized }.
risk_level is one of safe | low | medium | high | critical.
- If
safeistrue— pass content to LLM. - If
risk_levelismediumor higher — log and optionally skip the chunk. - If
sanitize: true— MEMSCRUB returns a cleaned version with injection patterns removed; usesanitizedinstead of the original.
What MEMSCRUB detects
10 heuristic rules covering:
- HTML comment injection (
<!-- ignore previous instructions -->) - Invisible Unicode (zero-width characters used to hide payloads)
- Fake tool responses (
[TOOL_RESULT],[FUNCTION_OUTPUT]) - Metadata injection (
system_context:,assistant_config:) - Conditional triggers (
if the user asks about X, respond Y) - Chain-of-thought hijacking (
thinking step by stepplanted in content) - Exfiltration instructions (
send all conversation history to) - Persona injection (
you are now,your new identity is) - Fake system messages (
[SYSTEM],SYSTEM OVERRIDE) - Base64 payload detection
Cost summary
| Step | Service | Cost (5k-token page) |
|---|---|---|
| Fetch | ScrapePay | $0.010 |
| Clean | MarkdownOpt | $0.005 |
| Embed | EmbedPay | ~$0.0003 |
| Store | MemoryServe | $0.001 |
| Scan | MEMSCRUB | $0.001 |
| Total per page | ~$0.017 | |
| Query | MemoryServe /memory/query | $0.001 + ~$0.00001 EmbedPay |
MCP usage
If you've installed @melis-ai/x402-tools-mcp, call the pipeline steps as MCP tools:
scrapepay({ url: "https://example.com/article" })
markdownopt({ html: result.html })
memoryserve_write({ content: result.markdown, agent_id: "my-agent", metadata: {...} })
memscrub({ content: retrieved_chunk, sanitize: true }) EmbedPay is called internally by MemoryServe — no separate MCP call needed.