Paperclip: the command-line interface for scientific literature
Last month, we introduced Sy, the literature search agent that navigates biomedical preprints as a filesystem. Scientists have loved using it, and the natural question has been: can my own agent use this filesystem too?
Today we're releasing Paperclip, the agent-native counterpart to Sy. Whereas humans use a chat-based UI, agents work best within the rich text environment of a command-line interface. Paperclip gives your agent direct CLI access to 8M+ papers—standard search and retrieval functions, plus several powerful tools that, when used together, let agents actually explore, deep-dive, and synthesize.
What's in Paperclip?
We've thought a lot about what the “principal components” of good literature search are. These components shouldn't just be individually useful for specific queries, but should be synergistic and composable. We're excited to introduce several powerful commands—search,grep, map, ask-image,sql, and from.
search
As a starting point, we've implemented hybrid search, combining BM25 and embedding-based retrieval. The agent can also select a specific ranking mechanism more suited for its queries. For token efficiency, rather than return the entire abstract of each search result, we return a 1–2 sentence TL;DR summary.
grep
While grep is a preferred search tool for coding agents, it's not available in literature search APIs—it's hard to do across millions of papers. We've spent a lot of time optimizing our indices to make this happen in milliseconds. We're very excited for this feature, and we think your agents will agree.
map
A common motif in literature search involves asking the same question across many papers. Rather than have the agent do this sequentially, we providemap, which performs this in parallel, yielding a structured result for the agent to read.
ask-image
Papers are inherently multimodal, and we don't want you to have to download every image just to figure out what's in each paper. We provide ask-image, which allows your agent to ask arbitrary questions over figures via a VLM, all in the cloud—no heavy lifting agent-side. Everything stays in the CLI.

Panel B — Dose-response: Sotorasib sensitivity in KRASG12C(red, n=6) vs KRASWT (black, n=12) organoids. G12C lines show significantly lower IC50 values.
Panel D — IC50 violin plots: Distribution of IC50 for AZD4625 vs Sotorasib. Wider spread for Sotorasib; XDO344 shows high IC50 indicating resistance.
Panel E — Tumor growth: PHLC207 in vivo. Vehicle arms show exponential growth. Both Sotorasib and AZD4625 (100 mg/kg) achieve sustained tumor regression.
ask_image on a multi-panel figure from a KRAS G12C resistance paper. The model parses each panel—oncoprint, dose-response curves, violin plots, tumor growth—and extracts specific data points and findings.sql
Another major motif in literature search is aggressive filtering. This can be hard to do over a filesystem, but it's something SQL was designed for. Common APIs often wrap these functions into pre-set queries. We figured it would be best for agents to just have direct access to the underlying metadata table.
FROM documents
WHERE authors ILIKE '%Doudna%'
ORDER BY pub_date DESC
LIMIT 5
from
We've read thousands of agent rollouts for literature search APIs. Probably the biggest limitation is that most search APIs are stateless—each function exists independent of the agent's context. This stunts agentic exploration and tunneling.
To address this, we store every intermediate search result in our cloud-hosted database. You can reference them easily using --from, which performs the next action only on the subset of previously retrieved papers.
--from operates on the same 30 papers.We've been using Paperclip internally for weeks now and agents take to it immediately. We think the best way to experience it is for yourself. However, here are a couple of fun examples below:
Example 1: KRAS G12C Inhibitors
KRAS G12C inhibitors (sotorasib, adagrasib) were approved for non-small cell lung cancer, but most patients develop resistance. Map out the specific mutations and pathway alterations driving resistance, how frequently each appears, and whether there's a dominant mechanism.
The canonical mechanisms—secondary KRAS mutations (Y96D, H95D, Q99L), RTK bypass via MET/EGFR, PI3K/mTOR reactivation—show up in every review. These tend to be well-covered in most agent outputs. The harder question is whether it can surface findings that are published but not prominent. One such finding: CIC (Capicua), a tumor suppressor whose loss-of-function drives resistance through NFκB reactivation. It's described in a 2025 CRISPR screen paper, present in 24 of 50 corpus papers, but absent from most review-level summaries.
We gave the same agent the same question with two different tool configurations: a standard 3-tool search API (paper_search, read_paper,full_text_search) and Paperclip. Both used the same model, backend, and corpus. Here's the shape of each exploration:
--from keeps the result setThe key difference is statefulness. The MCP agent's two searches return independent result sets with no connection between them. There's no way to say "search for CICwithin papers I already found about KRAS resistance." The Paperclip agent holds a single results_id throughout the session. Every grep operates on the same 27 papers. When it notices CIC while reading, it tunnels back into that same set—grep "CIC" --from <previous search>—and confirms the entity appears in 15 of those 27 papers. That's the loop that produces discovery: broad search, scan, read, notice, narrow within the same context, confirm, read the source.
Example 2: AlphaFold Failure Modes
Map out where AlphaFold systematically fails. Which specific proteins or structural features cause failures? What are the root causes—training data, architecture, or something else? I want concrete examples with protein names, not just broad categories.
The canonical failures are well-known: intrinsically disordered regions, misleading pLDDT confidence scores, MSA depth dependence. These are well-covered in the literature. The deeper findings are harder: XCL1 (lymphotactin), a fold-switching protein that adopts two completely different stable structures and AlphaFold confidently predicts only one; β-solenoid hallucination, where AF2 produces confident but unrealistic repeat structures; and adversarial invariance, where AF3 predictions remain unchanged despite destabilizing mutations. All are published findings buried in full-text papers, absent from review abstracts.
Same setup: the same agent with a standard API vs Paperclip.
--from keeps the result setSame pattern as KRAS, but with three deep findings instead of one. The API agent ran 5 independent searches and read 2 papers, extracting canonical findings (KaiB, RfaH, disordered regions). It searched for "fold switching p53 KaiB RfaH" and "BCCIP RfaH KaiB p53 GA95"—protein names from training data, not from anything it read. The Paperclip agent anchored 50 papers to a single handle and ran 7 greps against it. Grepping "fold switching" narrowed to 8 papers; reading one revealed XCL1. Grepping "intrinsically disordered" surfaced a paper on β-solenoid hallucination. Grepping "training data|MSA" led to the adversarial invariance finding. Both agents used comparable resources (14 vs 8 calls), but the exploration graphs tell different stories—and led to different discoveries.
Use Paperclip today
Or add it as an MCP server directly—no local install needed:
