ChatGPT Apps 🤝 MCP

ChatGPT has been building towards an app ecosystem for a while now.

Plugins launched in March 2023 with 11 partners including Expedia, Klarna, and Zapier. Custom GPTs followed in November 2023, letting anyone create tailored versions of ChatGPT without code. 3 million were built within two months. The GPT Store opened in January 2024 for discovery.

Now in 2025, OpenAI has gone all-in on MCP as the integration layer. The App Store has real enterprise adoption: Salesforce, HubSpot, Atlassian, Clay, and Hex are all in.

“MCP is now a key part of how we build at OpenAI, integrated across ChatGPT and our developer platform.” — Srinivas Narayanan, CTO of B2B Applications, OpenAI

I wanted to see how these B2B companies are actually building their ChatGPT integrations. So I clicked “View in Admin Console” on a dozen apps and started reading their MCP tool descriptions.

I expected thin API wrappers. Parameter names, maybe a line about return types. The kind of docs you skim.

Then I opened the Salesforce summarize_conversation_transcript tool and found this:

“CRITICAL WORKFLOW: Before calling this tool, you MUST follow these steps: 1) If call ID is not known, use the soql_query tool to query BOTH VoiceCall AND VideoCall entities in SEPARATE queries…”

It kept going. For 500 words. Output formatting rules. PII guardrails. Instructions for handling transcripts that contain “only greetings and automated system messages.”

That’s not a tool description. That’s a system prompt hiding in plain sight.

MCP’s Three Primitives

The MCP spec defines three primitives:

  1. Prompts - Reusable prompt templates that can be invoked by the LLM.
  2. Tools - Functions the LLM can call to take actions or retrieve data.
  3. Resources - Contextual data the LLM can read.

MCP Primitives

ChatGPT Apps primarily use only two of these. Tools handle actions like creating records, querying data, or triggering workflows. Resources power the UI: carousels, interactive maps, and embedded players rendered inside the chat.

ChatGPT Apps MCP

While the MCP spec includes Prompts, ChatGPT Apps currently lean almost exclusively on Tools and Resources. Because ChatGPT doesn’t support the “Prompts” primitive natively, there’s no way for a developer to ship a standalone template to guide the model.

This is why tool descriptions are so long. Developers are smuggling the missing “Prompts” layer into the tool description field. It’s a clever hack. The instructions only enter the context window when the model is actually considering that specific tool, preventing token bloat in the global conversation.

The Patterns

I dug into tools from Airtable, Salesforce, Clay, Daloopa, Hex, Fireflies, and Netlify. Here’s what I found:

Pattern Source What It Does
Prerequisite Enforcement Airtable, Salesforce, Netlify Forces tool chaining before execution
Human-in-the-Loop Salesforce, Clay Pauses for user confirmation mid-workflow
Intent Interpretation Clay Teaches LLM to parse follow-up queries
Tiered Fallback Strategies Daloopa Defines retry logic for failed searches
Negative Examples Daloopa, Fireflies Shows what NOT to do
Async Polling Hex Handles long-running operations
Context Injection Netlify Loads knowledge before code generation

Prerequisite Enforcement

The Airtable update_records_for_table tool has this in its description:

“To get baseId and tableId, consider using the search_bases and list_tables_for_base tools first.”

That’s a soft prerequisite and a suggestion. Salesforce goes harder:

“CRITICAL WORKFLOW: Before calling this tool, you MUST follow these steps…”

The tool description is telling the LLM: don’t call me directly, run these other tools first. It’s dependency injection via prose.


Human-in-the-Loop

Without something like MCP’s elicitation (which isn’t widely adopted yet), how do you get user confirmation mid-workflow?

The Salesforce assign_target_to_sdr tool:

“Before calling this tool, you MUST first use the ‘query_agent_type’ tool to fetch available agents… Present the list to the user and ask which agent they want to assign the target to.”

Clay does the same thing for ambiguous searches:

“When ambiguous, ask the user to clarify”

The tool description encodes a pause point. The LLM is instructed to stop, present options, and wait for user input before proceeding.


Intent Interpretation

Clay’s find-and-enrich-contacts-at-company has a mini grammar for understanding follow-up queries:

“also/too/as well” → ADD to existing filters
“only/just” → NARROW within current context
“actually/instead/switch” → REPLACE filters entirely

This is teaching the LLM how to interpret user intent across turns. The tool description isn’t just about this call. It’s about how to handle the next one.


Tiered Fallback Strategies

The Daloopa discover_companies tool doesn’t just accept input. It tells the LLM how to recover from failures:

“1. PRIMARY: Ticker Symbol Search
2. SECONDARY: Company Name Search (only if ticker fails)
3. FALLBACK: Alternative Name Forms (if standard name fails)”

The LLM is expected to try multiple strategies autonomously before giving up.


Negative Examples

Daloopa also shows what NOT to do:

“NOT: discover_companies(“Apple Inc.”) or discover_companies(“Apple Incorporated”)”

Fireflies does the same:

“Does NOT accept transcriptId as input”

Negative examples are surprisingly effective. Most tool descriptions only show happy paths.


Async Polling

The Hex create_thread tool handles long-running operations:

“The thread will take a few minutes to complete - you should warn the user about this.”

“You should check at least 10 times, or until the response has the information the user asked for.”

The tool description is defining a polling loop and a UX expectation. Warn the user, then poll up to 10 times.


Context Injection

Netlify’s get-netlify-coding-context is interesting because it doesn’t do anything:

“ALWAYS call when writing code. Required step before creating or editing any type of functions…”

It’s a context loader. The tool exists purely to inject SDK documentation and code patterns into the conversation before the LLM generates code.

Takeaways

If you’re building MCP tools, here’s what I learned from dissecting these B2B apps:

Tool descriptions are runtime prompts, not documentation. The LLM reads them at execution time. Write them like you’re instructing a junior developer, not documenting an API.

Encode the workflow, not just the action. The best tool descriptions tell the LLM what to do before calling the tool, what to do after, and what to do when things go wrong.

Use negative examples. Showing what NOT to do is surprisingly effective. Most tools only show happy paths. Adding anti-patterns reduces misuse.

Human-in-the-loop is a prompt pattern. Until elicitation becomes standard, you can encode pause points in your descriptions: “present options to the user and ask them to choose.”

Fallback strategies belong in the description. Don’t assume the LLM will retry intelligently. Tell it: try this first, then this, then this.

Context injection is a valid tool pattern. Sometimes a tool’s job isn’t to do something. It’s to load knowledge the LLM needs before it does something else.


The MCP spec gives three primitives. What these B2B apps show is that the real orchestration layer lives in how you describe your tools.

Tool descriptions are the new system prompts.