tempaccount420 7 days ago

I don't see how a round trip of <500ms, which is equivalent to maybe 50 tokens, is worse than including many thousands more extra tokens in the prompt, just in case they might be useful. Not to mention the context fatigue.

If designed well - by suspending generation in memory and inserting a <function_result>, without restarting generation and fetching cache from disk - the round trip/tool call is better (costs the equivalent of 50 tokens for waiting + function_result tokens).

1
BoorishBears 6 days ago

You're dealing with the full TTFT x2 + the tokens all the prompts of all your MCPs before you even get to that round trip to the DB.

And you don't have to wonder about "if designed well": the reference implementation that's getting 20k downloads a week and getting embedded in downstream editors is is not designed well and will make the round trip every time and still not give the LLM the full information of the table.

Most MCP implementations are crappy half-assed implementations in similar fashion because everyone was rushing to post how they added <insert DB/API/Data Source> to MCP.

And if you're worried about "context fatigue" (you mean LLMs getting distracted by relevant information...), you should 100% prefer a well known schema format to N MCP prompt definitions with tool usage instructions that weren't even necessarily tuned for the LLM in question.

LLMs are much more easily derailed by the addition of extra tools and having to reason about when to call them and the results of calling them, than they are a prompt caching friendly block of tokens with easy to follow meaning.