Tagged: llm-delegation

1 entry

πŸͺ΄ weblinks

LLMs corrupt your documents when you delegate

Microsoft researchers introduce DELEGATE-52, a benchmark of long document-editing workflows across 52 professional domains. Even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT-5.4) corrupt about 25% of document content by the end of these workflows, and agentic tool use does not improve performance. The paper argues current LLMs are unreliable delegates: they introduce sparse but severe errors that compound silently over long interactions.