{"exhaustive":{"nbHits":false,"typo":false},"exhaustiveNbHits":false,"exhaustiveTypo":false,"hits":[{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"0o_MrPatrick_o0"},"title":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"The text in <em>Claude</em> <em>Code</em>\u2019s \u201c<em>Extended</em> <em>Thinking</em>\u201d output"},"url":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","extended","thinking"],"value":"https://patrickmccanna.net/the-text-in-<em>claude</em>-codes-<em>extended</em>-<em>thinking</em>-output-is-not-authentic/"}},"_tags":["story","author_0o_MrPatrick_o0","story_48630535"],"author":"0o_MrPatrick_o0","children":[48631002,48631011,48631028,48631034,48631036,48631050,48631075,48631089,48631090,48631232,48631234,48631252,48631273,48631282,48631295,48631413,48631417,48631503,48631634,48631660,48631761,48631826,48631890,48631920,48631958,48631976,48631979,48632022,48632030,48632213,48632253,48632316,48632352,48632410,48632420,48632605,48632773,48632978,48633020,48633097,48633249,48633254,48633264,48633392,48633681,48633838,48633952,48634503,48634964,48635064,48635284,48635377,48636198,48636463,48636565,48636883,48637300,48637936,48639159,48639233,48640187,48640514],"created_at":"2026-06-22T14:22:46Z","created_at_i":1782138166,"num_comments":205,"objectID":"48630535","points":294,"story_id":48630535,"title":"The text in Claude Code\u2019s \u201cExtended Thinking\u201d output","updated_at":"2026-06-23T06:08:58Z","url":"https://patrickmccanna.net/the-text-in-claude-codes-extended-thinking-output-is-not-authentic/"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"aray07"},"title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"You shouldn't use ultrathink in <em>Claude</em> <em>Code</em>"},"url":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"https://www.claudecodecamp.com/p/<em>claude</em>-<em>code</em>-<em>extended</em>-<em>thinking</em>"}},"_tags":["story","author_aray07","story_47533091"],"author":"aray07","created_at":"2026-03-26T17:16:22Z","created_at_i":1774545382,"num_comments":0,"objectID":"47533091","points":3,"story_id":47533091,"title":"You shouldn't use ultrathink in Claude Code","updated_at":"2026-03-26T17:21:55Z","url":"https://www.claudecodecamp.com/p/claude-code-extended-thinking"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"prmph"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"One simple prompt (on a fresh session using Sonnet 4.6): read three <em>code</em> files (less than 100 lines each) and merge them into one, depletes 20% of my 4 hour usage, and 3% of my weekly usage in a few minutes. I'm not using <em>extended</em> <em>thinking</em>, nor 1M tokens context option, nor sub-agents, nor MCP, nothing; just the prompt.<p>How the hell is anyone getting anything done with this? I probably just beed to get a refund and be done with <em>Claude</em> <em>Code</em>."},"title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"Tell HN: <em>Claude</em> <em>Code</em> usage depletion makes it basically unusable now"}},"_tags":["story","author_prmph","story_47715954","ask_hn"],"author":"prmph","children":[47770568],"created_at":"2026-04-10T10:23:43Z","created_at_i":1775816623,"num_comments":1,"objectID":"47715954","points":5,"story_id":47715954,"story_text":"One simple prompt (on a fresh session using Sonnet 4.6): read three code files (less than 100 lines each) and merge them into one, depletes 20% of my 4 hour usage, and 3% of my weekly usage in a few minutes. I&#x27;m not using extended thinking, nor 1M tokens context option, nor sub-agents, nor MCP, nothing; just the prompt.<p>How the hell is anyone getting anything done with this? I probably just beed to get a refund and be done with Claude Code.","title":"Tell HN: Claude Code usage depletion makes it basically unusable now","updated_at":"2026-04-14T19:50:39Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"leocardz_"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Poirot is a SwiftUI app that reads your local <em>Claude</em> <em>Code</em> sessions and gives you a proper UI for browsing conversations, <em>code</em> diffs, <em>extended</em> <em>thinking</em>, and configuration (commands, skills, MCP servers, etc.). Runs offline, no login, under 6 MB.<p>Built in a weekend using <em>Claude</em> <em>Code</em>. Open source (MIT).<p>Demo: <a href=\"https://youtu.be/JLvNSRZrxdo\" rel=\"nofollow\">https://youtu.be/JLvNSRZrxdo</a>"},"title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"Show HN: Poirot \u2013 A native macOS companion app for <em>Claude</em> <em>Code</em>"},"url":{"matchLevel":"none","matchedWords":[],"value":"https://github.com/leonardocardoso/poirot"}},"_tags":["story","author_leocardz_","story_47165573","show_hn"],"author":"leocardz_","created_at":"2026-02-26T13:09:45Z","created_at_i":1772111385,"num_comments":0,"objectID":"47165573","points":2,"story_id":47165573,"story_text":"Poirot is a SwiftUI app that reads your local Claude Code sessions and gives you a proper UI for browsing conversations, code diffs, extended thinking, and configuration (commands, skills, MCP servers, etc.). Runs offline, no login, under 6 MB.<p>Built in a weekend using Claude Code. Open source (MIT).<p>Demo: <a href=\"https:&#x2F;&#x2F;youtu.be&#x2F;JLvNSRZrxdo\" rel=\"nofollow\">https:&#x2F;&#x2F;youtu.be&#x2F;JLvNSRZrxdo</a>","title":"Show HN: Poirot \u2013 A native macOS companion app for Claude Code","updated_at":"2026-03-05T23:37:46Z","url":"https://github.com/leonardocardoso/poirot"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"slogansand"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Hey HN, author here.\nLoki Mode orchestrates specialized AI agents to take a PRD to deployed product with zero human intervention. But what I'm most proud of is the research foundation - we implemented virtually every scientifically proven pattern from the 2025-2026 AI agent literature.\nFrom Anthropic:<p>Constitutional AI self-critique against principles\nBuilding Effective Agents evaluator-optimizer pattern\n<em>Claude</em> <em>Code</em> Best Practices explore-plan-<em>code</em> workflow\nVisible <em>Extended</em> <em>Thinking</em> (think, think hard, ultrathink levels)\nEffective Harnesses one-feature-at-a-time pattern<p>From DeepMind:<p>SIMA 2 self-improvement loops\nGemini Robotics hierarchical reasoning (planner + executor)\nScalable AI Safety debate-based verification<p>From OpenAI:<p>Agents SDK tracing, guardrails, tripwires\nDeep Research adaptive planning with backtracking\nAGENTS.md standardized instructions<p>From Academic Research:<p>CONSENSAGENT (ACL 2025): Blind review + Devil's Advocate when unanimous. 30% false positive reduction.\nGoalAct: Global planning \u2192 skill decomposition \u2192 local execution. 12%+ success rate improvement.\nA-Mem: Zettelkasten-style memory linking for episodic\u2192semantic consolidation.\nMulti-Agent Reflexion: Structured debate (Implementer \u2192 Skeptic \u2192 Advocate \u2192 Synthesizer).\nIter-VF: Verify answer only, not reasoning chain. Prevents context overflow.<p>From Industry:<p>NVIDIA ToolOrchestra: Three-reward signal (outcome/efficiency/preference), dynamic agent selection\nAWS Bedrock: Routing mode for simple tasks, supervisor mode for complex\nBoris Cherny's self-verification loop (2-3x quality improvement)\nSimon Willison's sub-agents for context isolation<p>From HN discussions:<p>&quot;Zero companies without human in the loop&quot; \u2192 confidence-based escalation\nContext curation beats automatic RAG\nFresh contexts yield better results\nLLM-as-judge has shared blind spots \u2192 deterministic validation<p>The full acknowledgements with links to every paper/resource: <a href=\"https://github.com/asklokesh/claudeskill-loki-mode/blob/main/ACKNOWLEDGEMENTS.md\" rel=\"nofollow\">https://github.com/asklokesh/claudeskill-loki-mode/blob/main...</a>\nRun: <em>claude</em> --dangerously-skip-permissions then &quot;Loki Mode with PRD at path/to/prd&quot;\nHappy to discuss any of the research or architecture decisions."},"title":{"matchLevel":"none","matchedWords":[],"value":"Show HN: Research-Backed Multi-Agent System for Autonomous Development"},"url":{"matchLevel":"none","matchedWords":[],"value":"https://github.com/asklokesh/claudeskill-loki-mode"}},"_tags":["story","author_slogansand","story_46547971","show_hn"],"author":"slogansand","children":[46548390],"created_at":"2026-01-08T23:25:22Z","created_at_i":1767914722,"num_comments":1,"objectID":"46547971","points":3,"story_id":46547971,"story_text":"Hey HN, author here.\nLoki Mode orchestrates specialized AI agents to take a PRD to deployed product with zero human intervention. But what I&#x27;m most proud of is the research foundation - we implemented virtually every scientifically proven pattern from the 2025-2026 AI agent literature.\nFrom Anthropic:<p>Constitutional AI self-critique against principles\nBuilding Effective Agents evaluator-optimizer pattern\nClaude Code Best Practices explore-plan-code workflow\nVisible Extended Thinking (think, think hard, ultrathink levels)\nEffective Harnesses one-feature-at-a-time pattern<p>From DeepMind:<p>SIMA 2 self-improvement loops\nGemini Robotics hierarchical reasoning (planner + executor)\nScalable AI Safety debate-based verification<p>From OpenAI:<p>Agents SDK tracing, guardrails, tripwires\nDeep Research adaptive planning with backtracking\nAGENTS.md standardized instructions<p>From Academic Research:<p>CONSENSAGENT (ACL 2025): Blind review + Devil&#x27;s Advocate when unanimous. 30% false positive reduction.\nGoalAct: Global planning \u2192 skill decomposition \u2192 local execution. 12%+ success rate improvement.\nA-Mem: Zettelkasten-style memory linking for episodic\u2192semantic consolidation.\nMulti-Agent Reflexion: Structured debate (Implementer \u2192 Skeptic \u2192 Advocate \u2192 Synthesizer).\nIter-VF: Verify answer only, not reasoning chain. Prevents context overflow.<p>From Industry:<p>NVIDIA ToolOrchestra: Three-reward signal (outcome&#x2F;efficiency&#x2F;preference), dynamic agent selection\nAWS Bedrock: Routing mode for simple tasks, supervisor mode for complex\nBoris Cherny&#x27;s self-verification loop (2-3x quality improvement)\nSimon Willison&#x27;s sub-agents for context isolation<p>From HN discussions:<p>&quot;Zero companies without human in the loop&quot; \u2192 confidence-based escalation\nContext curation beats automatic RAG\nFresh contexts yield better results\nLLM-as-judge has shared blind spots \u2192 deterministic validation<p>The full acknowledgements with links to every paper&#x2F;resource: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;asklokesh&#x2F;claudeskill-loki-mode&#x2F;blob&#x2F;main&#x2F;ACKNOWLEDGEMENTS.md\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;asklokesh&#x2F;claudeskill-loki-mode&#x2F;blob&#x2F;main...</a>\nRun: claude --dangerously-skip-permissions then &quot;Loki Mode with PRD at path&#x2F;to&#x2F;prd&quot;\nHappy to discuss any of the research or architecture decisions.","title":"Show HN: Research-Backed Multi-Agent System for Autonomous Development","updated_at":"2026-03-05T23:19:36Z","url":"https://github.com/asklokesh/claudeskill-loki-mode"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"StanAngeloff"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"I started Flemma because I wanted to bring my AI workload into Neovim.<p>My workflow started with me living in developer portals - <em>Claude</em> Workbench, OpenAI Platform, Vertex (the horrors of GCP's Console UI!) I would spend hours in these web UIs crafting prompts, iterating on system instructions, maintaining a carefully curated library of sessions. But the browser really wasn't optimised for this kind of interaction. Editing was clunky (muscle memory &lt;C-W&gt; would close the tab instead and wipe my work), going back and forth between the LLM and myself felt off. So I had an idea: what if I could turn a Markdown document into an LLM chat interface?<p>At first I thought that would be enough - just do what I was already doing in the browser... but in Neovim. And sure enough having been accustomed to my own setup for the last decade I immediately felt a productivity boost. Writing functional requirement documents, statements of work, deep research across different systems - all of it felt better in &quot;my&quot; editor.<p>I then had a taste of Aider... and <em>Claude</em> <em>Code</em>... and all the other tools that were coming out. And Flemma felt lacking. So I started building: tool support, better conversation organisation, a proper UI to tame the noise that tool calls introduce.<p>Today, Flemma is a fully evolved AI workspace. It runs autonomous agent loops, interacts with multiple LLMs (Anthropic, OpenAI, Vertex, Moonshot) and lets you switch providers mid-conversation - something I do occasionally during research, asking two or three different models for their take on a problem, then combining findings into a final document.<p>Under the hood, .chat files are just Markdown with role markers (@You:, @Assistant:), but Flemma treats them as a proper filetype with its own parser, AST, LSP server, template engine and sandboxed tool execution. The buffer *is* the state - no hidden database, no JSON history, no server process. Your conversations are portable, greppable and version-controllable (I backup mine in Git). You can close Neovim, reopen the file a week later and pick up exactly where you left off.<p>It's got prompt caching, <em>extended</em> <em>thinking</em>, 7 built-in tools, a layered config system that gets out of your way and a UI that keeps getting refined to bring the noise down and make long agent sessions pleasant to work in (not quite there yet).<p>Flemma is for anyone who'd rather stay in Neovim. If that's you, I'd love to hear what you think.<p>Repo: <a href=\"https://github.com/Flemma-Dev/flemma.nvim\" rel=\"nofollow\">https://github.com/Flemma-Dev/flemma.nvim</a><p>Demo: <a href=\"http://flemma.dev/flemma.nvim/blob/develop/README.md#-flemma\" rel=\"nofollow\">http://flemma.dev/flemma.nvim/blob/develop/README.md#-flemma</a>"},"title":{"matchLevel":"none","matchedWords":[],"value":"Show HN: I built a full LLM chat client as a Neovim filetype"}},"_tags":["story","author_StanAngeloff","story_47599302","show_hn"],"author":"StanAngeloff","created_at":"2026-04-01T11:08:59Z","created_at_i":1775041739,"num_comments":0,"objectID":"47599302","points":3,"story_id":47599302,"story_text":"I started Flemma because I wanted to bring my AI workload into Neovim.<p>My workflow started with me living in developer portals - Claude Workbench, OpenAI Platform, Vertex (the horrors of GCP&#x27;s Console UI!) I would spend hours in these web UIs crafting prompts, iterating on system instructions, maintaining a carefully curated library of sessions. But the browser really wasn&#x27;t optimised for this kind of interaction. Editing was clunky (muscle memory &lt;C-W&gt; would close the tab instead and wipe my work), going back and forth between the LLM and myself felt off. So I had an idea: what if I could turn a Markdown document into an LLM chat interface?<p>At first I thought that would be enough - just do what I was already doing in the browser... but in Neovim. And sure enough having been accustomed to my own setup for the last decade I immediately felt a productivity boost. Writing functional requirement documents, statements of work, deep research across different systems - all of it felt better in &quot;my&quot; editor.<p>I then had a taste of Aider... and Claude Code... and all the other tools that were coming out. And Flemma felt lacking. So I started building: tool support, better conversation organisation, a proper UI to tame the noise that tool calls introduce.<p>Today, Flemma is a fully evolved AI workspace. It runs autonomous agent loops, interacts with multiple LLMs (Anthropic, OpenAI, Vertex, Moonshot) and lets you switch providers mid-conversation - something I do occasionally during research, asking two or three different models for their take on a problem, then combining findings into a final document.<p>Under the hood, .chat files are just Markdown with role markers (@You:, @Assistant:), but Flemma treats them as a proper filetype with its own parser, AST, LSP server, template engine and sandboxed tool execution. The buffer *is* the state - no hidden database, no JSON history, no server process. Your conversations are portable, greppable and version-controllable (I backup mine in Git). You can close Neovim, reopen the file a week later and pick up exactly where you left off.<p>It&#x27;s got prompt caching, extended thinking, 7 built-in tools, a layered config system that gets out of your way and a UI that keeps getting refined to bring the noise down and make long agent sessions pleasant to work in (not quite there yet).<p>Flemma is for anyone who&#x27;d rather stay in Neovim. If that&#x27;s you, I&#x27;d love to hear what you think.<p>Repo: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;Flemma-Dev&#x2F;flemma.nvim\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;Flemma-Dev&#x2F;flemma.nvim</a><p>Demo: <a href=\"http:&#x2F;&#x2F;flemma.dev&#x2F;flemma.nvim&#x2F;blob&#x2F;develop&#x2F;README.md#-flemma\" rel=\"nofollow\">http:&#x2F;&#x2F;flemma.dev&#x2F;flemma.nvim&#x2F;blob&#x2F;develop&#x2F;README.md#-flemma</a>","title":"Show HN: I built a full LLM chat client as a Neovim filetype","updated_at":"2026-04-01T12:13:58Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"lepuzfcoder"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Hey HN. I built this because I've been in therapy for years and noticed that a big part of what therapists do is ask the right questions at the right time. I wanted to see if an AI could serve as a daily self-reflection tool \u2014 not replacing therapy, but as a complement to it.\nSome design decisions and why:<p>Desktop-only, intentionally. I think therapy should feel like sitting down with your thoughts, not scrolling on your phone. The desktop constraint is a feature.\nBYOK (Bring Your Own Key). You use your own <em>Claude</em> API key. No backend, no data collection, no accounts. Your conversations never leave your machine. This felt non-negotiable for something dealing with mental health.\nBuilt with <em>Claude</em> <em>Code</em>. I work full-time as a team lead at an edtech company, so this was built in evenings and weekends, mostly through vibe coding sessions.\nI use it myself daily. 15\u201320 min sessions with Opus + <em>extended</em> <em>thinking</em>. After weeks of use, it picks up on patterns in how you think \u2014 recurring avoidance behaviors, cognitive distortions, etc.<p>The name comes from &quot;Gnothi Seauton&quot; (Know Thyself) \u2014 the inscription at the Temple of Delphi.\nThere's no comparable open-source tool in this space. Every mental health AI app I found is closed-source and collects user data. I wanted to build the alternative I wish existed.\nFeedback welcome \u2014 especially on the approach, architecture, or if this is fundamentally a bad idea. Happy to discuss.\nGitHub: <a href=\"https://github.com/Lepuz-coder/opengnothia\" rel=\"nofollow\">https://github.com/Lepuz-coder/opengnothia</a>"},"title":{"matchLevel":"none","matchedWords":[],"value":"Show HN: OpenGnothia \u2013 Open-source AI therapy companion (BYOK)"},"url":{"matchLevel":"none","matchedWords":[],"value":"https://www.opengnothia.com/tr"}},"_tags":["story","author_lepuzfcoder","story_47074375","show_hn"],"author":"lepuzfcoder","created_at":"2026-02-19T14:50:46Z","created_at_i":1771512646,"num_comments":0,"objectID":"47074375","points":2,"story_id":47074375,"story_text":"Hey HN. I built this because I&#x27;ve been in therapy for years and noticed that a big part of what therapists do is ask the right questions at the right time. I wanted to see if an AI could serve as a daily self-reflection tool \u2014 not replacing therapy, but as a complement to it.\nSome design decisions and why:<p>Desktop-only, intentionally. I think therapy should feel like sitting down with your thoughts, not scrolling on your phone. The desktop constraint is a feature.\nBYOK (Bring Your Own Key). You use your own Claude API key. No backend, no data collection, no accounts. Your conversations never leave your machine. This felt non-negotiable for something dealing with mental health.\nBuilt with Claude Code. I work full-time as a team lead at an edtech company, so this was built in evenings and weekends, mostly through vibe coding sessions.\nI use it myself daily. 15\u201320 min sessions with Opus + extended thinking. After weeks of use, it picks up on patterns in how you think \u2014 recurring avoidance behaviors, cognitive distortions, etc.<p>The name comes from &quot;Gnothi Seauton&quot; (Know Thyself) \u2014 the inscription at the Temple of Delphi.\nThere&#x27;s no comparable open-source tool in this space. Every mental health AI app I found is closed-source and collects user data. I wanted to build the alternative I wish existed.\nFeedback welcome \u2014 especially on the approach, architecture, or if this is fundamentally a bad idea. Happy to discuss.\nGitHub: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;Lepuz-coder&#x2F;opengnothia\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;Lepuz-coder&#x2F;opengnothia</a>","title":"Show HN: OpenGnothia \u2013 Open-source AI therapy companion (BYOK)","updated_at":"2026-03-05T23:35:45Z","url":"https://www.opengnothia.com/tr"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"StanAngeloff"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Hey HN, I posted Flemma back in October 2025 with no context. Since then I've shipped &gt;100 commits and used it daily as my primary AI workspace so I figured a proper update was due.<p>The core idea: a .chat file IS the conversation. No SQLite, no JSON logs, no shadow state. What you see in the buffer is exactly what the model receives. Edit an assistant reply to fix a hallucination, delete a tangent, fork by duplicating the file - it all works because there's nothing to fall out of sync.<p>What's new since October:<p>- Tool calling. Models can run shell commands, read/edit/write files (same as Pi, just 4 tools). Results go straight into the buffer. There's an approval flow (Ctrl-] cycles: preview -&gt; execute -&gt; send) so nothing runs without your say-so. Parallel tool use also works.<p>- Prompt caching for Anthropic, OpenAI and Vertex AI. Flemma places cache breakpoints automatically. Long conversations are now significantly cheaper (this was a major pain point for me).<p>- <em>Extended</em> <em>thinking</em> / reasoning support for all 3 providers.<p>- Per-buffer overrides via frontmatter. `flemma.opt` lets you pick which tools a buffer can use, set provider parameters, switch models - all scoped to that one file.<p>- Open registration APIs for both providers and tools. Custom tools can resolve definitions asynchronously from CLI subprocesses or remote APIs. I plan on adding mcporter support at some point.<p>Flemma works with Anthropic, OpenAI and Vertex AI. You get cost tracking, presets, Lua template expressions, file attachments and a lualine.nvim component.<p>One thing I want to be upfront about: nearly every line of <em>code</em> in Flemma was written by AI (<em>Claude</em> <em>Code</em> as of late, Amp and Aider in the past). It says so in the README. Every change was personally architected, reviewed and tested by me. I decide what gets built and I vet every diff. I think this is where a lot of software development is heading and I'd rather be honest about it than pretend otherwise.<p>I'm @StanAngeloff on GitHub - long-time Neovim user and open source enthusiast. Happy to answer questions.<p><a href=\"https://github.com/Flemma-Dev/flemma.nvim\" rel=\"nofollow\">https://github.com/Flemma-Dev/flemma.nvim</a>"},"title":{"matchLevel":"none","matchedWords":[],"value":"Show HN: Flemma \u2013 a Neovim plugin where the .chat buffer is the conversation"}},"_tags":["story","author_StanAngeloff","story_47004647","show_hn"],"author":"StanAngeloff","created_at":"2026-02-13T16:38:25Z","created_at_i":1771000705,"num_comments":0,"objectID":"47004647","points":2,"story_id":47004647,"story_text":"Hey HN, I posted Flemma back in October 2025 with no context. Since then I&#x27;ve shipped &gt;100 commits and used it daily as my primary AI workspace so I figured a proper update was due.<p>The core idea: a .chat file IS the conversation. No SQLite, no JSON logs, no shadow state. What you see in the buffer is exactly what the model receives. Edit an assistant reply to fix a hallucination, delete a tangent, fork by duplicating the file - it all works because there&#x27;s nothing to fall out of sync.<p>What&#x27;s new since October:<p>- Tool calling. Models can run shell commands, read&#x2F;edit&#x2F;write files (same as Pi, just 4 tools). Results go straight into the buffer. There&#x27;s an approval flow (Ctrl-] cycles: preview -&gt; execute -&gt; send) so nothing runs without your say-so. Parallel tool use also works.<p>- Prompt caching for Anthropic, OpenAI and Vertex AI. Flemma places cache breakpoints automatically. Long conversations are now significantly cheaper (this was a major pain point for me).<p>- Extended thinking &#x2F; reasoning support for all 3 providers.<p>- Per-buffer overrides via frontmatter. `flemma.opt` lets you pick which tools a buffer can use, set provider parameters, switch models - all scoped to that one file.<p>- Open registration APIs for both providers and tools. Custom tools can resolve definitions asynchronously from CLI subprocesses or remote APIs. I plan on adding mcporter support at some point.<p>Flemma works with Anthropic, OpenAI and Vertex AI. You get cost tracking, presets, Lua template expressions, file attachments and a lualine.nvim component.<p>One thing I want to be upfront about: nearly every line of code in Flemma was written by AI (Claude Code as of late, Amp and Aider in the past). It says so in the README. Every change was personally architected, reviewed and tested by me. I decide what gets built and I vet every diff. I think this is where a lot of software development is heading and I&#x27;d rather be honest about it than pretend otherwise.<p>I&#x27;m @StanAngeloff on GitHub - long-time Neovim user and open source enthusiast. Happy to answer questions.<p><a href=\"https:&#x2F;&#x2F;github.com&#x2F;Flemma-Dev&#x2F;flemma.nvim\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;Flemma-Dev&#x2F;flemma.nvim</a>","title":"Show HN: Flemma \u2013 a Neovim plugin where the .chat buffer is the conversation","updated_at":"2026-03-05T23:31:54Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"shikkra"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Today I tried to use <em>claude</em>.ai ($100 Max plan) with Opus 4.5 and <em>extended</em> <em>thinking</em> enabled. I was met with a weird retry message. It tried to generate a response 10 times and then automatically switched to a different model without any indication or confirmation.<p>I've been noticing different issues crop up frequently, both on the web and in <em>Claude</em> <em>Code</em>. So I decided to look into how often this has been happening.<p>Here's the number of incidents per month based on their own status page https://status.<em>claude</em>.com/history as of today:<p>February 2026    :  10 incidents (we\u2019re only 4 days in)\nJanuary 2026     :  26 incidents\nDecember 2025    :  21 incidents<p>At least 16 of these directly affected their most capable model <em>Claude</em> Opus 4.5:<p>3 incidents (Dec 21-23)\n9 incidents (Jan 7, 12, 13, 14, 20, 25-26, 28 x2)\n4 incidents (Feb 1, 2, 3, 4)<p>Ten more are related to the <em>claude</em>.ai platform itself. And that's not even counting how buggy it is day to day. I don't think I'm the only one who's had it generate a nearly complete response, only for something to go wrong and wipe the entire thing from the conversation. No way to recover it, just wasted tokens.<p>How is Anthropic not addressing this? They are one of the highest valued AI companies out there. Clearly they have the resources and engineers to fix these issues. Why isn\u2019t reliability a priority?"},"title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude"],"value":"Tell HN: <em>Claude</em> Has Had 57 Incidents in the Past 3 Months"}},"_tags":["story","author_shikkra","story_46885666","ask_hn"],"author":"shikkra","created_at":"2026-02-04T13:38:16Z","created_at_i":1770212296,"num_comments":0,"objectID":"46885666","points":2,"story_id":46885666,"story_text":"Today I tried to use claude.ai ($100 Max plan) with Opus 4.5 and extended thinking enabled. I was met with a weird retry message. It tried to generate a response 10 times and then automatically switched to a different model without any indication or confirmation.<p>I&#x27;ve been noticing different issues crop up frequently, both on the web and in Claude Code. So I decided to look into how often this has been happening.<p>Here&#x27;s the number of incidents per month based on their own status page https:&#x2F;&#x2F;status.claude.com&#x2F;history as of today:<p>February 2026    :  10 incidents (we\u2019re only 4 days in)\nJanuary 2026     :  26 incidents\nDecember 2025    :  21 incidents<p>At least 16 of these directly affected their most capable model Claude Opus 4.5:<p>3 incidents (Dec 21-23)\n9 incidents (Jan 7, 12, 13, 14, 20, 25-26, 28 x2)\n4 incidents (Feb 1, 2, 3, 4)<p>Ten more are related to the claude.ai platform itself. And that&#x27;s not even counting how buggy it is day to day. I don&#x27;t think I&#x27;m the only one who&#x27;s had it generate a nearly complete response, only for something to go wrong and wipe the entire thing from the conversation. No way to recover it, just wasted tokens.<p>How is Anthropic not addressing this? They are one of the highest valued AI companies out there. Clearly they have the resources and engineers to fix these issues. Why isn\u2019t reliability a priority?","title":"Tell HN: Claude Has Had 57 Incidents in the Past 3 Months","updated_at":"2026-03-05T23:31:18Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"pllu"},"story_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Hi HN! I've been working on a tool to auto-generate a comprehensive developer guide for AI agents working with your codebase.<p>It analyses your codebase against structured architecture patterns and development capabilities, producing a markdown document that maps out:<p>- Your architecture (<em>code</em> layout, application layers, frameworks, standards)\n- Development capabilities (how to interact with servers, read logs, run scripts, access test data, debug)\n- Guidance for continuous improvement as your app evolves<p>It takes minutes to try: run the generator prompt in your workspace using <em>Claude</em> <em>Code</em>, Cursor, or any AI coding tool (works best with <em>extended</em> <em>thinking</em> models). You can use the output in rules files like <em>CLAUDE</em>.md or .cursor/rules.<p>The tool is technology-agnostic, so should work for a wide range of apps, tools, services and libraries. There's a plugin system that gives the generator hints about technology-specific documentation (SST v3 plugin included as an example).<p>GitHub: <a href=\"https://github.com/martinpllu/agent-dev-guide\" rel=\"nofollow\">https://github.com/martinpllu/agent-dev-guide</a>\nExample output: <a href=\"https://github.com/martinpllu/agent-dev-example/blob/main/agent-dev-guide.md\" rel=\"nofollow\">https://github.com/martinpllu/agent-dev-example/blob/main/ag...</a><p>I'd love feedback from anyone trying this out."},"title":{"matchLevel":"none","matchedWords":[],"value":"Show HN: Agent Dev Guide \u2013 Generate structured context docs for AI coding agents"},"url":{"matchLevel":"none","matchedWords":[],"value":"https://github.com/martinpllu/agent-dev-guide"}},"_tags":["story","author_pllu","story_44912991","show_hn"],"author":"pllu","created_at":"2025-08-15T14:37:26Z","created_at_i":1755268646,"num_comments":0,"objectID":"44912991","points":1,"story_id":44912991,"story_text":"Hi HN! I&#x27;ve been working on a tool to auto-generate a comprehensive developer guide for AI agents working with your codebase.<p>It analyses your codebase against structured architecture patterns and development capabilities, producing a markdown document that maps out:<p>- Your architecture (code layout, application layers, frameworks, standards)\n- Development capabilities (how to interact with servers, read logs, run scripts, access test data, debug)\n- Guidance for continuous improvement as your app evolves<p>It takes minutes to try: run the generator prompt in your workspace using Claude Code, Cursor, or any AI coding tool (works best with extended thinking models). You can use the output in rules files like CLAUDE.md or .cursor&#x2F;rules.<p>The tool is technology-agnostic, so should work for a wide range of apps, tools, services and libraries. There&#x27;s a plugin system that gives the generator hints about technology-specific documentation (SST v3 plugin included as an example).<p>GitHub: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;martinpllu&#x2F;agent-dev-guide\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;martinpllu&#x2F;agent-dev-guide</a>\nExample output: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;martinpllu&#x2F;agent-dev-example&#x2F;blob&#x2F;main&#x2F;agent-dev-guide.md\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;martinpllu&#x2F;agent-dev-example&#x2F;blob&#x2F;main&#x2F;ag...</a><p>I&#x27;d love feedback from anyone trying this out.","title":"Show HN: Agent Dev Guide \u2013 Generate structured context docs for AI coding agents","updated_at":"2026-05-12T01:08:15Z","url":"https://github.com/martinpllu/agent-dev-guide"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"_fat_santa"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"At my company we're using <em>Claude</em> <em>Code</em> w/ API Billing and I found that unless you're running ralph loops on Opus with <em>extended</em> <em>thinking</em>, it's very hard to blow through more than $200/mo.<p>I made this argument earlier and I'll make it again, I think a major contributing factor to AI budgets exploding is the token leaderboards, culture of &quot;tokenmaxxing&quot; and the the constant narrative that if you're not burning X tokens a month, you're not a good engineer."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Uber caps employee AI spending after blowing through budget in four months"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://techcrunch.com/2026/06/02/uber-caps-employee-ai-spending-after-blowing-through-budget-in-four-months/"}},"_tags":["comment","author__fat_santa","story_48375544"],"author":"_fat_santa","children":[48377576],"comment_text":"At my company we&#x27;re using Claude Code w&#x2F; API Billing and I found that unless you&#x27;re running ralph loops on Opus with extended thinking, it&#x27;s very hard to blow through more than $200&#x2F;mo.<p>I made this argument earlier and I&#x27;ll make it again, I think a major contributing factor to AI budgets exploding is the token leaderboards, culture of &quot;tokenmaxxing&quot; and the the constant narrative that if you&#x27;re not burning X tokens a month, you&#x27;re not a good engineer.","created_at":"2026-06-02T21:41:22Z","created_at_i":1780436482,"objectID":"48376726","parent_id":48375544,"story_id":48375544,"story_title":"Uber caps employee AI spending after blowing through budget in four months","story_url":"https://techcrunch.com/2026/06/02/uber-caps-employee-ai-spending-after-blowing-through-budget-in-four-months/","updated_at":"2026-06-03T00:36:24Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"piotrwittchen"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Author here. AICTL is an open-source AI agent I've been building in Rust as an alternative to the various closed CLI assistants. It's a single workspace with three frontends sharing one engine:<p>- CLI (aictl): REPL with slash commands, agents, skills, plugins, hooks<p>- Desktop (macOS, work-in-progress): Tauri-based, same engine<p>- HTTP server (aictl-server): OpenAI/Anthropic-compatible proxy so you can put Aider/Continue/Cline in front of it and get redaction, audit, and prompt-injection guards for free<p>What I tried to get right:<p>- 11 providers in one binary: OpenAI, Anthropic, Gemini, Grok, Mistral, DeepSeek, Kimi, Z.ai, Ollama, plus native GGUF (llama.cpp) and MLX (Apple Silicon) for fully local inference.<p>- Security as a first-class layer, not a checkbox: CWD jail for tool calls, keyring-backed API keys, three-layer outbound redaction (regex + entropy + optional NER via gline-rs), prompt-injection detection, full JSONL audit log. Local providers skip redaction by default; cloud calls don't.<p>- A working MCP client with stdio + Streamable HTTP + legacy SSE transports (hand-rolled JSON-RPC, no extra deps).<p>- User-extensible: drop a manifest + executable in ~/.aictl/plugins/ for custom tools; lifecycle hooks can block, rewrite, or augment any turn.<p>What's honestly rough:<p>- Native GGUF and MLX inference are experimental. Tool-call formatting on small local models is hit-or-miss; chat templates are mostly ChatML. Cloud providers are the recommended daily driver.<p>- Desktop is macOS-only for now and still WIP.<p>- Not aimed at coding specifically \u2014 it's general-purpose. For dedicated coding agents I'd still point people at <em>Claude</em> <em>Code</em>, Codex, or opencode.<p>Install: curl -sSf <a href=\"https://aictl.app/install.sh\" rel=\"nofollow\">https://aictl.app/install.sh</a> | sh<p>Source: <a href=\"https://github.com/pwittchen/aictl\" rel=\"nofollow\">https://github.com/pwittchen/aictl</a><p>Happy to answer questions about the architecture, the security model, or the server's Anthropic-passthrough mode (the trickiest part \u2014 keeping tool_use blocks, prompt caching, and <em>extended</em> <em>thinking</em> intact across the proxy)."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Show HN: AICTL \u2013 A native AI agent for terminal and macOS, in Rust"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://aictl.app"}},"_tags":["comment","author_piotrwittchen","story_48127081"],"author":"piotrwittchen","comment_text":"Author here. AICTL is an open-source AI agent I&#x27;ve been building in Rust as an alternative to the various closed CLI assistants. It&#x27;s a single workspace with three frontends sharing one engine:<p>- CLI (aictl): REPL with slash commands, agents, skills, plugins, hooks<p>- Desktop (macOS, work-in-progress): Tauri-based, same engine<p>- HTTP server (aictl-server): OpenAI&#x2F;Anthropic-compatible proxy so you can put Aider&#x2F;Continue&#x2F;Cline in front of it and get redaction, audit, and prompt-injection guards for free<p>What I tried to get right:<p>- 11 providers in one binary: OpenAI, Anthropic, Gemini, Grok, Mistral, DeepSeek, Kimi, Z.ai, Ollama, plus native GGUF (llama.cpp) and MLX (Apple Silicon) for fully local inference.<p>- Security as a first-class layer, not a checkbox: CWD jail for tool calls, keyring-backed API keys, three-layer outbound redaction (regex + entropy + optional NER via gline-rs), prompt-injection detection, full JSONL audit log. Local providers skip redaction by default; cloud calls don&#x27;t.<p>- A working MCP client with stdio + Streamable HTTP + legacy SSE transports (hand-rolled JSON-RPC, no extra deps).<p>- User-extensible: drop a manifest + executable in ~&#x2F;.aictl&#x2F;plugins&#x2F; for custom tools; lifecycle hooks can block, rewrite, or augment any turn.<p>What&#x27;s honestly rough:<p>- Native GGUF and MLX inference are experimental. Tool-call formatting on small local models is hit-or-miss; chat templates are mostly ChatML. Cloud providers are the recommended daily driver.<p>- Desktop is macOS-only for now and still WIP.<p>- Not aimed at coding specifically \u2014 it&#x27;s general-purpose. For dedicated coding agents I&#x27;d still point people at Claude Code, Codex, or opencode.<p>Install: curl -sSf <a href=\"https:&#x2F;&#x2F;aictl.app&#x2F;install.sh\" rel=\"nofollow\">https:&#x2F;&#x2F;aictl.app&#x2F;install.sh</a> | sh<p>Source: <a href=\"https:&#x2F;&#x2F;github.com&#x2F;pwittchen&#x2F;aictl\" rel=\"nofollow\">https:&#x2F;&#x2F;github.com&#x2F;pwittchen&#x2F;aictl</a><p>Happy to answer questions about the architecture, the security model, or the server&#x27;s Anthropic-passthrough mode (the trickiest part \u2014 keeping tool_use blocks, prompt caching, and extended thinking intact across the proxy).","created_at":"2026-05-13T20:31:03Z","created_at_i":1778704263,"objectID":"48127085","parent_id":48127081,"story_id":48127081,"story_title":"Show HN: AICTL \u2013 A native AI agent for terminal and macOS, in Rust","story_url":"https://aictl.app","updated_at":"2026-05-13T20:35:08Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"ianbicking"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"I've been doing something similar to this in a personal <em>claude</em> <em>code</em> frontend, though not particularly &quot;magical&quot;.<p>I'm mostly using my system to make comments on long AI-generated documents (especially design documents). I find it works well to have the AI generate something, and then I read through it, making comments along the way.<p>You can get pretty far just repeating the things you see... &quot;I'm reading [heading] and [comments]&quot;. But I do find some use in selecting content and saying &quot;I don't agree with this&quot; or whatever else.<p>The result is just an augmented message. It looks like:<p><pre><code>    &lt;transcript&gt;\n      Let's see what we've got here.\n      &lt;selection doc=&quot;proposal.md&quot; location=&quot;paragraph 3&quot;&gt;\n        The system already...\n      &lt;/selection&gt;\n      No, I don't like how this is approaching the problem, ...\n    &lt;/transcript&gt;\n</code></pre>\nThen I just send this as a user message. <em>Claude</em> <em>Code</em> (and I'm guessing any of the agentic systems) picks up on the markup very easily. It also helps to label it as a transcript, as it can understand there may be errors, and things like spelling and punctuation are inferred not deliberate. (Some additional instruction is necessary to help it understand, for example, that it should look for homophones that might make more sense in context.)<p>It makes reviewing feel pretty relaxed and natural. I've played around with similar note taking systems, which I think could be great for studying in school, but haven't had the focus on that particular problem to take it very far.<p>But I think the best thing really is giving the agent a richer understanding of what the user is experiencing and doing and just creating a rich representation of that. The keywords can be useful, but almost only as checkpoints: a keyword can identify the moment to take the transcript and package it up and deliver it.<p>One difference perhaps in design motivation: I have really embraced long latency interactions. I use ChatGPT with <em>extended</em> <em>thinking</em> by default, and just suck it up when the answer didn't really require <em>thinking</em>. I deliver 10 points of feedback at once instead of little by little. (Often halfway through I explicitly contradict myself, because I'm <em>thinking</em> out loud and my ideas are developing.) I just don't stress out about latency or feedback, and so low-latency but lower-intelligence interactions don't do it for me (such as ChatGPT's advanced voice mode, or probably <em>Thinking</em> Machine's work). I think this focus is in part a value statement: I'm trying to do higher quality work, not faster work."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Reimagining the mouse pointer for the AI era"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://deepmind.google/blog/ai-pointer/"}},"_tags":["comment","author_ianbicking","story_48111581"],"author":"ianbicking","children":[48115870],"comment_text":"I&#x27;ve been doing something similar to this in a personal claude code frontend, though not particularly &quot;magical&quot;.<p>I&#x27;m mostly using my system to make comments on long AI-generated documents (especially design documents). I find it works well to have the AI generate something, and then I read through it, making comments along the way.<p>You can get pretty far just repeating the things you see... &quot;I&#x27;m reading [heading] and [comments]&quot;. But I do find some use in selecting content and saying &quot;I don&#x27;t agree with this&quot; or whatever else.<p>The result is just an augmented message. It looks like:<p><pre><code>    &lt;transcript&gt;\n      Let&#x27;s see what we&#x27;ve got here.\n      &lt;selection doc=&quot;proposal.md&quot; location=&quot;paragraph 3&quot;&gt;\n        The system already...\n      &lt;&#x2F;selection&gt;\n      No, I don&#x27;t like how this is approaching the problem, ...\n    &lt;&#x2F;transcript&gt;\n</code></pre>\nThen I just send this as a user message. Claude Code (and I&#x27;m guessing any of the agentic systems) picks up on the markup very easily. It also helps to label it as a transcript, as it can understand there may be errors, and things like spelling and punctuation are inferred not deliberate. (Some additional instruction is necessary to help it understand, for example, that it should look for homophones that might make more sense in context.)<p>It makes reviewing feel pretty relaxed and natural. I&#x27;ve played around with similar note taking systems, which I think could be great for studying in school, but haven&#x27;t had the focus on that particular problem to take it very far.<p>But I think the best thing really is giving the agent a richer understanding of what the user is experiencing and doing and just creating a rich representation of that. The keywords can be useful, but almost only as checkpoints: a keyword can identify the moment to take the transcript and package it up and deliver it.<p>One difference perhaps in design motivation: I have really embraced long latency interactions. I use ChatGPT with extended thinking by default, and just suck it up when the answer didn&#x27;t really require thinking. I deliver 10 points of feedback at once instead of little by little. (Often halfway through I explicitly contradict myself, because I&#x27;m thinking out loud and my ideas are developing.) I just don&#x27;t stress out about latency or feedback, and so low-latency but lower-intelligence interactions don&#x27;t do it for me (such as ChatGPT&#x27;s advanced voice mode, or probably Thinking Machine&#x27;s work). I think this focus is in part a value statement: I&#x27;m trying to do higher quality work, not faster work.","created_at":"2026-05-12T20:50:39Z","created_at_i":1778619039,"objectID":"48114362","parent_id":48111581,"story_id":48111581,"story_title":"Reimagining the mouse pointer for the AI era","story_url":"https://deepmind.google/blog/ai-pointer/","updated_at":"2026-05-12T23:20:02Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"rkuska"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"For 4.7 it is no longer possible to disable adaptive <em>thinking</em>. Which is weird given the comment from Boris followed with silence (and closed github issue). So much for the transparency.<p>&gt; <em>Claude</em> Opus 4.7 (<em>claude</em>-opus-4-7), adaptive <em>thinking</em> is the only supported <em>thinking</em> mode. <em>Thinking</em> is off unless you explicitly set <em>thinking</em>: {type: &quot;adaptive&quot;} in your request; manual <em>thinking</em>: {type: &quot;enabled&quot;} is rejected with a 400 error.<p><a href=\"https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking\" rel=\"nofollow\">https://platform.<em>claude</em>.com/docs/en/build-with-<em>claude</em>/adapti...</a><p>For my <em>claude</em> <em>code</em> I went with following config:<p>* /effort xhigh (in the terminal cli) - To avoid lazying<p>*   &quot;env&quot;: {&quot;<em>CLAUDE</em>_<em>CODE</em>_DISABLE_1M_CONTEXT&quot;: &quot;1&quot;} (settings.json) - It seems like opus is just worse with larger context<p>*  &quot;display&quot;: &quot;summarized&quot; (settings.json) - To bring back summaries.<p>*  &quot;showThinkingSummaries&quot;: true (settings.json) - Should show <em>extended</em> <em>thinking</em> summaries in interactive sessions<p>Freaking wizardry."},"story_title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude"],"value":"<em>Claude</em> Opus 4.7"},"story_url":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude"],"value":"https://www.anthropic.com/news/<em>claude</em>-opus-4-7"}},"_tags":["comment","author_rkuska","story_47793411"],"author":"rkuska","children":[47804179,47806614],"comment_text":"For 4.7 it is no longer possible to disable adaptive thinking. Which is weird given the comment from Boris followed with silence (and closed github issue). So much for the transparency.<p>&gt; Claude Opus 4.7 (claude-opus-4-7), adaptive thinking is the only supported thinking mode. Thinking is off unless you explicitly set thinking: {type: &quot;adaptive&quot;} in your request; manual thinking: {type: &quot;enabled&quot;} is rejected with a 400 error.<p><a href=\"https:&#x2F;&#x2F;platform.claude.com&#x2F;docs&#x2F;en&#x2F;build-with-claude&#x2F;adaptive-thinking\" rel=\"nofollow\">https:&#x2F;&#x2F;platform.claude.com&#x2F;docs&#x2F;en&#x2F;build-with-claude&#x2F;adapti...</a><p>For my claude code I went with following config:<p>* &#x2F;effort xhigh (in the terminal cli) - To avoid lazying<p>*   &quot;env&quot;: {&quot;CLAUDE_CODE_DISABLE_1M_CONTEXT&quot;: &quot;1&quot;} (settings.json) - It seems like opus is just worse with larger context<p>*  &quot;display&quot;: &quot;summarized&quot; (settings.json) - To bring back summaries.<p>*  &quot;showThinkingSummaries&quot;: true (settings.json) - Should show extended thinking summaries in interactive sessions<p>Freaking wizardry.","created_at":"2026-04-17T08:19:11Z","created_at_i":1776413951,"objectID":"47803650","parent_id":47796722,"story_id":47793411,"story_title":"Claude Opus 4.7","story_url":"https://www.anthropic.com/news/claude-opus-4-7","updated_at":"2026-04-20T10:22:59Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"troupo"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"They are now literally blaming users for using their product as advertised:<p><a href=\"https://x.com/lydiahallie/status/2039800718371307603\" rel=\"nofollow\">https://x.com/lydiahallie/status/2039800718371307603</a><p>--- start quote ---<p>Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips:<p>\u2022 Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start.<p>\u2022 Lower the effort level or turn off <em>extended</em> <em>thinking</em> when you don't need deep reasoning. Switch at session start.<p>\u2022 Start fresh instead of resuming large sessions that have been idle ~1h<p>\u2022 Cap your context window, long sessions cost more <em>CLAUDE</em>_<em>CODE</em>_AUTO_COMPACT_WINDOW=200000<p>--- end quote ---<p><a href=\"https://x.com/bcherny/status/2043163965648515234\" rel=\"nofollow\">https://x.com/bcherny/status/2043163965648515234</a><p>--- start quote ---<p>We defaulted to medium [reasoning] as a result of user feedback about <em>Claude</em> using too many tokens. When we made the change, we (1) included it in the changelog and (2) showed a dialog when you opened <em>Claude</em> <em>Code</em> so you could choose to opt out. Literally nothing sneaky about it \u2014 this was us addressing user feedback in an obvious and explicit way.<p>--- end quote ---"},"story_title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"<em>Claude</em> <em>Code</em> Routines"},"story_url":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"https://<em>code</em>.<em>claude</em>.com/docs/en/routines"}},"_tags":["comment","author_troupo","story_47768133"],"author":"troupo","children":[47772599,47801413],"comment_text":"They are now literally blaming users for using their product as advertised:<p><a href=\"https:&#x2F;&#x2F;x.com&#x2F;lydiahallie&#x2F;status&#x2F;2039800718371307603\" rel=\"nofollow\">https:&#x2F;&#x2F;x.com&#x2F;lydiahallie&#x2F;status&#x2F;2039800718371307603</a><p>--- start quote ---<p>Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips:<p>\u2022 Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start.<p>\u2022 Lower the effort level or turn off extended thinking when you don&#x27;t need deep reasoning. Switch at session start.<p>\u2022 Start fresh instead of resuming large sessions that have been idle ~1h<p>\u2022 Cap your context window, long sessions cost more CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000<p>--- end quote ---<p><a href=\"https:&#x2F;&#x2F;x.com&#x2F;bcherny&#x2F;status&#x2F;2043163965648515234\" rel=\"nofollow\">https:&#x2F;&#x2F;x.com&#x2F;bcherny&#x2F;status&#x2F;2043163965648515234</a><p>--- start quote ---<p>We defaulted to medium [reasoning] as a result of user feedback about Claude using too many tokens. When we made the change, we (1) included it in the changelog and (2) showed a dialog when you opened Claude Code so you could choose to opt out. Literally nothing sneaky about it \u2014 this was us addressing user feedback in an obvious and explicit way.<p>--- end quote ---","created_at":"2026-04-14T21:17:28Z","created_at_i":1776201448,"objectID":"47771648","parent_id":47770981,"story_id":47768133,"story_title":"Claude Code Routines","story_url":"https://code.claude.com/docs/en/routines","updated_at":"2026-04-17T01:01:17Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"ai_slop_hater"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"I also have similar experience with their API, i.e. some requests get stalled for minutes with zero events coming in from Anthropic. Presumably the model does this &quot;<em>extended</em> <em>thinking</em>&quot; but no way to see that. I treat these requests as stuck and retry. Same experience in <em>Claude</em> <em>Code</em> Opus 4.6 when effort is set to &quot;high&quot;\u2014the model gets stuck for ten minutes (at which point I cancel) and token count indicator doesn't increase.<p>I am not buying what this guy says. He is either lying or not telling us everything."},"story_title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"Issue: <em>Claude</em> <em>Code</em> is unusable for complex engineering tasks with Feb updates"},"story_url":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"https://github.com/anthropics/<em>claude</em>-<em>code</em>/issues/42796"}},"_tags":["comment","author_ai_slop_hater","story_47660925"],"author":"ai_slop_hater","comment_text":"I also have similar experience with their API, i.e. some requests get stalled for minutes with zero events coming in from Anthropic. Presumably the model does this &quot;extended thinking&quot; but no way to see that. I treat these requests as stuck and retry. Same experience in Claude Code Opus 4.6 when effort is set to &quot;high&quot;\u2014the model gets stuck for ten minutes (at which point I cancel) and token count indicator doesn&#x27;t increase.<p>I am not buying what this guy says. He is either lying or not telling us everything.","created_at":"2026-04-06T18:23:32Z","created_at_i":1775499812,"objectID":"47664823","parent_id":47664761,"story_id":47660925,"story_title":"Issue: Claude Code is unusable for complex engineering tasks with Feb updates","story_url":"https://github.com/anthropics/claude-code/issues/42796","updated_at":"2026-04-07T09:36:08Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"bcherny"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Hey all, Boris from the <em>Claude</em> <em>Code</em> team here. I just responded on the issue, and cross-posting here for input.<p>---<p>Hi, thanks for the detailed analysis. Before I keep going, I wanted to say I appreciate the depth of <em>thinking</em> &amp; care that went into this.<p>There's a lot here, I will try to break it down a bit. These are the two core things happening:<p>&gt; `redact-<em>thinking</em>-2026-02-12`<p>This beta header hides <em>thinking</em> from the UI, since most people don't look at it. It *does not* impact <em>thinking</em> itself, nor does it impact <em>thinking</em> budgets or the way <em>extended</em> reasoning works under the hood. It is a UI-only change.<p>Under the hood, by setting this header we avoid needing <em>thinking</em> summaries, which reduces latency. You can opt out of it with `showThinkingSummaries: true` in your settings.json (see [docs](<a href=\"https://code.claude.com/docs/en/settings#available-settings\" rel=\"nofollow\">https://<em>code</em>.<em>claude</em>.com/docs/en/settings#available-settings</a>)).<p>If you are analyzing locally stored transcripts, you wouldn't see raw <em>thinking</em> stored when this header is set, which is likely influencing the analysis. When <em>Claude</em> sees lack of <em>thinking</em> in transcripts for this analysis, it may not realize that the <em>thinking</em> is still there, and is simply not user-facing.<p>&gt; <em>Thinking</em> depth had already dropped ~67% by late February<p>We landed two changes in Feb that would have impacted this. We evaluated both carefully:<p>1/ Opus 4.6 launch \u2192 adaptive <em>thinking</em> default (Feb 9)<p>Opus 4.6 supports adaptive <em>thinking</em>, which is different from <em>thinking</em> budgets that we used to support. In this mode, the model decides how long to think for, which tends to work better than fixed <em>thinking</em> budgets across the board. `<em>CLAUDE</em>_<em>CODE</em>_DISABLE_ADAPTIVE_<em>THINKING</em>` to opt out.<p>2/ Medium effort (85) default on Opus 4.6 (Mar 3)<p>We found that effort=85 was a sweet spot on the intelligence-latency/cost curve for most users, improving token efficiency while reducing latency. On of our product principles is to avoid changing settings on users' behalf, and ideally we would have set effort=85 from the start. We felt this was an important setting to change, so our approach was to:<p>1. Roll it out with a dialog so users are aware of the change and have a chance to opt out<p>2. Show the effort the first few times you opened <em>Claude</em> <em>Code</em>, so it wasn't surprising.<p>Some people want the model to think for longer, even if it takes more time and tokens. To improve intelligence more, set effort=high via `/effort` or in your settings.json. This setting is sticky across sessions, and can be shared among users. You can also use the ULTRATHINK keyword to use high effort for a single turn, or set `/effort max` to use even higher effort for the rest of the conversation.<p>Going forward, we will test defaulting Teams and Enterprise users to high effort, to benefit from <em>extended</em> <em>thinking</em> even if it comes at the cost of additional tokens &amp; latency. This default is configurable in exactly the same way, via `/effort` and settings.json."},"story_title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"Issue: <em>Claude</em> <em>Code</em> is unusable for complex engineering tasks with Feb updates"},"story_url":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"https://github.com/anthropics/<em>claude</em>-<em>code</em>/issues/42796"}},"_tags":["comment","author_bcherny","story_47660925"],"author":"bcherny","children":[47664476,47664511,47664563,47664570,47664695,47664711,47664733,47664742,47664747,47664793,47664799,47664872,47664881,47665013,47665084,47665133,47665289,47665374,47665499,47665562,47665993,47666151,47666673,47668126,47668172,47668209,47668225,47668780,47669265,47669467,47669476,47669843,47669979,47670040,47670192,47670277,47670710,47670748,47670962,47670970,47671723,47671777,47671992,47672036,47673129,47673217,47675460,47676015,47676293,47676381,47676710,47677301,47678065,47682834,47690850,47700322],"comment_text":"Hey all, Boris from the Claude Code team here. I just responded on the issue, and cross-posting here for input.<p>---<p>Hi, thanks for the detailed analysis. Before I keep going, I wanted to say I appreciate the depth of thinking &amp; care that went into this.<p>There&#x27;s a lot here, I will try to break it down a bit. These are the two core things happening:<p>&gt; `redact-thinking-2026-02-12`<p>This beta header hides thinking from the UI, since most people don&#x27;t look at it. It *does not* impact thinking itself, nor does it impact thinking budgets or the way extended reasoning works under the hood. It is a UI-only change.<p>Under the hood, by setting this header we avoid needing thinking summaries, which reduces latency. You can opt out of it with `showThinkingSummaries: true` in your settings.json (see [docs](<a href=\"https:&#x2F;&#x2F;code.claude.com&#x2F;docs&#x2F;en&#x2F;settings#available-settings\" rel=\"nofollow\">https:&#x2F;&#x2F;code.claude.com&#x2F;docs&#x2F;en&#x2F;settings#available-settings</a>)).<p>If you are analyzing locally stored transcripts, you wouldn&#x27;t see raw thinking stored when this header is set, which is likely influencing the analysis. When Claude sees lack of thinking in transcripts for this analysis, it may not realize that the thinking is still there, and is simply not user-facing.<p>&gt; Thinking depth had already dropped ~67% by late February<p>We landed two changes in Feb that would have impacted this. We evaluated both carefully:<p>1&#x2F; Opus 4.6 launch \u2192 adaptive thinking default (Feb 9)<p>Opus 4.6 supports adaptive thinking, which is different from thinking budgets that we used to support. In this mode, the model decides how long to think for, which tends to work better than fixed thinking budgets across the board. `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING` to opt out.<p>2&#x2F; Medium effort (85) default on Opus 4.6 (Mar 3)<p>We found that effort=85 was a sweet spot on the intelligence-latency&#x2F;cost curve for most users, improving token efficiency while reducing latency. On of our product principles is to avoid changing settings on users&#x27; behalf, and ideally we would have set effort=85 from the start. We felt this was an important setting to change, so our approach was to:<p>1. Roll it out with a dialog so users are aware of the change and have a chance to opt out<p>2. Show the effort the first few times you opened Claude Code, so it wasn&#x27;t surprising.<p>Some people want the model to think for longer, even if it takes more time and tokens. To improve intelligence more, set effort=high via `&#x2F;effort` or in your settings.json. This setting is sticky across sessions, and can be shared among users. You can also use the ULTRATHINK keyword to use high effort for a single turn, or set `&#x2F;effort max` to use even higher effort for the rest of the conversation.<p>Going forward, we will test defaulting Teams and Enterprise users to high effort, to benefit from extended thinking even if it comes at the cost of additional tokens &amp; latency. This default is configurable in exactly the same way, via `&#x2F;effort` and settings.json.","created_at":"2026-04-06T17:56:20Z","created_at_i":1775498180,"objectID":"47664442","parent_id":47660925,"story_id":47660925,"story_title":"Issue: Claude Code is unusable for complex engineering tasks with Feb updates","story_url":"https://github.com/anthropics/claude-code/issues/42796","updated_at":"2026-06-17T14:06:08Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"CharlesW"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"&gt; <i>Somewhere between Haiku 4.5 and Sonnet 4.5</i><p>That's like saying &quot;somewhere between Eliza and Haiku 4.5&quot;. Haiku is not even a so-called 'reasoning model'.\u00b9<p>\u00b9 <i>To preempt the easily-offended, this is what the latest Opus 4.6 in today's <em>Claude</em> <em>Code</em> update says: &quot;<em>Claude</em> Haiku 4.5 is not a reasoning model \u2014 it's optimized for speed and cost efficiency. It's the fastest model in the <em>Claude</em> family, good for quick, straightforward tasks, but it doesn't have <em>extended</em> <em>thinking</em>/reasoning capabilities.&quot;</i>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://venturebeat.com/technology/alibabas-new-open-source-qwen3-5-medium-models-offer-sonnet-4-5-performance"}},"_tags":["comment","author_CharlesW","story_47199781"],"author":"CharlesW","children":[47200905],"comment_text":"&gt; <i>Somewhere between Haiku 4.5 and Sonnet 4.5</i><p>That&#x27;s like saying &quot;somewhere between Eliza and Haiku 4.5&quot;. Haiku is not even a so-called &#x27;reasoning model&#x27;.\u00b9<p>\u00b9 <i>To preempt the easily-offended, this is what the latest Opus 4.6 in today&#x27;s Claude Code update says: &quot;Claude Haiku 4.5 is not a reasoning model \u2014 it&#x27;s optimized for speed and cost efficiency. It&#x27;s the fastest model in the Claude family, good for quick, straightforward tasks, but it doesn&#x27;t have extended thinking&#x2F;reasoning capabilities.&quot;</i>","created_at":"2026-02-28T22:00:07Z","created_at_i":1772316007,"objectID":"47200744","parent_id":47200621,"story_id":47199781,"story_title":"Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers","story_url":"https://venturebeat.com/technology/alibabas-new-open-source-qwen3-5-medium-models-offer-sonnet-4-5-performance","updated_at":"2026-03-05T23:38:24Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"pi-netizen"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"<em>Claude</em> <em>Code</em> stores every conversation as JSONL files under ~/.<em>claude</em>/projects/. I kept wanting to find old sessions, &quot;where did I debug that Redis issue?&quot; - so I built a CLI that searches across all of them.<p>It supports filtering by date (--since &quot;2 weeks ago&quot;), extracting only <em>code</em> blocks (--<em>code</em>-only), scoping to a specific project (--project myapp), and jumping directly back into a session (--open). It also surfaces <em>extended</em> <em>thinking</em> blocks if you want to see the reasoning behind an answer.<p>No server, no API calls, no sync. It reads the files <em>Claude</em> <em>Code</em> already writes locally."},"story_title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude","code"],"value":"<em>Claude</em>-search \u2013 grep, resume your <em>Claude</em> <em>Code</em> session history from the CLI"},"story_url":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude"],"value":"https://github.com/pi-netizen/<em>claude</em>-search"}},"_tags":["comment","author_pi-netizen","story_47176556"],"author":"pi-netizen","comment_text":"Claude Code stores every conversation as JSONL files under ~&#x2F;.claude&#x2F;projects&#x2F;. I kept wanting to find old sessions, &quot;where did I debug that Redis issue?&quot; - so I built a CLI that searches across all of them.<p>It supports filtering by date (--since &quot;2 weeks ago&quot;), extracting only code blocks (--code-only), scoping to a specific project (--project myapp), and jumping directly back into a session (--open). It also surfaces extended thinking blocks if you want to see the reasoning behind an answer.<p>No server, no API calls, no sync. It reads the files Claude Code already writes locally.","created_at":"2026-02-27T04:34:27Z","created_at_i":1772166867,"objectID":"47176557","parent_id":47176556,"story_id":47176556,"story_title":"Claude-search \u2013 grep, resume your Claude Code session history from the CLI","story_url":"https://github.com/pi-netizen/claude-search","updated_at":"2026-03-05T23:38:17Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"rob"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["claude","code","extended","thinking"],"value":"Subagents, plugins, skills, hooks, mcp servers, output styles, memory, <em>extended</em> <em>thinking</em>... seems like a bunch of stuff you can configure in <em>Claude</em> <em>Code</em> that overlap in a lot of areas. Wish they could figure out a way to simplify things."},"story_title":{"fullyHighlighted":false,"matchLevel":"partial","matchedWords":["claude"],"value":"<em>Claude</em> Skills"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://www.anthropic.com/news/skills"}},"_tags":["comment","author_rob","story_45607117"],"author":"rob","children":[45608249,45608455],"comment_text":"Subagents, plugins, skills, hooks, mcp servers, output styles, memory, extended thinking... seems like a bunch of stuff you can configure in Claude Code that overlap in a lot of areas. Wish they could figure out a way to simplify things.","created_at":"2025-10-16T17:18:33Z","created_at_i":1760635113,"objectID":"45608042","parent_id":45607117,"story_id":45607117,"story_title":"Claude Skills","story_url":"https://www.anthropic.com/news/skills","updated_at":"2026-03-05T22:48:34Z"}],"hitsPerPage":20,"nbHits":62,"nbPages":4,"page":0,"params":"query=claude+code+extended+thinking&advancedSyntax=true&analyticsTags=backend","processingTimeMS":21,"processingTimingsMS":{"_request":{"queue":5,"roundTrip":22},"afterFetch":{"format":{"highlighting":1,"total":2},"merge":{"mergeLoop":{"prepareNextHit":1,"total":1},"total":1},"total":1},"fetch":{"query":10,"scanning":8,"total":19},"total":21},"query":"claude code extended thinking","serverTimeMS":28}
