{"exhaustive":{"nbHits":false,"typo":false},"exhaustiveNbHits":false,"exhaustiveTypo":false,"hits":[{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"defrost"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"In the US, sure.<p>In Australia we established a Royal Commission into Institutional Responses to Child Sexual Abuse, looked at all the schools and institutions regardless of creed (and, it turned out, the Christian Brothers were the clear worst of the worst - although few came away <em>unscath</em>ed) and then put a senior Vatican Cardinal on trial.<p>TBH it's been a <i>lot</i> harder to get the worst carbon offenders under close scrutiny in a very public eye."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"U.S. Science Is in Chaos"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://www.scientificamerican.com/article/americas-compact-between-science-and-politics-is-broken/"}},"_tags":["comment","author_defrost","story_48568058"],"author":"defrost","comment_text":"In the US, sure.<p>In Australia we established a Royal Commission into Institutional Responses to Child Sexual Abuse, looked at all the schools and institutions regardless of creed (and, it turned out, the Christian Brothers were the clear worst of the worst - although few came away unscathed) and then put a senior Vatican Cardinal on trial.<p>TBH it&#x27;s been a <i>lot</i> harder to get the worst carbon offenders under close scrutiny in a very public eye.","created_at":"2026-06-17T11:48:07Z","created_at_i":1781696887,"objectID":"48569027","parent_id":48568896,"story_id":48568058,"story_title":"U.S. Science Is in Chaos","story_url":"https://www.scientificamerican.com/article/americas-compact-between-science-and-politics-is-broken/","updated_at":"2026-06-17T11:57:22Z"},{"_highlightResult":{"author":{"fullyHighlighted":true,"matchLevel":"full","matchedWords":["unsloth"],"value":"<em>fsloth</em>"},"comment_text":{"matchLevel":"none","matchedWords":[],"value":"&quot;Founding cannot be a commodity. If it is, you have no moat or point, meaning you instantly collapse again, because you are an interchangeable commodity.&quot;<p>IMHO you still need to find the product and PMF<p>There are bunch of books startup world recommends which sort of all start from the principle of product, users, traction.<p>This is sort of scaffolding around that. It's not entirely insane to try to formalize this process - there already are books that do this (Bill Aulet, Disciplined entrepreneurship).<p>&quot;nor does it make sense for society to have people founding businesses at a scale&quot;<p>Maybe not at scale of moving lawns but I'm pretty sure the world is full of nichces that still lack specific software offering or where options of software offerings are limited.<p>This is like &quot;Uber for logging&quot; or &quot;time reservation system for cat dentists&quot; level of &quot;take existing product category and apply to a domain you know&quot;.<p>So not every cat dentist needs to found a cat dentist time reservation app but I'm sure there are niches withing niches with business opportunities awaiting."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"The founder's playbook: Building an AI-native startup"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://claude.com/blog/the-founders-playbook"}},"_tags":["comment","author_fsloth","story_48566832"],"author":"fsloth","comment_text":"&quot;Founding cannot be a commodity. If it is, you have no moat or point, meaning you instantly collapse again, because you are an interchangeable commodity.&quot;<p>IMHO you still need to find the product and PMF<p>There are bunch of books startup world recommends which sort of all start from the principle of product, users, traction.<p>This is sort of scaffolding around that. It&#x27;s not entirely insane to try to formalize this process - there already are books that do this (Bill Aulet, Disciplined entrepreneurship).<p>&quot;nor does it make sense for society to have people founding businesses at a scale&quot;<p>Maybe not at scale of moving lawns but I&#x27;m pretty sure the world is full of nichces that still lack specific software offering or where options of software offerings are limited.<p>This is like &quot;Uber for logging&quot; or &quot;time reservation system for cat dentists&quot; level of &quot;take existing product category and apply to a domain you know&quot;.<p>So not every cat dentist needs to found a cat dentist time reservation app but I&#x27;m sure there are niches withing niches with business opportunities awaiting.","created_at":"2026-06-17T09:30:08Z","created_at_i":1781688608,"objectID":"48567887","parent_id":48567012,"story_id":48566832,"story_title":"The founder's playbook: Building an AI-native startup","story_url":"https://claude.com/blog/the-founders-playbook","updated_at":"2026-06-17T11:02:08Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"Stagnant"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"I've been using <em>unsloth</em>/gemma-4-31B-it-qat-GGUF daily for various small parsing and programming tasks using opencode and llama-server's front end. The past couple of weeks have made a big difference after google released the QAT variant and llama.cpp got support for MTP which means it is possible to now get 60-80 Tok/s with RTX 4090. The model fits in VRAM comfortably enough to keep it loaded even while browsing and having multiple programs."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Running local models is good now"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/"}},"_tags":["comment","author_Stagnant","story_48555993"],"author":"Stagnant","children":[48561319,48562103],"comment_text":"I&#x27;ve been using unsloth&#x2F;gemma-4-31B-it-qat-GGUF daily for various small parsing and programming tasks using opencode and llama-server&#x27;s front end. The past couple of weeks have made a big difference after google released the QAT variant and llama.cpp got support for MTP which means it is possible to now get 60-80 Tok&#x2F;s with RTX 4090. The model fits in VRAM comfortably enough to keep it loaded even while browsing and having multiple programs.","created_at":"2026-06-16T19:50:15Z","created_at_i":1781639415,"objectID":"48560960","parent_id":48557467,"story_id":48555993,"story_title":"Running local models is good now","story_url":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/","updated_at":"2026-06-16T23:17:05Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"c0rruptbytes"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"I would try a 6-bit MoE and maybe with <em>unsloth</em>'s studio, they claim to have auto tool fixing which is where i see a lot of issues with MoEs<p>I'm on a 48gb M5 Pro right now and it's been okay, a lot of my rough experiences have been with MLX and I'm finding that GGUFs are okay now"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Running local models is good now"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/"}},"_tags":["comment","author_c0rruptbytes","story_48555993"],"author":"c0rruptbytes","comment_text":"I would try a 6-bit MoE and maybe with unsloth&#x27;s studio, they claim to have auto tool fixing which is where i see a lot of issues with MoEs<p>I&#x27;m on a 48gb M5 Pro right now and it&#x27;s been okay, a lot of my rough experiences have been with MLX and I&#x27;m finding that GGUFs are okay now","created_at":"2026-06-16T19:27:29Z","created_at_i":1781638049,"objectID":"48560622","parent_id":48560554,"story_id":48555993,"story_title":"Running local models is good now","story_url":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/","updated_at":"2026-06-16T19:31:49Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"upboundspiral"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"with llama-cpp and offloading non-active experts (from MOE architecture) to cpu RAM, you can easily run 50 tok / s QWEN-3.6 35B on 8-12 GB of VRAM. \nKV cache is a few GB, experts are ~3-5 GB (assuming q8 quant from <em>Unsloth</em> for example).<p>You can scroll through r/localllama and find tons of people getting useable speeds out of Qwen 35B.<p>24 tok / second on an ancient 1080ti<p><a href=\"https://old.reddit.com/r/LocalLLaMA/comments/1tcc7h5/24_toks_from_30b_moe_models_on_an_old_gtx_1080_8/\" rel=\"nofollow\">https://old.reddit.com/r/LocalLLaMA/comments/1tcc7h5/24_toks...</a><p>100 tok / second on a 4070<p><a href=\"https://old.reddit.com/r/LocalLLaMA/comments/1tjh7az/110_toks_with_12gb_vram_on_qwen36_35b_a3b_and_ik/\" rel=\"nofollow\">https://old.reddit.com/r/LocalLLaMA/comments/1tjh7az/110_tok...</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://twitter.com/fguzmanai/status/2065832668172845209"}},"_tags":["comment","author_upboundspiral","story_48557535"],"author":"upboundspiral","comment_text":"with llama-cpp and offloading non-active experts (from MOE architecture) to cpu RAM, you can easily run 50 tok &#x2F; s QWEN-3.6 35B on 8-12 GB of VRAM. \nKV cache is a few GB, experts are ~3-5 GB (assuming q8 quant from Unsloth for example).<p>You can scroll through r&#x2F;localllama and find tons of people getting useable speeds out of Qwen 35B.<p>24 tok &#x2F; second on an ancient 1080ti<p><a href=\"https:&#x2F;&#x2F;old.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1tcc7h5&#x2F;24_toks_from_30b_moe_models_on_an_old_gtx_1080_8&#x2F;\" rel=\"nofollow\">https:&#x2F;&#x2F;old.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1tcc7h5&#x2F;24_toks...</a><p>100 tok &#x2F; second on a 4070<p><a href=\"https:&#x2F;&#x2F;old.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1tjh7az&#x2F;110_toks_with_12gb_vram_on_qwen36_35b_a3b_and_ik&#x2F;\" rel=\"nofollow\">https:&#x2F;&#x2F;old.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1tjh7az&#x2F;110_tok...</a>","created_at":"2026-06-16T19:21:36Z","created_at_i":1781637696,"objectID":"48560545","parent_id":48558290,"story_id":48557535,"story_title":"GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz","story_url":"https://twitter.com/fguzmanai/status/2065832668172845209","updated_at":"2026-06-16T19:26:22Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"nxtfari"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"My honest read is that, having everything \u2014 the data centers, the compute, the models (however misaligned they might be), the only thing xAI is missing is users. They don\u2019t have any users because the only people who use Grok are essentially Elon\u2019s fanboy club, and all they pretty much do with it is ask it to generate arguments to win their Twitter threads or nonconsensually <em>uncloth</em>e people. Cursor gives xAI a captive audience of users; most sophisticated users don\u2019t use it anymore, so anyone left is unlikely to be opinionated when models are shifted to Grok. Marriage made in heaven."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"SpaceX to buy Cursor for $60B"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://www.reuters.com/legal/transactional/spacex-buy-anysphere-60-billion-2026-06-16/"}},"_tags":["comment","author_nxtfari","story_48553224"],"author":"nxtfari","children":[48561845],"comment_text":"My honest read is that, having everything \u2014 the data centers, the compute, the models (however misaligned they might be), the only thing xAI is missing is users. They don\u2019t have any users because the only people who use Grok are essentially Elon\u2019s fanboy club, and all they pretty much do with it is ask it to generate arguments to win their Twitter threads or nonconsensually unclothe people. Cursor gives xAI a captive audience of users; most sophisticated users don\u2019t use it anymore, so anyone left is unlikely to be opinionated when models are shifted to Grok. Marriage made in heaven.","created_at":"2026-06-16T19:11:50Z","created_at_i":1781637110,"objectID":"48560422","parent_id":48553224,"story_id":48553224,"story_title":"SpaceX to buy Cursor for $60B","story_url":"https://www.reuters.com/legal/transactional/spacex-buy-anysphere-60-billion-2026-06-16/","updated_at":"2026-06-16T20:49:06Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"robertkarl"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"You can trade off latency / accuracy / cost for any ML task. And with the local models.... the cost is free.<p>Having a local Qwen check another Qwen's work increases the accuracy quite a bit at the cost of more latency. You can't have your cake and eat it too.<p>In benchmarking local models, I'm having success increasing even a 9B qwen's score on terminal-bench adjacent problems, just by asking it to plan and handing the plan back to qwen with a fresh context. Try it with Qwen3.5, <em>unsloth</em> Q4+, and a thinking budget of around 1024 tokens."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Running local models is good now"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/"}},"_tags":["comment","author_robertkarl","story_48555993"],"author":"robertkarl","comment_text":"You can trade off latency &#x2F; accuracy &#x2F; cost for any ML task. And with the local models.... the cost is free.<p>Having a local Qwen check another Qwen&#x27;s work increases the accuracy quite a bit at the cost of more latency. You can&#x27;t have your cake and eat it too.<p>In benchmarking local models, I&#x27;m having success increasing even a 9B qwen&#x27;s score on terminal-bench adjacent problems, just by asking it to plan and handing the plan back to qwen with a fresh context. Try it with Qwen3.5, unsloth Q4+, and a thinking budget of around 1024 tokens.","created_at":"2026-06-16T18:23:51Z","created_at_i":1781634231,"objectID":"48559712","parent_id":48555993,"story_id":48555993,"story_title":"Running local models is good now","story_url":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/","updated_at":"2026-06-16T18:24:20Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"greenavocado"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"4 bit <em>unsloth</em> quants are good if you never ask for more than 20k context, use it as autocomplete on steroids, and never delegate serious questions to it"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Running local models is good now"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/"}},"_tags":["comment","author_greenavocado","story_48555993"],"author":"greenavocado","comment_text":"4 bit unsloth quants are good if you never ask for more than 20k context, use it as autocomplete on steroids, and never delegate serious questions to it","created_at":"2026-06-16T16:14:51Z","created_at_i":1781626491,"objectID":"48557579","parent_id":48557467,"story_id":48555993,"story_title":"Running local models is good now","story_url":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/","updated_at":"2026-06-16T18:19:34Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"c0rruptbytes"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"I don't know about good, I use a lot of local models and they're still pretty painful to run locally<p>You have dense models (qwen 27b, gemma 31b) who are pretty smart, but pretty slow<p>You have MoE models (gemma 26b, qwen 35b, north mini code 30b) who are pretty fast, but make a lot of mistakes<p>You need a lot of memory to run these well, quantization makes tool calling weaker, so most run at 4 bit quants and are wondering why it kinda sucks and that's because you've essentially lobotomized the model (I recommend <em>unsloth</em> quants, i recommend 6bit for MoEs and 5bit for dense)<p>So you need a lot of compute to make the pre-fill fast, you need bandwidth to make the decode fast, you need a lot of memory to hold everything - lot of ifs<p>On top of that, your laptop becomes a loud hot churning machine, it's uncomfortable to work with.<p>So are they good? not really. Do they work? yes<p>edit: just wanna clarify - i think open models are the future, i think they're super important, i'm contributing constantly to the ecosystem - i think people should play around with these models, i think people should use `pi` and learn how it all works - but don't download a model expecting it to be good out of the box, you will have to tune and configure a lot of stuff to replace a &quot;coding agent&quot; that most people are using models for"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Running local models is good now"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/"}},"_tags":["comment","author_c0rruptbytes","story_48555993"],"author":"c0rruptbytes","children":[48569169,48557648,48558007,48557572,48557645,48567220,48560554,48568310,48560059,48560960,48563187,48563169,48563879,48557563,48559055,48562829,48558947,48562040,48560974,48559410,48558166,48566682,48566668,48566589,48557579,48560988,48557729,48558220,48560393,48565876,48567173],"comment_text":"I don&#x27;t know about good, I use a lot of local models and they&#x27;re still pretty painful to run locally<p>You have dense models (qwen 27b, gemma 31b) who are pretty smart, but pretty slow<p>You have MoE models (gemma 26b, qwen 35b, north mini code 30b) who are pretty fast, but make a lot of mistakes<p>You need a lot of memory to run these well, quantization makes tool calling weaker, so most run at 4 bit quants and are wondering why it kinda sucks and that&#x27;s because you&#x27;ve essentially lobotomized the model (I recommend unsloth quants, i recommend 6bit for MoEs and 5bit for dense)<p>So you need a lot of compute to make the pre-fill fast, you need bandwidth to make the decode fast, you need a lot of memory to hold everything - lot of ifs<p>On top of that, your laptop becomes a loud hot churning machine, it&#x27;s uncomfortable to work with.<p>So are they good? not really. Do they work? yes<p>edit: just wanna clarify - i think open models are the future, i think they&#x27;re super important, i&#x27;m contributing constantly to the ecosystem - i think people should play around with these models, i think people should use `pi` and learn how it all works - but don&#x27;t download a model expecting it to be good out of the box, you will have to tune and configure a lot of stuff to replace a &quot;coding agent&quot; that most people are using models for","created_at":"2026-06-16T16:08:10Z","created_at_i":1781626090,"objectID":"48557467","parent_id":48555993,"story_id":48555993,"story_title":"Running local models is good now","story_url":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/","updated_at":"2026-06-17T12:05:07Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"clickety_clack"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"I think it would be very hard to convince someone to pay $100/mo to go back to Claude if they have a local model up and running, particularly now that model improvement has basically been stalled for the last 6 months. It\u2019s so easy to set it up for yourself now too with things like LM studio. That said, there will always be <em>unsoph</em>isticated users who can\u2019t figure it out, so there will always be someone there to pay."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Running local models is good now"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/"}},"_tags":["comment","author_clickety_clack","story_48555993"],"author":"clickety_clack","children":[48557447,48557185,48557103],"comment_text":"I think it would be very hard to convince someone to pay $100&#x2F;mo to go back to Claude if they have a local model up and running, particularly now that model improvement has basically been stalled for the last 6 months. It\u2019s so easy to set it up for yourself now too with things like LM studio. That said, there will always be unsophisticated users who can\u2019t figure it out, so there will always be someone there to pay.","created_at":"2026-06-16T15:36:45Z","created_at_i":1781624205,"objectID":"48556930","parent_id":48556855,"story_id":48555993,"story_title":"Running local models is good now","story_url":"https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/","updated_at":"2026-06-16T16:07:48Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"heipei"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"No KV cache quant, context length 50% of original, MTP absolutely. These are the relevant cmdline attributes. Getting around 100t/s with this setup, even when watt-limited to 450W.<p><pre><code>  --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --presence-penalty 0 --metrics --jinja --chat-template-file chat_template.jinja --chat-template-kwargs '{&quot;preserve_thinking&quot;: true}' --spec-type draft-mtp --spec-draft-n-max 2 --spec-draft-p-min 0.75 -ngl 99 -c 131072 -fa on -np 1 -hf <em>unsloth</em>/Qwen3.6-27B-MTP-GGUF:Q6_K</code></pre>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"}},"_tags":["comment","author_heipei","story_48542100"],"author":"heipei","comment_text":"No KV cache quant, context length 50% of original, MTP absolutely. These are the relevant cmdline attributes. Getting around 100t&#x2F;s with this setup, even when watt-limited to 450W.<p><pre><code>  --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --presence-penalty 0 --metrics --jinja --chat-template-file chat_template.jinja --chat-template-kwargs &#x27;{&quot;preserve_thinking&quot;: true}&#x27; --spec-type draft-mtp --spec-draft-n-max 2 --spec-draft-p-min 0.75 -ngl 99 -c 131072 -fa on -np 1 -hf unsloth&#x2F;Qwen3.6-27B-MTP-GGUF:Q6_K</code></pre>","created_at":"2026-06-16T07:58:26Z","created_at_i":1781596706,"objectID":"48552050","parent_id":48547519,"story_id":48542100,"story_title":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?","updated_at":"2026-06-16T14:39:33Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"klardotsh"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"4-5 bit quants would probably fit pretty well on your rig. Check HuggingFace for Qwen3.6-35B-A3B-MTP-GGUF [1]. They've also got a cool UI thing these days to help indicate which quants of a model will run on your hardware.<p>Full octane isn't gonna fit on much of anything south of a 128GB machine once adding KV cache.<p>[1]: <a href=\"https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF\" rel=\"nofollow\">https://huggingface.co/<em>unsloth</em>/Qwen3.6-35B-A3B-MTP-GGUF</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"}},"_tags":["comment","author_klardotsh","story_48542100"],"author":"klardotsh","comment_text":"4-5 bit quants would probably fit pretty well on your rig. Check HuggingFace for Qwen3.6-35B-A3B-MTP-GGUF [1]. They&#x27;ve also got a cool UI thing these days to help indicate which quants of a model will run on your hardware.<p>Full octane isn&#x27;t gonna fit on much of anything south of a 128GB machine once adding KV cache.<p>[1]: <a href=\"https:&#x2F;&#x2F;huggingface.co&#x2F;unsloth&#x2F;Qwen3.6-35B-A3B-MTP-GGUF\" rel=\"nofollow\">https:&#x2F;&#x2F;huggingface.co&#x2F;unsloth&#x2F;Qwen3.6-35B-A3B-MTP-GGUF</a>","created_at":"2026-06-16T00:02:27Z","created_at_i":1781568147,"objectID":"48548810","parent_id":48548429,"story_id":48542100,"story_title":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?","updated_at":"2026-06-16T11:27:19Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"wsintra2022"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"Not my experience at all. Mac Studio 64g, running Qwen2.7b 8K. Took ten minutes to get up and running, just read some documentation, <em>Unsloth</em> literally walks you through it. For Opencode just edit one file and its good to go. Have not had any issues (besides the occasional LLM related one). Not extremely manual and clunky at all."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"}},"_tags":["comment","author_wsintra2022","story_48542100"],"author":"wsintra2022","comment_text":"Not my experience at all. Mac Studio 64g, running Qwen2.7b 8K. Took ten minutes to get up and running, just read some documentation, Unsloth literally walks you through it. For Opencode just edit one file and its good to go. Have not had any issues (besides the occasional LLM related one). Not extremely manual and clunky at all.","created_at":"2026-06-15T22:40:35Z","created_at_i":1781563235,"objectID":"48548016","parent_id":48545676,"story_id":48542100,"story_title":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?","updated_at":"2026-06-15T22:43:32Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"asdfasgasdgasdg"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"I don\u2019t think there is any action Pichai can take on this that wouldn\u2019t hurt vastly more than a few students walking out of his commencement speech. He is a man well used to people complaining about his policies. Googlers do so all the time internally, including on this same topic, and have been for years. If gestures like this were going to move him, he\u2019d have been moved already.<p>That being said, I don\u2019t have a problem with people standing up for what they believe, even when it has no practical impact. It\u2019s good character building. I would expect that Sundar is similarly <em>unboth</em>ered."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Around 200 Stanford students walk out as Google CEO takes stage"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://www.sfgate.com/tech/article/sundar-pichai-stanford-commencement-22304888.php"}},"_tags":["comment","author_asdfasgasdgasdg","story_48545260"],"author":"asdfasgasdgasdg","comment_text":"I don\u2019t think there is any action Pichai can take on this that wouldn\u2019t hurt vastly more than a few students walking out of his commencement speech. He is a man well used to people complaining about his policies. Googlers do so all the time internally, including on this same topic, and have been for years. If gestures like this were going to move him, he\u2019d have been moved already.<p>That being said, I don\u2019t have a problem with people standing up for what they believe, even when it has no practical impact. It\u2019s good character building. I would expect that Sundar is similarly unbothered.","created_at":"2026-06-15T20:56:28Z","created_at_i":1781556988,"objectID":"48546934","parent_id":48546900,"story_id":48545260,"story_title":"Around 200 Stanford students walk out as Google CEO takes stage","story_url":"https://www.sfgate.com/tech/article/sundar-pichai-stanford-commencement-22304888.php","updated_at":"2026-06-16T20:09:35Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"monirmamoun"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"download LM Studio to play with, and it will let you search for models... try Qwen3.6-35B-A3B at 4,5 or 6 bits (6 bit XL is near perfect) and use pi coder or another harness to access it... you can also try <em>Unsloth</em> studio and try same model to start. LM Studio slighter easier to use, <em>Unsloth</em> probably better quality. Neither one is super great quality by the way (meaning: they crash or act weirdly too often to be full production solutions, but can work for local coding). ONCE YOU DOWNLOAD EITHER APP... it will let you search huggingface for the models. Just type qwen to start looking and ... start messing around. And you connect the pi coder harness using the http interface that LM Studio and <em>Unsloth</em> offer to the engine API, so make sure you figure out that url and turn it on... something like 127.0.0.1:1234/api would be a typical IP (localhost) and port (1234 is used by LM Studio)"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"}},"_tags":["comment","author_monirmamoun","story_48542100"],"author":"monirmamoun","comment_text":"download LM Studio to play with, and it will let you search for models... try Qwen3.6-35B-A3B at 4,5 or 6 bits (6 bit XL is near perfect) and use pi coder or another harness to access it... you can also try Unsloth studio and try same model to start. LM Studio slighter easier to use, Unsloth probably better quality. Neither one is super great quality by the way (meaning: they crash or act weirdly too often to be full production solutions, but can work for local coding). ONCE YOU DOWNLOAD EITHER APP... it will let you search huggingface for the models. Just type qwen to start looking and ... start messing around. And you connect the pi coder harness using the http interface that LM Studio and Unsloth offer to the engine API, so make sure you figure out that url and turn it on... something like 127.0.0.1:1234&#x2F;api would be a typical IP (localhost) and port (1234 is used by LM Studio)","created_at":"2026-06-15T20:25:32Z","created_at_i":1781555132,"objectID":"48546573","parent_id":48545615,"story_id":48542100,"story_title":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?","updated_at":"2026-06-16T01:20:46Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"Terr_"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"I think it's a mistake to assume that just because the initial burglar is technically <em>unsoph</em>isticated, that's the end of the story. Crime can become surprisingly complicated, with its own supply chains, service providers and tool vendors, specializations, middlemen, etc. (Credit card fraud is a good example.)<p>Imagine how your threat-model can change if the thief\u2014still incurious and <em>unsoph</em>isticated\u2014just happens to &quot;know a guy&quot;:<p>1. A thief steals your computer, with no thought to who you are or what you might have on it.<p>2. The computer is passed to a fence for a predictable immediate cut.<p>3. The fence sees a lot of these computers (or phones), and knows that there are ways to extract more profit.<p>4. The fence has a relationship with a data extractor, and runs a provided program that gleans as much exploitable data as possible before reselling the hardware.<p>5. The data-extractor sees those tax files pop up, and sells those details to another criminal group that specializes in tax fraud.<p>If a system exists to &quot;use every part of the buffalo&quot;, then pretty much anything can cause you damage. I'm sure somebody is already developing tools to scan a drive trying to determine likely names of your first-pet for those stupid account recovery questions."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Windows 11 users are tired of MS account requirements creeping into everything"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://www.windowscentral.com/microsoft/windows-11/windows-11-users-are-tired-of-microsoft-account-requirements-and-workarounds"}},"_tags":["comment","author_Terr_","story_48533101"],"author":"Terr_","comment_text":"I think it&#x27;s a mistake to assume that just because the initial burglar is technically unsophisticated, that&#x27;s the end of the story. Crime can become surprisingly complicated, with its own supply chains, service providers and tool vendors, specializations, middlemen, etc. (Credit card fraud is a good example.)<p>Imagine how your threat-model can change if the thief\u2014still incurious and unsophisticated\u2014just happens to &quot;know a guy&quot;:<p>1. A thief steals your computer, with no thought to who you are or what you might have on it.<p>2. The computer is passed to a fence for a predictable immediate cut.<p>3. The fence sees a lot of these computers (or phones), and knows that there are ways to extract more profit.<p>4. The fence has a relationship with a data extractor, and runs a provided program that gleans as much exploitable data as possible before reselling the hardware.<p>5. The data-extractor sees those tax files pop up, and sells those details to another criminal group that specializes in tax fraud.<p>If a system exists to &quot;use every part of the buffalo&quot;, then pretty much anything can cause you damage. I&#x27;m sure somebody is already developing tools to scan a drive trying to determine likely names of your first-pet for those stupid account recovery questions.","created_at":"2026-06-15T19:26:00Z","created_at_i":1781551560,"objectID":"48545876","parent_id":48542714,"story_id":48533101,"story_title":"Windows 11 users are tired of MS account requirements creeping into everything","story_url":"https://www.windowscentral.com/microsoft/windows-11/windows-11-users-are-tired-of-microsoft-account-requirements-and-workarounds","updated_at":"2026-06-16T14:49:34Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"kpw94"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"&gt;  gemma (<em>unsloth</em>/gemma-4-26B-A4B-it-GGUF) models<p>Since you're running quantized (at UD-Q4_K_XL) , check out the &quot;qat&quot; models (<em>unsloth</em>/gemma-4-26B-A4B-it-qat-GGUF) !<p>- <a href=\"https://huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF\" rel=\"nofollow\">https://huggingface.co/<em>unsloth</em>/gemma-4-26B-A4B-it-qat-GGUF</a>\n(With &quot;Jun 9 Update: Added MTP support.&quot;)<p>- <a href=\"https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/\" rel=\"nofollow\">https://blog.google/innovation-and-ai/technology/developers-...</a>"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"}},"_tags":["comment","author_kpw94","story_48542100"],"author":"kpw94","children":[48546419,48547631],"comment_text":"&gt;  gemma (unsloth&#x2F;gemma-4-26B-A4B-it-GGUF) models<p>Since you&#x27;re running quantized (at UD-Q4_K_XL) , check out the &quot;qat&quot; models (unsloth&#x2F;gemma-4-26B-A4B-it-qat-GGUF) !<p>- <a href=\"https:&#x2F;&#x2F;huggingface.co&#x2F;unsloth&#x2F;gemma-4-26B-A4B-it-qat-GGUF\" rel=\"nofollow\">https:&#x2F;&#x2F;huggingface.co&#x2F;unsloth&#x2F;gemma-4-26B-A4B-it-qat-GGUF</a>\n(With &quot;Jun 9 Update: Added MTP support.&quot;)<p>- <a href=\"https:&#x2F;&#x2F;blog.google&#x2F;innovation-and-ai&#x2F;technology&#x2F;developers-tools&#x2F;quantization-aware-training-gemma-4&#x2F;\" rel=\"nofollow\">https:&#x2F;&#x2F;blog.google&#x2F;innovation-and-ai&#x2F;technology&#x2F;developers-...</a>","created_at":"2026-06-15T18:53:11Z","created_at_i":1781549591,"objectID":"48545489","parent_id":48544680,"story_id":48542100,"story_title":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?","updated_at":"2026-06-16T14:06:22Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"jodoherty"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"I use pi with an RTX Pro 6000 Blackwell to run Gemma 4 31b to do all my agentic coding.<p>I find it useful.<p>This side project highlights a similar approach to how I scope and tackle projects at work now:<p><a href=\"https://git.theodohertyfamily.com/wg-wrap.git/tree/README.md\" rel=\"nofollow\">https://git.theodohertyfamily.com/wg-wrap.git/tree/README.md</a><p><a href=\"https://git.theodohertyfamily.com/wg-wrap.git/tree/CASE_STUDY.md\" rel=\"nofollow\">https://git.theodohertyfamily.com/wg-wrap.git/tree/CASE_STUD...</a><p>You have to apply a lot of careful architecture and TDD to your approach. Eliminate technical risk by tackling hard things early and wrapping them up in a simple, easy to use interface.<p>I find I can get some projects done 2-3 times faster than if I wrote them by hand. It can also save about 5-10x time on mundane or broadly scoped projects by helping me consolidate and try out ideas very quickly.<p>Setup-wise, I switch between vLLM using nvidia/Gemma-4-31B-IT-NVFP4 and llama.cpp using <em>unsloth</em>/gemma-4-31B-it-qat-GGUF with MTP. I throttle the GPU power usage to 400W.<p>My current llama.cpp setup gets token generation rates between 60-150 t/s depending on MTP draft acceptance rates. Prefill is between 1500-4000 t/s depending on context length/depth."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"}},"_tags":["comment","author_jodoherty","story_48542100"],"author":"jodoherty","children":[48562704],"comment_text":"I use pi with an RTX Pro 6000 Blackwell to run Gemma 4 31b to do all my agentic coding.<p>I find it useful.<p>This side project highlights a similar approach to how I scope and tackle projects at work now:<p><a href=\"https:&#x2F;&#x2F;git.theodohertyfamily.com&#x2F;wg-wrap.git&#x2F;tree&#x2F;README.md\" rel=\"nofollow\">https:&#x2F;&#x2F;git.theodohertyfamily.com&#x2F;wg-wrap.git&#x2F;tree&#x2F;README.md</a><p><a href=\"https:&#x2F;&#x2F;git.theodohertyfamily.com&#x2F;wg-wrap.git&#x2F;tree&#x2F;CASE_STUDY.md\" rel=\"nofollow\">https:&#x2F;&#x2F;git.theodohertyfamily.com&#x2F;wg-wrap.git&#x2F;tree&#x2F;CASE_STUD...</a><p>You have to apply a lot of careful architecture and TDD to your approach. Eliminate technical risk by tackling hard things early and wrapping them up in a simple, easy to use interface.<p>I find I can get some projects done 2-3 times faster than if I wrote them by hand. It can also save about 5-10x time on mundane or broadly scoped projects by helping me consolidate and try out ideas very quickly.<p>Setup-wise, I switch between vLLM using nvidia&#x2F;Gemma-4-31B-IT-NVFP4 and llama.cpp using unsloth&#x2F;gemma-4-31B-it-qat-GGUF with MTP. I throttle the GPU power usage to 400W.<p>My current llama.cpp setup gets token generation rates between 60-150 t&#x2F;s depending on MTP draft acceptance rates. Prefill is between 1500-4000 t&#x2F;s depending on context length&#x2F;depth.","created_at":"2026-06-15T18:47:10Z","created_at_i":1781549230,"objectID":"48545412","parent_id":48542100,"story_id":48542100,"story_title":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?","updated_at":"2026-06-16T21:57:05Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"twothreeone"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"&gt; <em>unsloth</em>/Qwen3.6-35B-A3B-MTP-GGUF<p>I've actually tried this exact same model locally as well.. albeit on just a single 3090 at 128k context and I got around 40-60tok/s with Q4_K quantization.<p>The thing that bugged me the most was really the quality of the output on moderately complex real-world coding tasks. Having to switch between &quot;prompt/vibe&quot; and &quot;manually implement&quot; is such a big context switch burden, because you really have to ask yourself every few minutes if you're &quot;holding it wrong&quot; or the model is just too stupid.<p>It also doesn't really seem to handle transitions from &quot;low-level implementation detail&quot; to &quot;high-level design&quot; well, e.g., it wouldn't easily render tables and such. With Claude I don't have this issue.. so I think for now my verdict would be that it's not really a viable replacement. I really hope it will be in a few months time.<p>Oh and I used &quot;aider&quot; to replace claude CLI, which maybe that's also sub-optimal.. I'm not sure. The MCP marketplaces are useful of course, though arguably you could just manually replace them over time."},"story_title":{"matchLevel":"none","matchedWords":[],"value":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"}},"_tags":["comment","author_twothreeone","story_48542100"],"author":"twothreeone","children":[48545929,48546262,48552832,48553822],"comment_text":"&gt; unsloth&#x2F;Qwen3.6-35B-A3B-MTP-GGUF<p>I&#x27;ve actually tried this exact same model locally as well.. albeit on just a single 3090 at 128k context and I got around 40-60tok&#x2F;s with Q4_K quantization.<p>The thing that bugged me the most was really the quality of the output on moderately complex real-world coding tasks. Having to switch between &quot;prompt&#x2F;vibe&quot; and &quot;manually implement&quot; is such a big context switch burden, because you really have to ask yourself every few minutes if you&#x27;re &quot;holding it wrong&quot; or the model is just too stupid.<p>It also doesn&#x27;t really seem to handle transitions from &quot;low-level implementation detail&quot; to &quot;high-level design&quot; well, e.g., it wouldn&#x27;t easily render tables and such. With Claude I don&#x27;t have this issue.. so I think for now my verdict would be that it&#x27;s not really a viable replacement. I really hope it will be in a few months time.<p>Oh and I used &quot;aider&quot; to replace claude CLI, which maybe that&#x27;s also sub-optimal.. I&#x27;m not sure. The MCP marketplaces are useful of course, though arguably you could just manually replace them over time.","created_at":"2026-06-15T18:46:09Z","created_at_i":1781549169,"objectID":"48545399","parent_id":48544680,"story_id":48542100,"story_title":"Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?","updated_at":"2026-06-16T14:31:19Z"},{"_highlightResult":{"author":{"matchLevel":"none","matchedWords":[],"value":"nyrikki"},"comment_text":{"fullyHighlighted":false,"matchLevel":"full","matchedWords":["unsloth"],"value":"My workflow is too different right now (gradually constrained to network less builds for reasons) but I am really enjoying how zeds agents have worked out in the past few weeks.<p>I have 27b, 35B-A3B and a cpu backed gpt-oss configured and use them in parallel, checking if one is getting ratholed and adding context or manual fixes.<p>I had various other systems setup and commercial models but really don\u2019t use them.<p>It may be too interactive for some people, but it is a good mix of fail fast and often the places qwen3.6 was failing was eventually problems with the frontier models.<p>And this is with the <em>unsloth</em> defaults and hardened llama.cpp podman containers.<p>I do sometimes load other models or honestly just feed things into google\u2019s free agent.  But that is rare and to be honest manually fixing is typically faster and less error prone"},"story_title":{"matchLevel":"none","matchedWords":[],"value":"My Homelab AI Dev Platform"},"story_url":{"matchLevel":"none","matchedWords":[],"value":"https://rsgm.dev/post/ai-dev-platform/"}},"_tags":["comment","author_nyrikki","story_48542433"],"author":"nyrikki","children":[48565257],"comment_text":"My workflow is too different right now (gradually constrained to network less builds for reasons) but I am really enjoying how zeds agents have worked out in the past few weeks.<p>I have 27b, 35B-A3B and a cpu backed gpt-oss configured and use them in parallel, checking if one is getting ratholed and adding context or manual fixes.<p>I had various other systems setup and commercial models but really don\u2019t use them.<p>It may be too interactive for some people, but it is a good mix of fail fast and often the places qwen3.6 was failing was eventually problems with the frontier models.<p>And this is with the unsloth defaults and hardened llama.cpp podman containers.<p>I do sometimes load other models or honestly just feed things into google\u2019s free agent.  But that is rare and to be honest manually fixing is typically faster and less error prone","created_at":"2026-06-15T18:26:32Z","created_at_i":1781547992,"objectID":"48545183","parent_id":48543932,"story_id":48542433,"story_title":"My Homelab AI Dev Platform","story_url":"https://rsgm.dev/post/ai-dev-platform/","updated_at":"2026-06-17T03:13:50Z"}],"hitsPerPage":20,"nbHits":7071,"nbPages":50,"page":0,"params":"query=unsloth&advancedSyntax=true&analyticsTags=backend","processingTimeMS":8,"processingTimingsMS":{"_request":{"roundTrip":25},"afterFetch":{"format":{"highlighting":1,"total":1}},"fetch":{"query":6,"total":7},"total":8},"query":"unsloth","serverTimeMS":10}