HN Search powered by Algolia

Search
Hacker News

Search by

LLM Visualization (https://bbycroft.net/llm)

1592 points|plibither8|1 year ago|131 comments

Llm.c – LLM training in simple, pure C/CUDA (https://github.com/karpathy/llm.c)

1050 points|tosh|1 year ago|168 comments

LLM Visualization (https://bbycroft.net/llm)

972 points|jonbaer|1 year ago|1 comments

Replit's new Code LLM: Open Source, 77% smaller than Codex, trained in 1 week (https://www.latent.space/p/reza-shabani#details)

891 points|swyx|2 years ago|220 comments

DBRX: A new open LLM(https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm)

866 points|jasondavies|1 year ago|343 comments

Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023?

800 points|divan|1 year ago|237 comments

There is a 5 month old thread [1] on this, but it might be already outdated.

What is the best approach for feeding custom set of documents to LLM and get non-halucinating and decent result in Dec 2023?

UPD: The question is generally about how to "teach" LLM answer questions using your set of documents (not necessarily train your own, so approaches like RAG counts)

[1] https://news.ycombinator.com/item?id=36832572

Replacing my best friends with an LLM trained on 500k group chat messages (https://www.izzy.co/blogs/robo-boys.html)

751 points|izzymiller|2 years ago|355 comments

Implementing a ChatGPT-like LLM from scratch, step by step (https://github.com/rasbt/LLMs-from-scratch)

739 points|rasbt|1 year ago|98 comments

Open Assistant – project meant to give everyone access to a great chat based LLM(https://github.com/LAION-AI/Open-Assistant)

713 points|pps|2 years ago|2 comments

Building a fully local LLM voice assistant to control my smart home (https://johnthenerd.com/blog/local-llm-assistant/)

699 points|JohnTheNerd|1 year ago|186 comments

Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a MacBook (https://github.com/antimatter15/alpaca.cpp)

673 points|antimatter15|2 years ago|283 comments

Reproducing GPT-2 in llm.c (https://github.com/karpathy/llm.c/discussions/481)

618 points|tosh|10 months ago|117 comments

Llama.ttf: A font which is also an LLM(https://fuglede.github.io/llama.ttf/)

608 points|fuglede_|9 months ago|154 comments

Uncensor any LLM with abliteration (https://huggingface.co/blog/mlabonne/abliteration)

586 points|mizzao|10 months ago|287 comments

An example of LLM prompting for programming (https://martinfowler.com/articles/2023-chatgpt-xu-hao.html)

546 points|mpweiher|2 years ago|261 comments

Llama.vim – Local LLM-assisted text completion (https://github.com/ggml-org/llama.vim)

530 points|kgwgk|2 months ago|108 comments

OK, I can partly explain the LLM chess weirdness now (https://dynomight.net/more-chess/)

524 points|dmazin|4 months ago|460 comments

My LLM codegen workflow (https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/)

522 points|lolptdr|1 month ago|160 comments

Show HN: LLM-aided OCR – Correcting Tesseract OCR errors with LLMs (https://github.com/Dicklesworthstone/llm_aided_ocr)

479 points|eigenvalue|8 months ago|172 comments

Almost exactly 1 year ago, I submitted something to HN about using Llama2 (which had just come out) to improve the output of Tesseract OCR by correcting obvious OCR errors [0]. That was exciting at the time because OpenAI's API calls were still quite expensive for GPT4, and the cost of running it on a book-length PDF would just be prohibitive. In contrast, you could run Llama2 locally on a machine with just a CPU, and it would be extremely slow, but "free" if you had a spare machine lying around.

Well, it's amazing how things have changed since then. Not only have models gotten a lot better, but the latest "low tier" offerings from OpenAI (GPT4o-mini) and Anthropic (Claude3-Haiku) are incredibly cheap and incredibly fast. So cheap and fast, in fact, that you can now break the document up into little chunks and submit them to the API concurrently (where each chunk can go through a multi-stage process, in which the output of the first stage is passed into another prompt for the next stage) and assemble it all in a shockingly short amount of time, and for basically a rounding error in terms of cost.

My original project had all sorts of complex stuff for detecting hallucinations and incorrect, spurious additions to the text (like "Here is the corrected text" preambles). But the newer models are already good enough to eliminate most of that stuff. And you can get very impressive results with the multi-stage approach. In this case, the first pass asks it to correct OCR errors and to remove line breaks in the middle of a word and things like that. The next stage takes that as the input and asks the model to do things like reformat the text using markdown, to suppress page numbers and repeated page headers, etc. Anyway, I think the samples (which take less than 1-2 minutes to generate) show the power of the approach:

Original PDF:
https://github.com/Dicklesworthstone/llm_aided_ocr/blob/main...

Raw OCR Output:
https://github.com/Dicklesworthstone/llm_aided_ocr/blob/main...

LLM-Corrected Markdown Output:
https://github.com/Dicklesworthstone/llm_aided_ocr/blob/main...

One interesting thing I found was that almost all my attempts to fix/improve things using "classical" methods like regex and other rule based things made everything worse and more brittle, and the real improvements came from adjusting the prompts to make things clearer for the model, and not asking the model to do too much in a single pass (like fixing OCR mistakes AND converting to markdown format).

Anyway, this project is very handy if you have some old scanned books you want to read from Archive.org or Google Books on a Kindle or other ereader device and want things to be re-flowable and clear. It's still not perfect, but I bet within the next year the models will improve even more that it will get closer to 100%. Hope you like it!

[0] https://news.ycombinator.com/item?id=36976333

Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (https://hao-ai-lab.github.io/blogs/cllm/)

461 points|zhisbug|11 months ago|98 comments

New LLM optimization technique slashes memory costs (https://venturebeat.com/ai/new-llm-optimization-technique-slashes-memory-costs-up-to-75/)

445 points|hochmartinez|4 months ago|214 comments

QwQ: Alibaba's O1-like reasoning LLM(https://qwenlm.github.io/blog/qwq-32b-preview/)

438 points|amrrs|4 months ago|421 comments

Patterns for building LLM-based systems and products (https://eugeneyan.com/writing/llm-patterns/)

436 points|7d7n|2 years ago|55 comments

Falcon 40B LLM (which beats Llama) now Apache 2.0 (https://twitter.com/Thom_Wolf/status/1663986216771936263)

435 points|convexstrictly|2 years ago|141 comments

SmolGPT: A minimal PyTorch implementation for training a small LLM from scratch (https://github.com/Om-Alve/smolGPT)

434 points|amrrs|2 months ago|55 comments

Numbers every LLM developer should know (https://github.com/ray-project/llm-numbers)

428 points|richardliaw|2 years ago|103 comments

Show HN: Speeding up LLM inference 2x times (possibly)(https://asciinema.org/a/piP22yYwcaohu5cA2gyuv1W61)

419 points|kolinko|1 year ago|114 comments

Here's a project I've been working on for the last few months.

It's a new (I think) algorithm, that allows to adjust smoothly - and in real time - how many calculations you'd like to do during inference of an LLM model.

It seems that it's possible to do just 20-25% of weight multiplications instead of all of them, and still get good inference results.

I implemented it to run on M1/M2/M3 GPU. The mmul approximation itself can be pushed to run 2x fast before the quality of output collapses.

The inference speed is just a bit faster than Llama.cpp's, because the rest of implementation could be better, but with a better development I think it can be a new method to speed up inference - in addition to quantization.

You could call it ad-hoc model distillation :)

You can change the speed / accuracy of a model at will, in real time.

Oh, and as a side effect, the data format allows to also choose how much of the model you want to load into the memory. You can decide to skip say 10-20-40% of the least important weights.

It's implemented for Mistral, it was also tested slightly on Mixtral and Llama. It's for FP16 for now, but Q8 is in the works.

The algorithm is described here, and the implementation is open source.

https://kolinko.github.io/effort/

I know these are bold claims, but I hope they survive the scrutiny :)

LLM4Decompile: Decompiling Binary Code with LLM(https://github.com/albertan017/LLM4Decompile)

412 points|Davidbrcz|1 year ago|129 comments

In the LLM space, "open source" is being used to mean "downloadable weights"(https://www.alessiofanelli.com/blog/llama2-isnt-open-source)

400 points|FanaHOVA|2 years ago|228 comments

Hello OLMo: A truly open LLM(https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e7359222?gi=760105621962)

398 points|tosh|1 year ago|67 comments