I've been working on a little side project that combines Duolingo-like listening comprehension exercises with real content .
Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.
And a couple weeks ago we got the new mistral-ocr model. We updated our OCR benchmark to include the new models.
We evaluated 1,000 documents for JSON extraction accuracy. Major takeaways:
- Qwen 2.5 VL (72b and 32b) are by far the most impressive. Both landed right around 75% accuracy (equivalent to GPT-4o’s performance). Qwen 72b was only 0.4% above 32b. Within the margin of error.
- Both Qwen models passed mistral-ocr (72.2%), which is specifically trained for OCR.
- Gemma-3 (27B) only scored 42.9%. Particularly surprising given that it's architecture is based on Gemini 2.0 which still tops the accuracy chart.
The data set and benchmark runner is fully open source. You can check out the code and reproduction steps here:
I appreciate developing ROCm into something competitive with CUDA would require a lot of work, both internally within AMD and with external contributions to the relevant open source libraries.
However the amount of resources at stake is incredible. The delta between NVIDIA's value and AMD's is bigger than the annual GDP of Spain. Even if they needed to hire a few thousand engineers at a few million in comp each, it'd still be a good investment.
I believe the best way to learn a language is by doing an in-depth project. This is my first Zig project intended for learning the ropes on publishing a Zig package. It turns out to be quite solid and performant. It might be a bit over-engineered.
This little library is packed with the following features:
- Building dependency graph from dependency data. - Performing topological sort on the dependency graph. - Generating dependence-free subsets for parallel processing. - Cycle detection and cycle reporting.
WattWise is a CLI tool that monitors my workstation’s power draw using a smart plug and automatically throttles the CPU & GPUs during expensive Time-of-Use electricity periods. Built with Python, uses PID controllers for smooth transitions between power states. Works with TP-Link Kasa plugs and Home Assistant.
Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.
Would love your thoughts!