AI Recreation: One Dev’s 7-Day Challenge | Frandroid

by drbyos

A single developer, armed with AI Claude, managed to implement Google’s revolutionary method to end the RAM crisis. The consequence: it is now possible to run ultra-powerful AI models on a simple personal computer like a MacBook Air.

Anthropic Claude

The story begins with a scientific publication that almost went unnoticed by the general public, but which shook the financial markets. Google presented a new algorithm called TurboQuant at the end of March 2026 at the ICLR conference. The objective: to reduce the RAM requirements of greedy artificial intelligences and potentially put an end to the RAM crisis affecting the general public.

The company published the mathematics behind this advance, but made a singular choice: not to share any lines of exploitable code.

This is where Tom Turney comes in, an independent developer who, armed with his terminal and the AI ​​assistant Claude, decided to recreate this technology from scratch as we can read on Medium. In just 7 days, the developer managed to recreate and even improve Google’s secret algorithm.

The problem of artificial intelligence memory

To understand the feat, we must first look at how current language models work. When you chat with artificial intelligence, it doesn’t just read your last sentence. It must retain the full history of the conversation to remain consistent. This data is stored in what is called the KV cache, for “ Key-Value ».

The problem with this cache is that it grows linearly with each new word generated. Over a long conversation, this temporary memory ends up consuming more space than the AI ​​model itself.

To go further
Can your computer or smartphone run an AI? This site gives you the answer in one click

This is the main reason why it is so difficult to run high-performance models on a personal computer. Google’s algorithm provides a mathematical response to this blockage. If you want to dig deeper into the basic mechanics, we have already detailed how this solution massively reduces the memory consumption of our AIs.

A seven-day sprint to overtake Google

Faced with Google’s research document, Tom Turney did not wait. In the space of seven days, he transformed complex equations into a working program.

The first three days were dedicated to prototyping in Python to validate basic mathematics. Then, he ported this code to more efficient languages ​​to exploit the graphics chips of Apple computers.

The most interesting part is the optimization. The first version of his code was relatively slow. According to data shared by the developer, initial processing capped at 739 tokens per second (the performance unit for AI models).

Thanks to careful work on memory management and graphics calculations, he managed to push this speed to 2,747 tokens per second. The end result is not only functional, but faster than existing standard compression methods.

But the developer didn’t stop there. He added his own search layer on top of Google’s algorithm with a function called Sparse V. He noticed that during long conversations, artificial intelligence only gives importance to a tiny portion of the stored words.

By deciding not to process unnecessary data, he explains that he can ignore 90% of the value decompressions. The speed gain is notable, and the impact on the quality of the AI’s responses is, according to its own tests, ” 0,0000 “. Absolute precision.

Wall Street’s panic faced with an equation

Google’s announcement had an unexpected side effect. Financial markets, fearing that this software optimization would destroy demand for hardware components (including RAM), massively sold their shares.

Companies like Samsung, Micron and NVIDIA saw their prices drop drastically in the space of 48 hours. Cloudflare CEO Matthew Prince described this publication as “ Google’s DeepSeek moment ».

However, this market reaction lacks nuance. Making a technology more resource-efficient does not necessarily reduce its overall consumption, quite the contrary. This is called the Jevons paradox.

To go further
This tool already integrates Google TurboQuant: here are the expected gains for your PC or Mac

By reducing the material cost necessary to operate these models, new uses become possible for the general public. The rapid integration of these discoveries gives us a first very concrete glimpse of the power arriving on our personal computers with applications already ready to download, which use the Google TurboQuant algorithm.

What happened this week marks a turning point. The gap between theoretical research and its practical application has never been thinner. Thanks to the initiative of an independent developer, it is now possible to run an artificial intelligence model with 35 billion parameters, with an immense context, on a simple MacBook.

All this, without the company behind the algorithm even having to publish its own code.


Want to find the best Frandroid articles on Google News? You can follow Frandroid on Google News in one click.

Related Posts

Leave a Comment