ⓘ Notebookcheck / KoboldCPP
Commentary articles exclusively reflect the individual opinion of the author listed.
Ultimately, hardly anyone knows what exactly happens to your own data when you make a request to an AI. But one thing is clear: Whatever happens to it, you will no longer really own this data.
In addition to image and video generation, hosting your own LLM is surprisingly easy and has a number of advantages over the offers from the major providers – especially if you want to experiment with Large Language Models without passing on your data to Big Tech.
The most important point: No matter what the model is used for, all data remains under your own control. That alone is a clear advantage if you don’t want to hand over your data to third parties. In addition, practically any model can be used – whether Deepseek, Gemma2 or GPT. Another advantage is being able to use versions that do not restrict certain types of requests.
KoboldCPP is an easy-to-use AI text generation tool consisting of a single executable file and designed for GGUF and GGML models. It supports both GPU and CPU and can serve as a specialized backend for AI storytelling and chats. KoboldCPP can be downloaded from GitHub and is available for Windows, Linux, Mac and Docker.
If the whole thing is hosted in a container, the LLM can be made available to every device in your own network without much effort. There are already ready-made templates for the most important platforms, including Unraid and TrueNAS. The same is possible with other installations as long as the necessary rules are set in the firewall.
Once the desired platform has been determined, the first step is to decide which model should be used. The best place to go is Hugging Face. The models must be in GGUF format.
If you want to host D&D scenarios, you should definitely choose an uncensored model. Otherwise, sooner or later the LLM will refuse to deal damage to a character, which can lead to undesirable results.
Some models, such as Deepseek and Claude, tend to “think”, i.e. output the entire thought process for a query. This may be fine with a GPU doing most of the work, but without a GPU it slows things down significantly. Ultimately, the only thing that helps here is to try it out to find a suitable model. Gemma2 is a good starting point for this.
The URL that leads to the GGUF file must then be copied to the respective file page. Many models come in several sizes, so you should choose a variant that stays within the available RAM.
Installation under Windows is largely the same. However, if the model is used without a GPU, the NoCUDA version must be downloaded. It may take some time to start because KoboldCPP first downloads the model before displaying the user interface. This is easy to see under Windows, but with Unraid or TrueNAS the log has to be opened to see the download progress. Under Unraid, it may also be necessary to increase the available storage space for Docker containers – depending on how large the model you choose is.
KoboldCPP offers four different interface modes: Instruct, Story, Chat and Adventure.

ⓘ Notebookcheck
It’s not particularly fast by any stretch of the imagination, but text generation is only slightly below the average reading speed. But it’s absolutely usable for D&D scenarios on a 16-core AMD 5950X (currently around 300 euros on Amazon) and will probably run even faster on more modern CPUs. The more cores available, the better. A decent amount of RAM also enables the use of larger models, although 16 GB should usually be sufficient. The size and type of the selected model also have a significant influence on the generation speed. With a slimmer model, the speed can be noticeably increased.
For the best possible experience, Large Language Models with a GPU are of course the best choice. But if you just want to try out your own LLM, bypass the restrictions of ChatGPT, Claude or Gemini or don’t want to entrust your data to these services, you don’t need any special hardware to get started – and you’ll still get a decently usable experience.

Ever since I met Manic Miner on the ZX Spectrum, I have been an avid gamer and technology fan. Seduced by UMPCs and the promise of big performance in small packages, I’ve wasted too much time and energy jailbreaking, flashing, and overclocking anything with an electrical pulse. I am a strong advocate of the right to repair and resent any company telling me how to use something I paid for.

As a child of the 90s, my Gameboy was my constant companion. After school the PlayStation was turned on. When I finally got my first PC, it was all over me. My passion for gaming has never waned since then. For me, writing for Notebookcheck means reporting on topics that are really close to my heart – in addition to gaming, I also like to talk about e-mobility, photovoltaics or innovative gadgets. When I’m not sitting at the computer, I’m probably doing water rescue work on the Baltic Sea coast or trying to counteract the dark side of my geek life – namely sitting for long periods of time – in the local swimming pool.
