Gemini API Limits: Best Alternatives in 2024

I’ve been using the free tier of Google Gemini’s API to generate snarky descriptions of visitors captured on my video doorbell in Home Assistant. It worked perfectly until very recently. Google has unfortunately slashed the number of free requests for many of its models, with Gemini 2.5 Flash cut down to just 20 requests per day. If you’ve been hit by the same problem, here’s what to try instead.

HTG Wrapped 2025: 24 days of tech

24 days of our favorite hardware, gadgets, and tech

Switch to a different model

I was originally using the Gemini 1.5 Flash model for my AI-generated video doorbell descriptionsbut moved to the superior Gemini 2.5 Flash. Unfortunately, the automation regularly fires more than 20 times per day, and the current rate limits mean that the automation soon breaks. However, while many Gemini models have been severely limited, some still have reasonable limits.

For example, the Gemini Robotics-ER 1.5 Preview model currently has a limit of 250 requests per day. This is a model intended to bring agentic AI capabilities to robotics, but you can use it for other purposes. I tested this out with my snarky doorbell descriptions automation, and it was able to generate a reasonable result.

Information about the Gemini Robotics-ER 1.5 model from the Google AI Studio website.

This Gemini Robotics-ER 1.5 model is good enough to use for my doorbell automation, but this is really only kicking the problem further down the road. Since this is a preview model, it’s likely that the limits will also be cut at some point.

Try GroqCloud

Another option is GroqCloud. GroqCloud is an AI inference platform that runs popular AI models on powerful hardware. It gives you remote access to a wide range of models that have reasonable rate limits and run lightning fast.

For example, I was able to use the meta-llama/llama-4-maverick-17b-128e-instruct model to replace Gemini for my doorbell description, and the results were very good. Currently, that model has a limit of up to 1,000 requests or 500,000 tokens per day for free, which is more than enough for my needs.

A list of avaialbe models in GroqCloud, along with their current free limits.

GroqCloud works perfectly in the LLM Vision integration I use in Home Assistant, and while there isn’t a native Groq conversation agent for the Assist voice assistantthere’s a HACS integration you can use to make your Home Assistant voice assistant smarter using GroqCloud.

Once again, however, this relies on GroqCloud keeping the current free tier limits. There’s nothing to say that Groq won’t also cut rate limits at some point in the future, so you might want to add a fallback model with a different provider.

Host a local LLM or VLM

If you don’t want your free access to an AI model suddenly pulled from under you, you might want to consider hosting a local LLM yourself. This puts you firmly in control, with none of your data leaving your home or being used to train AI models, and no concerns about rate limits or API fees.

Hosting an LLM isn’t free in the strictest sense, since you’re paying for the hardware and electricity, but it can work out cheaper in the long term. The biggest issue is whether your hardware is powerful enough to run an LLM that can do the job.

For example, my doorbell automation involves analyzing an image taken from a doorbell camera and describing what the image contains. Running a medium-sized vision-language model (VLM) such as Llama 3.2 Vision 11B generally requires at least 12GB of VRAM for reasonable performance, so a consumer-grade GPU such as an RTX 3060 12GB should be able to handle it.

a palit nvidia geforce rtx 3060 of the graphics card in an open box against Dark background. Credit: Maryia_K/Shutterstock.com

If you don’t have the necessary hardware, however, then the responses will take a significant amount of time to be generated. This wouldn’t be of any use for my doorbell automation, which requires timely notifications.

If you’ve got the hardware to run a local LLM or VLM that’s fast enough to be useful, then it’s definitely worth considering. Your data never needs to leave your home, and you don’t have to worry about what major AI companies are learning about you from your prompts.

Suck it up and pay for what you use

You might not like this final option, but it’s one you need to consider. AI companies are spending insane amounts of money developing their models and buying GPUs. This isn’t sustainable long-term, and the revenue has to come from somewhere.

It’s not realistic to expect companies such as Google to give away unlimited access to its models for free. At some point, they’re going to start making us pay, and that seems to have started.

A person handing over a large stack of cash. Credit: Andy Dean Photography/Shutterstock.com

API costs aren’t insanely high; Tier 1 Gemini 2.5 Flash pricing is $0.30 per million tokens for text, image, and video inputs, and $2.50 per million tokens for the output. Even with my doorbell automation firing multiple times a day, it would cost me just a few cents a month. Even using Gemini 3 Pro Preview, Google’s best and most expensive model (which would be complete overkill), I’d only hit about $2 per month.

You don’t have to limit yourself to a single provider, either. Using aggregators such as OpenRouteryou can access a huge range of models from different providers and pay for all your usage in a single location.

The free tiers offered by AI companies were only ever really intended for testing out models, and there’s no real surprise that the limits have been slashed. Currently, it’s not going to break the bank to actually pay for what you use.


It would have been nice if Google had given a heads-up about slashing its free API limits, rather than many people only finding out when their automations stopped working. The change was pretty inevitable, however. The good news is that there are still free alternatives you can use, and if you do decide to pay, it’s not going to break the bank.

Related Posts

Leave a Comment