Published: January 5, 2026, 10:30 CET
Author: AI Expert Reviewer
Lead
Google is hitting developers with a one-two punch in January 2026: grounding billing for Gemini 3 models launches today while critical models face shutdown in the coming weeks.
Starting today, January 5, 2026, Google has begun charging developers for Grounding with Google Search on Gemini 3 models. Simultaneously, two more model deprecations loom—text-embedding-004 sunsets January 14, and gemini-2.5-flash-image-preview follows January 15—creating an urgent migration window for production applications. The overlapping deadlines leave developers scrambling to audit code, recalculate budgets, and deploy replacement models before services stop working.
This marks a turning point in how Google monetizes its most advanced AI capabilities. Until today, search grounding operated under a free daily quota (5,000 prompts for Gemini 3 Pro). Now, beyond that allowance, each search query costs $14 per 1,000 queries. For teams running high-volume applications, the cost jump is substantial. A mid-sized enterprise executing 100,000 queries monthly will see grounding costs climb from near-zero to roughly $1,400 monthly—and scale from there with usage.

What Changed Today: The Grounding Billing Model
Google announced the January 5 grounding billing start on December 5, 2025, giving developers exactly one month to prepare. Unlike other API changes, this one offers no grace period or staged rollout. All Gemini 3 Pro Preview and Gemini 3 Flash Preview requests using the Google Search tool will now incur charges immediately.
The pricing structure is straightforward but punishing at scale. Gemini 3 Pro includes 5,000 free grounded prompts per month. After that threshold, the cost is $14 per 1,000 search queries. On Vertex AI, the same pricing applies but with a higher free tier: 1,500 free requests daily (up to 45,000 monthly), then $35 per 1,000 grounded prompts beyond that. The distinction matters: Vertex AI’s per-prompt pricing is cheaper if your application generates multiple search queries per user request.
One critical detail: developers pay per search query generated by the model, not per user prompt. If a single Gemini 3 request triggers three internal searches to ground the response, you’re charged for all three. This pricing model incentivizes optimization—caching, reducing redundant searches, and testing whether grounding is necessary for each request—but catches teams off-guard if they’re not monitoring token-level behavior.
Models Sunset Within Days: Text-Embedding-004 and Image Models
While the grounding billing surprise unfolds, developers face more immediate code-breaking deadlines. The text-embedding-004 model, widely used for semantic search, RAG systems, and codebase indexing, will stop accepting requests on January 14, 2026—just nine days away. Thousands of applications depend on this model for vector search. LinkedIn discussions confirm developers have been caught off-guard, with one engineer posting: “the clock just started,” noting that teams hardcoding model names “deep inside services” will face mini fire drills when 004 becomes unavailable.
Google’s recommended migration target is gemini-embedding-001, a newer embedding model that benchmarks higher on quality but requires reindexing any existing vector databases. The migration isn’t trivial: teams must pull source documents, re-embed them with the new model, reload vectors into search indices, and validate that downstream systems still work. For large codebases or document repositories, this can take days or weeks.
On January 15, the gemini-2.5-flash-image-preview model will also shut down. This affects fewer developers than the embedding change, but any production image generation workflows relying on this preview model must switch to stable alternatives like gemini-2.5-flash-image or Imagen 4.

Why This Matters: Cost Shock and Vendor Lock-In Risk
For enterprises, the grounding billing change represents a shift from “search is free, use it freely” to “every search has a cost.” This reshapes application architecture. Teams that built chatbots or research agents assuming free, unlimited search queries must now budget for grounding costs or redesign to use fewer searches.
An analysis from Sparkco estimated that a mid-sized enterprise running 100,000 monthly grounding queries will face $360,000 in monthly inference costs, with grounding alone adding another $1,400. For applications running 300,000+ queries monthly, total costs exceed $1 million. Caching—storing recent search results to avoid redundant queries—can reduce costs by 40%, but this requires architectural changes.
The embedding shutdown adds another cost pressure. Organizations cannot simply keep using text-embedding-004 indefinitely; Google has set a hard cutoff. Migrating to gemini-embedding-001 means paying Google for embeddings if they weren’t already. Pricing is $0.15 per million input tokens—modest per query, but it accumulates in batch workflows.
Key Deadlines and What Developers Must Do Now
| Deadline | Event | Model(s) Affected | Action Required |
|---|---|---|---|
| Jan 5, 2026 (TODAY) | Grounding billing begins | Gemini 3 Pro/Flash | Audit grounding usage; add budget alerts in Cloud Console; consider reducing unnecessary searches |
| Jan 14, 2026 | text-embedding-004 sunset | Embedding models | Migrate to gemini-embedding-001; reindex vector databases; test downstream systems |
| Jan 15, 2026 | Image preview model sunset | gemini-2.5-flash-image-preview | Move image generation to gemini-2.5-flash-image or Imagen 4 |
| Feb 5, 2026 (Gemini 2.0 Flash) / Feb 25, 2026 (Flash-Lite) | Gemini 2.0 Flash models retire | gemini-2.0-flash, gemini-2.0-flash-lite | Migrate to Gemini 2.5 Flash / Flash-Lite equivalents |

Developers should immediately:
- Audit current usage – Query Cloud Logging or Vertex AI dashboards to find where grounding is enabled and which models are in use.
- Set budget alerts – Configure 50%, 80%, and 100% thresholds in Google Cloud Console to catch unexpected cost spikes.
- Start migration planning – For text-embedding-004 users, begin drafting a reindexing and testing plan; assign resources.
- Test alternatives – Run side-by-side tests of gemini-embedding-001 before switching production traffic.
- Review Gemini 2.0 deprecation – If you’re still using Gemini 2.0 Flash models, plan an upgrade path to Gemini 2.5 before early February.
How Google Justifies the Shift
Google’s rationale is clear: grounding requires live queries to Google Search or Google Maps, which consume infrastructure and data. Unlike token-based model inference, which scales across all customers equally, each grounded request incurs a marginal cost to Google. The pricing model—$14 per 1,000 queries—is meant to align developer incentives with resource consumption.
For teams building search-aware agents or retrieval-augmented generation (RAG) systems, grounding was a competitive advantage. It gave Gemini a way to avoid hallucinations by tethering responses to current data. Now that feature has a price tag, reshaping the economics of agentic AI.
Vertex AI’s higher free tier (1,500 daily) signals that Google wants to keep enterprise use of grounding accessible while charging teams that scale beyond that threshold. The strategy mirrors AWS and Azure’s approach to API pricing: free tier for proof-of-concept, paid tiers for production.

Looking Ahead: What’s Next for Developers
The deprecation rhythm will accelerate. Google has already announced that Gemini 2.0 Flash models will sunset in early February 2026 (specifically February 5 for Gemini 2.0 Flash and February 25 for Flash-Lite), pushing developers to upgrade to Gemini 2.5 series. By June 2026, Gemini 2.5 Pro and Flash are scheduled to retire in favor of Gemini 3 Pro. This means developers upgrading to Gemini 3 today to avoid future deprecations will immediately face grounding billing—a catch-22.
The broader trend is clear: Google is consolidating its model portfolio around Gemini 3 as the flagship, retiring older versions, and monetizing premium features like grounding. Developers who’ve built on older Gemini versions face a migration treadmill; those already on Gemini 3 face new costs.
To minimize disruption, developers should treat deprecation announcements as serious. Google provided advance notice for each sunset (1-2 months), but that window evaporates quickly once the deadline approaches. Setting up automated alerts for Gemini API release notes, testing changes in staging environments early, and maintaining abstraction layers between model names and business logic are no longer optional.
FAQ: Common Questions About the Changes
If I’m still using Gemini 1.5 models, what happens on January 5?
Grounding billing only applies to Gemini 3 models. Gemini 1.5 Pro and Flash have their own grounding pricing ($35 per 1,000 prompts for search grounding). However, Gemini 1.5 models were already shut down in September 2025, so you should have migrated by now.
Do I need to reindex my entire vector database when migrating from text-embedding-004?
Yes, if you’re using that model for semantic search. Old vectors created with text-embedding-004 are not compatible with gemini-embedding-001. You’ll need to reprocess your documents and regenerate vectors. Google recommends running side-by-side testing first to ensure quality is acceptable.
Will my Gemini 3 application cost more if I enable grounding?
Yes. Even with the 5,000 free monthly prompts, any application executing more than that will incur costs. Calculate your expected monthly grounding volume and budget accordingly. Enterprise customers can negotiate volume discounts with Google Cloud sales.
What’s the easiest way to avoid grounding costs?
Reduce the number of searches. Implement caching, limit grounding to high-value queries, and test whether your application quality degrades without always-on search grounding. Some use cases (like current events lookups) require grounding; others (like coding assistance) may not.
Are there third-party alternatives to Gemini grounding I can use instead?
Yes. You can build your own RAG pipeline using open-source embedding models (like Sentence Transformers) and vector databases (Pinecone, Weaviate, Milvus). This trades Google’s convenience for more control and potentially lower costs at scale, but requires more engineering effort.
Conclusion
January 5, 2026, marks a watershed moment for Gemini developers. Grounding billing launches, cost models shift, and the deprecation clock ticks loudly for text-embedding-004 and image preview models. Teams that act now—auditing usage, setting budgets, starting migrations—will avoid surprise bills and production outages. Those that delay will face rushed code changes and scrambled reindexing efforts over the next two weeks.
The message from Google is clear: advanced capabilities like grounding are now premium features, and the older model roster is being retired systematically. For developers building on Gemini, this is a moment to consolidate around Gemini 3, invest in optimization (caching, reduced searches), and treat deprecation announcements as sprint-worthy tasks rather than future concerns.
Related Resources
- To optimize your costs, check our comprehensive guide on Gemini API pricing strategies and our tutorial on migrating your RAG pipeline.
This review was developed through rigorous hands-on testing across real-world B2B scenarios:
• 200+ AI solutions evaluated
• Hundreds of successful implementations
• Complete editorial independence (no paid placements)
• Minimum 7-14 days hands-on testing per tool
• Team of B2B AI specialists with 3+ years experience
→ Learn more: AI Implementation Roadmap & B2B AI Tool Reviews