Introducing hybrid on-device inference using Firebase AI Logic

June 9, 2025

We’re excited to announce a powerful new experimental feature in the Firebase AI Logic client SDK for Web: hybrid on-device inference!

Harness the power of on-device AI models with a seamless fallback to cloud-hosted models

This ensures your generative AI-powered features are always available to your users – online or offline.

Chrome on Desktop supports Gemini Nano, a powerful on-device model that can be downloaded and hosted in the browser on-demand. Running AI models directly on a user’s device offers some great advantages:

Enhanced privacy: Sensitive data can stay right on the user’s device.
Offline availability: Your app’s AI features can work even without an internet connection.
Cost savings: On-device inference is available at no cost, reducing your application’s operational expenses.
On-device performance: You can often get high performance thanks to hardware acceleration.

You can build AI features that utilize all these advantages of an on-device model, but then fallback to a cloud-hosted model when its capabilities are needed.

The Best of Both Worlds

On-Device and In-Cloud Inference

The Firebase AI Logic client SDK for Web has a hybrid on-device extension that uses the proposed W3C Prompt API for on-device model capabilities. It uses built-in, on-device models (like Gemini Nano in Chrome) when it’s available. If that’s not an option on a particular device or browser, the SDK falls back to using Gemini models on the server. This maximizes your reach, ensuring that your users can access your AI features, regardless of their device capabilities.

How it works

Using hybrid on-device inference is straightforward. The Firebase AI Logic client SDK for Web automatically handles the logic of checking for on-device model availability and falling back to a cloud-hosted model if necessary.

Here’s a look at how you can implement hybrid inference in your web app:

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance
// Set the mode, for example to use on-device model when possible
const model = getGenerativeModel(ai, { mode: "prefer_on_device" });

// run inference and log result
const result = await model.generateContent("Write a story about a magic backpack.");
console.log(result.response.text());

This implementation is almost identical to a regular getGenerativeModel call, but with the addition of the mode parameter that sets how you want to access a model. The SDK takes care of the rest!

Furthermore, you can also configure this code to always use either an on-device or cloud-hosted model for inference by switching the `mode` parameter:

// Run inference only on device. Does not fallback to cloud-hosted models.
const model = getGenerativeModel(ai, { mode: "only_on_device" });

// Run inference only on the cloud-hosted models.
const model = getGenerativeModel(ai, { mode: "only_in_cloud" });

If you’re curious, you can try out our demo and view the codebase.

Get Started Today!

We’re excited to see what you’ll build with hybrid on-device inference. To get started, check out our official documentation.

We are always looking for feedback on how we can improve our products. Please feel free to share your thoughts and experiences with us using one of our channels.

Happy building!

The Firebase Blog