Ship production AI features faster with Firebase AI Logic

April 22, 2026

If you’ve been building with Firebase AI Logic, you know the core story: direct access to Gemini from your mobile or web app, with no backend to build or maintain. Thousands of apps are already running this in production today.

This blog post covers what’s new at Cloud Next ’26. We’ve been listening to your feedback and working on features that give you more control over security, prompt management, and cost optimization, all while keeping the client-first simplicity you’re used to.

If you’re new to Firebase AI Logic, check out the getting started guide first. For a broader overview of our client-first architecture and why it matters for mobile and web apps, take a look at our I/O 2025 blog post.

Server Prompt Templates: now with function calling and chat

Back in December, we launched server prompt templates to let you move your prompts off the client and onto Firebase’s servers. Many of you adopted them to protect prompt IP and speed up iteration cycles.

At launch, templates supported text-and-image inputs with standard system instructions. The two most common requests we heard were: “Can I use function calling with these?” and “What about multi-turn chat?”

Both are now supported.

Chat experiences with templates

You can now use server prompt templates to power multi-turn conversations. The system instructions, model configuration, and any tool definitions live on the server. Your client code just references the template ID and manages the conversation turn-by-turn.

Here’s what a chat template looks like on the server side (you define this in the Firebase console):

---
model: gemini-3-flash-preview
config:
  topK: 20
  topP: 0.8
input:
  default:
    message: ""
  schema:
    message?: string
---
{{history}}
{{message}}

The {{history}} placeholder is the key ingredient. It tells the template where to inject the conversation turns managed by the client SDK. On the client, you provide the template ID and the parameter values. Here’s how it works in Kotlin:

chat.kt

val model = Firebase.ai(backend = GenerativeBackend.googleAI())
  .templateGenerativeModel()

// Start a chat session with your template ID
val chatSession = model.startChat(
  "weather-assistant-v2",
  mapOf("language" to "english")
)

// Send messages. The template's system instructions and
// model config apply to every turn automatically
val response = chatSession.sendMessage(
  Content("user", "What's the weather like in Lisbon?")
)
println("Response: ${response.text}")

Copied!

The key benefit here is that you can iterate on your system instructions, swap the underlying model, or adjust safety settings, all from the Firebase console, without pushing an app update. Your chat feature keeps working with the new configuration instantly.

Function calling in templates

You can now define tool declarations directly in your server prompt template’s frontmatter. The function schemas, like your system instructions, stay on the server and are never exposed to the client binary.

Here’s what a template with function calling looks like:

---
model: gemini-3-flash-preview
tools:
  - name: takePicture
    description: Use the device's camera to take a picture
    input:
      schema:
        orientation(enum, camera orientation): [PORTRAIT, LANDSCAPE]
        useFlash?: boolean, whether or not to use the flash
        zoom?:
          type: number
          description: how zoomed in the camera should be
          minimum: 1
          maximum: 10
    output:
      schema:
        aspectRatio?: string, the aspect ratio of the photo
        mimeType: string, mime type of the image
        data: string, base64 encoded image data
input:
  schema:
    customerName?: string
---
{{role "system"}}
You are great at writing poems and always give detailed compliments about photos the user takes.

{{role "user"}}
{{#if customerName}}
Take a picture for {{customerName}}.
{{else}}
Describe and compliment the picture in the history.
{{/if}}

{{history}}

Notice that the tools block defines the full function schema (name, description, typed parameters with constraints, and expected output) entirely on the server. Your client never sees these definitions.

On the client, function calling with templates works through the chat API. You start a chat session with the template, send a message, and check whether the model returned a function call. If it did, you execute your local logic and send the result back. Here’s what the client code looks like in Dart for a Flutter app:

image-chat.dart

final _model = FirebaseAI.googleAI().templateGenerativeModel();

// Start a chat session with a template that has tools defined
var chatSession = model.startChat(
  'function-calling-v1',
  inputs: {'customerName': customerName},
);

// Send a message that might trigger a function call
var response = await chatSession.sendMessage(
  Content.text(userMessage),
);

// Check if the model wants to call a function
final functionCalls = response?.functionCalls.toList();
if (functionCalls != null && functionCalls.isNotEmpty) {
  final functionCall = functionCalls.first;
  if (functionCall.name == 'takePicture') {
      // Execute your local logic
      final result = {
          'orientation': 'LANDSCAPE',
          'useFlash': true,
          'zoom': 2,
      };
      // Send the function result back to the model
      var followUp = await chatSession.sendMessage(
          Content.functionResponse(functionCall.name, result),
      );
      print("Final response: ${followUp?.text}");
  }
}

Copied!

This is powerful because the tool schema, parameter constraints, and system instructions that guide the model’s decisions are all managed server-side in the template. Your client code only handles the execution of local actions and the conversation flow. If you need to add a new tool, tweak a parameter constraint, or change the system prompt, just update the template in the Firebase console and it’s live instantly. No app update required.

For full syntax and configuration details, check out the updated template docs.

Replay attack protection from App Check

Firebase AI Logic has always kept your Gemini API key on our servers, never exposed to the client. Additionally, using Firebase App Check provides a critical additional layer that validates that requests come from your legitimate, untampered app using device attestation providers like Play Integrity or DeviceCheck.

When using App Check with Firebase AI Logic, we already supported limited-use tokens with configurable lifespans as short as 5 minutes. Starting in May 2026, we’re introducing replay attack protection, which makes App Check tokens strictly single-use.

Even if an attacker intercepts a valid token over the network, they cannot replay it. Each token is consumed on first use and rejected on any subsequent attempt. This is especially valuable for AI endpoints where each call has a direct cost impact on your bill. For example securing a retail AI endpoint from abuse for virtual try on.

Type-safe automatic function calling

Implementing function calling – where the model can call specified external APIs and functions from your app – used to require a lot of boilerplate: manually defining JSON schemas, parsing the model’s tool call responses, serializing parameters, and handling errors at each step. It’s the kind of code that’s tedious to write and easy to get wrong.

Our SDKs now support type-safe automatic function calling. For example, define a native Kotlin data class, and the SDK does the rest. It infers the tool schema, registers it with the model, and handles serialization and deserialization automatically.

points.kt

// Define your parameters as a standard Kotlin data class
@Serializable
data class AddPointsParams(
  val points: Int,
  val reason: String
)

// The SDK automatically infers the schema and handles the call
val model = Firebase.ai.generativeModel(
  modelName = "gemini-2.5-flash",
  tools = listOf(
      Tool.autoFunction("addGamePoints", "Award points to the player") {
          params: AddPointsParams ->
          // Your native Kotlin logic runs here
          gameViewModel.addPoints(params.points)
          "Awarded ${params.points} points: ${params.reason}"
      }
  )
)

Copied!

No manual JSON parsing. No schema definition boilerplate. Just native data classes and lambdas. This means fewer bugs, less code to maintain, and faster feature delivery.

Explicit context caching

In many production scenarios, you’re sending the same large chunk of context (like a company policy document, a product catalog, a long video) to the model repeatedly for different users. Processing those same input tokens over and over is expensive and adds latency.

Explicit context caching lets you upload that heavy context once to the Gemini API and get a cache ID back. Reference that cache ID in your server prompt template, and the model only processes the user’s short query against the pre-cached context.

---
model: 'gemini-3-flash-preview'
config:
  cachedContent: 'cachedContents/your-cache-id-here'
---

{{role "system"}}
You are a customer service agent. Answer questions based on
the company policy document provided in the cached context.

{{role "user"}}
{{customerQuestion}}

If you’re making a lot of requests to this content, then using explicit caching can reduce your input token costs significantly and makes responses feel noticeably faster. The combination with server prompt templates is particularly useful: you set the cache ID on the server, and your client code doesn’t need to know or manage anything about caching.

Hybrid on-device inference for Android (experimental)

Since October 2025, we’ve supported hybrid on-device inference for web apps on Chrome. Now, we’re bringing the same hybrid on-device capability to Android.

If you set the inference mode to PREFER_ON_DEVICE, the SDK checks if the device supports Gemini Nano. If it does, inference runs locally with less latency, absolute data privacy, and zero API cost to you. If the device doesn’t support on-device inference, the SDK seamlessly falls back to the cloud-hosted model.

summarize.kt

val model = Firebase.ai.generativeModel(
  modelName = "gemini-3-flash-preview",
  inferenceMode = InferenceMode.PREFER_ON_DEVICE
)

val response = model.generateContent("Summarize this note for me: $userNote")

Copied!

You write one codebase. No conditional logic. No separate execution paths.

The Android devices that support on-device AI are growing. By implementing this now, you improve margins and performance for a segment of your users without adding architectural complexity.

This is currently experimental on Android. We’d love your feedback as we work toward a stable release, so let us know what you’re building with it.

Start building

We’d love to hear what you think. These features were shaped by community feedback, and your input directly influences what we work on next. Join us at the Firebase community forums, and if you attended our Cloud Next session or watched on-demand (BRK2-064), thank you!

Happy building!

The Firebase Blog