Building an AI-powered crossword puzzle with Genkit and Gemini

June 26, 2024

When you think of a crossword puzzle, you probably think of the analog pen-and-paper game, not artificial intelligence. But our newest demo might change that. For I/O this year Flutter and Firebase teamed up with Very Good Ventures to create I/O Crossword, a classic word game with a helpful twist built with the Gemini API.

We were able to build this fun and helpful hint feature using Firebase Genkit. Launched at Google I/O this year, Genkit is Google’s AI integration framework for Node.js developers (with Go support coming soon). Its goal is to make building AI features and adding them to your app’s backend as easy and natural as possible.

We all know that feeling: you’re trying to solve a crossword puzzle, and the word is on the tip of your tongue, or the clue just isn’t cutting it. Instead of spoiling the fun with a Google search, we thought, why not add an AI-powered feature to the app to give you a nudge – just enough to keep those brain cells firing and that satisfaction flowing when you finally crack the code.

In this blog, we’ll cover how we built this functionality and why we chose Genkit to build the experience. If you want to dig in further, you can access the code on GitHub. Click here to play the game.

How it works

Before jumping into how we built the feature, here’s a quick overview of how it works…

Say you don’t quite know what “To convert data into a code for security” means
Click on “Ask for a hint”
Now, you can type out up to 10 “Yes/No” questions and get a response from the Gemini API 1.5 Flash model to guide your thinking
Once you’ve got enough clues, you can go ahead and enter what you think is the correct word

Under the hood, Genkit calls the Gemini API to answer your questions.

Adjusting parameters and working with LLMs in Genkit

So why did we choose to use Genkit to build out this feature? Two words: power and flexibility.

Genkit includes libraries and plugins that simplify every part of the AI feature development process, from building the initial prototype to production deployment. It comes with all the primitives and tools and we needed to efficiently build and iterate on the AI logic for the I/O Crossword app, so we could focus more time and attention on making it fun to play with!

Prompt engineering with Genkit

When working with LLMs, finding the right combination of model parameters and prompts for your use case is critical to achieve the best generation. Genkit uses dotprompt, which encapsulates model parameters, tools, input/output schemas, and your prompt template into a single prompt file to simplify design and iteration. Dotprompt files are managed, versioned, and tested alongside the rest of our code as part of our regular development workflow. Rich prompt templating is provided with Handlebars syntax and supports variables, conditional logic, multimodal content, and more.

Here is an example of a dotprompt file, showing the prompt we used to give hints to the user of the crossword puzzle app.

---
model: googleai/gemini-1.5-flash-latest
config:
  temperature: 0.1
input:
  schema:
    word: string
    question: string
    context(array):
      question: string
      answer: string, one of ('yes', 'no', 'notApplicable')
output:
  format: json
  schema:
    answer: string, one of ('yes', 'no', 'notApplicable')
---

I am solving a crossword puzzle and you are a helpful agent that 
can answer only yes or no questions to assist me in guessing the word "{{word}}".

If the answer to the question is yes say 'yes'.

If the answer to the question is no say 'no'.

If the question is inappropriate, unrelated to the word "{{word}}", 
or not a question that can be answered with a yes or no, say 'notApplicable'.

The word I am trying to guess is "{{word}}", and the question is "{{question}}". 
The questions I've asked so far with their corresponding answers are:

{{#each context}}
- {{question}}: {{ answer}}
{{/each}}

In this prompt, we have explicitly defined the model we employ for the task at hand, which is Gemini 1.5 Flash. It’s important to note that we didn’t always use this model. At the initial stages of development - before Google I/O - we utilized the Gemini 1.0 Pro model. However, with the introduction of the new Flash model, which is now accessible to developers, we made the switch. This transition highlights the flexibility of our approach. If, in the future, we decide to move from Google AI to an alternative platform such as Vertex AI, the process would be straightforward. All that would be required is a simple modification to the model defined in our code. This adaptability is a testament to the versatility of our solution.

To invoke this prompt from inside the crossword app, all we need to do is add the Dotprompt framework to our app, and use its prompt function to load the prompt from the dotprompt file.

import {prompt} from "@genkit-ai/dotprompt";

const cluePrompt = await prompt<z.infer<typeof getHintSchema>>("clue");
const result = await cluePrompt.generate({input});

Firebase: Integrating AI, managing prompts, and securing the experience

Since we already used Firebase to power many of the features of the crossword puzzle app (such as hosting the app, resetting the board, and securing the game experience), it was straightforward to use Firebase to also integrate AI capabilities into the game.

We did this by creating a flow in Genkit. A flow is a function with some additional characteristics.

When a user clicks on the Hint button, this is what happens on the backend:

The Genkit flow receives the question and incorporates context from previous questions and clues.
The flow processes the information and calls the Gemini 1.5 Flash model to generate a “yes or no” output.

Genkit uses the Zod framework to define typesafe schemas for specifying the input and output of flows. Here is the schema we defined for the hint feature:

import * as z from "zod";
const getHintSchema = z.object({
 word: z.string(),
 question: z.string(),
 context: z.array(
   z.object({
     question: z.string(),
     answer: z.string(),
   })
 ),
});

Using Zod allows us to send and receive parameters to and from Genkit as structured type-safe objects.

Now, let’s take a look at the flow itself:

import {onFlow} from "@genkit-ai/firebase/functions";
import {firebaseAuth} from "@genkit-ai/firebase/auth";
export const getHintKit = onFlow(
 {
   name: "getHintKit",
   httpsOptions: {cors: "*"},
   inputSchema: getHintSchema,
   outputSchema: z.object({
     answer: z.string(),
   }),
   authPolicy: firebaseAuth((user) => {
     if (user.firebase?.sign_in_provider !== "anonymous") {
       throw new Error("We expect Firebase Anonymous Authentication");
     }
   }),
 },
 async (input) => {
   const cluePrompt = await prompt<z.infer<typeof getHintSchema>>("clue");
   const result = await cluePrompt.generate({input});
   return result.output() as any;
 }
);

First, after configuring a name for the flow and httpsOptions, we define how our input and output schemas are structured. For the input schema, we use the Hint schema we built before. Our output is an object that contains a property named answer, with a string as the value. This will become our yes or no response.

Then, we added an authentication policy, which helps us integrate our flow with Firebase authentication using anonymous auth, so only authenticated users have access to the crossword features.

Lastly, inside the flow, we load the prompt using the prompt function. The generate function takes in the parameters we send to the prompt, which are loaded into the flow’s argument. Since we generate a prompt with structured output, we use the output helper to retrieve and validate it before returning it back.

To deploy, all you need to do is call firebase deploy and after a short moment, the game is available on Firebase

Testing and debugging our AI logic

Throughout the development process, whenever we want to quickly test our AI logic, we can open the local Genkit developer UI to have a nice interface to provide structured input. It connects to our Genkit code, so any models, prompt, and flow we have configured are available to run. If there are any issues, the Inspect tab provides a way to view trace views so we can have more visibility into the model interaction. This helps us test and iterate code faster!

See for yourself: Start finding hidden words

Genkit made building our AI-powered crossword a breeze! The framework let us build, launch, and track our AI-powered feature without any hassle. The result? A cool hint feature that makes the game more fun and gives players a truly unique experience.

Start finding all the hidden words and earn points for your team!

Try out the I/O crossword puzzle here. To learn more about Genkit, read the Get Started Guide.

We’ve also open-sourced the code so you can explore it on GitHub.

The Firebase Blog