Skip to main content

Overview

Generate images directly in chat using AI models. Supports both generation from text prompts and iterative editing of existing images.

Quick Start

Toggle in chat.config.ts:
integrations: {
  imageGeneration: true, // Requires BLOB_READ_WRITE_TOKEN
}
Image generation requires BLOB_READ_WRITE_TOKEN since generated images are uploaded to Vercel Blob storage.
Configure the default image model:
models: {
  defaults: {
    image: "google/gemini-3-pro-image",
  },
}

Modes

The tool operates in two modes based on context:
ModeTriggerBehavior
generateText prompt onlyCreates new image from scratch
editPrompt + attachments or previous generationUses existing images as input
Mode is determined automatically:
const mode = imageParts.length > 0 || lastGeneratedImage ? "edit" : "generate";

Iterative Editing

Users can iterate on generated images without re-uploading. The system automatically tracks the last generated image in the conversation.

How It Works

  1. Extraction: Before each request, the chat agent scans recent messages for the last generated image:
app/(chat)/api/chat/get-recent-generated-image.ts
export function getRecentGeneratedImage(
  messages: ChatMessage[]
): { imageUrl: string; name: string } | null {
  const lastAssistantMessage = messages.findLast(
    (message) => message.role === "assistant"
  );

  if (lastAssistantMessage?.parts && lastAssistantMessage.parts.length > 0) {
    for (const part of lastAssistantMessage.parts) {
      if (
        part.type === "tool-generateImage" &&
        part.state === "output-available" &&
        part.output?.imageUrl
      ) {
        return {
          imageUrl: part.output.imageUrl,
          name: `generated-image-${part.toolCallId}.png`,
        };
      }
    }
  }

  return null;
}
  1. Injection: The extracted image is passed to the tool factory:
lib/ai/core-chat-agent.ts
const lastGeneratedImage = getRecentGeneratedImage(messages);

const tools = getTools({
  // ...other params
  lastGeneratedImage,
});
  1. Edit mode: When lastGeneratedImage exists, the tool fetches it and includes it as input:
lib/ai/tools/generate-image.ts
const inputImages = await collectEditImages({ imageParts, lastGeneratedImage });
promptInput = { text: prompt, images: inputImages };

User Experience

  • User: “Generate a sunset over mountains”
  • AI: generates image
  • User: “Add a lake in the foreground”
  • AI: edits previous image (no re-upload needed)

Image Sources

Edit mode combines images from multiple sources:
SourceDescription
lastGeneratedImageMost recent generated image in conversation
attachmentsUser-uploaded images in current message
Both are fetched and passed to the model:
async function collectEditImages({ imageParts, lastGeneratedImage }) {
  return await Promise.all([
    ...(lastGeneratedImage ? [fetchImageBuffer(lastGeneratedImage.imageUrl)] : []),
    ...imageParts.map((p) => fetchImageBuffer(p.url)),
  ]);
}

Architecture

Follows the Tool Part pattern:
lib/ai/tools/generate-image.ts → components/part/generate-image.tsx

Tool Output

return { imageUrl: result.url, prompt };
The generated image is uploaded to blob storage and the URL returned.

UI States

StateShows
input-availableSkeleton + “Generating image:
output-availableImage + copy button + prompt

Configuration

Image Model

chat.config.ts
models: {
  defaults: {
    image: "google/gemini-3-pro-image",
  },
}

Model Selection Logic

The tool supports two types of models:
TypeDescriptionExample
Image modelStandalone image generation modelsgoogle/gemini-3-pro-image
MultimodalLanguage models with image generation capabilitygoogle/gemini-2.0-flash-exp
Model selection follows this priority:
lib/ai/tools/tools.ts
const imageToolModelId = isMultimodalImageModel(selectedModel)
  ? selectedModel
  : undefined;
  1. If the user’s selected chat model can generate images (is a multimodal image model), use it
  2. Otherwise, fall back to config.models.defaults.image

Image Model vs Multimodal Generation

The tool uses different generation paths based on model type: Image model (generateImage from AI SDK):
  • Uses standalone image models via getImageModel()
  • Supports edit mode with image buffers as input
  • Returns base64-encoded images
Multimodal (generateText with image output):
  • Uses language models via getMultimodalImageModel()
  • Passes images as URL references in message content
  • Requires responseModalities: ["TEXT", "IMAGE"] for Google models
  • Extracts generated image from response files