Add image recognition using Google Cloud Vision API called through a Cloud Function. Users capture or select an image, which is converted to base64 and sent to the Cloud Function. The function calls the Vision API with LABEL_DETECTION and TEXT_DETECTION features, parses the response, and returns labels with confidence scores. Display results in a ListView. For on-device recognition without internet, use the google_mlkit_text_recognition package in a Custom Action.
Adding Image Recognition to Your FlutterFlow App
Image recognition enables features like product identification, document scanning, receipt parsing, and accessibility descriptions. This tutorial covers both cloud-based and on-device approaches.
Prerequisites
- FlutterFlow Pro plan (Cloud Functions required)
- Google Cloud Platform account with Vision API enabled
- Firebase Storage for image handling
- An image capture or selection mechanism in your app
Step-by-step guide
Set up the Google Cloud Vision API and Cloud Function
Set up the Google Cloud Vision API and Cloud Function
In Google Cloud Console, enable the Cloud Vision API for your project. Create a service account with Vision API access and download the JSON key file. In your Cloud Function, install @google-cloud/vision and create a client with the service account credentials. The function receives a base64 image string and calls client.annotateImage() with features: LABEL_DETECTION (identifies objects/scenes) and TEXT_DETECTION (reads text in images). Return the parsed results.
1const vision = require('@google-cloud/vision');2const client = new vision.ImageAnnotatorClient();34exports.analyzeImage = async (req, res) => {5 const { imageBase64 } = req.body;6 const request = {7 image: { content: imageBase64 },8 features: [9 { type: 'LABEL_DETECTION', maxResults: 10 },10 { type: 'TEXT_DETECTION', maxResults: 5 },11 ],12 };13 const [result] = await client.annotateImage(request);14 const labels = result.labelAnnotations.map(l => ({15 description: l.description,16 score: Math.round(l.score * 100),17 }));18 const text = result.fullTextAnnotation?.text || '';19 res.json({ labels, text });20};Expected result: The Cloud Function accepts a base64 image and returns labels with scores and extracted text.
Capture or select an image and convert to base64 in a Custom Action
Capture or select an image and convert to base64 in a Custom Action
Create a Custom Action called imageToBase64 that takes a file path (from camera capture or gallery picker) and returns a base64 string. Read the file bytes, encode with base64Encode from dart:convert. Before encoding, resize the image to max 1024px on the longest side using the image package to reduce upload size and API costs. Large camera photos (5MB+) are slow to upload and cost more per Vision API call.
Expected result: The Custom Action converts a captured or selected image to a resized base64 string.
Call the Cloud Function and display recognition results
Call the Cloud Function and display recognition results
On your analysis page, add a FlutterFlowUploadButton or custom camera capture button. After the user selects an image, show it in a preview Image widget. Call the imageToBase64 Custom Action, then call the Cloud Function API with the base64 string. Show a CircularProgressIndicator during processing. On success, parse the response and display labels in a ListView — each Row shows the label description and a LinearPercentIndicator for the confidence score.
Expected result: Users see their image with a list of detected labels and confidence scores below.
Add on-device text recognition for offline use
Add on-device text recognition for offline use
For text recognition without internet, add google_mlkit_text_recognition to Pubspec Dependencies. Create a Custom Action called recognizeText that takes an image file path, creates an InputImage, and calls TextRecognizer().processImage(). Extract recognized text blocks and return the concatenated text. This works offline and is free (no API costs), but only supports text, not object labels.
Expected result: Text recognition runs on-device without internet connection or API costs.
Display extracted text in a copyable text area
Display extracted text in a copyable text area
For text detection results, show the extracted text in a SelectableText widget (allows copy) or a TextField (allows editing). Add a Copy to Clipboard button using a Custom Action with Clipboard.setData(). This is useful for receipt scanning, document digitization, and business card reading. Format the text by preserving line breaks from the Vision API response.
Expected result: Extracted text displays in a selectable and copyable text area.
Complete working example
1// Cloud Function: Google Cloud Vision API2const vision = require('@google-cloud/vision');3const client = new vision.ImageAnnotatorClient();45exports.analyzeImage = async (req, res) => {6 try {7 const { imageBase64 } = req.body;89 if (!imageBase64) {10 return res.status(400).json({ error: 'No image provided' });11 }1213 const request = {14 image: { content: imageBase64 },15 features: [16 { type: 'LABEL_DETECTION', maxResults: 10 },17 { type: 'TEXT_DETECTION', maxResults: 5 },18 { type: 'OBJECT_LOCALIZATION', maxResults: 5 },19 ],20 };2122 const [result] = await client.annotateImage(request);2324 const labels = (result.labelAnnotations || []).map(label => ({25 description: label.description,26 score: Math.round(label.score * 100),27 }));2829 const objects = (result.localizedObjectAnnotations || []).map(obj => ({30 name: obj.name,31 score: Math.round(obj.score * 100),32 }));3334 const text = result.fullTextAnnotation?.text || '';3536 res.json({ labels, objects, text });37 } catch (error) {38 console.error('Vision API error:', error);39 res.status(500).json({ error: 'Analysis failed' });40 }41};Common mistakes when developing a Custom Image Recognition System in FlutterFlow
Why it's a problem: Sending full-resolution camera images to the Cloud Vision API
How to avoid: Resize images to max 1024px on the longest side before encoding to base64. This reduces size to ~200KB and processing to 1-2 seconds.
Why it's a problem: Calling the Vision API directly from client-side code
How to avoid: Call the Vision API from a Cloud Function that holds the credentials securely. The client only sends the image to your function.
Why it's a problem: Not showing a loading indicator during image analysis
How to avoid: Show a CircularProgressIndicator while the API call is in progress. Disable the analyze button until the result returns.
Best practices
- Resize images to max 1024px before sending to reduce costs and latency
- Call the Vision API through a Cloud Function to keep credentials secure
- Show a loading indicator during the 1-3 second API processing time
- Use on-device ML Kit for text recognition when internet is unavailable
- Display confidence scores as progress bars for visual interpretation
- Cache recognition results to avoid re-analyzing the same image
- Handle API errors gracefully with retry option and error message
Still stuck?
Copy one of these prompts to get a personalized, step-by-step explanation.
I want to add image recognition to my FlutterFlow app using Google Cloud Vision API via a Cloud Function. Show how to capture an image, resize and encode it as base64, call the function, and display labels with confidence scores. Also include on-device text recognition with ML Kit.
Create a page with an image preview area at the top, a Capture/Select button, a processing indicator area, and a ListView below for displaying results.
Frequently asked questions
How much does the Vision API cost?
First 1,000 units/month are free. Beyond that: $1.50 per 1,000 units for LABEL_DETECTION, $1.50 for TEXT_DETECTION. Each feature on each image is one unit.
What types of objects can it recognize?
LABEL_DETECTION identifies thousands of categories: animals, vehicles, food, landscapes, activities, etc. OBJECT_LOCALIZATION provides bounding boxes for specific objects in the image.
Can it read handwritten text?
Yes. The Vision API supports handwritten text recognition (DOCUMENT_TEXT_DETECTION feature). Accuracy varies by handwriting quality.
Is on-device recognition as accurate as cloud?
ML Kit text recognition is good for printed text but less accurate than the cloud API. For label detection, you need the cloud API — ML Kit only does text, face, and barcode locally.
Can I train a custom model for specific objects?
Yes. Google Cloud AutoML Vision lets you train custom models on your data. Deploy the model and call it from the same Cloud Function pattern.
Can RapidDev help with AI image features?
Yes. RapidDev can implement image recognition, custom model training, document scanning, receipt parsing, and visual search.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation