Skip to main content
RapidDev - Software Development Agency
replit-integrationsStandard API Integration

How to Integrate Replit with Google Cloud AI Platform

To integrate Replit with Google Cloud AI Platform (Vertex AI), create a service account with Vertex AI User permissions, download the JSON key file, store it in Replit Secrets as GOOGLE_CREDENTIALS_JSON, and use the Google Cloud client library to call Vertex AI prediction endpoints. For online prediction, send feature inputs to a deployed model endpoint and receive real-time inference results. Use Reserved VM for persistent model serving and Autoscale for batch-triggered jobs.

What you'll learn

  • How to create a Google Cloud service account with Vertex AI permissions and download JSON credentials
  • How to store service account JSON credentials securely in Replit Secrets
  • How to call Vertex AI online prediction endpoints from Python using the google-cloud-aiplatform library
  • How to call Vertex AI prediction endpoints from Node.js using the REST API with service account tokens
  • How to structure a Replit prediction serving layer that wraps Vertex AI model endpoints
Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Advanced14 min read45 minutesAnalyticsMarch 2026RapidDev Engineering Team
TL;DR

To integrate Replit with Google Cloud AI Platform (Vertex AI), create a service account with Vertex AI User permissions, download the JSON key file, store it in Replit Secrets as GOOGLE_CREDENTIALS_JSON, and use the Google Cloud client library to call Vertex AI prediction endpoints. For online prediction, send feature inputs to a deployed model endpoint and receive real-time inference results. Use Reserved VM for persistent model serving and Autoscale for batch-triggered jobs.

Vertex AI Prediction Endpoints from Replit

Google Cloud AI Platform — now unified under the Vertex AI brand — is Google's enterprise ML operations platform. It provides managed infrastructure for training models, evaluating experiments, and deploying trained models to prediction endpoints that scale automatically. For developers who have already trained and deployed a model on Vertex AI, the challenge is often building the application layer that calls the prediction endpoint and serves results to users. Replit is well-suited for this: a lightweight Python or Node.js server that receives user inputs, transforms them into the feature format your model expects, calls the Vertex AI prediction endpoint, and returns the results.

Vertex AI prediction endpoints expose an HTTP interface that accepts instance data (the input features your model was trained on) and return predictions (the model's output). For tabular models, instances are JSON objects with named feature fields. For image models, instances contain base64-encoded image data. For NLP models, instances contain text strings. The exact schema depends on how your model was deployed.

Authentication to Vertex AI requires a Google Cloud service account — a non-human identity that represents your application. The service account has a JSON key file that contains a private key used to generate short-lived access tokens. This key file must be treated as a highly sensitive secret: anyone who has it can call your Vertex AI endpoints and incur billing charges on your Google Cloud project. Store it in Replit Secrets as a JSON string, never in your source code or repository.

Vertex AI also provides access to Google's pre-built foundation models — Gemini for text, Imagen for images, and others — through the same prediction endpoint interface. If you need to call Gemini or other Google AI models from Replit, the setup process described here is identical.

Integration method

Standard API Integration

Google Cloud AI Platform integrates with Replit through the Vertex AI REST API and Python client library, authenticated by a service account JSON key stored in Replit Secrets. Your Replit server reads the service account credentials, initializes the Vertex AI client, and sends prediction requests to deployed model endpoints. The response contains model predictions that your application can use for real-time inference serving.

Prerequisites

  • A Replit account with a Python or Node.js Repl ready
  • A Google Cloud account with billing enabled and a project with Vertex AI API activated
  • A trained model deployed to a Vertex AI prediction endpoint in your Google Cloud project
  • Service account creation permissions in Google Cloud IAM (typically Project Editor or Owner)
  • Python packages: google-cloud-aiplatform, flask; Node.js packages: axios, express, google-auth-library

Step-by-step guide

1

Create a Service Account and Download Credentials

Google Cloud uses service accounts for machine-to-machine authentication. You need to create a service account, grant it the minimum permissions required to call your Vertex AI endpoint, and download its JSON key file. In the Google Cloud Console, navigate to IAM & Admin → Service Accounts. Click 'Create Service Account'. Give it a name like 'replit-vertex-ai' and a description. Click 'Create and continue'. On the permissions step, add the 'Vertex AI User' role (roles/aiplatform.user). This role grants permission to call prediction endpoints, list models, and read endpoint metadata. If you only need to call predictions (not train or manage models), 'Vertex AI User' is sufficient — do not use 'Vertex AI Admin' or 'Project Editor' which would grant unnecessary broad access. Click 'Done' to create the service account. Then click on the service account in the list, go to the 'Keys' tab, click 'Add Key' → 'Create new key', and select JSON format. Google Cloud downloads a JSON file to your computer. This file contains the private key for your service account. Also note your Google Cloud project ID (visible in the Console top bar) and your Vertex AI endpoint ID (found under Vertex AI → Endpoints in the Console). You will need both when calling the prediction API.

service-account-structure.json
1// Example structure of a Google Cloud service account JSON key file
2// (Never store this in code — paste the content into Replit Secrets)
3{
4 "type": "service_account",
5 "project_id": "your-project-id",
6 "private_key_id": "key-id",
7 "private_key": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----\n",
8 "client_email": "replit-vertex-ai@your-project-id.iam.gserviceaccount.com",
9 "client_id": "...",
10 "auth_uri": "https://accounts.google.com/o/oauth2/auth",
11 "token_uri": "https://oauth2.googleapis.com/token"
12}

Pro tip: The JSON key file is downloaded once. If you lose it, you must create a new key — you cannot re-download existing keys. Store a copy in a secure password manager immediately after download.

Expected result: A service account exists in Google Cloud IAM with the Vertex AI User role. A JSON key file is downloaded to your local machine. You have your project ID and endpoint ID noted.

2

Store Service Account Credentials in Replit Secrets

The service account JSON key file must be stored as a Replit Secret — never committed to version control or pasted into source code. The JSON key contains a private key that grants access to your Google Cloud project. Click the lock icon (🔒) in the Replit sidebar to open the Secrets panel. Add the following secrets: GOOGLE_CREDENTIALS_JSON: paste the entire contents of your service account JSON key file as a single string. The JSON may contain newlines — ensure the secret value is the complete JSON text. GOOGLE_CLOUD_PROJECT: your Google Cloud project ID (e.g., 'my-project-12345'). VERTEX_AI_ENDPOINT_ID: the numeric ID of your deployed Vertex AI endpoint (found in Vertex AI → Endpoints in Google Cloud Console). VERTEX_AI_LOCATION: the Google Cloud region where your endpoint is deployed (e.g., 'us-central1'). In your server code, read GOOGLE_CREDENTIALS_JSON, parse it as JSON with JSON.parse(process.env.GOOGLE_CREDENTIALS_JSON), and pass it to the Google authentication library. This avoids the need for a credentials file on disk, which would not persist reliably across Replit container restarts. Replit's Secret Scanner monitors for private key patterns. If GOOGLE_CREDENTIALS_JSON is accidentally included in source code, Replit will detect the BEGIN RSA PRIVATE KEY marker and alert you.

check-credentials.py
1# Python: Parse credentials from Replit Secrets
2import os
3import json
4
5credentials_json = os.environ.get('GOOGLE_CREDENTIALS_JSON')
6if not credentials_json:
7 raise ValueError('GOOGLE_CREDENTIALS_JSON not set. Add it in Replit Secrets (lock icon 🔒).')
8
9credentials_dict = json.loads(credentials_json)
10print('Service account:', credentials_dict.get('client_email'))
11print('Project:', os.environ.get('GOOGLE_CLOUD_PROJECT'))
12print('Endpoint:', os.environ.get('VERTEX_AI_ENDPOINT_ID'))
13print('Location:', os.environ.get('VERTEX_AI_LOCATION'))

Pro tip: When pasting the JSON key into Replit Secrets, do not add extra quotes around the value. The secret value should be the raw JSON text starting with '{' — not a string containing JSON.

Expected result: GOOGLE_CREDENTIALS_JSON, GOOGLE_CLOUD_PROJECT, VERTEX_AI_ENDPOINT_ID, and VERTEX_AI_LOCATION are set in Replit Secrets. The verification script prints the service account email without errors.

3

Call Vertex AI Predictions from Python

The google-cloud-aiplatform Python library provides a high-level interface for calling Vertex AI prediction endpoints. Install it in the Replit Shell: pip install google-cloud-aiplatform flask. The library uses the google.oauth2.service_account.Credentials class to authenticate from a credentials dictionary. You parse GOOGLE_CREDENTIALS_JSON from environment variables and create credentials in memory — no temporary file needed. For online prediction, the aiplatform.Endpoint class provides a predict() method that accepts a list of instances (the input data for your model) and returns a Prediction object with the model's output. The exact structure of instances and predictions depends on your model's serving signature. The Flask server below creates a /predict endpoint that accepts POST requests with instance data, calls Vertex AI, and returns predictions. The model_init() function initializes the Vertex AI client once at startup using the service account credentials. IMPORTANT: The predict() call is synchronous and can take 100ms-2 seconds depending on model size and load. For high-traffic APIs, consider adding caching for repeated identical inputs or using Vertex AI's batch prediction for large volumes.

vertex_ai.py
1# vertex_ai.py Vertex AI prediction client for Replit (Python)
2import os
3import json
4from flask import Flask, request, jsonify
5from google.cloud import aiplatform
6from google.oauth2 import service_account
7
8# Load credentials from Replit Secrets
9credentials_dict = json.loads(os.environ['GOOGLE_CREDENTIALS_JSON'])
10credentials = service_account.Credentials.from_service_account_info(
11 credentials_dict,
12 scopes=['https://www.googleapis.com/auth/cloud-platform']
13)
14
15PROJECT_ID = os.environ['GOOGLE_CLOUD_PROJECT']
16LOCATION = os.environ.get('VERTEX_AI_LOCATION', 'us-central1')
17ENDPOINT_ID = os.environ['VERTEX_AI_ENDPOINT_ID']
18
19# Initialize Vertex AI with service account credentials
20aiplatform.init(
21 project=PROJECT_ID,
22 location=LOCATION,
23 credentials=credentials
24)
25
26# Load the endpoint once at startup
27endpoint = aiplatform.Endpoint(
28 endpoint_name=f'projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}'
29)
30
31app = Flask(__name__)
32
33@app.route('/predict', methods=['POST'])
34def predict():
35 """Call Vertex AI online prediction endpoint."""
36 body = request.get_json()
37 instances = body.get('instances')
38
39 if not instances:
40 return jsonify({'error': 'instances field required in request body'}), 400
41
42 if not isinstance(instances, list):
43 instances = [instances] # Wrap single instance in list
44
45 try:
46 prediction = endpoint.predict(instances=instances)
47 return jsonify({
48 'predictions': prediction.predictions,
49 'deployed_model_id': prediction.deployed_model_id,
50 'model_display_name': prediction.model_display_name
51 })
52 except Exception as e:
53 return jsonify({'error': str(e)}), 500
54
55@app.route('/health')
56def health():
57 return jsonify({'status': 'ok', 'project': PROJECT_ID, 'endpoint': ENDPOINT_ID})
58
59if __name__ == '__main__':
60 app.run(host='0.0.0.0', port=3000)
61 print(f'Vertex AI server ready. Endpoint: {ENDPOINT_ID}')

Pro tip: For tabular models, instances are dictionaries with feature names as keys: [{'feature1': 1.5, 'feature2': 'category_A'}]. For text models: [{'content': 'text to classify'}]. Check your model's serving signature in Vertex AI Console for the exact instance schema.

Expected result: POST /predict with {"instances": [{"feature": "value"}]} returns model predictions from Vertex AI. The /health endpoint confirms the project and endpoint IDs are loaded correctly.

4

Call Vertex AI Predictions from Node.js

For Node.js Replit projects, call Vertex AI prediction endpoints using the REST API directly with a Google-issued access token. The google-auth-library package handles token generation from your service account credentials. Install packages in the Replit Shell: npm install google-auth-library axios express. The google-auth-library's GoogleAuth class reads credentials from a JSON object and generates short-lived Bearer tokens. These tokens are passed in the Authorization header of HTTP requests to the Vertex AI REST API. Token generation is handled by the library — it automatically refreshes tokens before they expire. The Vertex AI prediction REST endpoint URL follows this pattern: https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}:predict The request body is a JSON object with an 'instances' array. The response contains a 'predictions' array with one prediction per instance. For production deployments, the Node.js version below is equivalent to the Python version — choose based on your project's language. Both authenticate using the same service account credentials stored in GOOGLE_CREDENTIALS_JSON.

vertex_ai.js
1// vertex_ai.js — Vertex AI prediction client for Replit (Node.js)
2const { GoogleAuth } = require('google-auth-library');
3const axios = require('axios');
4const express = require('express');
5
6const credentialsJson = process.env.GOOGLE_CREDENTIALS_JSON;
7if (!credentialsJson) throw new Error('GOOGLE_CREDENTIALS_JSON not set. Add it in Replit Secrets (lock icon 🔒).');
8
9const credentials = JSON.parse(credentialsJson);
10const PROJECT_ID = process.env.GOOGLE_CLOUD_PROJECT;
11const LOCATION = process.env.VERTEX_AI_LOCATION || 'us-central1';
12const ENDPOINT_ID = process.env.VERTEX_AI_ENDPOINT_ID;
13
14// Initialize Google Auth client
15const auth = new GoogleAuth({
16 credentials,
17 scopes: 'https://www.googleapis.com/auth/cloud-platform'
18});
19
20const PREDICTION_URL = `https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:predict`;
21
22async function predictVertexAI(instances) {
23 const client = await auth.getClient();
24 const token = await client.getAccessToken();
25
26 const response = await axios.post(PREDICTION_URL, { instances }, {
27 headers: {
28 Authorization: `Bearer ${token.token}`,
29 'Content-Type': 'application/json'
30 }
31 });
32
33 return response.data;
34}
35
36const app = express();
37app.use(express.json());
38
39app.post('/predict', async (req, res) => {
40 let { instances } = req.body;
41 if (!instances) return res.status(400).json({ error: 'instances required' });
42 if (!Array.isArray(instances)) instances = [instances];
43
44 try {
45 const result = await predictVertexAI(instances);
46 res.json(result);
47 } catch (err) {
48 const status = err.response?.status || 500;
49 res.status(status).json({ error: err.response?.data?.error || err.message });
50 }
51});
52
53app.get('/health', (req, res) => {
54 res.json({ status: 'ok', project: PROJECT_ID, endpoint: ENDPOINT_ID, location: LOCATION });
55});
56
57app.listen(3000, '0.0.0.0', () => {
58 console.log(`Vertex AI Node.js server running. Endpoint: ${ENDPOINT_ID}`);
59});

Pro tip: The GoogleAuth client automatically caches and refreshes access tokens. You do not need to manage token expiry manually — each call to client.getAccessToken() returns a valid token, refreshing it transparently when needed.

Expected result: POST /predict with instances array returns Vertex AI predictions. The Node.js server authenticates with the service account credentials and calls the correct regional Vertex AI endpoint.

Common use cases

Real-Time Model Prediction API

Build an Express or Flask server on Replit that acts as a prediction API: receive feature data from your app's frontend, transform it into the format your Vertex AI model expects, call the prediction endpoint, and return the result. This pattern decouples your application from Vertex AI and allows you to add caching, feature engineering, and response transformation.

Replit Prompt

Build a Flask prediction API that accepts JSON feature data from POST requests, calls a Vertex AI endpoint with the features, and returns the model's predicted class and confidence score. Store the service account JSON in Replit Secrets.

Copy this prompt to try it in Replit

Batch Inference Trigger

Create a Replit server that receives a list of items (product descriptions, customer records, images) and sends them to Vertex AI in batches, collecting predictions for each item. This is useful for triggering inference on new data uploads without requiring a continuously running batch job.

Replit Prompt

Build an API endpoint that accepts a list of text descriptions, sends them to a Vertex AI text classification endpoint, and returns a list of predictions with confidence scores for each item.

Copy this prompt to try it in Replit

ML Model Monitoring Dashboard

Build a dashboard that queries Vertex AI model monitoring metrics and prediction logs to track model performance over time. Your Replit server calls the Vertex AI API to fetch recent prediction requests, check for data drift alerts, and surface model health metrics in a custom UI.

Replit Prompt

Create a Node.js server that calls the Vertex AI API to list recent model evaluation metrics and prediction statistics for a deployed endpoint, then returns them as a JSON summary for a monitoring dashboard.

Copy this prompt to try it in Replit

Troubleshooting

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials

Cause: The google-cloud-aiplatform library is looking for Application Default Credentials (ADC) from a file path or environment variable, but GOOGLE_APPLICATION_CREDENTIALS environment variable is not set to a file path. Replit requires passing credentials as a dictionary from the Secret, not as a file.

Solution: Explicitly pass credentials to aiplatform.init() and the Endpoint constructor using service_account.Credentials.from_service_account_info(). Do not set GOOGLE_APPLICATION_CREDENTIALS to a file path — instead parse GOOGLE_CREDENTIALS_JSON and pass the resulting credentials object directly.

typescript
1# Correct pattern: parse JSON from env, create credentials object
2import json
3from google.oauth2 import service_account
4from google.cloud import aiplatform
5
6creds = service_account.Credentials.from_service_account_info(
7 json.loads(os.environ['GOOGLE_CREDENTIALS_JSON']),
8 scopes=['https://www.googleapis.com/auth/cloud-platform']
9)
10aiplatform.init(project=PROJECT_ID, location=LOCATION, credentials=creds)

403 PermissionDenied: Permission 'aiplatform.endpoints.predict' denied

Cause: The service account does not have the Vertex AI User role, or the role was granted at the wrong level (e.g., on the service account itself rather than on the project).

Solution: In Google Cloud Console, go to IAM & Admin → IAM. Find your service account's email. Click the pencil icon to edit permissions. Verify 'Vertex AI User' (roles/aiplatform.user) is listed under Roles for the service account at the project level. If missing, add it and wait 1-2 minutes for IAM changes to propagate.

JSON parse error or unexpected token when loading GOOGLE_CREDENTIALS_JSON

Cause: The service account JSON was copied into Replit Secrets with extra quotes, truncation, or encoding issues. The JSON must be valid and complete — including the private key with its newlines preserved.

Solution: In Replit Secrets, click the GOOGLE_CREDENTIALS_JSON entry to edit. Delete the current value and re-paste the entire contents of your downloaded .json key file. Verify the value starts with '{"type":"service_account"' and ends with '}'. Test parsing in Replit Shell with: node -e "JSON.parse(process.env.GOOGLE_CREDENTIALS_JSON); console.log('OK')"

typescript
1# Test JSON parsing in Python Shell
2import os, json
3try:
4 creds = json.loads(os.environ['GOOGLE_CREDENTIALS_JSON'])
5 print('JSON valid. Service account:', creds.get('client_email'))
6except json.JSONDecodeError as e:
7 print('JSON invalid:', e)
8 print('First 100 chars:', os.environ.get('GOOGLE_CREDENTIALS_JSON', '')[:100])

Prediction returns 404 Not Found for the endpoint URL

Cause: The VERTEX_AI_ENDPOINT_ID or VERTEX_AI_LOCATION is incorrect. Vertex AI endpoint URLs are region-specific — an endpoint in us-central1 cannot be reached via the europe-west1 URL.

Solution: In Google Cloud Console, go to Vertex AI → Endpoints and click your endpoint. The detail page shows the full endpoint name including the correct project ID and location. Copy the location exactly (e.g., 'us-central1') and the numeric endpoint ID from the URL or the endpoint details panel.

typescript
1# Print the constructed URL to verify it is correct
2PREDICTION_URL = f'https://{LOCATION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}:predict'
3print('Prediction URL:', PREDICTION_URL)
4# Expected: https://us-central1-aiplatform.googleapis.com/v1/projects/my-project/locations/us-central1/endpoints/123456789:predict

Best practices

  • Store GOOGLE_CREDENTIALS_JSON in Replit Secrets (lock icon 🔒) as the complete JSON string — never write service account credentials to a file in your Replit project
  • Grant the service account only the Vertex AI User role, not broader Project Editor or Owner roles — follow the principle of least privilege
  • Parse credentials in memory using service_account.Credentials.from_service_account_info() rather than writing to a temp file, which may persist unexpectedly in Replit containers
  • Initialize the Vertex AI client and endpoint object once at server startup rather than on each request to avoid repeated authentication overhead
  • Add input validation before calling Vertex AI — verify instance schema, data types, and required fields match your model's serving signature to prevent prediction errors
  • Cache prediction results for identical inputs using a short TTL (30-60 seconds) to reduce Vertex AI calls for repeated queries
  • Deploy as Reserved VM for services that make frequent Vertex AI calls — Autoscale cold starts add 2-5 seconds latency on the first request after idle periods
  • Monitor Vertex AI prediction costs in Google Cloud Console — online prediction is billed per node hour for the deployed endpoint, even when idle

Alternatives

Frequently asked questions

How do I connect Replit to Google Cloud Vertex AI?

Create a Google Cloud service account with the Vertex AI User role, download its JSON key file, and paste the entire JSON into Replit Secrets (lock icon 🔒) as GOOGLE_CREDENTIALS_JSON. In Python, use google-cloud-aiplatform and pass credentials via service_account.Credentials.from_service_account_info(). In Node.js, use google-auth-library to generate Bearer tokens for REST API calls.

How do I securely store Google service account credentials in Replit?

Click the lock icon (🔒) in the Replit sidebar, add a secret named GOOGLE_CREDENTIALS_JSON, and paste the complete JSON key file contents as the value. Access it in Python with json.loads(os.environ['GOOGLE_CREDENTIALS_JSON']) and in Node.js with JSON.parse(process.env.GOOGLE_CREDENTIALS_JSON). Never write the JSON to a file or hardcode it in source code.

Can I call Google Gemini or other foundation models from Replit using this method?

Yes. Google Gemini, Imagen, and other Vertex AI foundation models use the same service account authentication and prediction endpoint pattern. The main difference is the endpoint URL — foundation models use publisher model endpoints rather than custom-trained model endpoints. The google-cloud-aiplatform library has GenerativeModel and ImageGenerationModel classes that wrap this.

What deployment type should I use for a Vertex AI integration on Replit?

Use Reserved VM for services that handle frequent real-time predictions — Vertex AI cold starts add latency, and Autoscale's container warm-up adds additional delay on the first request after idle. If your prediction service is low-traffic and latency tolerance is high (several seconds is acceptable), Autoscale works and is more cost-effective.

Why is my Vertex AI endpoint returning 403 even though the service account has the right role?

IAM permission changes take 1-2 minutes to propagate. If you just added the Vertex AI User role, wait a moment and retry. Also verify the role was added at the project level (not just on the service account itself) — go to IAM & Admin → IAM, find the service account email, and confirm 'Vertex AI User' appears in the roles column.

RapidDev

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your project.

Book a free consultation

Need help with your project?

Our experts have built 600+ apps and can accelerate your development. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.