To use TensorFlow in Replit, add tensorflow or tensorflow-cpu to your Python package dependencies via the Packages panel, or configure system-level dependencies in replit.nix for native libraries. TensorFlow.js can be used in Node.js Repls without extra Nix setup. Be aware of Replit's storage and memory limits — large model files require Replit Core or external model storage.
Run TensorFlow Models in Replit for ML Inference
TensorFlow is the most widely used deep learning framework in production environments, and it can run directly inside Replit projects — without any external GPU servers or managed ML platforms. While Replit is not optimized for training large models from scratch (the free tier has only 0.5 vCPU and 512MB RAM), it is well-suited for loading pre-trained models and serving inference predictions via an API endpoint. This covers the majority of real-world ML use cases: image classification, text analysis, sentiment scoring, object detection, and recommendation scoring.
Python TensorFlow requires some additional setup compared to pure-Python packages because it depends on native C++ libraries that need system-level dependencies. Replit's Nix-based environment manager handles this through the replit.nix file, where you can specify system packages like libstdc++. For simpler use cases, tensorflow-cpu (the CPU-only variant) has fewer dependencies and installs more reliably than the full tensorflow package in constrained environments.
TensorFlow.js takes a different approach — it runs entirely in JavaScript/Node.js without native compilation, making it much easier to set up in Replit for Node.js projects. If your use case involves running standard models (MobileNet, COCO-SSD, PoseNet) for image or audio processing, TensorFlow.js with pre-converted models is often the fastest path to a working prototype.
This tutorial covers TensorFlow setup for both Python and Node.js Repls, model loading patterns, building a prediction API endpoint, and managing model storage size — a critical consideration in Replit's environment.
Integration method
TensorFlow runs locally within your Replit project — it is a library, not an external API. Python TensorFlow requires configuring the Nix environment for native dependencies. TensorFlow.js works directly in Node.js Repls. Both approaches let you load pre-trained models, run inference, and serve predictions via a Flask or Express API endpoint. Model files must be managed carefully given Replit's storage limits.
Prerequisites
- A Replit account — Replit Core ($25/mo) is strongly recommended for TensorFlow due to the 8GB RAM and 4 vCPU compared to the free tier's 512MB RAM
- Basic Python or JavaScript knowledge for writing inference code
- Understanding of TensorFlow/Keras model formats (SavedModel, .h5, .tflite, TFJS format)
- A pre-trained model file or knowledge of which TensorFlow Hub or Keras Applications model you want to use
- Familiarity with Flask (Python) or Express (Node.js) for building the API endpoint
Step-by-step guide
Choose Your TensorFlow Variant and Configure the Environment
Choose Your TensorFlow Variant and Configure the Environment
The first decision is whether to use TensorFlow for Python or TensorFlow.js for Node.js. For production inference workloads with Keras models (.h5, SavedModel format), Python TensorFlow is the standard choice. For browser-based inference or lighter-weight JavaScript workflows, TensorFlow.js is easier to set up in Replit. For Python TensorFlow, start a new Python Repl. Click the Packages icon (cube) in the left sidebar and search for tensorflow-cpu. Installing tensorflow (the full version with GPU support) in Replit's CPU-only environment wastes disk space — tensorflow-cpu has the same Python API for inference on CPU hardware. Click the + button to add it to your dependencies. If you encounter installation errors related to missing system libraries (libstdc++, glibc version mismatches), you need to edit the replit.nix file. This file controls system-level package installation via the Nix package manager. Common additions for TensorFlow are shown in the code snippet below. For TensorFlow.js, create a Node.js Repl and install @tensorflow/tfjs-node via the Packages panel. This version uses native Node.js bindings for faster inference than the pure-JS version. If native bindings fail to install, fall back to @tensorflow/tfjs which is the pure-JavaScript version that runs on any Node.js version.
1# replit.nix — add system dependencies for Python TensorFlow2# Edit this file in your Replit project if tensorflow-cpu fails to install3{ pkgs }: {4 deps = [5 pkgs.python3116 pkgs.python311Packages.pip7 pkgs.stdenv.cc.cc.lib # provides libstdc++.so.68 pkgs.zlib9 pkgs.glib10 ];11}Pro tip: Use tensorflow-cpu instead of tensorflow for Replit projects. The GPU-enabled tensorflow package is 500MB+ larger and provides no benefit on Replit's CPU-only machines. tensorflow-cpu has the identical Python API for inference.
Expected result: The tensorflow-cpu (Python) or @tensorflow/tfjs-node (Node.js) package installs successfully without errors. The console output shows the TensorFlow version number confirming successful installation.
Load a Pre-Trained Model
Load a Pre-Trained Model
Replit's free tier limits storage to a few hundred MB and RAM to 512MB, which significantly restricts which models you can load. Large models (BERT, ResNet-50, GPT-2) will crash the Repl with out-of-memory errors. For Replit, choose lightweight models: MobileNetV2 (14MB), EfficientNetB0 (29MB), or DistilBERT (67MB) are good options. For Python TensorFlow, you can download models at runtime using tf.keras.applications (for Keras built-in models) or tensorflow_hub for TensorFlow Hub models. At first run, the model downloads from the internet and is cached in the Replit filesystem. Subsequent runs load from the local cache. Be aware that the cache counts toward your Replit project's storage limit. For very large models, store the model files externally (AWS S3, Backblaze B2, Hugging Face Hub) and download them to /tmp at startup. The /tmp directory is not counted toward your Replit project storage quota, but its contents are wiped when the Repl restarts. For production deployments on Replit Autoscale, download the model to /tmp on each cold start. The code below loads MobileNetV2 from Keras Applications — a well-tested approach that works reliably in Replit's environment.
1# Python: load MobileNetV2 for image classification2import os3os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress TensorFlow startup messages45import tensorflow as tf6from tensorflow.keras.applications import MobileNetV27from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions8from tensorflow.keras.preprocessing import image9import numpy as np1011print(f'TensorFlow version: {tf.__version__}')12print('Loading MobileNetV2 model...')1314# Load pre-trained model (downloads ~14MB on first run, then cached)15model = MobileNetV2(weights='imagenet', include_top=True)16print('Model loaded successfully.')17print(f'Input shape: {model.input_shape}')1819# Test with a dummy image20test_input = np.random.random((1, 224, 224, 3)).astype('float32')21test_input = preprocess_input(test_input)22predictions = model.predict(test_input, verbose=0)23top_preds = decode_predictions(predictions, top=3)[0]2425print('Test predictions (random input):') 26for class_name, label, score in top_preds:27 print(f' {label}: {score:.3f}')28print('Model inference working correctly.')Pro tip: Set TF_CPP_MIN_LOG_LEVEL=3 to suppress TensorFlow's verbose C++ startup logs, which clutter your Replit console output and make debugging harder.
Expected result: The script prints the TensorFlow version, loads MobileNetV2, and runs a test prediction on a random input — confirming the model loads and inference runs without memory errors.
Build a Prediction API with Flask
Build a Prediction API with Flask
With the model loading correctly, wrap it in a Flask API so other applications can request predictions via HTTP. The server loads the model once at startup (not on every request) to avoid the 10-30 second load time on each inference call. Requests send an image as a URL or base64-encoded string, and the server returns predictions as JSON. The Flask server below exposes a POST endpoint at /predict that accepts JSON with an image_url field. The server downloads the image, preprocesses it to the 224x224 input size required by MobileNetV2, runs inference, and returns the top 5 predicted class labels with confidence scores. For production use on Replit Autoscale, use a WSGI server like gunicorn instead of Flask's built-in development server. Add gunicorn to your Packages and set the run command to gunicorn --bind 0.0.0.0:3000 app:app. The built-in Flask development server is single-threaded and not suitable for handling concurrent prediction requests.
1# Flask prediction API for TensorFlow MobileNetV22import os3os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'45from flask import Flask, request, jsonify6import tensorflow as tf7from tensorflow.keras.applications import MobileNetV28from tensorflow.keras.applications.mobilenet_v2 import preprocess_input, decode_predictions9from tensorflow.keras.preprocessing import image10import numpy as np11import requests12from PIL import Image13from io import BytesIO1415app = Flask(__name__)1617# Load model once at startup18print('Loading TensorFlow model...')19model = MobileNetV2(weights='imagenet', include_top=True)20model.compile() # warm up the model21print('Model ready.')2223def load_image_from_url(url):24 """Download and preprocess image for MobileNetV2."""25 response = requests.get(url, timeout=10)26 img = Image.open(BytesIO(response.content)).convert('RGB')27 img = img.resize((224, 224))28 img_array = image.img_to_array(img)29 img_array = np.expand_dims(img_array, axis=0)30 return preprocess_input(img_array)3132@app.route('/predict', methods=['POST'])33def predict():34 data = request.get_json()35 if not data or 'image_url' not in data:36 return jsonify({'error': 'image_url required in request body'}), 4003738 try:39 img_array = load_image_from_url(data['image_url'])40 predictions = model.predict(img_array, verbose=0)41 top_preds = decode_predictions(predictions, top=5)[0]4243 results = [44 {'class': label, 'label': class_name, 'confidence': float(score)}45 for class_name, label, score in top_preds46 ]47 return jsonify({'predictions': results})4849 except Exception as e:50 return jsonify({'error': str(e)}), 5005152@app.route('/health')53def health():54 return jsonify({'status': 'ok', 'model': 'MobileNetV2'})5556if __name__ == '__main__':57 app.run(host='0.0.0.0', port=3000)Pro tip: Load the TensorFlow model once at application startup (outside the request handler), not inside the /predict handler. Loading inside the handler adds 10-30 seconds to every prediction request.
Expected result: The Flask server starts and prints 'Model ready.' Sending a POST request to /predict with an image_url returns a JSON response with 5 predicted class labels and confidence scores.
TensorFlow.js for Node.js Repls
TensorFlow.js for Node.js Repls
If you are working in a Node.js Replit project, TensorFlow.js provides a JavaScript-native TensorFlow experience. It supports loading pre-trained models in the TFJS format (converted from Python TensorFlow SavedModel or Keras .h5 format), as well as models from TensorFlow Hub in TFJS format. Install @tensorflow/tfjs-node via the Packages panel. This uses native Node.js bindings for better performance than the pure-JavaScript version. If @tensorflow/tfjs-node fails to install due to native compilation errors, use @tensorflow/tfjs instead — it is slower but always installs cleanly. The Node.js example below uses @tensorflow-models/mobilenet which is pre-converted to TFJS format and downloads automatically from the TensorFlow CDN at runtime. This is the simplest starting point — no model conversion needed.
1// Node.js TensorFlow.js image classification server2// Install: npm install @tensorflow/tfjs-node @tensorflow-models/mobilenet @tensorflow/tfjs-backend-cpu3const express = require('express');4const tf = require('@tensorflow/tfjs-node');5const mobilenet = require('@tensorflow-models/mobilenet');6const https = require('https');7const app = express();8app.use(express.json());910let model;1112async function loadModel() {13 console.log('Loading MobileNet model...');14 model = await mobilenet.load({ version: 2, alpha: 0.5 });15 console.log('MobileNet model loaded and ready.');16}1718app.post('/predict', async (req, res) => {19 if (!model) return res.status(503).json({ error: 'Model not loaded yet' });20 const { image_url } = req.body;21 if (!image_url) return res.status(400).json({ error: 'image_url required' });2223 try {24 // Download image and convert to tensor25 const response = await fetch(image_url);26 const buffer = Buffer.from(await response.arrayBuffer());27 const tfImage = tf.node.decodeImage(buffer, 3);28 const predictions = await model.classify(tfImage);29 tfImage.dispose(); // free GPU memory30 res.json({ predictions });31 } catch (err) {32 res.status(500).json({ error: err.message });33 }34});3536app.get('/health', (req, res) => res.json({ status: 'ok', model: 'MobileNet TFJS' }));3738loadModel().then(() => {39 app.listen(3000, '0.0.0.0', () => console.log('TensorFlow.js server running on port 3000'));40});Pro tip: Always call tensor.dispose() after using a TensorFlow.js tensor to free memory. TensorFlow.js manages GPU/memory tensors manually in Node.js — not disposing them causes memory leaks that will eventually crash your Repl.
Expected result: The Node.js server starts, downloads the MobileNet TFJS model, and begins accepting /predict requests. The model downloads once (~9MB) and is cached for subsequent runs.
Handle Storage and Memory Limits for Production
Handle Storage and Memory Limits for Production
TensorFlow models range from a few MB (MobileNetV2: 14MB) to several GB (ResNet-152: 232MB, BERT-large: 1.3GB). Replit's free tier project storage is limited and its 512MB RAM cannot accommodate most production-grade models. Replit Core provides 8GB RAM and more generous storage, making it suitable for models up to ~500MB. For large models that exceed Replit's storage or memory, the recommended pattern is to store the model externally (on AWS S3, Hugging Face Hub, or Backblaze B2) and download it to /tmp at server startup. The /tmp directory is not counted toward your project's storage quota. The tradeoff is that /tmp is wiped on restart, so the model re-downloads on each cold start (which takes 30-120 seconds depending on model size). For production deployments, use Replit Reserved VM (not Autoscale) when you have large models. Reserved VM keeps the server running continuously, avoiding repeated cold starts and model re-downloads. Autoscale spins down to zero when idle, which means every restart requires a model re-download. The environment variable pattern below lets your code adapt to both Replit constraints and local development, where the model path might be a local directory.
1# Python: model management with external storage fallback2import os3import boto3 # install: boto34import tensorflow as tf56MODEL_DIR = '/tmp/model'7S3_BUCKET = os.environ.get('MODEL_S3_BUCKET', '')8S3_KEY = os.environ.get('MODEL_S3_KEY', 'mobilenet_v2.keras')910def ensure_model_downloaded():11 """Download model from S3 to /tmp if not already present."""12 if os.path.exists(MODEL_DIR):13 print(f'Model already in {MODEL_DIR}')14 return1516 if S3_BUCKET:17 print(f'Downloading model from S3: s3://{S3_BUCKET}/{S3_KEY}')18 os.makedirs(MODEL_DIR, exist_ok=True)19 s3 = boto3.client(20 's3',21 aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],22 aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY']23 )24 s3.download_file(S3_BUCKET, S3_KEY, f'{MODEL_DIR}/model.keras')25 print('Model downloaded successfully.')26 else:27 # Fallback: use Keras pre-trained model28 print('No S3 bucket configured — using Keras Applications model')2930def load_model():31 ensure_model_downloaded()32 model_path = f'{MODEL_DIR}/model.keras'33 if os.path.exists(model_path):34 return tf.keras.models.load_model(model_path)35 else:36 from tensorflow.keras.applications import MobileNetV237 return MobileNetV2(weights='imagenet')3839model = load_model()Pro tip: Use Replit Reserved VM deployment (not Autoscale) for TensorFlow inference services with large models. Reserved VM keeps the process alive so the model stays loaded in memory, while Autoscale scales to zero and requires re-loading the model on every cold start.
Expected result: The model loader correctly downloads the model from S3 to /tmp if not present, or loads it from the Keras cache — adapting to Replit's storage constraints automatically.
Common use cases
Image Classification API
Load a pre-trained MobileNet or EfficientNet model in Replit and expose it as an HTTP endpoint. Users POST an image URL or base64-encoded image, and the server returns the predicted class labels and confidence scores. This is a common pattern for content moderation, product categorization, or automated image tagging.
Build a Flask API endpoint that accepts a base64-encoded image in a POST request body, runs it through a pre-trained MobileNet TensorFlow model, and returns the top 5 predicted class labels with confidence scores as JSON.
Copy this prompt to try it in Replit
Text Sentiment Scoring Service
Load a fine-tuned BERT or LSTM sentiment model in Replit and serve it as an API. Input text strings are preprocessed, tokenized, and passed through the model to produce sentiment scores (positive/negative/neutral with confidence). This can power real-time feedback analysis, review moderation, or customer support triage.
Create a Python Flask server that loads a TensorFlow text classification model, accepts POST requests with text strings, and returns sentiment scores (positive/negative/neutral) as JSON predictions.
Copy this prompt to try it in Replit
TensorFlow.js In-Browser Model Demo
Use a Replit Node.js + Express server to serve a web page that loads a TensorFlow.js model in the browser (not on the server). The browser does the inference locally using the user's device GPU/CPU. The Replit server provides the HTML, model files, and any server-side data processing, while TensorFlow.js handles the actual ML inference on the client side.
Build a Node.js Express server that serves a web page with TensorFlow.js loaded via CDN. The page runs a pre-trained pose detection model using the device webcam and displays keypoints overlaid on the video feed.
Copy this prompt to try it in Replit
Troubleshooting
MemoryError or Repl crashes with 'Killed' when loading a TensorFlow model
Cause: The TensorFlow model is too large for Replit's free tier 512MB RAM. Most non-trivial models require at least 1-2GB RAM to load. The OS kills the process when it exceeds memory limits.
Solution: Upgrade to Replit Core for 8GB RAM. Alternatively, switch to a smaller model variant (MobileNetV2 instead of ResNet-50, DistilBERT instead of BERT-large). For TensorFlow Lite (.tflite) models, use the TFLite interpreter which has much lower memory overhead than full TensorFlow.
1# Use TFLite interpreter for lower memory usage2import tensorflow as tf3interpreter = tf.lite.Interpreter(model_path='model.tflite')4interpreter.allocate_tensors()5input_details = interpreter.get_input_details()6output_details = interpreter.get_output_details()ImportError: libstdc++.so.6: version 'GLIBCXX_3.4.29' not found
Cause: TensorFlow's native C++ libraries require a newer version of libstdc++ than what Nix provides by default in Replit's environment.
Solution: Edit replit.nix to add pkgs.stdenv.cc.cc.lib as a dependency. This provides the newer libstdc++ version required by TensorFlow. After saving replit.nix, Replit automatically rebuilds the Nix environment.
1# replit.nix2{ pkgs }: {3 deps = [4 pkgs.python3115 pkgs.stdenv.cc.cc.lib6 pkgs.zlib7 pkgs.glib8 ];9}TensorFlow takes 60+ seconds to load on every request
Cause: The TensorFlow model is being loaded inside the request handler function, which re-loads it on every HTTP request. Model loading (downloading weights, building the computation graph) takes 10-60 seconds for most models.
Solution: Load the model at server startup — before the app.run() or app.listen() call — and store it in a module-level variable. The request handler then reads from this pre-loaded model variable for fast inference.
1# Wrong: loads model on every request2@app.route('/predict', methods=['POST'])3def predict():4 model = MobileNetV2(weights='imagenet') # DON'T DO THIS5 ...67# Correct: load once at startup8model = MobileNetV2(weights='imagenet')910@app.route('/predict', methods=['POST'])11def predict():12 # use pre-loaded model variable13 predictions = model.predict(img_array)Replit project storage full after installing TensorFlow or downloading models
Cause: TensorFlow and its dependencies (NumPy, SciPy, h5py, etc.) use 500MB-1GB of disk space. Combined with model weight files, this can fill up free tier project storage (which is limited to a few hundred MB).
Solution: Use Replit Core for more project storage. Download model files to /tmp instead of the project directory — /tmp is not counted toward project storage quota but is cleared on restart. Consider using tensorflow-cpu instead of tensorflow to reduce dependency size.
1import os2# Store model in /tmp to avoid project storage limits3MODEL_CACHE_DIR = '/tmp/tf_models'4os.makedirs(MODEL_CACHE_DIR, exist_ok=True)5os.environ['TFHUB_CACHE_DIR'] = MODEL_CACHE_DIR6os.environ['KERAS_HOME'] = MODEL_CACHE_DIRBest practices
- Use tensorflow-cpu instead of tensorflow in Replit — CPU-only TensorFlow is 60% smaller and has fewer native dependencies while providing identical inference performance on Replit's CPU machines.
- Load models once at application startup, never inside request handlers — model loading takes 10-60 seconds and doing it per-request makes your API unusably slow.
- Store TensorFlow model files in /tmp to avoid filling your Replit project storage quota — /tmp is unmetered but clears on restart, so re-download on startup from S3 or Hugging Face Hub.
- Upgrade to Replit Core for production TensorFlow workloads — 8GB RAM vs 512MB on free tier is the difference between most models loading successfully or crashing.
- Use TFLite (.tflite) models when possible — they have 4-8x lower memory usage than full TensorFlow SavedModels and are optimized for CPU inference.
- Dispose TensorFlow.js tensors explicitly after use (tensor.dispose()) in Node.js to prevent memory leaks that will crash long-running Repls.
- Use Replit Reserved VM for TensorFlow inference servers, not Autoscale — Reserved VM keeps the model loaded in memory continuously, while Autoscale cold starts require re-loading the model each time.
- Set TF_CPP_MIN_LOG_LEVEL=3 to suppress TensorFlow's verbose C++ initialization logs, which clutter your console output during development.
Alternatives
The OpenAI API calls hosted models via HTTP without any local model management or memory concerns, making it far simpler to integrate in Replit for most NLP tasks compared to running TensorFlow locally.
IBM Watson provides hosted NLP and image recognition APIs that call external services rather than running models locally, eliminating TensorFlow's memory and storage requirements in Replit.
Vertex AI hosts and serves TensorFlow models as managed prediction endpoints, so your Replit server just makes HTTP calls to get predictions without needing to manage TensorFlow locally.
Frequently asked questions
Can TensorFlow run in Replit on the free tier?
Lightweight TensorFlow models (under 100MB, like MobileNetV2) can run on Replit's free tier during development, but the 512MB RAM limit frequently causes crashes when loading anything larger. For reliable TensorFlow inference, upgrade to Replit Core which provides 8GB RAM. The free tier is useful for testing small models and prototyping.
How do I install TensorFlow in Replit?
Click the Packages (cube) icon in the Replit sidebar and search for tensorflow-cpu (recommended) or tensorflow. Click the + button to add it. If you encounter missing native library errors, add system packages to your replit.nix file — specifically pkgs.stdenv.cc.cc.lib for libstdc++. After editing replit.nix, Replit rebuilds the environment automatically.
Does Replit support TensorFlow GPU?
No. Replit's machines do not have GPUs — they are CPU-only. This means TensorFlow runs on CPU inference only. GPU-enabled TensorFlow training is not possible on Replit. For CPU inference workloads (loading a pre-trained model and making predictions), TensorFlow on Replit works well despite the CPU limitation.
How do I use TensorFlow.js instead of Python TensorFlow in Replit?
Create a Node.js Repl and install @tensorflow/tfjs-node via the Packages panel. TensorFlow.js has no native system library dependencies, making it much simpler to install in Replit than Python TensorFlow. You can load pre-trained TFJS format models from TensorFlow Hub or convert Python Keras models to TFJS format using the tensorflowjs Python package.
Where should I store large TensorFlow model files in Replit?
Store large model files in /tmp rather than your project directory. The /tmp directory is not counted toward your Replit project's storage quota. The tradeoff is that /tmp clears on every Repl restart, so your startup code must re-download the model. For production, store models on S3, Backblaze B2, or Hugging Face Hub and download to /tmp at startup.
Talk to an Expert
Our team has built 600+ apps. Get personalized help with your project.
Book a free consultation