Learn how to prevent hitting usage quotas with LLM calls in n8n by using caching, batching, rate limiting, token optimization, monitoring, and fallback strategies to optimize API usage and maintain workflow efficiency.
Book a call with an Expert
Starting a new venture? Need to upgrade your web app? RapidDev builds application with your growth in mind.
To prevent hitting usage quotas with LLM calls in n8n, implement strategies like caching responses, batching requests, implementing rate limiting, and monitoring your usage. These approaches help optimize API usage and ensure you stay within limits while still leveraging AI capabilities in your workflows.
Comprehensive Guide to Preventing LLM Usage Quota Issues in n8n
Step 1: Understand Your LLM Service Quotas
Before implementing any prevention strategies, it's essential to understand the quota limitations of the LLM service you're using with n8n:
For example, OpenAI has different rate limits based on your tier:
// OpenAI typical limits example
Free tier: 3 RPM, 200 RPD (requests per day)
Tier 1 paid: 60 RPM, 10,000 RPD
Tier 2 paid: 3,500 RPM, customizable daily limits
Step 2: Implement Caching for Repeated Queries
One of the most effective ways to reduce LLM API calls is to cache responses for identical or similar requests:
Here's how to implement basic caching in n8n:
// Example using Function node for caching
// This checks if we have a cached response before making an LLM API call
let cacheKey = `llm-response-${input.item.query.trim()}`;
// Try to get from cache first
const cachedResponse = $workflow.variables.getCache(cacheKey);
if (cachedResponse) {
// Use cached response if available
return {
json: {
response: cachedResponse,
source: 'cache'
}
};
}
// If not in cache, prepare to make the actual API call
return {
json: {
query: input.item.query,
needsApiCall: true
}
};
Then, in a subsequent node after your LLM API call:
// After getting the response from the LLM API
// Store the result in cache for future use
const responseText = $input.item.json.response;
const query = $input.item.json.query;
const cacheKey = `llm-response-${query.trim()}`;
// Cache for 24 hours (86400000 ms)
$workflow.variables.setCache(cacheKey, responseText, 86400000);
return $input.item;
Step 3: Batch Process Requests
Instead of making individual API calls for each item, batch them together:
Implementation in n8n:
// Example function to batch requests
// This collects items and only processes them when a certain batch size is reached
// In a Function node
let batchSize = 10; // Adjust based on your needs
let currentBatch = $workflow.variables.currentBatch || [];
// Add current item to batch
currentBatch.push($input.item.json.query);
$workflow.variables.currentBatch = currentBatch;
// Check if we've reached batch size
if (currentBatch.length >= batchSize) {
// Process the batch
return {
json: {
queries: currentBatch,
processBatch: true
}
};
} else {
// Skip processing until we have enough items
return {
json: {
message: `Added to batch (${currentBatch.length}/${batchSize})`,
processBatch: false
}
};
}
Then, in your LLM integration node, you can send all queries at once or use a more efficient approach.
Step 4: Implement Rate Limiting
Create your own rate limiting system to stay within provider limits:
Here's how to implement basic rate limiting:
// Example rate limiting implementation in a Function node
// Configuration
const MAX_REQUESTS_PER\_MINUTE = 50; // Adjust based on your LLM provider limits
const MINUTE_IN_MS = 60 \* 1000;
// Get or initialize rate limiting data
let rateLimitData = $workflow.variables.rateLimitData || {
requestTimes: [],
lastResetTime: Date.now()
};
// Clean up old request times (older than 1 minute)
const now = Date.now();
rateLimitData.requestTimes = rateLimitData.requestTimes.filter(time => now - time < MINUTE_IN_MS);
// Check if we need to wait
if (rateLimitData.requestTimes.length >= MAX_REQUESTS_PER\_MINUTE) {
// Calculate time to wait until oldest request is more than 1 minute old
const oldestRequest = rateLimitData.requestTimes[0];
const timeToWait = MINUTE_IN_MS - (now - oldestRequest);
if (timeToWait > 0) {
// We need to wait - implement delay or reschedule
return {
json: {
status: 'rate\_limited',
waitTime: timeToWait,
message: `Rate limit reached. Need to wait ${timeToWait/1000} seconds.`
}
};
}
}
// Record this request time
rateLimitData.requestTimes.push(now);
$workflow.variables.rateLimitData = rateLimitData;
// Proceed with the request
return {
json: {
status: 'proceed',
requestCount: rateLimitData.requestTimes.length,
canProceed: true
}
};
Step 5: Optimize Token Usage
Reduce token consumption to stay within quotas:
Implementation example:
// Example of optimizing prompts before sending to LLM
// In a Function node before your LLM API call
// Original query from workflow
let query = $input.item.json.query;
// Optimize the query to use fewer tokens
function optimizePrompt(text) {
// Remove redundant phrases
text = text.replace(/please provide me with|can you tell me about|i would like to know|please explain/gi, '');
// Trim excess whitespace
text = text.replace(/\s+/g, ' ').trim();
// Limit length if needed
const MAX\_CHARS = 500;
if (text.length > MAX\_CHARS) {
text = text.substring(0, MAX\_CHARS);
}
return text;
}
const optimizedQuery = optimizePrompt(query);
// Calculate approximate token savings
const originalTokenCount = Math.ceil(query.length / 4); // Rough estimate
const optimizedTokenCount = Math.ceil(optimizedQuery.length / 4); // Rough estimate
const tokenSavings = originalTokenCount - optimizedTokenCount;
return {
json: {
original: query,
optimized: optimizedQuery,
tokenSavings: tokenSavings,
proceed: true
}
};
Step 6: Set Up a Quota Management System
Create a system to track and manage your LLM API usage:
Implementation in n8n:
// Example of a quota management system
// This can be implemented as a separate workflow or reusable function
// Configuration
const DAILY\_QUOTA = 100000; // Tokens per day
const WARNING\_THRESHOLD = 0.8; // 80% of quota
// Get or initialize quota tracking
let quotaData = $workflow.variables.quotaData || {
dailyUsage: 0,
lastResetDay: new Date().toISOString().split('T')[0],
alerts: []
};
// Check if we need to reset the daily counter
const today = new Date().toISOString().split('T')[0];
if (today !== quotaData.lastResetDay) {
quotaData.dailyUsage = 0;
quotaData.lastResetDay = today;
}
// Get the token usage from the current request
const promptTokens = $input.item.json.usage?.prompt\_tokens || 0;
const completionTokens = $input.item.json.usage?.completion\_tokens || 0;
const totalTokens = promptTokens + completionTokens;
// Update usage
quotaData.dailyUsage += totalTokens;
// Check if we're approaching the quota
const usagePercentage = quotaData.dailyUsage / DAILY\_QUOTA;
let status = 'normal';
let message = '';
if (usagePercentage >= 1) {
// Quota exceeded
status = 'quota\_exceeded';
message = `Daily quota exceeded: ${quotaData.dailyUsage}/${DAILY_QUOTA} tokens used`;
// Add to alerts if this is a new alert
if (!quotaData.alerts.includes('quota\_exceeded')) {
quotaData.alerts.push('quota\_exceeded');
}
} else if (usagePercentage >= WARNING\_THRESHOLD) {
// Approaching quota
status = 'warning';
message = `Approaching daily quota: ${Math.round(usagePercentage * 100)}% used`;
// Add to alerts if this is a new alert
if (!quotaData.alerts.includes('warning')) {
quotaData.alerts.push('warning');
}
}
// Save updated quota data
$workflow.variables.quotaData = quotaData;
return {
json: {
...($input.item.json),
quotaStatus: status,
quotaMessage: message,
dailyUsage: quotaData.dailyUsage,
remainingQuota: DAILY\_QUOTA - quotaData.dailyUsage
}
};
Step 7: Implement Fallback Mechanisms
Create fallback systems for when quotas are reached:
Here's a fallback implementation:
// Example fallback system when quotas are reached
// This can be implemented in an IF node's condition
// In Function node (preparing fallback logic)
let quotaStatus = $input.item.json.quotaStatus || 'normal';
let needsFallback = quotaStatus === 'quota\_exceeded';
// Determine what fallback to use
if (needsFallback) {
// Options for fallback
const fallbackOptions = [
{
type: 'cached\_response',
priority: 1
},
{
type: 'simpler\_model',
priority: 2
},
{
type: 'queue_for_later',
priority: 3
},
{
type: 'static\_response',
priority: 4
}
];
// Choose the highest priority fallback
const fallback = fallbackOptions[0];
return {
json: {
original: $input.item.json,
needsFallback: true,
fallbackType: fallback.type
}
};
} else {
// No fallback needed
return {
json: {
...$input.item.json,
needsFallback: false
}
};
}
Then implement the fallback handler:
// Example of handling different fallback types
// This would be in a Function node after determining fallback is needed
const fallbackType = $input.item.json.fallbackType;
switch(fallbackType) {
case 'cached\_response':
// Try to find a similar cached response
const query = $input.item.json.original.query;
const similarityThreshold = 0.8;
// This would require implementing a similarity search
// through your cache, which could be complex
return {
json: {
response: "I'm sorry, but we've reached our usage limit. Here's a similar response from our cache...",
fallbackUsed: fallbackType
}
};
case 'simpler\_model':
// Use a less token-intensive model
// This would require implementing a call to a different model
return {
json: {
useModel: 'text-ada-001', // Example of a simpler model
fallbackUsed: fallbackType
}
};
case 'queue_for_later':
// Queue the request for processing later
// Store in database or queue system
return {
json: {
response: "We've reached our limit. Your request has been queued and will be processed later.",
queued: true,
fallbackUsed: fallbackType
}
};
case 'static\_response':
default:
// Provide a static fallback response
return {
json: {
response: "I'm sorry, but we've reached our usage limit for AI responses. Please try again later.",
fallbackUsed: fallbackType
}
};
}
Step 8: Monitor and Analyze Usage Patterns
Set up a monitoring system to track and analyze your LLM usage:
Implementation:
// Example logging system for monitoring LLM usage
// This can be added to any workflow that uses LLM services
// In a Function node after LLM API call
const timestamp = new Date().toISOString();
const model = $input.item.json.model || 'unknown';
const promptTokens = $input.item.json.usage?.prompt\_tokens || 0;
const completionTokens = $input.item.json.usage?.completion\_tokens || 0;
const totalTokens = promptTokens + completionTokens;
const workflowName = $workflow.name;
const workflowId = $workflow.id;
// Create log entry
const logEntry = {
timestamp,
model,
promptTokens,
completionTokens,
totalTokens,
workflowName,
workflowId,
success: $input.item.json.error ? false : true,
errorMessage: $input.item.json.error || null
};
// Get existing logs or initialize
let usageLogs = $workflow.variables.usageLogs || [];
// Add new log entry
usageLogs.push(logEntry);
// Keep only the last 1000 entries to avoid memory issues
if (usageLogs.length > 1000) {
usageLogs = usageLogs.slice(usageLogs.length - 1000);
}
// Save updated logs
$workflow.variables.usageLogs = usageLogs;
// You could send this data to an external analytics system here
// For example, to a database, monitoring tool, or custom dashboard
return {
json: {
...$input.item.json,
logged: true,
logEntry
}
};
Step 9: Distribute Load Across Multiple Providers
Use multiple LLM providers to distribute your usage:
Implementation example:
// Example of load balancing across multiple LLM providers
// This would be implemented in a Function node before making API calls
// Define available providers with their quotas and current usage
const providers = [
{
name: 'openai',
models: ['gpt-3.5-turbo', 'gpt-4'],
dailyQuota: 100000,
usedToday: $workflow.variables.openaiUsed || 0,
costPerToken: 0.002,
available: true
},
{
name: 'anthropic',
models: ['claude-instant', 'claude-2'],
dailyQuota: 80000,
usedToday: $workflow.variables.anthropicUsed || 0,
costPerToken: 0.0025,
available: true
},
{
name: 'google',
models: ['palm', 'gemini'],
dailyQuota: 50000,
usedToday: $workflow.variables.googleUsed || 0,
costPerToken: 0.001,
available: true
}
];
// Filter only available providers with remaining quota
const availableProviders = providers.filter(p =>
p.available && p.usedToday < p.dailyQuota
);
if (availableProviders.length === 0) {
// No providers available, implement fallback
return {
json: {
error: true,
message: "All providers have reached their quotas",
needsFallback: true
}
};
}
// Different strategies for selecting a provider
const selectionStrategy = 'optimal'; // Options: 'round-robin', 'cost', 'optimal', 'quota-based'
let selectedProvider;
switch(selectionStrategy) {
case 'round-robin':
// Simple round-robin
const lastIndex = $workflow.variables.lastProviderIndex || 0;
const nextIndex = (lastIndex + 1) % availableProviders.length;
selectedProvider = availableProviders[nextIndex];
$workflow.variables.lastProviderIndex = nextIndex;
break;
case 'cost':
// Select the cheapest provider
selectedProvider = availableProviders.reduce((min, p) =>
p.costPerToken < min.costPerToken ? p : min, availableProviders[0]);
break;
case 'quota-based':
// Select provider with most remaining quota
selectedProvider = availableProviders.reduce((max, p) =>
(p.dailyQuota - p.usedToday) > (max.dailyQuota - max.usedToday) ? p : max,
availableProviders[0]);
break;
case 'optimal':
default:
// Weighted selection based on multiple factors
selectedProvider = availableProviders.reduce((best, p) => {
// Calculate a score based on remaining quota percentage and cost
const remainingQuotaPercent = (p.dailyQuota - p.usedToday) / p.dailyQuota;
const score = remainingQuotaPercent \* (1 / p.costPerToken);
if (!best.score || score > best.score) {
return {...p, score};
}
return best;
}, {score: 0});
break;
}
// Return the selected provider
return {
json: {
...$input.item.json,
provider: selectedProvider.name,
model: selectedProvider.models[0], // Select appropriate model
useProvider: true
}
};
Step 10: Implement Workflow Throttling and Scheduling
Control when and how often your workflows run:
Implementation:
// Example of workflow throttling logic
// This can be used at the beginning of workflows that use LLM services
// Configuration
const MAX_DAILY_EXECUTIONS = 50; // Maximum times this workflow can run per day
const PRIORITY\_LEVEL = 2; // 1=high, 2=medium, 3=low
// Get workflow execution data
let executionData = $workflow.variables.executionData || {
dailyCount: 0,
lastResetDate: new Date().toISOString().split('T')[0],
lastExecutionTime: null
};
// Reset counter if it's a new day
const today = new Date().toISOString().split('T')[0];
if (today !== executionData.lastResetDate) {
executionData.dailyCount = 0;
executionData.lastResetDate = today;
}
// Check if we've hit the daily execution limit
if (executionData.dailyCount >= MAX_DAILY_EXECUTIONS) {
return {
json: {
error: true,
message: `Daily workflow execution limit reached (${executionData.dailyCount}/${MAX_DAILY_EXECUTIONS})`,
canProceed: false
}
};
}
// Implement dynamic delay based on priority and previous execution
let delayNeeded = 0;
const now = Date.now();
if (executionData.lastExecutionTime) {
const timeSinceLastExecution = now - new Date(executionData.lastExecutionTime).getTime();
// Define minimum delays based on priority (in milliseconds)
const minimumDelays = {
1: 1000, // High priority: 1 second
2: 5000, // Medium priority: 5 seconds
3: 30000 // Low priority: 30 seconds
};
const minimumDelay = minimumDelays[PRIORITY\_LEVEL] || 5000;
if (timeSinceLastExecution < minimumDelay) {
delayNeeded = minimumDelay - timeSinceLastExecution;
}
}
if (delayNeeded > 0) {
return {
json: {
needsDelay: true,
delayMs: delayNeeded,
message: `Throttling workflow execution. Please wait ${delayNeeded/1000} seconds.`,
canProceed: false
}
};
}
// Update execution data
executionData.dailyCount++;
executionData.lastExecutionTime = new Date().toISOString();
$workflow.variables.executionData = executionData;
// Proceed with workflow
return {
json: {
executionCount: executionData.dailyCount,
priority: PRIORITY\_LEVEL,
canProceed: true
}
};
Step 11: Optimize Context Window Usage
Manage the context window efficiently to reduce token usage:
Implementation:
// Example of context window management
// This optimizes context for multi-turn conversations
// Get conversation history
let conversationHistory = $workflow.variables.conversationHistory || [];
const MAX_HISTORY_TURNS = 5;
const MAX_TOKENS_PER\_TURN = 500;
// User's current message
const userMessage = $input.item.json.message;
// Add user message to history
conversationHistory.push({
role: "user",
content: userMessage
});
// Optimize context if we have more than MAX_HISTORY_TURNS
if (conversationHistory.length > MAX_HISTORY_TURNS _ 2) { // _2 because each turn has user+assistant
// Method 1: Simple truncation - keep only recent turns
conversationHistory = conversationHistory.slice(-MAX_HISTORY_TURNS \* 2);
// Method 2: Summarize older turns (pseudocode)
// This would require an actual LLM call to summarize
/\*
const oldTurns = conversationHistory.slice(0, -MAX_HISTORY_TURNS \* 2);
const summarizationPrompt = `Summarize this conversation concisely: ${JSON.stringify(oldTurns)}`;
// Make LLM call to summarize old turns
// Replace old turns with summary
conversationHistory = [
{
role: "system",
content: "Previous conversation summary: " + summary
},
...conversationHistory.slice(-MAX_HISTORY_TURNS \* 2)
];
\*/
}
// Truncate long messages to reduce token count
conversationHistory = conversationHistory.map(msg => {
if (msg.content.length > MAX_TOKENS_PER\_TURN \* 4) { // Rough char to token conversion
return {
role: msg.role,
content: msg.content.substring(0, MAX_TOKENS_PER\_TURN \* 4) + "... [truncated]"
};
}
return msg;
});
// Save updated history
$workflow.variables.conversationHistory = conversationHistory;
// Prepare optimized context for LLM call
return {
json: {
messages: conversationHistory,
optimizedContext: true
}
};
Step 12: Use Code-Based Self-Hosted Alternatives
Consider using local models to avoid API quotas altogether:
Implementation:
// Example of integrating with a self-hosted LLM via HTTP Request
// This would be set up in an HTTP Request node
// Configuration for local LLM server
const LOCAL_LLM_URL = "http://your-server:8080/v1/completions";
const LOCAL_LLM_MODEL = "llama2-7b"; // Example model name
// Prepare request to local LLM
const requestOptions = {
url: LOCAL_LLM_URL,
method: "POST",
body: {
model: LOCAL_LLM_MODEL,
prompt: $input.item.json.query,
max\_tokens: 500,
temperature: 0.7
},
headers: {
"Content-Type": "application/json"
}
};
// Decision logic to choose between API and local model
function shouldUseLocalModel() {
// Check if we're approaching API quota
const quotaData = $workflow.variables.quotaData || { dailyUsage: 0 };
const QUOTA\_THRESHOLD = 0.8; // 80% of quota
const DAILY\_QUOTA = 100000;
if (quotaData.dailyUsage / DAILY_QUOTA > QUOTA_THRESHOLD) {
return true;
}
// Check if the query is simple enough for local model
const query = $input.item.json.query;
const COMPLEXITY\_THRESHOLD = 100; // Simple measure of complexity
if (query.length < COMPLEXITY\_THRESHOLD) {
return true;
}
return false;
}
// Make the decision
if (shouldUseLocalModel()) {
// Return configuration for local model request
return {
json: {
useLocalModel: true,
requestOptions
}
};
} else {
// Use cloud API instead
return {
json: {
useLocalModel: false,
useCloudApi: true
}
};
}
Step 13: Implement Adaptive Retry Mechanisms
Create smart retry logic for failed API calls:
Implementation:
// Example of adaptive retry logic
// This handles different types of errors with appropriate strategies
// In a Function node after a failed API call
const error = $input.item.json.error || {};
const errorCode = error.code || 'unknown';
const errorMessage = error.message || 'Unknown error';
// Get retry data or initialize
let retryData = $workflow.variables.retryData || {
attemptCount: 0,
lastAttemptTime: null,
errors: []
};
// Add error to history
retryData.errors.push({
timestamp: new Date().toISOString(),
code: errorCode,
message: errorMessage
});
// Keep error history manageable
if (retryData.errors.length > 20) {
retryData.errors = retryData.errors.slice(-20);
}
// Check if we should retry
const MAX\_RETRIES = 5;
let shouldRetry = retryData.attemptCount < MAX\_RETRIES;
let retryDelay = 0;
// Different retry strategies based on error type
switch(errorCode) {
case 'rate_limit_exceeded':
// Rate limit errors - use exponential backoff
retryDelay = Math.pow(2, retryData.attemptCount) \* 1000;
break;
case 'server\_error':
case 'internal_server_error':
// Server errors - linear backoff
retryDelay = 5000 + (retryData.attemptCount \* 2000);
break;
case 'invalid_request_error':
// Invalid request errors - don't retry, fix the request
shouldRetry = false;
break;
case 'quota\_exceeded':
// Quota errors - wait longer or don't retry
if (retryData.attemptCount > 2) {
shouldRetry = false; // Don't keep retrying quota errors
} else {
retryDelay = 60000; // 1 minute
}
break;
default:
// Default strategy - moderate backoff
retryDelay = 3000 + (retryData.attemptCount \* 1000);
break;
}
// Update retry data
retryData.attemptCount++;
retryData.lastAttemptTime = new Date().toISOString();
$workflow.variables.retryData = retryData;
// Determine if we need circuit breaking
const CIRCUIT_BREAKER_THRESHOLD = 3;
const recentErrors = retryData.errors.slice(-CIRCUIT_BREAKER_THRESHOLD);
const uniqueErrorCodes = new Set(recentErrors.map(e => e.code)).size;
// If we're seeing the same error repeatedly, consider circuit breaking
const needsCircuitBreak = recentErrors.length >= CIRCUIT_BREAKER_THRESHOLD && uniqueErrorCodes === 1;
if (needsCircuitBreak) {
shouldRetry = false;
}
// Return retry decision
return {
json: {
originalError: error,
shouldRetry,
retryDelay,
retryCount: retryData.attemptCount,
circuitBroken: needsCircuitBreak,
nextAction: shouldRetry ? 'retry' : 'fallback'
}
};
Step 14: Use Multi-Stage Processing Pipelines
Break complex tasks into smaller steps to optimize token usage:
Implementation:
// Example of a multi-stage processing pipeline
// This breaks a complex task into smaller steps
// Stage 1: Analyze and preprocess the query
function preprocessQuery(query) {
// Determine query type and required processing
const queryTypes = {
summarization: /summarize|summary|summarise|summarization/i,
translation: /translate|translation|convert to/i,
analysis: /analyze|analyse|analysis/i,
creative: /create|write|generate|creative/i,
factual: /what is|who is|when did|where is|how does|fact|information/i
};
// Identify query type
let queryType = 'general';
for (const [type, pattern] of Object.entries(queryTypes)) {
if (pattern.test(query)) {
queryType = type;
break;
}
}
// Extract key elements
const keyTerms = query.match(/\b\w{5,}\b/g) || [];
return {
originalQuery: query,
queryType,
keyTerms,
preprocessed: true
};
}
// Stage 2: Select appropriate model and approach
function selectProcessingStrategy(preprocessedData) {
const queryType = preprocessedData.queryType;
// Define strategies for different query types
const strategies = {
summarization: {
model: 'gpt-3.5-turbo',
systemPrompt: 'Summarize the following text concisely:',
tokenLimit: 500
},
translation: {
model: 'gpt-3.5-turbo',
systemPrompt: 'Translate the following text:',
tokenLimit: 400
},
analysis: {
model: 'gpt-4',
systemPrompt: 'Analyze the following information in detail:',
tokenLimit: 800
},
creative: {
model: 'gpt-4',
systemPrompt: 'Be creative and generate content based on this request:',
tokenLimit: 1000
},
factual: {
model: 'gpt-3.5-turbo',
systemPrompt: 'Provide factual information about:',
tokenLimit: 300
},
general: {
model: 'gpt-3.5-turbo',
systemPrompt: 'Respond to the following query:',
tokenLimit: 400
}
};
return {
...preprocessedData,
strategy: strategies[queryType] || strategies.general,
strategySelected: true
};
}
// Process the input query through the pipeline
const query = $input.item.json.query;
const preprocessed = preprocessQuery(query);
const strategyData = selectProcessingStrategy(preprocessed);
// Return the complete pipeline data
return {
json: {
pipelineData: strategyData,
readyForLLM: true
}
};
Step 15: Implement a Comprehensive Quota Management Dashboard
Create a complete monitoring and management system:
Implementation (conceptual setup for n8n):
// This is a conceptual implementation for a quota dashboard
// You would need to set up a separate workflow to collect and display this data
// 1. Collect data from all workflows
// Create a dedicated workflow that runs periodically and collects quota data
// In the collection workflow:
const allWorkflows = []; // You would fetch this from n8n API
// Initialize central quota storage
let quotaDatabase = $workflow.variables.quotaDatabase || {
providers: {
openai: { dailyUsage: 0, monthlyUsage: 0, quota: 100000, lastReset: null },
anthropic: { dailyUsage: 0, monthlyUsage: 0, quota: 80000, lastReset: null },
google: { dailyUsage: 0, monthlyUsage: 0, quota: 50000, lastReset: null }
},
workflows: {},
history: [],
alerts: []
};
// Reset counters if needed
const today = new Date().toISOString().split('T')[0];
for (const provider in quotaDatabase.providers) {
if (quotaDatabase.providers[provider].lastReset !== today) {
quotaDatabase.providers[provider].dailyUsage = 0;
quotaDatabase.providers[provider].lastReset = today;
// Log the reset
quotaDatabase.history.push({
timestamp: new Date().toISOString(),
event: 'daily\_reset',
provider,
message: `Daily counter reset for ${provider}`
});
}
}
// For each workflow, collect usage data
// This would iterate through all workflows and retrieve their usage data
// Example of how you might process workflow data:
function processWorkflowData(workflowId, workflowName, usageData) {
// Initialize workflow in database if not exists
if (!quotaDatabase.workflows[workflowId]) {
quotaDatabase.workflows[workflowId] = {
name: workflowName,
dailyUsage: 0,
monthlyUsage: 0,
history: []
};
}
// Update workflow usage
const workflow = quotaDatabase.workflows[workflowId];
workflow.dailyUsage += usageData.tokens || 0;
workflow.monthlyUsage += usageData.tokens || 0;
// Add to history
workflow.history.push({
timestamp: new Date().toISOString(),
tokens: usageData.tokens,
provider: usageData.provider,
model: usageData.model
});
// Keep history manageable
if (workflow.history.length > 100) {
workflow.history = workflow.history.slice(-100);
}
// Update provider totals
if (usageData.provider && quotaDatabase.providers[usageData.provider]) {
quotaDatabase.providers[usageData.provider].dailyUsage += usageData.tokens || 0;
quotaDatabase.providers[usageData.provider].monthlyUsage += usageData.tokens || 0;
}
// Check for alerts
checkAlerts(workflowId, usageData.provider);
}
// Alert checking function
function checkAlerts(workflowId, provider) {
const workflow = quotaDatabase.workflows[workflowId];
const providerData = quotaDatabase.providers[provider];
// Alert thresholds
const WORKFLOW_ALERT_THRESHOLD = 10000; // Tokens per day per workflow
const PROVIDER_ALERT_THRESHOLD = 0.8; // 80% of quota
// Check workflow usage
if (workflow.dailyUsage > WORKFLOW_ALERT_THRESHOLD) {
createAlert('workflow_high_usage', `Workflow ${workflow.name} has high usage: ${workflow.dailyUsage} tokens today`);
}
// Check provider usage
if (providerData && providerData.quota > 0) {
const usageRatio = providerData.dailyUsage / providerData.quota;
if (usageRatio > PROVIDER_ALERT_THRESHOLD) {
createAlert('provider_quota_warning', `Provider ${provider} is at ${Math.round(usageRatio * 100)}% of daily quota`);
}
}
}
// Alert creation
function createAlert(type, message) {
// Check if similar alert already exists
const recentAlerts = quotaDatabase.alerts.filter(a =>
a.type === type &&
new Date(a.timestamp).getTime() > Date.now() - 3600000 // Last hour
);
if (recentAlerts.length === 0) {
quotaDatabase.alerts.push({
timestamp: new Date().toISOString(),
type,
message,
status: 'new'
});
// You could send notifications here (email, Slack, etc.)
}
}
// Save updated quota database
$workflow.variables.quotaDatabase = quotaDatabase;
// 2. Create dashboard endpoints
// You would create HTTP endpoints to serve this data to a frontend
// Example of an endpoint that returns the current quota status
function getDashboardData() {
const quotaDatabase = $workflow.variables.quotaDatabase || {};
// Process data for display
const providers = Object.entries(quotaDatabase.providers || {}).map(([name, data]) => {
return {
name,
dailyUsage: data.dailyUsage,
quota: data.quota,
percentUsed: data.quota > 0 ? Math.round((data.dailyUsage / data.quota) \* 100) : 0,
status: data.quota > 0 && data.dailyUsage / data.quota > 0.9 ? 'critical' :
data.quota > 0 && data.dailyUsage / data.quota > 0.7 ? 'warning' : 'normal'
};
});
const workflows = Object.entries(quotaDatabase.workflows || {}).map(([id, data]) => {
return {
id,
name: data.name,
dailyUsage: data.dailyUsage,
monthlyUsage: data.monthlyUsage
};
}).sort((a, b) => b.dailyUsage - a.dailyUsage); // Sort by highest usage
const alerts = (quotaDatabase.alerts || [])
.filter(a => a.status === 'new')
.slice(-10); // Get most recent alerts
return {
providers,
workflows,
alerts,
totals: {
dailyTokens: providers.reduce((sum, p) => sum + p.dailyUsage, 0),
activeWorkflows: workflows.length
},
lastUpdated: new Date().toISOString()
};
}
Conclusion
Managing LLM usage quotas in n8n requires a multi-faceted approach. By implementing the strategies outlined in this guide, you can significantly optimize your LLM API consumption while maintaining the functionality your workflows need.
Key takeaways:
By combining these approaches, you can build robust n8n workflows that make efficient use of LLM capabilities while staying within your quota limits and budget constraints. Remember to continuously monitor and refine your implementation as your usage patterns evolve.
When it comes to serving you, we sweat the little things. That’s why our work makes a big impact.