Learn how to prevent duplicate LLM calls in n8n workflows by implementing caching with Function nodes, workflow data, or external databases to save time and reduce API costs.
Book a call with an Expert
Starting a new venture? Need to upgrade your web app? RapidDev builds application with your growth in mind.
To prevent duplicate LLM calls in n8n workflows, you can implement caching mechanisms using Function nodes, the n8n Credentials & Workflow Data feature, or external databases. This allows you to store previously processed inputs and their corresponding outputs, checking for existing results before making new API calls to LLMs like GPT or Claude, thus saving time and API costs.
Comprehensive Guide: Preventing Duplicate LLM Calls in n8n Workflows
Step 1: Understanding the Problem
When working with Large Language Models (LLMs) like OpenAI's GPT or Anthropic's Claude in n8n workflows, you might encounter situations where the same input is sent to the LLM multiple times, resulting in:
This guide presents multiple approaches to implement caching mechanisms that prevent duplicate LLM calls by storing previous inputs and their corresponding outputs.
Step 2: Setting Up a Basic Caching System with Function Nodes
The simplest approach is to use a Function node to create an in-memory cache:
// Initialize cache if it doesn't exist
if (!$workflow.cache) {
$workflow.cache = {};
}
// Get the input text that would be sent to the LLM
const inputText = items[0].json.inputText;
// Check if we already have a result for this input
if ($workflow.cache[inputText]) {
// Return cached result
items[0].json.llmResponse = $workflow.cache[inputText];
return items;
}
// If not in cache, let the workflow continue to the LLM
return items;
After the LLM node, add another Function node to save the result:
// Initialize cache if it doesn't exist
if (!$workflow.cache) {
$workflow.cache = {};
}
// Save the result to the cache
const inputText = items[0].json.inputText;
const llmResponse = items[0].json.llmResponse;
$workflow.cache[inputText] = llmResponse;
return items;
Step 3: Implementing a More Robust Cache with Workflow Data
For persistence between workflow runs, use n8n's Credentials & Workflow Data feature:
Step 3.1: Enable workflow data storage:
Step 3.2: Create a Function node to check and retrieve from cache:
// Get input that would be sent to LLM
const inputText = items[0].json.inputText;
// Create a hash of the input to use as a key
// This handles large inputs better than using the raw text
function createHash(text) {
let hash = 0;
for (let i = 0; i < text.length; i++) {
const char = text.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash; // Convert to 32bit integer
}
return hash.toString();
}
const inputHash = createHash(inputText);
// Load the cache from workflow data
const workflowData = await $getWorkflowStaticData("global");
if (!workflowData.llmCache) {
workflowData.llmCache = {};
}
// Check if we have a cached result
if (workflowData.llmCache[inputHash]) {
items[0].json.llmResponse = workflowData.llmCache[inputHash].response;
items[0].json.fromCache = true;
return items;
}
// If not cached, add a flag to process it
items[0].json.fromCache = false;
items[0].json.inputHash = inputHash;
return items;
Step 3.3: Add an IF node to check if the result was found in cache:
Step 3.4: After the LLM node, add a Function node to save to cache:
// Load the current cache
const workflowData = await $getWorkflowStaticData("global");
if (!workflowData.llmCache) {
workflowData.llmCache = {};
}
// Get the input hash and LLM response
const inputHash = items[0].json.inputHash;
const llmResponse = items[0].json.llmResponse;
// Save to cache with timestamp
workflowData.llmCache[inputHash] = {
response: llmResponse,
timestamp: Date.now()
};
// Optional: Limit cache size (keep only the last 100 entries)
const cacheKeys = Object.keys(workflowData.llmCache);
if (cacheKeys.length > 100) {
// Sort by timestamp (oldest first)
cacheKeys.sort((a, b) => workflowData.llmCache[a].timestamp - workflowData.llmCache[b].timestamp);
// Remove oldest entries
for (let i = 0; i < cacheKeys.length - 100; i++) {
delete workflowData.llmCache[cacheKeys[i]];
}
}
return items;
Step 4: Advanced Caching with External Database
For more persistent and scalable caching, use an external database:
Step 4.1: Add a MongoDB node (or your preferred database) to your workflow.
Step 4.2: Create a Function node to check the cache:
// Get the input for the LLM
const inputText = items[0].json.inputText;
// Create a more sophisticated hash
function createHash(text) {
let hash = 0;
for (let i = 0; i < text.length; i++) {
const char = text.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash;
}
return hash.toString();
}
const inputHash = createHash(inputText);
// Pass along the original input and the hash
items[0].json.inputHash = inputHash;
items[0].json.dbQuery = { inputHash: inputHash };
return items;
Step 4.3: Add a MongoDB node configured to find documents:
Step 4.4: Add an IF node to check if a cached result was found:
Step 4.5: If a result was found, add a Function node to extract it:
// Extract the cached response
items[0].json.llmResponse = items[0].json[0].response;
items[0].json.fromCache = true;
// Remove the database result array
delete items[0].json[0];
return items;
Step 4.6: If no result was found, proceed to the LLM node, then add a MongoDB node to save the result:
Step 5: Implementing Cache Expiration
For time-sensitive applications, add cache expiration:
// In your cache checking function
const workflowData = await $getWorkflowStaticData("global");
if (!workflowData.llmCache) {
workflowData.llmCache = {};
}
const inputHash = createHash(items[0].json.inputText);
// Check if we have a cached result AND it's not expired
const CACHE\_TTL = 24 _ 60 _ 60 \* 1000; // 24 hours in milliseconds
const now = Date.now();
if (
workflowData.llmCache[inputHash] &&
(now - workflowData.llmCache[inputHash].timestamp) < CACHE\_TTL
) {
// Use cached result
items[0].json.llmResponse = workflowData.llmCache[inputHash].response;
items[0].json.fromCache = true;
} else {
// If expired or not in cache, mark for processing
items[0].json.fromCache = false;
// Optionally clean up expired cache entries
if (workflowData.llmCache[inputHash]) {
delete workflowData.llmCache[inputHash];
}
}
return items;
Step 6: Implementing Semantic Caching
For more intelligent caching that can match semantically similar inputs:
Step 6.1: Create an embedding for each input using an embedding model (e.g., OpenAI's embeddings API).
Step 6.2: Store both the input text and its embedding in your cache:
// After getting an embedding from OpenAI or another service
const workflowData = await $getWorkflowStaticData("global");
if (!workflowData.semanticCache) {
workflowData.semanticCache = [];
}
// Store the input, embedding, and response
workflowData.semanticCache.push({
inputText: items[0].json.inputText,
embedding: items[0].json.embedding, // This comes from an embedding API
response: items[0].json.llmResponse,
timestamp: Date.now()
});
return items;
Step 6.3: To check for similar inputs, calculate cosine similarity:
// Function to calculate cosine similarity between embeddings
function cosineSimilarity(a, b) {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] \* b[i];
normA += a[i] \* a[i];
normB += b[i] \* b[i];
}
normA = Math.sqrt(normA);
normB = Math.sqrt(normB);
return dotProduct / (normA \* normB);
}
// Current input embedding (from an embedding API)
const currentEmbedding = items[0].json.embedding;
// Load the cache
const workflowData = await $getWorkflowStaticData("global");
if (!workflowData.semanticCache) {
workflowData.semanticCache = [];
items[0].json.fromCache = false;
return items;
}
// Find the most similar cached entry
let bestMatch = null;
let highestSimilarity = 0;
const SIMILARITY\_THRESHOLD = 0.92; // Adjust as needed
for (const entry of workflowData.semanticCache) {
const similarity = cosineSimilarity(currentEmbedding, entry.embedding);
if (similarity > highestSimilarity) {
highestSimilarity = similarity;
bestMatch = entry;
}
}
// If we found a sufficiently similar entry, use it
if (bestMatch && highestSimilarity >= SIMILARITY\_THRESHOLD) {
items[0].json.llmResponse = bestMatch.response;
items[0].json.similarity = highestSimilarity;
items[0].json.fromCache = true;
} else {
items[0].json.fromCache = false;
}
return items;
Step 7: Implementing a Cache Pruning Strategy
For long-running workflows, implement a strategy to prevent the cache from growing too large:
// Load the cache
const workflowData = await $getWorkflowStaticData("global");
if (!workflowData.llmCache) {
workflowData.llmCache = {};
}
// Maximum cache size
const MAX_CACHE_SIZE = 500;
// If the cache is too large, prune it
const cacheKeys = Object.keys(workflowData.llmCache);
if (cacheKeys.length > MAX_CACHE_SIZE) {
console.log(`Cache too large (${cacheKeys.length}), pruning...`);
// Strategy 1: Remove oldest entries
const sortedEntries = cacheKeys
.map(key => ({
key,
timestamp: workflowData.llmCache[key].timestamp || 0
}))
.sort((a, b) => a.timestamp - b.timestamp);
// Keep only the newest MAX_CACHE_SIZE entries
const keysToRemove = sortedEntries
.slice(0, sortedEntries.length - MAX_CACHE_SIZE)
.map(entry => entry.key);
// Remove the oldest entries
for (const key of keysToRemove) {
delete workflowData.llmCache[key];
}
console.log(`Removed ${keysToRemove.length} old cache entries`);
}
// Continue with normal cache operations...
Step 8: Creating a Reusable Caching Subworkflow
To make your caching system reusable across multiple workflows:
Step 8.1: Create a new workflow dedicated to cache management.
Step 8.2: Add an "Execute Workflow" trigger to make it callable from other workflows.
Step 8.3: Add input parameters to the "Execute Workflow" trigger:
Step 8.4: Implement the cache logic in this workflow with a Switch node based on the operation.
Step 8.5: In your main workflow, use the "Execute Workflow" node to call this caching workflow before and after your LLM node.
Step 9: Monitoring Cache Performance
To track how effective your cache is:
// At the end of your workflow, add this to a Function node
const workflowData = await $getWorkflowStaticData("global");
// Initialize stats if needed
if (!workflowData.cacheStats) {
workflowData.cacheStats = {
totalRequests: 0,
cacheHits: 0,
cacheMisses: 0,
lastReset: Date.now()
};
}
// Update stats
workflowData.cacheStats.totalRequests++;
if (items[0].json.fromCache) {
workflowData.cacheStats.cacheHits++;
} else {
workflowData.cacheStats.cacheMisses++;
}
// Calculate hit rate
const hitRate = (workflowData.cacheStats.cacheHits / workflowData.cacheStats.totalRequests) \* 100;
// Add stats to the output
items[0].json.cacheStats = {
totalRequests: workflowData.cacheStats.totalRequests,
cacheHits: workflowData.cacheStats.cacheHits,
cacheMisses: workflowData.cacheStats.cacheMisses,
hitRate: `${hitRate.toFixed(2)}%`,
runningFor: `${Math.floor((Date.now() - workflowData.cacheStats.lastReset) / (1000 * 60 * 60))} hours`
};
return items;
Step 10: Handling Multiple Input Parameters
If your LLM calls include multiple parameters beyond just the input text:
// Create a composite key that includes all relevant parameters
function createCacheKey(params) {
// Sort keys to ensure consistent order
const sortedKeys = Object.keys(params).sort();
// Create a string representation of all parameters
const paramsString = sortedKeys
.map(key => `${key}:${JSON.stringify(params[key])}`)
.join('|');
// Create a hash of the parameters string
function hashString(str) {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = ((hash << 5) - hash) + char;
hash = hash & hash;
}
return hash.toString();
}
return hashString(paramsString);
}
// Example usage
const cacheKey = createCacheKey({
inputText: items[0].json.inputText,
temperature: items[0].json.temperature || 0.7,
maxTokens: items[0].json.maxTokens || 1000,
model: items[0].json.model || "gpt-3.5-turbo"
});
// Use this key in your cache lookups
items[0].json.cacheKey = cacheKey;
return items;
Step 11: Troubleshooting Common Issues
If you encounter issues with your caching implementation:
Issue 1: Cache not persisting between workflow runs
Issue 2: Cache always misses despite same inputs
Issue 3: Cache grows too large
Step 12: Optimizing for Production Use
For production workflows:
// Add error handling to your cache operations
try {
const workflowData = await $getWorkflowStaticData("global");
if (!workflowData.llmCache) {
workflowData.llmCache = {};
}
// Cache operations...
} catch (error) {
console.error("Cache operation failed:", error.message);
// Provide a fallback behavior
items[0].json.fromCache = false;
items[0].json.cacheError = error.message;
}
return items;
Add logging for debugging:
// Add verbose logging for cache operations
console.log(`Cache operation: ${operation}`);
console.log(`Input hash: ${inputHash}`);
console.log(`Cache hit: ${!!workflowData.llmCache[inputHash]}`);
if (workflowData.llmCache[inputHash]) {
console.log(`Cached at: ${new Date(workflowData.llmCache[inputHash].timestamp).toISOString()}`);
}
// Continue with cache operations...
Conclusion
By implementing a caching mechanism for your LLM calls in n8n, you can significantly reduce API costs, improve workflow execution times, and avoid rate limiting issues. Choose the approach that best fits your specific needs, considering factors like persistence requirements, cache size, and the complexity of your inputs.
Remember that caching works best for deterministic queries where the same input should always produce the same output. For applications requiring fresh, non-deterministic, or context-dependent responses, you may need to modify these caching strategies or disable them entirely.
When it comes to serving you, we sweat the little things. That’s why our work makes a big impact.