How to monitor success rates of language model calls in n8n?

Book a call with an Expert

Starting a new venture? Need to upgrade your web app? RapidDev builds application with your growth in mind.

How to monitor success rates of language model calls in n8n?

To monitor the success rates of language model calls in n8n, you can implement a tracking system using n8n's built-in functionality along with external monitoring tools. This involves tracking API responses, logging failures, calculating success metrics, and visualizing the data for analysis. By following this guide, you'll be able to set up a comprehensive monitoring system that helps you understand your language model's performance and address any issues promptly.

Step 1: Set Up Basic Error Handling in n8n Workflows

The first step is to implement proper error handling in your n8n workflows that make language model API calls. This allows you to capture when calls fail and why.


// Example of implementing error handling in Function node
const errorHandler = (error, itemIndex) => {
  // Log the error
  console.error(`Error in LLM call for item ${itemIndex}:`, error.message);
  
  // Return a standardized error object
  return {
    success: false,
    error: error.message,
    timestamp: new Date().toISOString(),
    itemIndex: itemIndex
  };
};

// In your HTTP Request node that calls the language model
try {
  // Your language model API call
  const response = await $items(itemIndex).json.response;
  return {
    success: true,
    data: response,
    timestamp: new Date().toISOString(),
    itemIndex: itemIndex
  };
} catch (error) {
  return errorHandler(error, itemIndex);
}

Step 2: Create a Database for Storing Call Results

Set up a database to store the results of your language model calls. This will allow you to analyze the success rates over time.

Create a database table with the following fields:
- id (primary key)
- timestamp
- success (boolean)
- model\_name
- error\_message (if applicable)
- response_time_ms
- workflow\_id
- execution\_id

You can use n8n's PostgreSQL, MySQL, or MongoDB nodes to connect to your database.

Step 3: Record Each Language Model Call

After each language model call, record the results in your database:


// In a Function node after your LLM API call
const startTime = new Date();

// Make your LLM API call
const llmResponse = await $node["HTTP Request"].json;

const endTime = new Date();
const responseTime = endTime - startTime;

// Prepare data for database
const logData = {
  timestamp: new Date().toISOString(),
  success: llmResponse.error ? false : true,
  model\_name: "{{$json.modelName}}",
  error\_message: llmResponse.error || null,
  response_time_ms: responseTime,
  workflow\_id: $workflow.id,
  execution\_id: $execution.id
};

return { logData };

// Then use a database node to insert this data

Step 4: Create Success Rate Calculation Workflow

Create a separate workflow that calculates success rates at regular intervals:


// In a Function node after retrieving data from your database
function calculateSuccessRate(records, timeframe) {
  const totalCalls = records.length;
  const successfulCalls = records.filter(record => record.success === true).length;
  
  return {
    timeframe,
    total\_calls: totalCalls,
    successful\_calls: successfulCalls,
    success\_rate: totalCalls > 0 ? (successfulCalls / totalCalls) \* 100 : 0,
    average_response_time: records.reduce((sum, record) => sum + record.response_time_ms, 0) / totalCalls
  };
}

// Calculate for different time periods
const last24Hours = $items(0).json.records.filter(r => 
  new Date(r.timestamp) > new Date(Date.now() - 24 _ 60 _ 60 \* 1000)
);

const last7Days = $items(0).json.records.filter(r => 
  new Date(r.timestamp) > new Date(Date.now() - 7 _ 24 _ 60 _ 60 _ 1000)
);

return [
  calculateSuccessRate(last24Hours, "24 hours"),
  calculateSuccessRate(last7Days, "7 days")
];

Step 5: Set Up Alerting for Low Success Rates

Implement an alerting system that notifies you when success rates drop below a certain threshold:


// In a Function node after calculating success rates
const threshold = 95; // 95% success rate threshold

if ($json.success\_rate < threshold) {
  return {
    alert: true,
    message: `Language model success rate has dropped to ${$json.success_rate.toFixed(2)}% in the last ${$json.timeframe}`,
    severity: $json.success\_rate < 90 ? "high" : "medium",
    timestamp: new Date().toISOString()
  };
} else {
  return {
    alert: false
  };
}

Then use n8n's Slack, Email, or other notification nodes to send alerts when needed.

Step 6: Create a Visual Dashboard

Set up a visual dashboard to monitor your success rates. You can use:

n8n's integration with Grafana
Export data to a BI tool like Metabase or Power BI
Use n8n's HTTP Request node to send data to a dashboard service

Example of sending data to Grafana:


// In a Function node to format data for Grafana
const grafanaData = $items.map(item => ({
  targets: [
    {
      target: "llm.success\_rate",
      datapoints: [
        [$json.success\_rate, new Date().getTime()]
      ]
    },
    {
      target: "llm.response\_time",
      datapoints: [
        [$json.average_response_time, new Date().getTime()]
      ]
    }
  ]
}));

return { grafanaData };

Step 7: Implement Detailed Error Analysis

Create a workflow that analyzes errors to identify patterns:


// In a Function node after retrieving error data
function analyzeErrors(errorRecords) {
  // Group errors by type
  const errorTypes = {};
  
  errorRecords.forEach(record => {
    const errorMsg = record.error\_message;
    errorTypes[errorMsg] = (errorTypes[errorMsg] || 0) + 1;
  });
  
  // Sort by frequency
  const sortedErrors = Object.entries(errorTypes)
    .sort((a, b) => b[1] - a[1])
    .map(([error, count]) => ({ error, count }));
  
  return {
    total\_errors: errorRecords.length,
    error\_breakdown: sortedErrors,
    most_common_error: sortedErrors.length > 0 ? sortedErrors[0] : null
  };
}

const failedCalls = $items.filter(item => item.json.success === false);
return analyzeErrors(failedCalls);

Step 8: Track Success Rates by Model and Prompt Type

To get more granular insights, track success rates by different models and prompt types:


// In a Function node for segmented analysis
function calculateSegmentedSuccessRates(records) {
  // Group by model
  const modelGroups = {};
  records.forEach(record => {
    if (!modelGroups[record.model\_name]) {
      modelGroups[record.model\_name] = [];
    }
    modelGroups[record.model\_name].push(record);
  });
  
  // Calculate success rates for each model
  const modelStats = {};
  for (const [model, modelRecords] of Object.entries(modelGroups)) {
    const total = modelRecords.length;
    const successful = modelRecords.filter(r => r.success).length;
    modelStats[model] = {
      total\_calls: total,
      success\_rate: (successful / total) \* 100,
      avg_response_time: modelRecords.reduce((sum, r) => sum + r.response_time_ms, 0) / total
    };
  }
  
  return modelStats;
}

return calculateSegmentedSuccessRates($items);

Step 9: Monitor Cost and Usage Together with Success Rates

Track the cost implications of your language model usage alongside success rates:


// In a Function node to calculate costs
function calculateCosts(records, costPerSuccessfulCall, costPerFailedCall) {
  const successfulCalls = records.filter(r => r.success).length;
  const failedCalls = records.length - successfulCalls;
  
  const successCost = successfulCalls \* costPerSuccessfulCall;
  const failureCost = failedCalls \* costPerFailedCall;
  
  return {
    total\_cost: successCost + failureCost,
    success\_cost: successCost,
    failure\_cost: failureCost,
    cost_per_call: (successCost + failureCost) / records.length
  };
}

// Adjust these values based on your LLM provider's pricing
const costPerSuccessfulCall = 0.02; // $0.02 per successful call
const costPerFailedCall = 0.005; // $0.005 per failed call (some providers charge less for failures)

return calculateCosts($items, costPerSuccessfulCall, costPerFailedCall);

Step 10: Implement Scheduled Monitoring Reports

Set up a scheduled workflow to generate and send regular monitoring reports:


// In a Function node to generate a report
function generateReport(successRates, errorAnalysis, costData) {
  const now = new Date();
  
  return {
    report\_title: `LLM Performance Report - ${now.toISOString().split('T')[0]}`,
    report\_timestamp: now.toISOString(),
    summary: {
      overall_success_rate: successRates.find(r => r.timeframe === "24 hours").success\_rate,
      total_calls_24h: successRates.find(r => r.timeframe === "24 hours").total\_calls,
      avg_response_time_ms: successRates.find(r => r.timeframe === "24 hours").average_response\_time,
      total_cost_24h: costData.total\_cost
    },
    success\_rates: successRates,
    error\_analysis: errorAnalysis,
    cost\_analysis: costData,
    recommendations: generateRecommendations(successRates, errorAnalysis)
  };
}

function generateRecommendations(successRates, errorAnalysis) {
  const recommendations = [];
  
  if (successRates.find(r => r.timeframe === "24 hours").success\_rate < 95) {
    recommendations.push("Investigate recent failures - success rate is below 95%");
  }
  
  if (errorAnalysis.most_common_error) {
    recommendations.push(`Address the most common error: "${errorAnalysis.most_common_error.error}"`);
  }
  
  return recommendations;
}

return generateReport($node["Success Rates"].json, $node["Error Analysis"].json, $node["Cost Analysis"].json);

Use an Email node to send this report to stakeholders on a daily or weekly basis.

Step 11: Implement A/B Testing for Different Prompts

Set up A/B testing to compare success rates between different prompt formats:


// In a Function node for A/B testing
function setupABTest(item, testId) {
  // Randomly assign variant A or B
  const variant = Math.random() < 0.5 ? 'A' : 'B';
  
  // Define different prompt templates for each variant
  const promptTemplates = {
    'A': `Standard prompt: ${item.json.promptBase}`,
    'B': `Enhanced prompt with examples: ${item.json.promptBase}\n\nFor example:\n${item.json.examples}`
  };
  
  return {
    test\_id: testId,
    variant,
    prompt: promptTemplates[variant],
    original\_item: item.json
  };
}

// Assign each item to a test group
return $items.map(item => setupABTest(item, 'prompt-formatting-test'));

Step 12: Create a Real-time Monitoring Dashboard with n8n

Build a real-time dashboard using n8n's webhook functionality:


// In a Function node to prepare dashboard data
function prepareDashboardData(successRates, currentStatus) {
  return {
    dashboard: {
      current\_status: {
        status: currentStatus.overall_success_rate > 98 ? "healthy" : "degraded",
        success_rate_current: currentStatus.success\_rate,
        response_time_current: currentStatus.average_response_time
      },
      historical: {
        daily: successRates.filter(r => r.timeframe === "24 hours"),
        weekly: successRates.filter(r => r.timeframe === "7 days")
      },
      last\_updated: new Date().toISOString()
    }
  };
}

// Use this with a Set node to update a static value
// Then create a webhook endpoint to expose this data

Then create a simple HTML dashboard that fetches this data via the webhook.

Step 13: Implement Automatic Retries for Failed Calls

Set up automatic retries when language model calls fail:


// In a Function node
const maxRetries = 3;
const retryDelay = 1000; // ms

async function callLLMWithRetry(prompt, modelName, retryCount = 0) {
  try {
    // Your LLM API call
    const response = await makeAPICall(prompt, modelName);
    return {
      success: true,
      data: response,
      retries\_needed: retryCount
    };
  } catch (error) {
    if (retryCount < maxRetries) {
      // Wait before retrying
      await new Promise(resolve => setTimeout(resolve, retryDelay \* (retryCount + 1)));
      return callLLMWithRetry(prompt, modelName, retryCount + 1);
    } else {
      return {
        success: false,
        error: error.message,
        retries\_attempted: retryCount
      };
    }
  }
}

// Helper function for making the actual API call
async function makeAPICall(prompt, modelName) {
  // Implementation depends on your LLM provider
  // Example for OpenAI:
  const response = await $node["HTTP Request"].helpers.httpRequest({
    method: "POST",
    url: "https://api.openai.com/v1/chat/completions",
    headers: {
      "Authorization": `Bearer ${$credentials.openAiApi.apiKey}`,
      "Content-Type": "application/json"
    },
    body: {
      model: modelName,
      messages: [{ role: "user", content: prompt }]
    }
  });
  
  return response.data;
}

return callLLMWithRetry($json.prompt, $json.modelName);

Step 14: Implement Performance Benchmarking

Create a workflow that periodically runs benchmark tests against your language models:


// In a Function node to set up benchmark tests
function createBenchmarkTests() {
  const standardPrompts = [
    { id: "simple_qa", prompt: "What is the capital of France?", expected_type: "factual" },
    { id: "code_generation", prompt: "Write a function to calculate fibonacci numbers", expected_type: "code" },
    { id: "creative", prompt: "Write a short poem about artificial intelligence", expected\_type: "creative" }
  ];
  
  const models = [
    "gpt-3.5-turbo",
    "gpt-4",
    "claude-2"
    // Add other models you use
  ];
  
  const benchmarkTests = [];
  
  // Create all combinations of prompts and models
  for (const prompt of standardPrompts) {
    for (const model of models) {
      benchmarkTests.push({
        test\_id: `${prompt.id}_${model}`,
        prompt: prompt.prompt,
        model: model,
        expected_type: prompt.expected_type,
        timestamp: new Date().toISOString()
      });
    }
  }
  
  return benchmarkTests;
}

return createBenchmarkTests();

Step 15: Create a Comprehensive Monitoring System

Finally, tie all the previous components together into a comprehensive monitoring system:

Create a master workflow that orchestrates all monitoring activities
Set up a database schema that stores all monitoring data
Implement a dashboard that shows real-time and historical data
Configure alerts for critical failures

Master workflow example:


// In a Function node at the start of your master monitoring workflow
const monitoringTasks = [
  {
    task: "collect_recent_data",
    description: "Retrieve recent LLM call data from database",
    next: "calculate_success_rates"
  },
  {
    task: "calculate_success_rates",
    description: "Calculate success rates for different time periods",
    next: "analyze\_errors" 
  },
  {
    task: "analyze\_errors",
    description: "Analyze error patterns",
    next: "generate_cost_analysis"
  },
  {
    task: "generate_cost_analysis",
    description: "Calculate cost metrics",
    next: "run\_benchmarks"
  },
  {
    task: "run\_benchmarks",
    description: "Run standard benchmark tests",
    next: "generate\_report"
  },
  {
    task: "generate\_report",
    description: "Generate comprehensive monitoring report",
    next: "update\_dashboard"
  },
  {
    task: "update\_dashboard",
    description: "Update real-time monitoring dashboard",
    next: "send\_notifications"
  },
  {
    task: "send\_notifications",
    description: "Send alerts for any issues detected",
    next: null
  }
];

return { monitoringTasks };

By implementing this comprehensive monitoring system, you'll gain full visibility into the success rates of your language model calls in n8n, enabling you to quickly identify and address any issues that arise.

How to monitor success rates of language model calls in n8n?

How to monitor success rates of language model calls in n8n?

Want to explore opportunities to work with us?

Client trust and success are our top priorities