Changelog

Changelog

Changelog

New features, integrations, and research from Neurometric AI.

New features, integrations, and research from Neurometric AI.

Product

Released Clawbake, an open-source Kubernetes system enabling team members to spin up isolated OpenClaw AI agent instances managed via web dashboard or Slack.

  • Isolated environments with independent namespaces, API keys, and storage

  • Slack integration with /clawbake commands for instance management

  • Kubernetes operator pattern with automatic drift correction

  • Network, credential, and workload isolation across users

Product

Evaluate model outputs using cosine similarity for organizations without established eval datasets. Compare model performance to optimize for speed and cost without needing ground-truth labels.

  • Cosine similarity scoring for text-based responses

  • Relative model comparison without eval datasets

  • Speed and cost optimization decisions from real outputs

Integration

Turn Claude Code sessions into a testing lab. Automatically evaluate prompts across different models and reasoning configurations with zero code changes.

  • Automatic prompt capture from Claude Code sessions

  • Gateway routing through api.neurometric.ai

  • Built-in /neurometric-status and /neurometric-replay commands

Product

Identify the fastest models for your agent workloads. Radar graph comparisons across cost, speed, and token efficiency with average 4-5x latency improvements.

  • Radar graph model comparisons

  • Latency-focused optimization (avg 4-5x speed gains)

  • Free access at studio.neurometric.ai

Research

Published a comprehensive checklist for evaluating AI infrastructure and deploying Small Language Models for speed and cost reduction.

  • Step-by-step infrastructure evaluation framework

  • SLM deployment guide for production workloads

  • Cost-performance optimization strategies

Research

Explores the future of self-optimizing AI infrastructure where systems autonomously manage model selection, routing, and configuration without human intervention.

  • Intelligent gateway for real-time request routing

  • Automated model evaluation pipelines

  • Cost-quality optimizers with dynamic budget constraints

Integration

Import your existing Langfuse traces directly into Neurometric for instant analysis. One-click connection with automatic trace sync.

  • One-click Langfuse connection

  • Automatic trace import and sync

  • Supports EU and US Langfuse regions

Product

Explore 4-stage Chain of Thought reasoning across different models. Compare how models break down complex problems step by step.

  • 4-stage CoT visualization

  • Side-by-side model comparison

  • Step-by-step reasoning analysis

Announced a post-trained GPT-4o-style model based on Arcee 400B as a free and open-source alternative for production inference.

Research

Fine-tuned a 4B parameter Small Language Model to achieve 95% accuracy on CRM-Arena, demonstrating that smaller models can outperform frontier models on domain-specific tasks.

  • 4B parameter model fine-tuned for CRM tasks

  • 95% accuracy — outperforming larger frontier models

  • Fraction of the inference cost

Research

Guide to transitioning from tactical model selection to systematic portfolio management across 10+ models with dedicated AI Model Ops infrastructure.

  • Portfolio management across frontier, mid-tier, SLM, and custom models

  • Multi-dimensional measurement at task-model-algorithm level

  • Continuous automated testing infrastructure

Product

Full experiment pipeline for model evaluation: configure evaluations across multiple models, run automated comparisons, and review results with LLM-as-Judge scoring.

  • Multi-model experiment configuration

  • Automated prompt evaluation across 6+ models

  • LLM-as-Judge pass/fail scoring

  • Cost, latency, and token analysis per model

Product

Added NVIDIA Nemotron models to the CRM-Arena leaderboard for testing and benchmarking against other production models.

Integration

Connect your Langfuse account to Neurometric and import existing LLM traces. Auto-sync prompts, costs, and latency data for immediate analysis.

  • API key-based connection

  • Automatic prompt and cost sync

  • Real-time monitoring support

Product

Launched the CRM-Arena leaderboard for benchmarking language models on real-world CRM tasks. Test models across customer service, sales, and support scenarios.

  • Real-world CRM task benchmarks

  • Multi-model comparison

  • Production-relevant evaluation metrics

Research

Published research on how inference-time compute optimizations and task-specific algorithms improve model performance on production CRM workloads.

AI-Powered Workspace

Ready to Deploy Intelligence Without Compromise?

Join enterprises that have eliminated API dependencies, slashed inference costs, and deployed AI that respects data sovereignty.