Loop v2026.01.29 - Beta Release#

Released: January 29, 2026

This beta introduces datasets and evaluation workflows to organize LLM spans, run evaluations with custom evaluators, compare alternative responses, and track results over time.

Features & Enhancements#

Datasets — Create and manage datasets of LLM spans with full CRUD operations, version history, and multi-row selection. Organizations can use datasets to arrange spans for evaluation and comparison tasks.
Dataset Remix — Generate alternative LLM responses for dataset spans using different models or providers. Users can examine outputs side-by-side with inline expandable comparison view and track results in a leaderboard.
Evaluators & Evaluations — Define custom evaluators with prompt templates including built-in defaults. Conduct evaluations against spans or entire datasets with live streaming progress updates and stop/restart controls.
Manual Scoring — Manually score dataset spans with custom score titles for human-in-the-loop evaluation workflows.
Evaluation Results — View evaluation results with delta comparisons from previous runs, variance statistics per evaluator, and visual stat bars for quick insights.
Improved Onboarding — Redesigned welcome page with interactive demo project containing pre-seeded data and automatic navigator expansion for first-time users.

Fixes & Improvements#

Column Persistence — Fixed column order and visibility not persisting correctly in spans table.
Timestamp Handling — Improved nanosecond timestamp handling across the codebase to prevent precision issues.
Cost Tracking — Fixed floating-point precision artifacts in cost calculations and improved cost chart accuracy.
UI Improvements — Various fixes for table styling, tooltip behavior, context menu focus, and layout stability.
Model Prices — Updated model pricing and context window data.