Loop v2026.01.29 - Beta Release#
Released: January 29, 2026
This beta introduces datasets and evaluation workflows to organize LLM spans, run evaluations with custom evaluators, compare alternative responses, and track results over time.
Features & Enhancements#
-
Datasets — Create and manage datasets of LLM spans with full CRUD operations, version history, and multi-row selection. Organizations can use datasets to arrange spans for evaluation and comparison tasks.
-
Dataset Remix — Generate alternative LLM responses for dataset spans using different models or providers. Users can examine outputs side-by-side with inline expandable comparison view and track results in a leaderboard.
-
Evaluators & Evaluations — Define custom evaluators with prompt templates including built-in defaults. Conduct evaluations against spans or entire datasets with live streaming progress updates and stop/restart controls.
-
Manual Scoring — Manually score dataset spans with custom score titles for human-in-the-loop evaluation workflows.
-
Evaluation Results — View evaluation results with delta comparisons from previous runs, variance statistics per evaluator, and visual stat bars for quick insights.
-
Improved Onboarding — Redesigned welcome page with interactive demo project containing pre-seeded data and automatic navigator expansion for first-time users.
Fixes & Improvements#
- Column Persistence — Fixed column order and visibility not persisting correctly in spans table.
- Timestamp Handling — Improved nanosecond timestamp handling across the codebase to prevent precision issues.
- Cost Tracking — Fixed floating-point precision artifacts in cost calculations and improved cost chart accuracy.
- UI Improvements — Various fixes for table styling, tooltip behavior, context menu focus, and layout stability.
- Model Prices — Updated model pricing and context window data.