Skip to content

Loop v2026.01.29 - Beta Release#

Released: January 29, 2026

This beta introduces datasets and evaluation workflows to organize LLM spans, run evaluations with custom evaluators, compare alternative responses, and track results over time.

Features & Enhancements#

  • Datasets — Create and manage datasets of LLM spans with full CRUD operations, version history, and multi-row selection. Organizations can use datasets to arrange spans for evaluation and comparison tasks.

  • Dataset Remix — Generate alternative LLM responses for dataset spans using different models or providers. Users can examine outputs side-by-side with inline expandable comparison view and track results in a leaderboard.

  • Evaluators & Evaluations — Define custom evaluators with prompt templates including built-in defaults. Conduct evaluations against spans or entire datasets with live streaming progress updates and stop/restart controls.

  • Manual Scoring — Manually score dataset spans with custom score titles for human-in-the-loop evaluation workflows.

  • Evaluation Results — View evaluation results with delta comparisons from previous runs, variance statistics per evaluator, and visual stat bars for quick insights.

  • Improved Onboarding — Redesigned welcome page with interactive demo project containing pre-seeded data and automatic navigator expansion for first-time users.

Fixes & Improvements#

  • Column Persistence — Fixed column order and visibility not persisting correctly in spans table.
  • Timestamp Handling — Improved nanosecond timestamp handling across the codebase to prevent precision issues.
  • Cost Tracking — Fixed floating-point precision artifacts in cost calculations and improved cost chart accuracy.
  • UI Improvements — Various fixes for table styling, tooltip behavior, context menu focus, and layout stability.
  • Model Prices — Updated model pricing and context window data.