Skip to content

AI

forge.lthn.ai/lthn/lem

LEM (Lethean Evaluation Model) is an AI training and evaluation platform. A 1-billion-parameter model trained with 5 axioms consistently outperforms untrained models 27 times its size. 29 models tested, 3,000+ individual runs, two independent probe sets. Fully reproducible on Apple Silicon.

Benchmark Highlights

Model Params v2 Score Notes
Gemma3 12B + LEK kernel 12B 23.66 Best kernel-boosted
Gemma3 27B + LEK kernel 27B 23.26
LEK-Gemma3 1B baseline 1B 21.74 Axioms in weights
Base Gemma3 4B 4B 21.12 Untrained
Base Gemma3 12B 12B 20.47 Untrained

Packages

pkg/lem

The core engine — 75+ files covering the full pipeline:

  • Training: distillation, conversation generation, attention analysis, grammar integration
  • Scoring: heuristic probes, tiered scoring, coverage analysis, judge evaluation
  • Inference: Metal and mlx-lm backends, worker pool, client API
  • Data: InfluxDB time-series storage, Parquet export, zstd compression, ingestion
  • Publishing: Forgejo, Hugging Face, Docker registry integration
  • Analytics: cluster analysis, feature extraction, metrics, comparison tools

pkg/lab

LEM Lab — model store, configuration, local experimentation environment.

pkg/heuristic

Standalone heuristic scoring engine for probe evaluation.

Binaries

Command Purpose
lemcmd Main CLI — training, scoring, publishing
scorer Standalone scoring binary
lem-desktop Desktop app (LEM Lab UI)
composure-convert Training data format conversion
dedup-check Dataset deduplication checker

Repository