Skip to content

go-ml

forge.lthn.ai/core/go-ml provides pluggable inference backends, a multi-suite scoring engine with ethics-aware probes, GGUF model management, and a concurrent worker pipeline for batch evaluation.

Module: forge.lthn.ai/core/go-ml Size: ~7,500 LOC across 41 Go files, 6 test files Licence: EUPL-1.2

Core Capabilities

Area Description
Inference backends MLX (Metal GPU), llama.cpp (subprocess), HTTP (Ollama/vLLM/OpenAI-compatible)
Scoring engine Heuristic (regex), semantic (LLM judge), exact match, ethics probes
Agent orchestrator Multi-model scoring runs with concurrent worker pool
Model management GGUF format parsing, MLX-to-PEFT conversion, Ollama model creation
Data pipeline DuckDB storage, Parquet I/O, InfluxDB metrics, HuggingFace publishing

Dependencies

Module Purpose
core/go Framework services, lifecycle, process management
core/go-inference Shared TextModel/Backend/Token interfaces
core/go-mlx Native Metal GPU inference (darwin/arm64)
core/go-process Subprocess management for llama-server
core/go-log Structured error helpers
core/go-api REST API route registration
go-duckdb Embedded analytics database
parquet-go Columnar data format

Architecture Overview

The package is organised around four layers:

Backend Layer     — inference.go, backend_http.go, backend_llama.go, backend_mlx.go
                    Pluggable backends behind a common Backend interface.

Scoring Layer     — score.go, heuristic.go, judge.go, exact.go, probes.go
                    Multi-suite concurrent scoring engine.

Agent Layer       — agent_execute.go, agent_eval.go, agent_influx.go, agent_ssh.go
                    Orchestrates checkpoint discovery, evaluation, and result publishing.

Data Layer        — db.go, influx.go, parquet.go, export.go, io.go
                    Storage, metrics, and training data pipeline.

Service Registration

go-ml integrates with the Core DI framework via Service:

import "forge.lthn.ai/core/go-ml"

c, _ := core.New(
    core.WithName("ml", ml.NewService(ml.Options{
        OllamaURL:  "http://localhost:11434",
        JudgeURL:   "http://localhost:11434",
        JudgeModel: "qwen3:8b",
        Suites:     "all",
        Concurrency: 4,
    })),
)

On startup, the service registers configured backends and initialises the scoring engine. It implements Startable and Stoppable for lifecycle integration.

Service Methods

svc.Generate(ctx, "ollama", "Explain LoRA", ml.DefaultGenOpts())
svc.ScoreResponses(ctx, responses)
svc.RegisterBackend("custom", myBackend)
svc.Backend("ollama")
svc.DefaultBackend()
svc.Judge()
svc.Engine()

REST API

The api sub-package provides Gin-based REST endpoints:

Method Path Description
GET /v1/ml/backends List registered backends with availability
GET /v1/ml/status Service readiness, backend list, judge availability
POST /v1/ml/generate Generate text against a named backend

WebSocket channels: ml.generate, ml.status.

Quick Start

// Create an HTTP backend pointing at Ollama
backend := ml.NewHTTPBackend("http://localhost:11434", "qwen3:8b")

// Generate text
result, err := backend.Generate(ctx, "What is LoRA?", ml.DefaultGenOpts())
fmt.Println(result.Text)

// Score a batch of responses
judge := ml.NewJudge(backend)
engine := ml.NewEngine(judge, 4, "heuristic,semantic")
scores := engine.ScoreAll(ctx, responses)

Further Reading