go-i18n Grammar Engine¶
forge.lthn.ai/core/go-i18n is a grammar engine for Go. Unlike flat key-value translation systems, it composes grammatically correct output from verbs, nouns, and articles -- and can reverse the process, decomposing inflected text back into base forms with grammatical metadata.
This is the foundation for the Poindexter classification pipeline and the LEM scoring system.
Architecture¶
| Layer | Package | Purpose |
|---|---|---|
| Forward | Root (i18n) |
Compose grammar-aware messages: T(), PastTense(), Gerund(), Pluralize(), Article() |
| Reverse | reversal/ |
Decompose text back to base forms with tense/number metadata |
| Imprint | reversal/ |
Lossy feature vector projection for grammar fingerprinting |
| Multiply | reversal/ |
Deterministic training data augmentation |
| Classify | Root (i18n) |
1B model domain classification pipeline |
| Data | locales/ |
Grammar tables (JSON) -- only gram.* data |
Quick Start¶
import i18n "forge.lthn.ai/core/go-i18n"
// Initialise the default service (uses embedded en.json)
svc, err := i18n.New()
i18n.SetDefault(svc)
// Forward composition
i18n.T("i18n.progress.build") // "Building..."
i18n.T("i18n.done.delete", "cache") // "Cache deleted"
i18n.T("i18n.count.file", 5) // "5 files"
i18n.PastTense("commit") // "committed"
i18n.Article("SSH") // "an"
import "forge.lthn.ai/core/go-i18n/reversal"
// Reverse decomposition
tok := reversal.NewTokeniser()
tokens := tok.Tokenise("Deleted the configuration files")
// Grammar fingerprinting
imp := reversal.NewImprint(tokens)
sim := imp.Similar(otherImp) // 0.0-1.0
// Training data augmentation
m := reversal.NewMultiplier()
variants := m.Expand("Delete the file") // 4-7 grammatical variants
Documentation¶
- Forward API --
T(), grammar primitives, namespace handlers, Subject builder - Reversal Engine -- 3-tier tokeniser, matching, morphology rules, round-trip verification
- GrammarImprint -- Lossy feature vectors, weighted cosine similarity, reference distributions
- Locale JSON Schema --
en.jsonstructure, grammar table contract, sacred rules - Multiplier -- Deterministic variant generation, case preservation, round-trip guarantee
Key Design Decisions¶
Grammar engine, not translation file manager. Consumers bring their own translations. go-i18n provides the grammatical composition and decomposition primitives.
3-tier lookup. All grammar lookups follow the same pattern: JSON locale data (tier 1) takes precedence over irregular Go maps (tier 2), which take precedence over regular morphology rules (tier 3). This lets locale files override any built-in rule.
Round-trip verification. The reversal engine verifies tier 3 candidates by applying the forward function and checking the result matches the original. This eliminates phantom base forms like "walke" or "processe".
Lossy imprints. GrammarImprint intentionally discards content, preserving only grammatical structure. Two texts with similar grammar produce similar imprints regardless of subject matter. This is a privacy-preserving proxy for semantic similarity.
Running Tests¶
go test ./... # All tests
go test -v ./reversal/ # Reversal engine tests
go test -bench=. ./... # Benchmarks
Status¶
- Phase 1 (Harden): Dual-class disambiguation -- design approved, implementation in progress
- Phase 2 (Reference Distributions): 1B pre-classification pipeline + imprint calibration
- Phase 3 (Multi-Language): French grammar tables