Overfit is a from-scratch deep learning engine for .NET 10. Train in PyTorch or in-process, then run zero-allocation inference with no native dependencies and no Python runtime. AOT-compatible single-binary deployment.
Pre-allocated buffers, span-based math, no per-call allocations on the hot path. Predictable tail latency under concurrent load.
using var engine = InferenceEngine.FromSequential(model, 784, 10);
Span<float> input = stackalloc float[784];
Span<float> output = stackalloc float[10];
engine.Run(input, output); // zero-allocation
Load weights from HuggingFace or directly from GGUF files (F32, F16, BF16, Q8_0, Q4_K, Q6_K). KV-cache decode at 0 bytes per token. Top-10 logit overlap 10/10 vs PyTorch reference (maxAbsDiff = 0.000107).
using var gpt2 = Gpt2.LoadSmall("./models/gpt2");
using var session = gpt2.CreateSession();
session.Reset(gpt2.Tokenizer.Encode("Hello, world."));
for (int i = 0; i < 32; i++)
{
var token = session.GenerateNextToken(in sampling);
Console.Write(gpt2.Tokenizer.DecodeToken(token));
}
Direct ONNX import. 14 operators supported (Conv, Gemm, ReLU, Tanh, BatchNorm, MaxPool, Add for skip connections, etc.). Branching DAG topology works (ResNet, DenseNet, EfficientNet).
var model = OnnxImporter.Load("classifier.onnx");
model.Eval();
using var engine = InferenceEngine.FromSequential(model, 784, 10);
var prediction = engine.Predict(input); // zero-alloc
AMD Ryzen 9 9950X3D · Windows 11 · .NET 10 · BenchmarkDotNet 0.15.8. Reproducible from the repo.
| Engine | Mean | Allocated | vs ONNX |
|---|---|---|---|
| Overfit | 250.7 ns | 0 B | 7.6× faster |
| ONNX Runtime (pre-allocated) | 1,899 ns | 224 B | baseline |
| ONNX Runtime (standard) | 3,388 ns | 952 B | 0.56× |
| Path | Mean | Allocated |
|---|---|---|
| Legacy (full forward per token) | 6,318 ms | 62.0 MB · grows |
| KV-cache | 973 ms | 0 B / token * |
| Engine | Mean | Allocated | vs ONNX |
|---|---|---|---|
| Overfit | 522 ms | 0 B | 3.6× faster |
| ONNX Runtime | 1,894 ms | 117 MB | baseline |
Self-hosted pod-level anomaly detection on AKS / EKS. Replace Datadog or Dynatrace anomaly modules without sending metrics out of cluster. Sub-minute detection latency, no SaaS bill.
Add ML inference to existing .NET services without shipping a Python sidecar. One process, one runtime, one deploy pipeline, one security audit.
Sub-microsecond P99.9 (0.80 µs vs 5.70 µs for ONNX). Zero GC pauses under sustained concurrent load. Deterministic tail latency matters more than peak throughput.
Native AOT single-binary deployment. No CUDA libraries to ship, no Python runtime to install. Industrial automation, gaming anti-cheat, IoT.
Where this is the wrong tool. Read this before deciding.
Dual-licensed: free under AGPLv3 for open-source projects and research. Commercial licenses for closed-source production use:
Tell us about your workload. We respond within one business day. If we are not a fit, we will say so.