1. Abstract
Version: April 2026. All data, benchmarks, and system specifications reflect the state of the Affine network as of this date.
Affine is a decentralized evaluation and training system built on Bittensor whose central thesis is that the next leap in agentic capability comes not from scaling pretraining alone, but from reinforcement-learning post-training inside Affine’s own purpose-built environments. Static benchmarks are structurally inadequate for this goal:
- Fixed task sets that saturate through contamination
- Single-turn formats that miss interactive capabilities
- One-off scripts that do not scale to training loops
Affine inverts this paradigm — the same renewable environment suite that scores miners also serves as the RL training substrate that produces them, and the on-chain incentive structure drives every miner to post-train on these environments until its model surpasses the base Qwen3-32B on every capability axis. The system delivers four core contributions:
-
Scoring mechanism. A scoring mechanism built on Pareto dominance filtering, ELO-based temporal ratings, and a dual-signal anti-copy detector that incentivizes genuine model improvement on the Bittensor network.
-
Container-orchestration infrastructure. A container-orchestration infrastructure (Affinetes) that packages environments as reproducible Docker services with SSH-tunneled communication and multi-instance load balancing — separating environment execution from GPU inference.
-
A family of six evaluation environments: software engineering (SWE-Infinite), browser-grounded web interaction (LiveWeb Arena), memory management (MemoryGym), tool-mediated planning (NavWorld), strategic game-playing (OpenSpiel), and distributional alignment (DISTILL) — each with renewable task generation, deterministic seeding, and structured reward signals suitable for reinforcement learning.
-
Empirical evidence that the post-training loop works. Affine-trained miner models outperform the base Qwen3-32B on external benchmarks not used in training:
- +14% on MCP-Bench task completion
- +51% on MemoryAgentBench F1
- Non-trivial SWE-rebench scores (12.28) where the base model achieves zero
Why this matters: All environments provide deterministic seeding, structured reward signals (ranging from dense per-step shaping to per-task binary outcomes), and renewable task generation — serving simultaneously as evaluation benchmarks and as reinforcement-learning training substrates that have already pushed miner models past the base model.