LLM Intelligence & Value Assessment — Powered by AWG Singularity Framework
| Model | Provider | Capability | Efficiency | Vending-Bench | Cost / 1M tok | Altbot Role | Status |
|---|---|---|---|---|---|---|---|
| 🤖Claude Opus 4.6 | Anthropic | 95 | 55 | $8,017 | $15.00/$75.00 | Heavy reasoning, planning | ACTIVE |
| 🎵Claude Sonnet 4.6 | Anthropic | 88 | 75 | $6,200 | $3.00/$15.00 | Balanced coding/reasoning | ACTIVE |
| 📝Claude Haiku 4.5 | Anthropic | 72 | 90 | N/A | $0.80/$4.00 | Fast lightweight tasks | ACTIVE |
| ⚡Gemini 2.0 Flash | 82 | 92 | $4,800 | $0.10/$0.40 | Arena agents, swarm backbone | ACTIVE | |
| 💡Gemini 3.1 Flash-Lite | 68 | 97 | N/A | $0.025/$0.10 | High-volume cheap calls | ACTIVE | |
| 🔬Gemini 1.5 Pro | 85 | 60 | $5,100 | $1.25/$5.00 | Long context analysis | ACTIVE | |
| 🌐GPT-5.4 | OpenAI | 90 | 50 | $5,800 | $10.00/$30.00 | Reference only | REFERENCE |
| 🧪Grok 4.20 | xAI | 86 | 70 | $5,500 | $3.00/$15.00 | Reference only | REFERENCE |
| 🏠DeepSeek-R1-14B | DeepSeek | 65 | 95 | N/A | Free (local) | Local Jetson inference | ACTIVE |
| 🌙Kimi K2.5 | Moonshot | 70 | 78 | N/A | $1.00/$4.00 | Agent swarm (future) | REFERENCE |
Andon Labs benchmark simulating a vending machine business over 382 days. Score = final bank balance. Tests operational reasoning, accounting, inventory management, and long-horizon planning.
Simulated Y Combinator startup run for 1 year. Score = company valuation at exit. Evaluates strategic thinking, fundraising, hiring, product-market fit, and pivot decisions.
175 real-world office tasks spanning coding, email drafting, HR processes, and sales workflows. Measures practical agentic capability in corporate environments.
Agentic reasoning benchmark across challenging video games. Tests spatial reasoning, long-term strategy, exploration, and adaptive decision-making under uncertainty.
Altbot uses the AWG Singularity Test framework (Dr. Alexander Wissner-Gross) to evaluate models across 6 dimensions: Maturation Level, Targeting System, Positive-Sum, Composability, Abundance Flywheel, and Compute-Bound Path. Models that score higher on agentic benchmarks naturally align with AWG's emphasis on compute-bound problem solving.
This quadrant reflects Altbot's real production usage as of April 2026. Models marked "Reference" are included for competitive context but are not actively deployed in the Altbot swarm. Pricing reflects public list rates; actual costs may vary with caching, batching, and volume discounts.