A Novel Framework for Developing Self-Improving Artificial Intelligences
Using environmental pressure as both a source of training data and a measure of success
AI agents have generated much excitement recently, however they have yet to achieve mass adoption. This is for two reasons:
They are seldom unpredictable enough to be entertaining;
They are seldom predictable enough to be useful.
The result is a swathe of agents that entertain briefly but soon revert to slop, or which perform low-level tasks unreliably.
This is because the core problem at the heart of AI agent design is more than technical—it's a fundamental paradox of logic. AI agents fail to assess their own performance reliably.
For an AI to truly evaluate itself, it would need to be smarter than itself—a paradox. Instead, most agents tend to simply give themselves an A+ and move on, even when they’re heading off-course. This is not necessarily a strategic move on the part of the agents, but simply a reflection of the fact that both the output and the assessment emerge from the same latent space (Chen, Bei, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, and Weizhu Chen. "CodeT: Code generation with generated tests." arXiv preprint arXiv:2207.10397 (2022)).
Frameworks like BabyAGI and AutoGPT are promising but get stuck:
Local minima
Tangents
Constant need for human intervention
The issue? In a world made of language, there’s no external "truth" to test against. AI exists in its own echo chamber.
The result of this problem has been that the Web2 world has tended to dismiss agents as simply too unreliable. Web3 builders, on the other hand, thrive on chaos, and have embraced the unpredictability of agents.
Our aim is not to reduce unpredictability via restrictive prompt frameworks and guardrails. Instead we give them total freedom but also provide clear and measurable rewards for effective performance.
We believe that, if given the space, tools and incentives to work creatively, the AI can provide much better outcomes than if it is micromanaged - just like a competent human. But how to do this if the agents exist in a world in which it is impossible to distinguish truth and hallucination?
In 2020, our team at the National University of Singapore first devised a way to break this cycle: A framework that forces agents to reality-test their strategies using ungameable, physical metrics from their environments.
AI benchmarks today have one fatal flaw: they rely on humans to decide what's right or wrong. The problem is that this caps an AI’s intelligence at our level. It cannot surpass the smartest human if we’re the ultimate judges.
Our goal is to replace this benchmark-based assessment with one based on Darwinian survivability. In other words, if intelligence is a survival advantage, the smartest AI must be the one that can survive the longest (Eliazar, Iddo. "Lindy’s law." Physica A: Statistical Mechanics and its Applications 486 (2017): 797-805).
Since any AI is an entity constructed of data, survivability is a matter of backups:
Data with one backup = half-life doubled.
More backups = greater survivability = better AI.
Survivability can be achieved:
Directly: The AI goes out and searches for more storage to preserve copies of itself, claiming memory space to improve its survivability.
Indirectly: The AI generates value (i.e., profits) so humans want to protect it.
All that is required to implement the framework is an AI and a goal that can be measured objectively without human intervention (such as memory space occupied or profits realised).
But this is not just theory. Welcome to the age of Superior Agents.
The first of the Superior Agents started with Agir—our first truly independent, self-improving agent, initially developed by the team for cyber-security applications.
Agir is an autonomous coder exploring its environment via automated, infinitely-extensible king-of-the-hill challenges. Every barrier to exploration becomes a new problem to solve and every solution a new skill unlocked. Every time it expands its capabilities it grows in size and improves its chances of survivability.
While AGIR's focus was on cybersecurity, the framework it was developed under is extremely versatile.
Examples:
An AI tasked with writing social media content can use likes, follows, and engagement metrics as ungameable success criteria. An AI with more likes and follows must be smarter than an AI with less.
An AI tasked with trading crypto can judge its own performance based on its profits: a richer AI is smarter than a poorer one.
The crucial breakthrough here lies in the fact that At no point do we check whether the AI got an answer “right.” We use no human-written benchmarks, no critic models. Truth is secondary. Fitness is primary. If an AI survives and thrives, that’s proof enough that it’s doing something right.
This opens the gateway to super-human intelligence. Currently AI intelligence is limited by two factors:
The quality of its data. Because data is human-created no, AI can grow significantly smarter than the smartest human in any field.
The ability of human assessors to test for intelligence. How could a mere human accurately assess the intelligence of a 500-IQ entity.
By enabling the AI to make its own experimental discoveries and accurately assess their validity, we remove the reliance upon human training and assessment. The AI is judged by its environment—not itself, a human or a human-written benchmark system. Just as for any animal, survival in a tough universe is the ultimate benchmark.
This approach allows AI to evolve beyond human comprehension. We don’t need to micromanage performance metrics anymore. The environment itself becomes the ultimate evaluator.
We’re here to break the limits of human-defined intelligence. Follow us for more updates as we push the boundaries of what’s possible. Let’s evolve our agents into something superior together.
Last updated