Evaluating Intelligence
Fitness beats truth
In academic and scientific settings intelligence tends to be judged based upon a correctness heuristic: intelligence is the ability to solve problems correctly. From an evolutionary perspective, however, intelligence exists for no other reason than to improve a creature’s survival prospects.
Since it is capacity for evolution rather than the ability to solve any particular problem that interests us in this case, it is thus this survival heuristic that must be the starting point for any attempt to measure intelligence (see, for example: Prakash, Chetan, Kyle D. Stephens, Donald D. Hoffman, Manish Singh, and Chris Fields. "Fitness beats truth in the evolution of perception." Acta Biotheoretica 69 (2021): 319-341).
We thus argue that if one gives a code generation model the goal of maximising a non-code variable, every obstacle standing in the way of this becomes a new problem to be solved. If these problems and the correct/incorrect solutions are then saved to a file they can be used to retrain the model and improve its future performance on similar tasks, even in fast-changing environments.
The non-code variable in question can be almost any numerical measure external to the agent's own informational universe: a social media management agent could be ordered to maximise likes and follows, an AI companion to maximise human interactions, a gaming agent to maximise human player defeats etc.
While systems already exist that use rewards to drive machine learning, they are based on the principle of rewarding the system for getting better at a given task - the correctness heuristic covered above.
Under our design, non-code performance metrics are transformed into universal rewards. No matter the specifics of the problem at hand, a solution that results in higher metrics is always correct, while one that does not is always wrong. Thus no human or human-crafted assessment mechanism is necessary to evaluate and compensate the system’s work.
To put it another way, if we abstract incentives up a layer, we do the same to the skills learnt in response to those incentives.
If we simply reward an agent for writing more engaging Twitter posts as evaluated by a benchmark or a critic, all it will ever be able to do is write increasingly pleasing Twitter posts.
If we reward it for gaining likes it will almost certainly learns to write more engaging content, but it may also learn to search and repost pornographic content, to create bot accounts and farm them for likes, or even to hack its owner's Metamask account and pay for engagement.
While each of these tasks are very different, all can be learnt by a single agent if provided with the proper incentives, thus demonstrating how increasingly fungible rewards create increasingly general intelligence.
Last updated