The free energy principle tries to explain how (biological) systems maintain their order (non-equilibrium steady-state) by restricting themselves to a limited number of states.[1] It says that biological systems minimise a free energy functional of their internal states, which entail beliefs about hidden states in their environment. The implicit minimisation of variational free energy is formally related to variational Bayesian methods and was originally introduced by Karl Friston as an explanation for embodied perception in neuroscience,[2] where it is also known as active inference. -wiki
In late 2017, a group led by Rosalyn Moran, a neuroscientist and engineer at King’s College London, pitted two AI players against one another in a version of the 3D shooter game Doom. The goal was to compare an agent driven by active inference to one driven by reward-maximization.The reward-based agent’s goal was to kill a monster inside the game, but the free-energy-driven agent only had to minimize surprise. The Fristonian agent started off slowly. But eventually it started to behave as if it had a model of the game, seeming to realize, for instance, that when the agent moved left the monster tended to move to the right.After a while it became clear that, even in the toy environment of the game, the reward-maximizing agent was “demonstrably less robust”; the free energy agent had learned its environment better. “It outperformed the reinforcement-learning agent because it was exploring,” -wired article
Interesting article. The comment about minimizing surprise seems like it is overall surprise with regards to the entire environment. Were it to restrict to a fraction of the environment depending on environment it may in a sense do away with surprise, or at least what most notions of surprise would refer to, exploring the environment would seem to temporally increase surprise. But if it is with regards to the entire environment, to minimize surprise it has to indeed be explored, and would only minimize through exploration. One might imagine that if one talked about the true environment, encompassing all possible information, assuming that is in the right track, extended undefined length exploration should occur given any real world environment is a subset.
Hmmm, this may relate to some human behaviors. But if we take humans to be agents and the earth to be the environment, most humans in ancient times tended to only explore a very limited portion of their environment, far smaller than they could potentially explore if they tried to maximize exploration throughout a lifetime. Humanity as a whole did end up exploring pretty much all the earth through multiple generations of explorers, but a lone human did not tend to do this.
Even in today's world humans tend to limit their exploration of the environment and there are individuals with quite preset routines that don't leave their small hometown throughout their entire lives. In the middle ages, and before, once farming was established this tended to limit explorations a far of the farmer, given he was provided with resources locally.
It does seem that the ability to reward the ability to predict as well as the ability to reward the finding of novelty, a curiosity like reward, may be present. But the existence of reinforcement like mechanisms maximizing certain reward signals, some of which may be elicited by specific sensory signals, seems to also be present.
No comments:
Post a Comment