Summary
This chapter covered classifier systems, which can be seen as hybrid combination of different fields. The representation itself is a combination of rule-based system and genetic encodings in binary:
Each classifier has a condition that can use "don't care" symbols to generalize on the sensory input. The actions match directly with the effectors, without abstraction. The classifiers all store statistics about their recent application: predicted return, error (used to compute the accuracy), fitness, average size of the match set, and so forth.
The internal organization of classifier systems is surprisingly close to that of rule-based systems, notably including matching and conflict resolution phases. However, some key improvements allow the system to learn:
An evolutionary component is applied to the match set. If the set is empty, a random classifier is created. Otherwise, with a low probability, two classifiers are mutated and crossed over and reinserted into the population. The most redundant and least accurate classifiers are occasionally removed. To update the values of the classifiers (for instance, prediction and error), a reinforcement component uses the reward signal from the environment, together with existing estimates of the classifier's benefit.
Over time, this tends to find a good set of classifiers. Such technology can be applied directly to control problems where a solution needs to be learned online. However, the ideas can also be modified to various degrees to handle different representations (any categorical variables) and problems (evolutionary approach with fitness rather than reward).
Cascador uses a classifier system to learn to play deathmatch games according to a high-level performance indicator. Cascador benefits from hints provided by the designer as feedback. Still, the classifier system performs worse than a heterogeneous AI architecture, but manages to achieve (mostly) satisfactory behaviors autonomously.
|
|