What's the Problem?
Before discussing ways to prevent or cure problems with adaptive AI, we must first identify them. Picking out problems involves a combination of external observation and analysis of the internal workings of the system.
No Learning
Symptom: Learning does not occur, or happens inconsistently.
Example: The decision tree for weapon selection usually selects the weakest weapon, despite better alternatives being available.
Diagnostic: The model or the implementation is (partly) broken.
Remedy:
Debug the code and design. Verify the source code, comparing it to the theory. Validate the model by step-through analysis.
Uncontrollable
Symptom: The learning does not match specific results, or degenerates over time.
Example: The reinforcement learning animat does not retreat when it has low health, but instead attempts heroic attacks.
Diagnostic: The system is not equipped to reliably provide the desired control.
Remedy:
Use explicit ways to control the learning with supervision. Design an architecture to deal with the control problem without learning. Limit learning to other subsets of behaviors or actions. Decrease the learning over time as performance reaches satisfactory levels.
Suboptimal
Symptom: Learning does not reach the perfect result.
Example: The average error of a neural network used for target selection is high.
Diagnostic: The design does not assist the adaptation; the system relies on optimality.
Remedy:
Design the system such that suboptimality is not a problem. Provide hints to the learning by example (supervision) or guidance (feedback). Model the problem better so it's easier to find the best solution (for instance, expert features).
Unrealistic
Symptom: The behaviors are not realistic enough during the adaptation or at the end of the learning.
Example: Learning to aim causes the animat to spin around in circles for a few seconds.
Diagnostic: There is too much to learn; the policy is not designed for realism; the actions are inappropriate.
Remedy:
Learn as much as possible offline. Select a better policy that rewards safe exploration and exploitation. Design the actions at a higher level to reduce the unrealistic combinations.
|