3. Agents based on utility
The objectives itself are not sufficient to generate a
behavior of great quality in the majority of the environments. For example,
there are many sequences of actions which will lead the taxi to its
destination, but some are faster, sourer, more reliable, or less expensive than
others. The objectives alone provide a raw binary distinction between the
states of "happiness" and "sadness", whereas a measurement of more general
efficiency should allow a comparison between various states of the world
compared to the exact level of happiness than the agent reaches when it arrives
in a state or other. As the "happiness" term does not sound more scientific,
the traditional terminology uses in these cases to indicate that we prefer a
state of the world is a state with more utility than another
for the agent.
A function of utility projects a state (or
a sequence of states) on a real number which represents a level of happiness.
The complete definition of a function of utility makes it possible to make
rational decisions in two types of case in which the objectives are inadequate,
when there is conflict of objectives and that only some of them must be reached
(for example, speed and safety), the function of utility determines adequate
balance. Secondly, when there are several objectives and that none of them can
be reached with certainty, the utility provides a mechanism to balance the
probability of success according to the importance of the objectives.
4. Learning agents learning
An agent which learns can be divided into four conceptual
components that can be shown in the following figure:
General structure of the learning agents
The more significant distinction between the element
of training and the element of dealing is that former
first has the responsibility to make improvements and the later has the
responsibility for the choice of the external actions. The element of dealing
is what we had considered before as the complete agent: it receives the stimuli
and determines the actions to realize. The element of apprenticeship supplies
with criticisms on the schemes of the agent and determines how
the element of schemes must change to provide better results in the future.
The design of the element of training depends much on the
design of the element of dealing. When we try to design an agent which has the
capacity to learn, the first question is be to answered is "how to teach to
learn?" if not "Of which type of element of dealing the agent needs to achieve
its objective, when it learned how to do it?". I view of a design for an agent;
we can build the mechanisms of training necessary to improve each part of the
agent.
Criticism indicates to the element of training how the agent
acts compared to a level of fixed dealing. Criticism is necessary because
perceptions itself do not envisage an indication of the success of the agent.
So, it is significant to fix the level of dealing.
The last component of the agent which learns is the
generator of the problems. It suggests actions which will lead the
agent towards new and informative experiments. What is interesting is that if
the element of dealing goes on its way, it can continue to achieve better
actions, in view of its knowledge. But if the agent is laid out to explore a
little, and to achieve actions which are not completely optimal in the short
run, it can discover better long-term actions. The work of the generator of the
problems is to suggest these exploratory actions. It is what the scientists do
when they realize the experiments.
To carry out a complete design, we can reuse the example of
the automated taxi. The element of dealing consists of the collection of
knowledge and procedures which the taxi has to choose its actions of control.
The taxi is started and circulates by using this element of dealing. Criticism
observes the world and provides information to the element of training. For
example, after the taxi goes to opposite band (i.e. on its left) in a fast way,
criticism observes the scandalous language which use of other drivers. From
this experiment, the element of training is able to formulate a rule which
indicates that "to pass quickly to the opposite band" is an ill deed, and the
element of dealing changes by incorporating the new rule. The generator of the
problems must identify certain zones of behavior which must improve and suggest
experiments.
The element of training can make exchanges in any of the
components of "knowledge" which are shown in the diagrams of the agent. The
simpler cases include the direct training starting from the perceived
sequences. The observation of a certain number of successive states of the
environment can allow that the agent learns "how the world evolves/moves", and
the observation of the results of its actions can allow that the agent learns
"what make its actions". For example, if the taxi exerts a certain pressure on
the brakes when it is circulating on a wet road, it knows how the vehicle
decelerates. In light, these two tasks of training are more difficult if there
is only one sight partial of the environment.
The kinds of training shown in the paragraphs up do not
require the access to the levels of external dealing, in a certain manner, the
level is that used universally to make forecasts in accordance with the
experimentation. The situation is slightly more complex for an agent based on
the utility which wishes to acquire information to create its function of
utility.
|