Agent architecture

Abstract Architecture for Agents

  • Assume the environment may be in any of a finite set E of discrete, instantaneous states:
  • Agents are assumed to have a repertoire of possible actions available to them, which transform the state of the environment:
  • A run , r , of an agent in an environment is a sequence of interleaved environment states and actions:

Abstract Architecture for Agents

  • Let:
    • R be the set of all such possible finite sequences (over E and Ac )
    • R Ac be the subset of these that end with an action
    • R E be the subset of these that end with an environment state

State Transformer Functions

  • A state transformer function represents behavior of the environment:
  • Note that environments are…
    • history dependent
    • non-deterministic
  • If τ( r )=∅, then there are no possible successor states to r . In this case, we say that the system has ended its run
  • Formally, we say an environment Env is a triple Env = ⟨ E , e0 , τ ⟩ where: E is a set of environment states, e0 E is the initial state, and τ is a state transformer function


  • Agent is a function which maps runs to actions:
  • An agent makes a decision about what action to perform based on the history of the system that it has witnessed to date. Let AG be the set of all agents


  • A system is a pair containing an agent and an environment
  • Any system will have associated with it a set of possible runs; we denote the set of runs of agent Ag in environment Env by R (Ag, Env)
  • (We assume R (Ag, Env) contains only terminated runs)


  • Formally, a sequence
    represents a run of an agent Ag in environment Env =⟨ E , e0 ,τ ⟩ if:
    1. e0 is the initial state of Env
    2. α0 = Ag ( e0 ); and
    3. For u > 0,

Purely Reactive Agents

  • Some agents decide what to do without reference to their history — they base their decision making entirely on the present, with no reference at all to the past
  • We call such agents purely reactive :
  • A thermostat is a purely reactive agent


  • Now introduce perception system:


  • The see function is the agent’s ability to observe its environment, whereas the action function represents the agent’s decision making process
  • Output of the see function is a percept :
        • see : E Per
    which maps environment states to percepts, and action is now a function
          action : Per* A
    which maps sequences of percepts to actions

Agents with State

  • We now consider agents that maintain state:

Agents with State

  • These agents have some internal data structure, which is typically used to record information about the environment state and history.Let I be the set of all internal states of the agent.
  • The perception function see for a state-based agent is unchanged:
        • see : E Per
    The action-selection function action is now defined as a mapping
          action : I Ac
    from internal states to actions. An additional function next is introduced, which maps an internal state and percept to an internal state:
          next : I × Per I

Agent Control Loop

  1. Agent starts in some initial internal state i0
  2. Observes its environment state e , and generates a percept see ( e )
  3. Internal state of the agent is then updated via next function, becoming next ( i0 , see ( e ))
  4. The action selected by the agent is action ( next ( i0 , see ( e )))
  5. Goto 2

Tasks for Agents

  • We build agents in order to carry out tasks for us
  • The task must be specified by us…
  • But we want to tell agents what to do without telling them how to do it

Utility Functions over States

  • One possibility: associate utilities with individual states — the task of the agent is then to bring about states that maximize utility
  • A task specification is a function
        • u : E → ◊
    which associates a real number with every environment state

Utility Functions over States

  • But what is the value of a run…
    • minimum utility of state on run?
    • maximum utility of state on run?
    • sum of utilities of states on run?
    • average?
  • Disadvantage: difficult to specify a long term view when assigning utilities to individual states (One possibility: a discount for states later on.)

Utilities over Runs

  • Another possibility: assigns a utility not to individual states, but to runs themselves:
        • u : R → ◊
  • Such an approach takes an inherently long term view
  • Other variations: incorporate probabilities of different states emerging
  • Difficulties with utility-based approaches:
    • where do the numbers come from?
    • we don’t think in terms of utilities!
    • hard to formulate tasks in these terms

Expected Utility & Optimal Agents

  • Write P ( r | Ag , Env ) to denote probability that run r occurs when agent Ag is placed in environment Env

  • Then optimal agent Ag opt in an environment Env is the one that maximizes expected utility :

Bounded Optimal Agents

  • Some agents cannot be implemented on some computers
    (A function Ag : R E Ac may need more than available memory to implement)
  • Write AG m to denote the agents that can be implemented on machine (computer) m :
  • We can replace equation (1) with the following, which defines the bounded optimal agent Ag opt :

Predicate Task Specifications

  • A special case of assigning utilities to histories is to assign 0 (false) or 1 (true) to a run
  • If a run is assigned 1, then the agent succeeds on that run, otherwise it fails
  • Call these predicate task specifications
  • Denote predicate task specification by Ψ.
    Thus Ψ : R → {0, 1}.

Task Environments

  • A task environment is a pair ⟨ Env , Ψ ⟩ where Env is an environment,
        • Ψ : R → {0, 1}
    is a predicate over runs.Let TE be the set of all task environments.
  • A task environment specifies:
    • the properties of the system the agent will inhabit
    • the criteria by which an agent will be judged to have either failed or succeeded

Task Environments

  • Write R Ψ ( Ag , Env ) to denote set of all runs of the agent Ag in environment Env that satisfy Ψ:
  • We then say that an agent Ag succeeds in task environment ⟨ Env , Ψ ⟩ if

Achievement & Maintenance Tasks

  • Two most common types of tasks are achievement tasks and maintenance tasks:
    1. Achievement tasks are those of the form “achieve state of affairs Φ”
    2. Maintenance tasks are those of the form “maintain state of affairs Ψ”

Achievement & Maintenance Tasks

  • An achievement task is specified by a set G of “good” or “goal” states: G E

    The agent succeeds if it is guaranteed to bring about at least one of these states (we do not care which one — they are all considered equally good).

  • A maintenance goal is specified by a set B of “bad” states: B E

    The agent succeeds in a particular environment if it manages to avoid all states in B — if it never performs actions which result in any state in B occurring

Agent Synthesis

  • Agent synthesis is automatic programming: goal is to have a program that will take a task environment, and from this task environment automatically generate an agent that succeeds in this environment:
    (Think of ⊥ as being like null in Java.)
  • Synthesis algorithm is:
    • sound if, whenever it returns an agent, then this agent succeeds in the task environment that is passed as input
    • complete if it is guaranteed to return an agent whenever there exists an agent that will succeed in the task environment given as input

Agent Synthesis

  • Synthesis algorithm syn is sound if it satisfies the following condition:
    and complete if: