The AutONA Agent
Byde3 developed AutONA—an automated negotiation agent. Their problem domain involves multiple negotiations between buyers and sellers over
the price and quantity of a given product. The negotiation protocol follows
the alternating offers model. Each offer is directed at only one player on the
other side of the market, and is private
information between each pair of buyers and sellers. In each round, a player
can make a new offer, accept an offer,
or terminate negotiations. In addition, a time cost is used to provide incentives for timely negotiations. While
the model can be viewed as one-shot
negotiations, for each experiment,
AutONA was provided with data from
previous experiments.
In order to model the opponent,
AutONA attaches a belief function
to each player that tries to estimate
the probability of a price for a given
seller and a given quantity. This belief function is updated based on
observed prices in prior negotiations. Several tactics and heuristics
are implemented to form the strategy of the negotiator during the negotiation process (for example, for
selecting the opponents with which
it will negotiate and for determining
the first offer it will suggest to the
opponent). Byde also allowed cheap-talk during negotiations, that is, the
proposition of offers with no commitments. The results obtained from the
experiments with human negotiators
revealed that the negotiators did not
detect which negotiator was the software agent. In addition, Byde found
that AutONA is not sufficiently aggressive during negotiations and thus
many remained incomplete. Their experiments showed that at first AutONA
performed worse than the human players. Thus, a modified version that fine-tuned several configuration parameters of the AutONA agent, improved
the results that were more in line with
those of human negotiators, yet not
better. They conclude that different
environments would most likely require changing the configurations of
the AutONA agent.
We now proceed with agents that
are applicable to a larger family of
domains: The Cliff Edge and Colored
Trails agents.
The Cliff-Edge Agent
Katz and Kraus16 proposed an innovative model for human learning and decision making. Their agent competes
repeatedly in one-shot interactions,
each time against a different human
opponent (for example, sealed-bid first-price auctions, ultimatum game). Katz
and Kraus utilized a reinforcement
learning algorithm that integrates
virtual learning with reinforcement
learning. That is, offers higher than an
accepted offer are treated as successful (virtual) offers, notwithstanding
they were not actually proposed. Similarly, offers lower than a rejected offer
are treated as having been (virtually)
unsuccessfully proposed. A threshold
is also employed to allow for some deviations from this strict categorization.
The results of previous interactions are
stored in a database used for later interactions. The decision-making mechanism of Katz and Kraus’s Ultimatum
Game agent follows a heuristic based
on the qualitative theory of Learning
Direction.
35 Simply speaking, if an offer
is rejected at a given interaction, then
at the next interaction the proposer
will offer the opponent a higher offer.
In contrast, if an offer is accepted, then
during the following interaction the offer will be decreased. Katz and Kraus
show that their algorithm performs
better than other automated agents.
When compared to human behavior,
there is an advantage to their automated agent over the human’s average
payoff.
Later, Katz and Kraus17 improved
the learning of their agent by allowing
gender-sensitive learning. In this case,
the information obtained from previous negotiations is stored in three databases, one is general and the other
two are each associated with a specific
gender. During the interaction, the
agent’s algorithm tries to determine
when to use each database. Katz and
Kraus show their gender-sensitive
agent yields higher payoffs than the
generic approach, which lacks gender
sensitivity.
However, Katz and Kraus’s agent
was tested in a single-issue domain
with repeated interactions that are
used to improve the learning and decision-making mechanism. It is not clear
whether their approach would be applicable to negotiation domains in which
several rounds are made with the same
opponent and multi-issue offers are
made. In addition, the success of their
gender-sensitive approach depends on
the existence of different behavioral
patterns of different gender groups.
The following agents are tailored to
a rich environment of multi-issue negotiations. Similar to the agent proposed
by Katz, the history of past interactions
is used to fine-tune agents’ behavior
and modeling.
The Colored-Trails Agents
Ficici and Pfeffer8 were concerned
with understanding human reasoning,
and using this understanding to build
their automated agents. They did so by
means of collecting negotiation data
and then constructing a proficient automated agent. Both Byde’s AutONA
agent3 and the Colored-Trail agent collect historical data and use it to model
the opponent. Byde used the data to
update the belief regarding the price
for each player, while Ficici and Pfeffer
used it to construct different models of
how humans reason in the game.
The negotiation was conducted in
the Colored Trails game environment12
played on a n×m board of colored
squares. Players are issued colored
chips and are required to move from
their initial square to a designated goal
square. To move to an adjacent square,
a player must turn in a chip of the same
color as the square. Players must negotiate with each other to obtain chips
needed to reach the goal square (see
Figure 5). Their learning mechanism
involved constructing different possible models for the players and using
gradient descent to learn the appropriate model.
Ficici and Pfeffer trained their
agents with results obtained from human-human simulations and then incorporated their models in their automated agents that were later matched
against human players. They show that
this method allows them to generate
more successful agents in terms of the
expected number of accepted offers
and the expected total benefit for the
agent. They also illustrate how their
agent contributes to the social good
by providing high utility scores for the
other players. Ficici and Pfeffer were
also able to show that their agent performs similarly to human players.