This week’s BEACON Researchers at Work blog post is by University of Texas graduate student Erkin Bahceci.
In this blog post, I will describe my research (with Risto Miikkulainen) on competitive multi-agent search, and in particular how I used evolutionary computation to optimize agent strategies. I have always been interested in how people come up with new ideas and novel solutions to problems, and similarly how companies create new products. Real-world product design is a complicated process subject to various constraints, which is difficult to model in full detail. However, a higher-level view might provide useful insights as well. Such an abstract approach can be applicable to a broader set of problems, that is, not just creation of new products, but any type of innovation search (such as art and design and scientific discovery).
One abstract way to model innovation search is to look at it as combining existing features or ideas to come up with a new product or idea. Let’s say we have a feature space in the form of a 3D cube, where each of the three dimensions corresponds to a feature, and each of the eight cube corners represents a potential product, that is, a combination of features. In this space, points can be encoded as bit strings, where 1 means the corresponding feature is included, and 0 means that feature is missing. For instance, the point 110 may represent a product with wireless communication and touchscreen, but without a camera, whereas 101 may represent a wireless device with a camera, but no touchscreen. Also, let’s say each point has a fitness value, denoting the potential success of the product that has those features, as shown in Figure 1. The goal is to find successful products, that is, points with high fitness values.
Of course, we are interested in having more than three features in our search space, which can be achieved by replacing the cube with an N-dimensional hypercube, which has 2N corner points instead of just eight. Also, to assign fitness values to points in a systematic way, we use the NK fitness landscape formalization, which is outside the scope of this post. On this fitness landscape, the goal of a company (or an agent) is to visit points with as high fitness values as possible. The agent can start with an initial point, and try improving on it by adding or removing a small number features, called exploiting, or it can jump to a drastically different part of the feature space by making a larger number of changes, called exploring.
Furthermore, innovation search is usually not done in isolation, but in a competitive environment, which means agents influence each other with their discoveries and inventions. Agents sometimes imitate other agents closely, and they sometimes merely get some inspiration from others’ work. This sort of agent interaction can be added to the abstract model above by letting agents exploit and explore around another agent’s shared points, using those points to start their search instead of one of their own private points.
Another aspect of the fitness landscape we use is that it is dynamic. That is, whenever an agent visits a point, the fitness of that point and nearby points change, which we call flocking. Two types of flocking are used. Initial agent visits to a point cause boosting, where the region rises in fitness, whereas subsequent visits lead to crowding, where the region sinks. These changes make it possible to model the dynamics of fitness landscapes in innovation search, where boosting corresponds to creating new markets (such as tablet computers) and crowding to the saturation of existing markets (such as desktops).
It would be useful to see such changes in the fitness landscape. However, visualizing the whole fitness landscape is a challenge, since it is not practical to have a 3D visualization of an N-dimensional hypercube with all its 2N points, where each point has N neighbors (since the flipping of each bit of an N-dimensional point produces a neighbor). To address this challenge, we came up with an alternative way to visualize the fitness landscape. Instead of trying to display all points and their neighbors, we only show the neighborhood around a focus point in full detail, for example, around one of the agents, with the resolution of displayed points diminishing in proportion to their distance to the focus point.
To identify agent strategies that are good at finding high-fitness points, we perform strategy optimization, by evolving artificial neural networks (in particular Compositional Pattern Producing Networks) using the NEAT method. When the agent’s current state is given as input, these evolved networks output a set of values (one for each action), which are then used to probabilistically pick an action for the agent to perform. The possible actions are exploiting or exploring using shared points or private points.
We evolved agent strategies in three setups: (1) one where the agent was evaluated in a single homogeneous environment that had opponent agents with identical strategies, (2) one where the agent was evaluated in multiple homogeneous environments (which takes much more time than the first setup), and (3) one where the agent was evaluated in a single heterogeneous environment, that is, against opponent agents that each had a different strategy (which is similar to the first setup in terms of time required).
The second and third setups had the goal of evolving general strategies that can perform well in multiple environments, whereas the first setup did not. The performance of the two general strategies produced by the second and third setups on a given environment was lower than the strategy evolved particularly for that environment in the first setup, which is expected. On the other hand, the strategies that the heterogeneous setup produced performed close to those of the multiple homogeneous setup in the same homogeneous environments that were part of that setup, even though the required time for the heterogeneous setup evolution was much shorter. This result is noteworthy and might be useful in other domains as well.
One of the observed behaviors was a wave riding strategy in certain environments. Short exploitation jumps allow an agent to ride a boosting wave, staying at the forefront of the area that is being boosted as it moves through the landscape, leaving a trail of past visited points that have sunk in fitness (Figure 2). A sample video of this behavior is available.
Another observed phenomenon in this domain was a Twitter effect: when agents share all of their knowledge openly, they are inclined to imitate each other more and follow the sa
me ideas, which reduces diversity and also overall fitness due to crowding (Figure 3), whereas moderate restrictions to openness improve diversity and may increase creativity in the long run.
In order to test these ideas in a real-world setting, I am currently working with a dataset of human behavior in a competitive multi-agent search task under laboratory conditions. The two main goals are modeling the human subjects in this dataset as agent strategies, and obtaining strategies that perform better than the human subjects, through optimization. In the future, these methods and results may be applied in various industries by utilizing archival data from companies.
For more information about Erkin’s work, you can contact him at erkin at cs dot texas dot edu.