This post is by MSU graduate student Matthew Andres Moreno
Hi! My name is Matthew Andres Moreno. I’m a graduate student finishing up my first year studying digital evolution with my advisor Dr. Charles Ofria.
Today, I’m going to talk to you about police detective work. Eventually, we’ll talk about evolvability and genotype-phenotype maps, but first let’s talk CSI.
Police Composite Sketches
Specifically, let’s think about how a police composite sketch works. First, someone sees a criminal and describes the face’s physical features with words. This description is the compact representation. Then, the police artist reconstructs the criminal’s face from the description.
Schematic of hypothetical police composite process. Mug shot and composite reconstruction were taken from the Crime Scene Training Blog.
Why does this work? It works because the witness has seen lots of faces and knows what the important bits to describe are. It works because the police artist understands the witness’ words and has also seen lots of faces — from experience, she knows that the mouth goes under the nose, the nose goes between the eyes, etc. and doesn’t need the witness to tell her absolutely everything about the face in order to draw it.
Well, autoencoders can also be used to reconstruct a corrupted input. This works something like a police sketch, too. Suppose that the criminal was wearing pantyhose that partially obscured his face. The witness can still describe the suspect’s face and the police artist can still draw it. Under the right conditions the missing part of the face can be reconstructed reasonably well.
Schematic of hypothetical police composite process with suspect in disguise (incomplete input). Mug shot and composite reconstruction were taken from the Crime Scene Training Blog.
Why does this work? It works because the witness can still see and describe part of the face. It works because the police artist understands the witness’ words and has also seen lots of faces — from experience, she can make a pretty good guess by cluing off the fact, for example, that faces have left-right symmetry or maybe that the criminal probably had a cheekbone and ear on the part of the face that was obscured. Again, because she’s seen lots of faces the police artist doesn’t need the witness to tell her absolutely everything about the face in order to draw it.
Deep-Learning and Autoencoders
The jig’s up.. it was all a set-up! A set-up, that is, to help you understand what autoencoders do. Unless you’re technically inclined, understanding exactly what autoencoders areisn’t particularly important for our discussion. Suffice it to say that what autoencoders are is a type of clever deep learning algorithm.
What autoencoders do is directly analogous to what the witness and police artist do. By looking at lots of examples of complex objects like faces, autoencoders learn to
- compactly describe the important features of a complex object (“encoding”, just like the witness) and
- reconstruct a complex object from that description (“decoding”, just like the police artist).
I’ll refer to these as the two powers of autoencoders.
The following graphic, a “latent space interpolation” between three faces, gives a neat glimpse of how autoencoders work and how powerful they are. The latent space refers to the set of all compact descriptions an autoencoder can read. To understand what’s going on here, let’s just look at the top row of images.
Autoencoder latent space interpolation with faces! Graphic from [White, 2016].
At the top-left, we see an image of a woman with curly hair. To get to the image immediately adjacent on the right, we use power 1 of autoencoders to generate a compact description and then use power 2 of autoencoders go reconstitute a face image.
Then, going left to right across the top row, things start to get interesting. We gradually change the compact description of the curly-haired woman until it matches the compact description of the red-haired woman on the far right. Each image shows an intermediate compact description that was reconstituted using power 2 of autoencoders. This visualization shows a very natural-looking transition between the two faces!
I won’t walk you through it, but the rest of the grid of images shown above was generated analogously.
What does any of this have to do with evolution? This year, I’ve been investigating how autoencoders can be useful as genotype-phenotype maps in digital evolution. One idea of how this can work: use power 2 of autoencoders (the “decoder”) as the genotype-phenotype map. In this scheme, the genotype lives in the latent space.
In order to drive home the implications autoencoder genotype-phenotype maps on evolvability let’s talk through a little thought experiment. Think back the problem of police face reconstruction we’ve been thinking about. Suppose we’re trying to evolve a face that, as judged by the witness of a crime, maximally resembles the perpetrator. (Yes, this is a real thing people do [Frowd et al., 2004]). To accomplish this, we start out with a set of random genotypes that map to different phenotypes (images). The witness selects the images that most closely resemble the suspect’s face. Then, we mutate and recombine the best matches to make a new batch of images for the witness to consider As we iterate through this process, hopefully we generate images that more and more closely resemble the suspect’s face.
Consider trying to evolve a facial composite using the direct genotype-phenotype map. Under this map, the intensity of each pixel of the image is directly encoded in the genotype. First of all, the randomly generated images wouldn’t look very much like faces at all — they’d look more like static. Supposing that we were actually able to eventually get to an image that vaguely resembles a face at all, then what? Is there a path of pixel-by-pixel changes that leads to the suspect’s face where every pixel-by-pixel change more closely resembles the perpetrator’s face? I’d argue we’d be likely to sooner or later get stuck at a dead end where the image doesn’t resemble the perpetrator’s face but pixel-by-pixel changes to the image make it look less like the perpetrator’s face (or a face at all).
Evolving the composite using the direct genotype-phenotype map probably won’t work well.
What if instead of having the genotype directly represent the image at the pixel level, encode genotypes analogously to a verbal description then use a police artist who can draw a suspect from verbal descriptions to generate phenotypes. This is analogous to what the our “decoding” genotype-phenotype map, accomplishes.
(For those who are curious, software to evolve police composites use an indirect genotype-phenotype map based on eigenfaces [Frowd et al., 2004].)
This work — which we call AutoMap — was, in part, inspired by recent efforts efforts to understand evolvability in terms of learning theory [Kouvaris et al., 2017]. We hope that this work helps to strengthen an explicit connection between applied learning theory (i.e., machine learning) and evolvability. We’re also looking forward to expanding on the exploratory AutoMap experimental work that we’re taking to GECCO this summer.
If you’re interested in more detail, This blog is based on a more in-depth (but still non-technical and fun!) introduction to our work with AutoMap, which you can find here. If you want to check out our technical write-up on AutoMap, you can find the PDF here and the paper’s supporting materials here.
Finally, thanks also to my AutoMap coauthors Charles Ofria andWolfgang Banzhaf.
Frowd, Charlie D., et al. “EvoFIT: A holistic, evolutionary facial imaging technique for creating composites.” ACM Transactions on applied perception (TAP)1.1 (2004): 19-39.
Kouvaris, Kostas, et al. “How evolution learns to generalise: Using the principles of learning theory to understand the evolution of developmental organisation.” PLoS computational biology13.4 (2017): e1005358.
White, Tom. “Sampling generative networks: Notes on a few effective techniques.” arXiv preprintarXiv:1609.04468 (2016).