CONSTAX: a tool to simplify and improve taxonomic classification of community sequences

By: Natalie Vande Pol (PhD Candidate, Michigan State University)

I am a 5th year PhD student in the Microbiology and Molecular Genetics program at Michigan State University. This is the story of a side project that has been one of the most enjoyable and rewarding undertakings in my PhD career. CONSTAX was the first project on which I was a key contributor. The co-first authors both worked in community ecology and they wanted to develop a tool, but they needed some help writing Python scripts. That’s where I came in.

Community ecologists use a technique called amplicon sequencing, in which they extract DNA from a substrate (e.g., soil, plants, water) and sequence-specific genes that they then use as a “barcode” to identify the organism from which the DNA originated (Figure 1). In bacteria, this barcode is the 16S ribosomal RNA gene. In fungi, we generally use one of two ribosomal regions: ITS1 or ITS2. Ecologists use these barcode sequences to study pooled communities of organisms, allowing comparison of community structure between different conditions (e.g., healthy v. diseased gut/plant). Think of it like a census for soil fungi. These comparisons can sometimes indicate organisms that are important to causing, preventing, detecting, or recovering from a given characteristic or disturbance.

Figure 1: Community barcoding. Barcode genes amplified from different organisms have small differences in sequence. So long as a sequence for that organism is included in the reference database, that sequence can be “translated” back into an organism name.

One of the most important steps in a community analysis pipeline is to “translate” the barcode DNA sequences from the sample into the names of the organisms from which they originated. This is done by comparing sample sequences to reference sequences from known organisms, just as a barcode in a grocery store needs a computer reference to tell the cashier whether you are buying cilantro or parsley. With DNA sequences, the identification algorithm used to match up the sequences is called a classifier. Using different reference databases or different classifiers can yield different identifications.

To illustrate what happens with different classifiers, imagine you and two of your friends are all taking the same test. All three of you get 80/100 questions correct on the exam. However, when you compare your exams, you realize that while you all had 75 questions in common, the other 5 correctly answered questions were unique to each of you. So, on the surface your performances seem identical, but are in fact a bit different. Similarly, using a single classifier and different reference databases is analogous to each of you three taking the same exam having studied from three different textbooks (assuming otherwise identical performance). Your scores on the exam would probably vary.

Fortunately, for fungal research, UNITE is a well-curated reference sequence database, so the largest source of variation is between classifiers. Just as described in the first analogy above, different classifiers use different algorithms to assign taxonomies and estimate confidence/error rates, making it difficult to select a single classifier as the “best”. Therefore, our two community ecologists and I set out to develop a tool that eliminated the need to choose just one! If you and your three friends could collectively take that exam, you could have gotten 90/100, instead of just 80/100 on your own.

First, we chose the most commonly used and most recently developed classifiers: Ribosomal Database Project (RDP), UTAX, and SINTAX. We wrote a series of custom scripts to format the UNITE reference database to be compatible with each of the classifiers and ran our sequence datasets through each of the classifiers. Finally, we used Python scripts to standardize the output formats. This was all packaged and is automated by a single shell script (Figure 2). Users simply place their input files in the specified folders and provide the names and desired parameters in a configuration file.

Figure 2: The CONSTAX workflow. The portion highlighted in the gray box is automated through a single master script to ensure ease of use.

For each sequence, we compared the three assignments given for each taxonomic rank (Kingdom, Phylum, Class, Order, Family, Genus, and Species). If the confidence score for a given assignment was below a threshold value, that and all further taxonomic ranks were considered “Unidentified” for that sequence. In most cases, the three classifiers agreed on the taxonomy assigned. However, there were cases in which they disagreed, whether because one (or two) of the classifiers yielded an Unidentified, or because there were multiple different, confident assignments (Table 1). With three classifiers, we decided to implement a simple majority rule. Since classifiers provide an estimated confidence in taxonomic classifications, we used confidence scores to break ties.

Table 1: The CONSTAX Voting Rules.

We tested our tool on four different datasets from three different studies: barcode gene ITS1 or ITS2 of fungi from Soil or Plants (Figure 3). And it worked! Cross-referencing three classifiers corrected misassignments and improved overall performance. At the Kingdom level, the consensus taxonomy was only ~1% improved as compared to any individual classifier. However, higher levels had much stronger improvement, on average 7-35%, depending on the taxonomic level and the individual classifier. The mean improvement in performance by CONSTAX over individual classifiers is slightly over-estimated due to particularly poor classification by UTAX, which had the most Unidentified levels.

Figure 3: CONSTAX Performance. a) Soil fungi barcoded with the ITS1 gene, from Smith & Peay. b) Soil fungi barcoded with the ITS2 gene, from Oliver et al. c-d) Plant fungi with the c) ITS1 and d) ITS2 genes, from Angler et al.

What’s next for CONSTAX?

First, we would love to develop our tool to be compatible with bacterial community sequences. Fortunately, the classifiers were all written for bacterial community analysis in the first place! Unfortunately, the reference databases are either out of date or so poorly curated as to have misidentified reference sequences and some convoluted taxonomies. Bacteria seem to be renamed rather frequently and it’s difficult to know whether the assignment given is still correct. We focused our preliminary efforts on the SILVA database, as it is the most up-to-date, but it has some serious formatting issues, among other things. In theory, there should be 7 taxonomic ranks. A significant proportion of the SILVA taxa have 4-13 levels, requiring manual correction to determine the appropriate classification for each of the 7 expected levels. At least in fungi, the different taxonomic ranks have consistent suffixes that can be used to identify gaps/insertions and correctly place the ranks. In bacteria, suffixes only seem to be consistent within particular lineages, so I would only be able to fix one group at a time, and quite often the canonical seven taxonomic levels simply don’t exist for some bacterial lineages.

Secondly, we are very interested in incorporating new classifiers into our tool. UTAX, in particular, is becoming obsolete and had the highest rate of “Unidentified” taxonomic assignments. While this may make our tool look good, it’s not really representative of the best we can do. However, an even number of classifiers makes “voting” on a consensus assignment more complicated and we would prefer to have a more elegant and sound basis for breaking ties than just comparing confidence scores, since those metrics are each calculated slightly differently and don’t mean quite the same thing. It’s an excellent starting point, but future work in this area would be served by a more thorough evaluation of disagreements between classifiers.

If you’re interested in more detail or in using CONSTAX for your own research, this blog is based on our publication, which you can find here and our code repository is on GitHub.


Agler MT, Ruhe J, Kroll S, Morhenn C, Kim S-T, Weigel D, et al. (2016) Microbial hub taxa link host and abiotic factors to plant microbiome variation. PLoS Biol. 14(1):e1002352–31.

Gdanetz, K., Benucci, G. M. N., Pol, N. V., & Bonito, G. (2017). CONSTAX: A tool for improved taxonomic resolution of environmental fungal ITS sequences. BMC bioinformatics18(1), 538.

Oliver AK, Mac A, Jr C, Jumpponen A. (2015) Soil fungal communities respond compositionally to recurring frequent prescribed burning in a managed southeastern US forest ecosystem. For Ecol Manag. 345:1–9.

Smith DP, Peay KG. (2014) Sequence depth, not PCR replication, improves ecological inference from next generation DNA sequencing. PLoS One. 9(2):e90234–12.

Posted in BEACON Researchers at Work | Comments Off on CONSTAX: a tool to simplify and improve taxonomic classification of community sequences

Engineering the Tools of Genetic Engineers

By: Melody Keith, Bibiana Toro, Kim Ly, Alyssa Braddom, and Eleanor Young, University of Texas at Austin undergraduate researchers.

The University of Texas at Austin’s 2018 International Genetically Engineered Machine (iGEM) team is a group of students whose aim is to use synthetic biology to solve real world problems. We are a diverse group of upper and underclassmen who come from fields such as biochemistry, biology, and neuroscience. For some of us, this was our first in-depth research experience. For others, it was a chance to apply years of experience to an exciting project (Figure 1). Not only did we utilize microbiology techniques, but we were able to collaborate with researchers from Rice University and Texas Tech University, create a visually compelling poster and oral presentation, and design a website.

Figure 1: Students at the annual iGEM conference.  Eleanor Young, Melody Keith, Alyssa Braddom, Bibiana Toro, and Kim Ly with their poster.

This summer we came together to engineer a solution to a problem many scientists face in their lab every day: how to genetically manipulate non-model organisms. Bacteria are able to accomplish feats that human technology has yet to realize. They are better at producing certain materials, they can manufacture medicines, and some survive in extreme environments. However, these organisms that have incredible potential are usually non-model, which is to say, not the ones we work with in standard research labs. Biologists generally use only a handful of organisms, such as E. coli, because they are better understood and well characterized.  Therefore, we know how to genetically engineer them. However, these organisms do not always have the internal molecular machinery necessary to produce the molecule(s) that scientists desire. Scientists must instead engineer a bacterium that is unfamiliar to them, for which good protocols may not yet be established. Opportunities for failure pervade this process.

Our solution to this common problem is the Broad Host Range Kit, a combination of plasmid parts and fully assembled plasmids, that allows a researcher to test many plasmids at once and then build their own plasmid with their own coding sequence of interest. It relies on a molecular cloning method known as Golden Gate Assembly, which allows genetic parts to be easily assembled and interchanged. These plasmid parts are classified according to their function, or type, and many hours were spent cloning desired parts from template sequences. We built assembly plasmids out of these plasmid parts, varying some parts while conserving others. These varied regions were the reporter gene, encoding a fluorescent protein or chromoprotein, the origin of replication, which allows the bacteria to make copies of the plasmid, the barcode region, a short DNA sequence identifying each plasmid, and the antibiotic resistance gene. Origins of replication were chosen that are known to be broad host range; they function in a wide variety of different bacteria.

Each assembly plasmid, which we call a “Pioneer Plasmid”, has the origin of replication coupled to a specific reporter and barcode. Therefore, when the plasmid is inserted into the bacteria of interest, the origin of replication can be determined just by looking at the color of colonies on the plate. If for some reason, the bacteria can’t express the reporter, the barcode can be sequenced, which adds a layer of redundancy to the system.

Our kit relies on the “One Tube Method”, which puts all the Pioneer Plasmids into a single tube, so the mixture must only be transformed into the non-model organism of interest once (Figure 2). The transformation is then plated onto various antibiotic plates and the origin of replication that functions can be determined by visual inspection of the plate alone (Figure 3). Out kit, therefore, speeds up the process of finding out which broad host range origin functions in non-model organisms and contains 8 fully assembled pioneer plasmids with 3 different origins and over 40 part plasmids. We’ve also test the One Tube Method in Vibrio Natriegens and Serratia Marcescens.

Figure 2: The Kit in Action.  Schematic of how the One Tube Method works.  After transformation, screening for color reveals the plasmid each colony contains. The DNA in non-colored colonies can be extracted and sequenced to identify the plasmid within.  The identity of the plasmid reveals which genetic parts are functional in your bacteria.

Figure 3: Screening Colonies.  Seven plasmids containing different reporter genes transformed into E. coli.  The plate on the left is in natural light, while the plate on the right is under blue or UV light.  The different colonies are highlighted along with the identified reporter in each.

Members of our team had the privilege to attend the iGEM “Giant Jamboree” in Boston, where over 250 teams from all around the world met to present their research, but also to network, collaborate, and share ideas. It was a stimulating and rewarding conference. For example, the team from the National University of Singapore expressed luteolin in E. coli as an eco-friendly alternative to toxic yellow textile dyes. Cornell University produced a genetic circuit that would respond to frequency variable input signals to specifically regulate expression. Both presentations inspired one of our members to investigate optogenetic regulation and engineering of photoswitches for a project currently in progress.

The scientists and undergraduate researchers we interacted with commented that engineering non-model organisms was a problem that they too encountered in their own labs daily. They provided excellent feedback and suggestions, such as showing our kit functions in a particular culture collection or using software to design a combinatorial library of Pioneer Plasmids.  We also spoke to teams interested in acquiring and using our kit. For instance, the team from the Indian Institute of Technology Madras want to use the BHR kit to engineer Acinetobacter baylyi in order to produce biofuels from the degradation of aromatic compounds.

We also had the opportunity to hear from illustrious keynote speakers Dr. Ingrid Pultz, Jason Kelly, and George Church. In particular, the way Jason Kelly talked about the field of synthetic biology, and its enormous potential to revolutionize industries from agriculture to materials to medicine, was inspirational. They made the career path many of us are on feel tangible, achievable and bursting with opportunity. To hear them speak, changing the world seemed not only possible, but just within reach.

Presenting our own team’s research was a dynamic experience that highlighted the grit and focus it takes to practice, refine, and effectively communicate a project which took months to produce (Figures 1 and 4). We left feeling empowered, knowing this experience, being a member of the iGEM team, had given us the skills to successfully achieve every step of the research process from initiation to generating results to finally synthesizing a message about the meaning and impact of that work.As we continue to improve the Broad Host Range Kit, we know that the small steps we take in the lab everyday can translate into larger benefits for both the scientific community and the world.

Figure 4: Melody Keith presenting the students’ work. Bibiana Toro (right) and Eleanor Young (not pictured) also participated in the oral presentation.

Posted in BEACON Researchers at Work, BEACONites | Comments Off on Engineering the Tools of Genetic Engineers

Hyenas & Microbes

By: Connie Rojas, PhD Candidate at Michigan State University

It has been a year of traveling! Earlier this year, I traveled to the Masai Mara National Reserve, Kenya (MMNR) to conduct my field work, and currently, I am in Mexico City doing a visiting scholarship at the Universidad Nacional Autonoma de Mexico (UNAM)! In between, I visited UC Berkeley for the 2018 Science and Technology Centers (STC) Director’s Meeting and San Antonio, TX for the Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) annual conference, where I shared recent research findings as part of a symposium with other BEACONITES.  I am extremely grateful that my dissertation research on host-microbe interactions and the spotted hyena (Crocuta crocuta) microbiome has a strong field, laboratory, and computational component that allows me to travel and work with different collaborators!


I am a 4th year PhD Candidate in Dr. Kay Holekamp’s behavioral ecology laboratory at Michigan State University, in the Department of Integrative Biology and the Ecology, Evolutionary Biology, and Behavior program (EEBB). For my dissertation research, I am using next-generation sequencing technologies to assess how microbes and host-associated microbial communities (‘microbiome’) affect their host’s physiology, fitness, and behavior, and how host themselves, are influencing their microbiomes. I study these questions in a wild population of spotted hyenas. Hyenas are highly social carnivores and apex predators inhabiting much of Sub-Saharan Africa. Their societies are structured by linear dominance hierarchies, wherein an individual’s position in the hierarchy determines its priority of access to resources. Their social groups are also characterized by female dominance and male-biased dispersal.


For my field work, I traveled to my laboratory’s field camp at the MMNR and lived there for 4 months! I conducted 3 projects, all which investigated the role of microbes and microbiomes in shaping their host’s phenotype. One project involved me swabbing decomposing beef daily in order to survey microbial community succession across various stages of decomposition in the savanna environment. I wanted to emulate the environment and decomposition process of the carcasses hyenas eat and determine the types of beneficial and harmful microbes hyenas are acquiring this way. My second project was tons of fun; I conducted scent discrimination trials to ascertain if hyena scent gland secretions and their odors, which are hypothesized to be produced by microbes, contain information about the sender’s age, sex, and residency. I presented juvenile and adult female hyenas with the paste of two strangers (i.e. an immigrant male vs. adult female) and recorded the amount of time they spend sniffing each specimen. If my analyses show that hyenas spend a differential amount of time sniffing the paste samples, then this would indicate that the samples encode different information, and more importantly, that microbes are indeed contributing to their host’s chemical signaling! My last project was also very enjoyable and allowed me to interact with other animals in the reserve. I collected fecal samples from various species of antelope, elephants, and baboons to determine the role of host phylogeny in structuring the gut microbiomes of mammals in the savanna.

Right now, I am working with Dr. Valeria Souza (who gave an EEBB seminar in 2017; that is where I met her!) and Dr. Luis Eguiarte from the Institute of Ecology at the Universidad Nacional Autónoma de Mexico (UNAM) on the bioinformatics portion of my BEACON-funded gut microbiome project. This project investigates the socio-ecological drivers of gut microbiome structure and function in spotted hyenas, as well as its stability, its transmission across generations, and its potential to act as a reservoir for antibiotic resistance. We are using shotgun metagenomics (i.e. whole genome sequencing) to profile gut microbial community function and determine the metabolic pathways being provided by the community as a collective. Specifically, in their lab, I am being trained on the assembly, binning, annotation, and phylogenetic profiling of shotgun metagenomic data. From this data, we will be able to profile the taxonomic composition of the hyena’s gut bacterial communities, reconstruct the hyena’s diet, and survey the diversity of antibiotic resistance genes harbored by the community. We will also determine the relative importance of viruses in driving the evolution of these gut microbiomes and assay their heritability across generations and within an individual’s lifetime. The bioinformatic analyses are challenging and time-intensive, but I am making progress and it has all been very fun! Apart from work, I have been spending lots of time getting to know the city, eating as many tacos as I can, and making friends.

Although I am not looking forward to the bitter cold when I return to Michigan in January, I am looking forward to teaching my first class, taking a course on teaching college science, and co-organizing the 2019 EEBB Research Symposium, among other things. Until then, I am going to make the most of my time here in this great city!

Posted in Uncategorized | Comments Off on Hyenas & Microbes

Avida-ED in Action

BEACON scientists and educators featured in MSU Today reporting on a recent publication highlighting the use of Avida-ED in a newly developed undergraduate biology course. The course IBIO150 Integrative Biology: From DNA to Populations, was developed with non-Biology STEM majors in mind, students who need a rigorous major’s level course that covers core concepts emphasized in undergraduate biology reform. The unique component of the course is the incorporation of a digital evolution lab that uses Avida-ED, featuring series of exercises designed to address important concepts in evolutionary biology. In addition, students complete independent research projects. The incorporation of Avida-ED into the course is supported by current grants to the College of Natural Sciences through the Howard Hughes Medical Institute, in addition to an Improving Undergraduate Science Education (IUSE) grant Active LENS: Learning about Evolution and the Nature of Science. Avida-ED and other resources developed by BEACON have recently been featured in a video produced for the National Science Teachers Association (NSTA) TV.

Posted in BEACON in the News, Education, Evolution 101 | Comments Off on Avida-ED in Action

Ecology/Evolution Scientific Symposium at the SACNAS National Conference

The Society of Systematic Biologists (SSB) and BEACON have collaborated to organize a scientific symposium at the Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) national conference in San Antonio, Texas, on October 11th, 2018 (10:30AM-12:00PM, Room 225B). The symposium title is “It’s Complicated: The Ecology and Evolution of Microbes and Their Hosts.”

Host-microbe interactions are ubiquitous and often drive evolution. Microbial parasites or pathogens harm hosts, whereas other host-associated microbes are beneficial or even necessary for host health. Diverse scientists at the forefront of the ecology and evolution of host-microbe interactions will provide a synthetic perspective of research on this important topic.


  • Connie Rojas, PhD Candidate – Michigan State University
  • Luis Zaman, PhD – LSA Collegiate Postdoctoral Fellow, University of Michigan
  • Lisa Barrow, PhD – NSF Postdoctoral Fellow, University of New Mexico
  • Kat Milligan-Myhre, PhD – Assistant Professor University of Alaska Anchorage

SACNAS is a national organization focused on increasing the proportions of underrepresented minorities in science, technology, engineering, and math (STEM) fields. It is the largest multicultural and multidisciplinary STEM diversity organization in the country. In 2017, the National Conference was attended by 3,845 scientists from diverse backgrounds, with 77% of the participants members of ethnic/racial groups that are significantly underrepresented in STEM fields.

Many people may not realize the complicated and often important relationships occurring all around (and inside!) us with microbes and their hosts. As a result of attending our scientific symposium session, attendees will learn about: (1) exciting, recent advances in science research that illuminate how microbes can drive the ecology and evolution of their hosts; (2) the questions and approaches for studying host-microbe and host- parasite interactions; (3) career options within ecology and evolution from presenters at diverse career stages;  and (4) the central role of ecology and evolution to the life sciences. Read the abstracts, below, for more details about each talk.

In addition to the scientific symposium, BEACON and SSB are also sponsoring a day-long Ecology/Evolution field trip on October 13th to Mitchell Lake Audubon Center and the San Antonio Zoo. The symposium and field trip were co-organized by Eve Humphrey, Maurine Neiman, and Alexa Warwick, with support from the Society for the Study of Evolution Diversity Committee and Education and Outreach Committee. The speakers, organizers, and other attendees will also participate in an Ecology/Evolution session of “Conversations with Scientists” to share scientific career options on October 11th (5:45-7:15PM, Room 221C). If you’ll be at SACNAS 2018, we hope to see you at one or all of these events!

Connie Rojas – Host and Ecological Traits Shape the Structure, Function, and Diversity of the Gut Microbiome in Wild Spotted Hyenas

Animal bodies harbor complex microbial communities, hereafter termed microbiota, that exert profound effects on their physiology, behavior, and evolution. In the mammalian gastrointestinal tract, resident microbes are known to synthesize essential vitamins, supply their host with energy released from the fermentation of indigestible carbohydrates, competitively exclude pathogens, and promote immune system and tissue development. In spotted hyenas (Crocuta crocuta) meerkats (Suricata suricatta), and ring-tailed lemurs (Lemur catta), microbiota inhabiting scent-gland secretions co-vary with the gland’s odorous metabolite profiles and contain well-documented odor producers, indicating they likely contribute to their host’s chemical signaling behaviors. Furthermore, in four species of insectivorous bats, bacteria isolated from the skin exhibit anti-fungal properties against the causal agent (Pseudogymnoascus destructans) for white-nose syndrome, suggesting a beneficial role of these microbes in pathogen defense. However, despite the importance of the microbiota, we know little about the forces shaping its structure and function, especially in wild animal populations. Here, I use 16S rRNA gene sequencing technologies to a) survey the gut microbiota of wild spotted hyenas and b) investigate the host social and ecological factors affecting the gut microbiota. Specifically, I assay if gut microbiota diversity and structure vary with hyena age, reproductive state, group size, and temperature and precipitation. Overall, this research will contribute to our understanding of the ways a host shapes its microbial communities, and how microbial communities, in turn, influence their host’s behavioral phenotype. In this talk I will also share a bit about my journey as a scientist from the perspective of a Latina first-generation college student and daughter of immigrant parents.

Dr. Luis Zaman –  Experimenting with Digital and Microbial Evolution

My path to evolutionary biology was unusual. I started as a computer scientist and ended up working in a wet lab with microbes and viruses. I’ll talk about how I ended up where I am, what I’m doing now, and why disciplined and undisciplined science are important in research.





Dr. Lisa Barrow – Variable Host Susceptibility and Enigmatic Parasite Distributions: Insights from Museum Collections and Genomics of Avian Haemosporidians

There are several outstanding questions in ecology and evolution of host-parasite interactions. Why do host species vary so drastically in their susceptibility to parasites? How localized or widespread are different parasites? What are the environmental and host range limits to parasite distributions? These questions are particularly important given the predicted influence of climate change on species distributions and the potential for emerging infectious diseases. Avian haemosporidians are intracellular parasites that infect birds across the globe, sometimes with devastating consequences. Together with multi-institution, student-driven teams, we have been tackling two complex avian haemosporidian systems in Peru and New Mexico, USA. Using molecular and microscopic screening of extensive museum collections, we found that ~35% of birds are infected. In Peru, we screened nearly 4,000 birds representing 40 families and 523 species. After accounting for several environmental, life history, and ecological predictors of infection, we found that host phylogeny explains substantial variation in infection rate. In other words, susceptibility is deeply conserved across the avian tree, and is likely related to conserved aspects of the immune system. In New Mexico, we sampled avian haemosporidian communities in three mountain ranges to better understand the limits to parasite distributions. Haemosporidian communities exhibit structure on fine spatial scales, with most lineages occurring in a single mountain range, but a few widespread generalists infecting multiple host species. Ongoing work incorporating new genomic methods is improving estimates of the host and environmental range limits of haemosporidians, providing important baselines for identifying potential host switches or range expansions.

Dr. Kat Milligan-Myhre – Use of an Evolutionary Model to Determine the Role of Host Genetic Background on Microbiota

Microbiota are the microbes that live in and on a host. Disruption of the microbiota can lead to painful inflammation in the host, which can become chronic, as in the case of inflammatory bowel disease. Our lab focuses on the role the host genes play on the relationship between the microbiota and their host. Thus, we adapted the evolution and biomedical model organism, threespine stickleback fish (Gasterosteus aculeatus), for host-microbe studies. Stickleback are ideal for these studies due to their large family sizes, genetic variation within and between populations that is similar to human genetic variation within and between populations, and the tools available to study these interactions. We compared the development and behavior in fish raised germ free, with conventional microbiota, with mock communities of up to eight microbiota members, or with microbiota disrupted by antibiotic or environmental contaminants. We found that the populations varied in their response to these manipulations, indicating that the genetic variation between the populations contributed greater to the relationship between microbes and the host than the variation within the populations. We will use these results as a basis for future studies to identify the critical windows in development in which disruptions to gut microbiota result in short- and long-term consequences to host health, and determine the extent to which the host genetic background contributes to the ability of healthy gut microbial communities influence to fitness.

Posted in BEACON Researchers at Work, BEACONites, Diversity in STEM, Member Announcements | Tagged | Comments Off on Ecology/Evolution Scientific Symposium at the SACNAS National Conference

Evolutionary Computation Experts Video Collection

This blog post is by Risto Miikkulainen [1,2], Paul Jarratt [2], and Andrew Turner [2] from (1) The University of Texas at Austin and (2) Sentient Technologies

Given recent advances in evolutionary computation technology, available computational power, and opportunities for AI in the real world, we believe evolution is on the verge of a breakthrough, i.e. becoming the next Deep Learning. In order to chart the possibilities as well as challenges, we sat down at Sentient and at GECCO 2018 with a number of EC experts in both academia and industry to share their ideas about where AI is heading, and the role evolutionary computation can play in its future. The result is a collection of video interviews; they are organized around a number of specific questions so that you can explore those of interest to you. You can check out the collection at

Sentient plans to add more experts to this page in the future, so let us know if you’d like to contribute your point of view to this collection!

Posted in BEACON Researchers at Work, Education, Videos | Tagged , , , | Comments Off on Evolutionary Computation Experts Video Collection

A tale of scales and feathers… (ice and volcanoes)

This blog post is by MSU postdoc Murielle Ålund.

It all started with a-real life tetris game: trying to fit six gigantic coolers full of field gears, luggage, two kids and three adults into what was going to be our field car. And yet Greg Byford (Boughman Lab Research Technologist) had asked the rental agency for the biggest car they had!

Once finally on the road, as the amazing Icelandic landscapes were unfolding at each turn, I remember thinking that the trip – (and the postdoc position and the big move oversea with my family)- were already worth it, no matter how field work would go. And Iceland did not disappoint! It was such an amazing experience to spend almost six weeks in this beautiful country and to learn about Icelandic culture and fish.

Yes, the fish! I should probably have started with them, this is not a travel blog after all! The reason I got to visit such amazing places is that we are studying Threespine sticklebacks (Gasterosteus aculeatus) and how they adapt to their local environments in the Artic, a project led by Dr Janette Boughman. These tiny fish are impressive in their ability to colonize new habitats and adapt to very different environmental conditions all over the Northern hemisphere. They are very well known for their ability to repeatedly invade freshwater bodies from an original marine morph, which comes with drastic changes associated with the differences in salinity, food and predators. As Icelandic glaciers have been retreating for thousands of years and continue to do so at an accelerated rate (with a current estimated yearly surface loss of 0.2%)1, new lakes are continuously formed that can eventually be colonized by different populations of fish.

These glacial lakes are fed by constantly melting ice, and are thus extremely turbid, so much so that you can see the difference in color by looking at satellite images of the area! Once you have lost a trap in one of these lakes and realize that you cannot even see your own fingers in the water (making for an interesting time trying to find said trap by feeling it with your feet), you start wondering how sticklebacks can find food, avoid predators and choose a mating partner in these conditions!!!

This brings me to the main goals of our project: studying how sticklebacks adapt to widely different environments in these harsh and rapidly changing climates. We are specifically interested in the evolution of sensory systems, the idea being that other senses might compensate for the extreme low visibility of the turbid waters. To test that, we are collecting fish from glacial lakes, spring-fed lakes (these are very clear lakes) and from the sea and comparing their responses to visual and olfactory cues in a controlled behavioral experiment. We are then sampling and measuring their eyes, noses, lateral lines and comparing the different parts of their brains, to get a complete overview of their sensory systems and how developed their different senses are. In addition to being able to compare fish from lakes of different turbidity, we are also hoping to reconstruct a timeline of adaptation to these different habitats, as these lakes were formed anything in-between ten thousand and just hundred years ago, and have thus variable and known (max) colonization times. This will allow us to get an idea of the rate of evolution of sensory systems in sticklebacks.

This is a highly collaborative project. As I write, our amazing Icelandic team is still in the lab and in the field collecting fish and running trials. This includes our talented local technician Sven Wargenau from Hólar University, Julian Ohl (studying in Reykjavík for a Master’s degree in Environment and Natural Resources), the Boughman Lab’s brand new PhD student Brielle Dominguez and a visiting PhD student from Uppsala University: Javier Vargas Calle, specialist in gut microbiomes. Back here at MSU, Greg and I are starting to process some of the 1200 fish we already brought back and sending brain samples to the Hofmann Lab in Texas and eye samples to the Stenkamp Lab in Idaho, while the behavioral experiments are overseen by Dr. Jason Keagy at university of Illinois. To hear more about this project, come and listen to my talk at the BEACON congress next week!

That was for the tale of scales, ice and volcanoes. But the feathers you will ask? That would be enough for another full blog post… First let me say that the feathers were everywhere in Iceland, and bird watching there was amazingly easy: you can see rare species from your hot tub! But more seriously, my background is in studying speciation in birds, specifically two species of Eurasian passerines, collared and pied flycatchers (Ficedula albicollis and F. hypoleuca), that hybridize on a (much smaller) island in Sweden: Öland. During my PhD, I studied the consequences of hybridization for these two species and was particularly interested in their reproduction, and how sexual selection and fertility are affected by mating between different species.

My work involved quite a bit of fieldwork, catching, measuring and ringmarking thousands of young and adult birds, and collecting sperm samples and analysing them under the microscope directly in the forest. I first looked at what makes a male successful at siring as many chicks as possible in his nest and found that in collared flycatchers, sperm size matters, but differently depending on how “sexy” the males are: males with relatively small ornaments (white forehead patches) benefit from having long sperm, and vice-versa (coming out soon in Behavioural Ecology!2). I also found that hybridizing is really bad for these birds, as hybrids seem to be mostly sterile: the females lay empty eggs and the males do not manage to produce any functional sperm3. Since one of the species is pushed away from good territories by the other one, the females do not always really have a choice, and are sometimes constrained to mate with a male of the “wrong” species to secure a territory and food for their offspring. In a collaboration with a team at the Natural History Museum of Oslo, I found that females can still mostly avoid producing costly hybrids by seeking extra-pair copulations with males of their own species, and biasing against the sperm of the “unwanted” male inside their reproductive tract, so that it has fewer chances to fertilize their eggs4.

I find it fascinating that interactions between eggs, sperm, ovarian and seminal fluids can all influence the outcome of competition for fertilization and want to study this less understood phase of sexual selection in the future. Check my website for updates on this and more stories about birds, fish, cool behaviors and fascinating evolution!


Work cited:


2: Ålund, M., Persson Schmiterlöw, S., McFarlane, S.E. and Qvarnström, A., 2018 Optimal sperm length for high siring success depends on forehead patch size in collared flycatchers, Behavioral Ecology, accepted

3: Ålund, M., Immler, S., Rice, A. M. & Qvarnstrom, A. 2013 Low fertility of wild hybrid male flycatchers despite recent divergence. Biol. Lett. 9:3, 20130169. (doi:10.1098/rsbl.2013.0169)

4: Cramer E.R.A. and Ålund†, M., McFarlane, S. E., Johnsen, A. and Qvarnström, A., 2016 Females discriminate against heterospecific sperm in a natural hybrid zone, Evolution 70 (8), 1844-1855. (doi: 10.1111/evo.12986)

Posted in BEACON Researchers at Work | Comments Off on A tale of scales and feathers… (ice and volcanoes)

Mapping Antibiotic Resistance in Pseudomonas aeruginosa Biofilms to Develop Better Therapies for Cystic Fibrosis

This blog post is by MSU graduate student Michael Maiden.

MSU researchers, Chris Waters, Michael Maiden and Alessandra Hunt, BPS, 04.30.18

Currently, I am a 7th year DO-PhD student in the physician scientist training program in Dr. Christopher Water’s drug development and biofilm laboratory in the department of Microbiology & Molecular Genetics. I was attracted to the Michigan State University and the College of Osteopathic Medicine and, specifically the DO-PhD program, because it offered the opportunity to work on clinically relevant projects that may lead to better therapies for patients in the future.

In the Waters’ lab my research is focused on developing new therapies for chronic infections caused by bacteria in the form of biofilms. Biofilms are a community of cells enmeshed in a self-made gel that renders the community up to 1,000x more resistant to antimicrobial therapies. For this reason, bacteria growing in biofilm communities are a major contributor to chronic infections and death.

One bacterial pathogen, that often infects and forms biofilms in patients, is Pseudomonas aeruginosa. In fact, P. aeruginosa is the leading cause of death in patients with cystic fibrosis (CF). CF is a debilitating genetic disease that results in dry and clogged airways, which trap bacteria and leads to life-long chronic infections, resulting in premature death between the ages of 30 and 40 YO.

A biofilm colony formed by P. aeruginosa surrounded by a secreted self-made mucus that makes the bacteria very difficult to treat.

By early adulthood, nearly 50% of CF patients are chronically infected with P. aeruginosa. To extend the lives of CF patients, it is essential to develop therapeutic interventions that eradicate P. aeruginosa before it is able to form a chronic infection in these patients.

We found that by treating with two specific antimicrobials, tobramycin and triclosan, we could kill up to 99% of P. aeruginosa cells growing in biofilm communities. Further, this combination was effective in as-little-as 2-hrs. These exciting results raised one very difficult question, how?

One way to determine how antimicrobials work is to go after well-known targets and pathways. By either turning them on or off, using various molecular techniques, you can test to see if your particular drug is working through that pathway. We tried this approach with little success. So, we turned to an un-biased evolutionary approach.

Using this method, we took advantage of the natural tendency for bacteria to evolve resistance to any antimicrobial given enough time and small enough doses so that some bacteria may survive and thus mutate their genome. We evolved P. aeruginosa cells growing in a biofilm and rendered them resistant to the combination, by slowing raising the dose with time. Next, we performed whole-genome sequencing to identify the genetic mutation(s) that could help to explain how they became resistant.

We found a novel mutation in P. aeruginosa renders the bacteria resistant to the combination. This mutation is located within an enzyme essential for protein synthesis. This gave us a valuable clue for how triclosan may be enhancing tobramycin activity, allowing us to formulate a model for how the two work synergistically. Subsequent experiments have supported this model.

Further, the mutation we identified in our evolution mutants has been identified independently in clinical CF isolates of P. aeruginosa, which renders them resistant to tobramycin. Thus, our artificial evolution work in the lab has been validated by the natural evolution taking place in the clinics, specifically in the lungs of CF patients.

We now have a possible lead for how our combination may be working synergistically against P. aeruginosa cells growing in biofilm communities. This new resistance mechanism could be targeted in the future to develop compounds that inhibit this resistance mechanism. Further, knowledge of this mechanism could pave the way for the future development of compounds that work in a similar fashion to our combination, thus, yielding much needed new antimicrobial therapies. Currently, we are exploring how this mechanism renders bacterial cells in a biofilm resistant to our combination.

As antimicrobial resistance continues to be a major threat to human health, it is important to develop better strategies that more effectively use our current antimicrobial arsenal. This combination may by an example of one such strategy. As a future clinician, I am grateful to be a part of a project with strong clinical implications. And as a scientist, I have always been interested in how organisms evolve. The opportunity to perform evolution studies in the lab is both exciting and rewarding, providing little hints into what sustains life and what great trials and tribulations all living organisms have gone through to maintain it.

Relevant MSU Today Articles:

Posted in BEACON Researchers at Work | Comments Off on Mapping Antibiotic Resistance in Pseudomonas aeruginosa Biofilms to Develop Better Therapies for Cystic Fibrosis

Kalyanmoy Deb honored with IEEE Computational Intelligence Pioneer Award

BEACON’s own Professor Kalyanmoy Deb, the Koenig Endowed Chair in Electrical and Computer Engineering at Michigan State University, was honored today by the IEEE Computational Intelligence Society. At the World Congress of Computational Intelligence meeting in Rio de Janeiro, Brazil, he was given the IEEE Computational Intelligence Pioneer Award, which is given to at most one person each year who has made major contributions to the field. It recognizes contributions across one’s entire career. Prof. Deb was honored for his pioneering contributions to the field of evolutionary multi-objective optimization. Among those contributions was the algorithm NSGA-II, which has been more widely used than any other evolutionary multi-objective optimization tool. He has led the community in development of this new field that has spurred both widespread academic research and worldwide industrial application.

Posted in BEACON in the News | Comments Off on Kalyanmoy Deb honored with IEEE Computational Intelligence Pioneer Award

Learning an Evolvable Genotype-Phenotype Map


This post is by MSU graduate student Matthew Andres Moreno

Hi! My name is Matthew Andres Moreno. I’m a graduate student finishing up my first year studying digital evolution with my advisor Dr. Charles Ofria.

Today, I’m going to talk to you about police detective work. Eventually, we’ll talk about evolvability and genotype-phenotype maps, but first let’s talk CSI.

Police Composite Sketches

Specifically, let’s think about how a police composite sketch works. First, someone sees a criminal and describes the face’s physical features with words. This description is the compact representation. Then, the police artist reconstructs the criminal’s face from the description.

Schematic of hypothetical police composite process. Mug shot and composite reconstruction were taken from the Crime Scene Training Blog.

Why does this work? It works because the witness has seen lots of faces and knows what the important bits to describe are. It works because the police artist understands the witness’ words and has also seen lots of faces — from experience, she knows that the mouth goes under the nose, the nose goes between the eyes, etc. and doesn’t need the witness to tell her absolutely everything about the face in order to draw it.

Well, autoencoders can also be used to reconstruct a corrupted input. This works something like a police sketch, too. Suppose that the criminal was wearing pantyhose that partially obscured his face. The witness can still describe the suspect’s face and the police artist can still draw it. Under the right conditions the missing part of the face can be reconstructed reasonably well.

Schematic of hypothetical police composite process with suspect in disguise (incomplete input). Mug shot and composite reconstruction were taken from the Crime Scene Training Blog.

Why does this work? It works because the witness can still see and describe part of the face. It works because the police artist understands the witness’ words and has also seen lots of faces — from experience, she can make a pretty good guess by cluing off the fact, for example, that faces have left-right symmetry or maybe that the criminal probably had a cheekbone and ear on the part of the face that was obscured. Again, because she’s seen lots of faces the police artist doesn’t need the witness to tell her absolutely everything about the face in order to draw it.

Deep-Learning and Autoencoders

The jig’s up.. it was all a set-up! A set-up, that is, to help you understand what autoencoders do. Unless you’re technically inclined, understanding exactly what autoencoders areisn’t particularly important for our discussion. Suffice it to say that what autoencoders are is a type of clever deep learning algorithm.

What autoencoders do is directly analogous to what the witness and police artist do. By looking at lots of examples of complex objects like faces, autoencoders learn to

  1. compactly describe the important features of a complex object (“encoding”, just like the witness) and
  2. reconstruct a complex object from that description (“decoding”, just like the police artist).

I’ll refer to these as the two powers of autoencoders.

The following graphic, a “latent space interpolation” between three faces, gives a neat glimpse of how autoencoders work and how powerful they are. The latent space refers to the set of all compact descriptions an autoencoder can read. To understand what’s going on here, let’s just look at the top row of images.

Autoencoder latent space interpolation with faces! Graphic from [White, 2016].

At the top-left, we see an image of a woman with curly hair. To get to the image immediately adjacent on the right, we use power 1 of autoencoders to generate a compact description and then use power 2 of autoencoders go reconstitute a face image.

Then, going left to right across the top row, things start to get interesting. We gradually change the compact description of the curly-haired woman until it matches the compact description of the red-haired woman on the far right. Each image shows an intermediate compact description that was reconstituted using power 2 of autoencoders. This visualization shows a very natural-looking transition between the two faces!

I won’t walk you through it, but the rest of the grid of images shown above was generated analogously.

Genotype-Phenotype Maps

What does any of this have to do with evolution? This year, I’ve been investigating how  autoencoders can be useful as genotype-phenotype maps in digital evolution. One idea of how this can work: use power 2 of autoencoders (the “decoder”) as the genotype-phenotype map. In this scheme, the genotype lives in the latent space.

In order to drive home the implications autoencoder genotype-phenotype maps on evolvability let’s talk through a little thought experiment. Think back the problem of police face reconstruction we’ve been thinking about. Suppose we’re trying to evolve a face that, as judged by the witness of a crime, maximally resembles the perpetrator. (Yes, this is a real thing people do [Frowd et al., 2004]). To accomplish this, we start out with a set of random genotypes that map to different phenotypes (images). The witness selects the images that most closely resemble the suspect’s face. Then, we mutate and recombine the best matches to make a new batch of images for the witness to consider As we iterate through this process, hopefully we generate images that more and more closely resemble the suspect’s face.

Consider trying to evolve a facial composite using the direct genotype-phenotype map. Under this map, the intensity of each pixel of the image is directly encoded in the genotype. First of all, the randomly generated images wouldn’t look very much like faces at all — they’d look more like static. Supposing that we were actually able to eventually get to an image that vaguely resembles a face at all, then what? Is there a path of pixel-by-pixel changes that leads to the suspect’s face where every pixel-by-pixel change more closely resembles the perpetrator’s face? I’d argue we’d be likely to sooner or later get stuck at a dead end where the image doesn’t resemble the perpetrator’s face but pixel-by-pixel changes to the image make it look less like the perpetrator’s face (or a face at all).

Evolving the composite using the direct genotype-phenotype map probably won’t work well.

What if instead of having the genotype directly represent the image at the pixel level, encode genotypes analogously to a verbal description then use a police artist who can draw a suspect from verbal descriptions to generate phenotypes. This is analogous to what the our “decoding” genotype-phenotype map, accomplishes.

(For those who are curious, software to evolve police composites use an indirect genotype-phenotype map based on eigenfaces [Frowd et al., 2004].)


This work — which we call AutoMap — was, in part, inspired by recent efforts efforts to understand evolvability in terms of learning theory [Kouvaris et al., 2017]. We hope that this work helps to strengthen an explicit connection between applied learning theory (i.e., machine learning) and evolvability. We’re also looking forward to expanding on the exploratory AutoMap experimental work that we’re taking to GECCO this summer.

If you’re interested in more detail, This blog is based on a more in-depth (but still non-technical and fun!) introduction to our work with AutoMap, which you can find here. If you want to check out our technical write-up on AutoMap, you can find the PDF here and the paper’s supporting materials here.

Finally, thanks also to my AutoMap coauthors Charles Ofria andWolfgang Banzhaf.


Frowd, Charlie D., et al. “EvoFIT: A holistic, evolutionary facial imaging technique for creating composites.” ACM Transactions on applied perception (TAP)1.1 (2004): 19-39.

Kouvaris, Kostas, et al. “How evolution learns to generalise: Using the principles of learning theory to understand the evolution of developmental organisation.” PLoS computational biology13.4 (2017): e1005358.

White, Tom. “Sampling generative networks: Notes on a few effective techniques.” arXiv preprintarXiv:1609.04468 (2016).

Posted in BEACON Researchers at Work, Member Announcements | Tagged , , , | Comments Off on Learning an Evolvable Genotype-Phenotype Map