Fish, You are the Father!

By: Isaac Miller-Crews, PhD Candidate, University of Texas at Austin

My job would be much easier if CVS sold paternity testing kits for fish instead of humans! I am interested in the evolution of the neural regulation of reproduction, which requires knowing whether an animal reproduced. Genetic testing, such as parentage analysis, allows us to figure out relationships among individuals without direct historical knowledge. This testing has generally relied on looking in the DNA for microsatellites but we’re discovering new, more powerful, and cheaper ways to conduct these tests in the ‘Age of Big Data’ (Flanagan, 2018; Hodel, 2016). This is especially true if your fish population stubbornly refuse to have variable microsatellites!

Yet, common standards or guidelines for dealing with next-generation sequencing data still need to be figured out (Flanagan, 2018). Importantly, few bioinformatic tools exist that can differentiate well between closely related individuals or deal with DNA mixtures. Looking at single nucleotide polymorphisms (SNPs) across thousands of genomic sites allows researchers significantly more information on variability among samples than standard microsatellite approaches (Hodel, 2016). A new technique called restriction site-associated DNA sequencing (RAD-seq) helps us narrow down which places to look at on the DNA, because it only sequences certain fragments, and which fragments you get depends on which endonucleases you use to cut up the DNA. 2bRAD sequencing uses an endonuclease (type-2b) that give you consistent fragments across your sample, not to mention it’s very cost-effective (Wang, 2012).

The simplest form of paternity testing is exclusion, in which paternity is ruled out if a single site disagrees between the alleged father and the offspring-mother pair (Marshall, 1998), is prone to errors. (Wang, 2010). Parental and sibship reconstruction can generate full sets of possible parental genotype profiles but cannot be used with pooled offspring samples (Wang, 2004). The most common paternity testing technique uses a likelihood model to categorically assign paternity between individuals (Meagher, 1986). Not only does this approach require setting a threshold to call genotypes, but it also limits paternity to the comparison of only two alleged fathers (Marshall, 1998). Furthermore, this type of technique cannot deal with cases of mixed or pooled samples, since it can only categorically assign paternity to one putative father.

Luckily, there is always a Bayesian approach! Partial paternity testing assigns fractions of the offspring to candidate parents based on the highest Bayesian posterior probability (Hadfield, 2006) and outperforms categorical likelihood models, especially in being able to circumvent systematic biases, such as over-assigning paternity to males with a relatively higher number of homozygous loci (Devlin, 1988). Assigning partial paternity is thus perfect if you want to assess an entire brood or clutch or litter at once!

Most parentage testing techniques assume that parents are unrelated, and the pool of putative parents contain no close relatives, which can lead to troubling situations where full-siblings are assigned parentage over actual parents (Thompson, 1976). Populations with a lot of closely related individuals pose a problem to both microsatellite and SNP assays due to the lower variation amongst samples. In these cases, only 100 SNPs are required to outperform microsatellites (Flanagan, 2018). If close relatives are suspected to be in the sample, broader pedigree analysis is often required, such as done with identity-by-state (IBS) matrix clustering. Yet, to date, only one study has attempted to combine IBS clustering with any paternity testing method, categorical assignment, or to a genotyping-by-sequencing with RAD-seq data (Gutierrez, 2017). If only someone could combine the awesome power of IBS matrix clustering with the staggering potential of partial paternity testing!

The African cichlid fish Burton’s mouthbrooder, Astatotilapia burtoni, is a model system in social neuroscience, which forms highly complex and dynamic social communities. Adult male A. burtoni are considered either territorial or non-territorial (Fernald, 1977). Males position within the social dominance hierarchy is dynamic as possession of territories is transient (Hofmann, 1999). A. burtoni reproduce within territorial bowers prior to female mouth-brooding for around two weeks, during which fry can be directly removed from the mother’s buccal cavity. Current estimates of male reproductive success usually integrate some combination of female behavior (proximity, duration/frequency in shelter, or number of eggs laid in a territory), with variation in female preference assumed from this proxy of male reproductive success (Kidd, 2006). Although a female may associate with a male this does not directly equate to mating outcomes, meaning behavioral scoring is not enough to assign paternity (Theis, 2012).

My research aims to do just that by developing a NGS-based parentage analysis bioinformatics pipeline that integrates partial paternity assignment and IBS matrix clustering. The powerful pairing of these two parentage assignment methods allows detection of biases that might arise from closely related individuals in the alleged parent population and will handle pooled samples of multiple offspring. Which is great since our laboratory population of A. burtoni is quite inbred and produces fairly large broods (imagine mouth-brooding anywhere from 10-60 fry). Implementation of paternity testing to measure reproduction outcomes can help us understand the interaction between dynamic systems such as female reproductive cycle and male social dynamics (Fig. 1).

Figure 1. Research overview of how female internal reproductive state (blue) with male external social structure (red) interact and integrate into producing reproduction (purple). Measuring reproductive output requires the development of paternity testing methods.

The integration of a bioinformatics pipeline and the unique advantages of 2bRAD sequencing will allow for relatively easy expansion both into alternative DNA sequencing approaches and any species, regardless of available genomic resources. I plan to integrate paternity testing, as a measure of Darwinian fitness, into analysis on mate preferences and reproductive success in naturalistic communities of A. burtoni. While we use a lot of behavioral proxies of reproduction, such as social interactions or association time, nothing let’s you know that the deed was done like genetically testing everyone. Layered on top of these models of reproductive success within a social hierarchy I want to integrate neuromolecular techniques, from both the spatial resolution of single genes up to transcriptomic networks. This means I will know information about an individual’s behavior, reproductive success, and neural profile all within the context of an actual social community. Talk about truly integrative!

Isaac Miller-Crews is a PhD candidate in the Hofmann Lab (Department of Integrative Biology) at the
University of Texas at Austin


Devlin, B., Roeder, K., & Ellstrand, N. C. (1988). Fractional paternity assignment: theoretical development and comparison to other methods. Theoretical and Applied Genetics, 76(3), 369–380.
Fernald, R. D., & Hirata, N. R. (1977). Field study of Haplochromis burtoni : Quantitative behavioral observations. Animal Behaviour, 25, 964–975.
Flanagan, S. P., & Jones, A. G. (2018). The future of parentage analysis: From microsatellites to SNPs and beyond. Molecular Ecology, mec.14988.
Gutierrez, A. P., Turner, F., Gharbi, K., Talbot, R., Lowe, N. R., Peñaloza, C., … Houston, R. D. (2017). Development of a Medium Density Combined-Species SNP Array for Pacific and European Oysters (Crassostrea gigas and Ostrea edulis). G3 (Bethesda, Md.), 7(7), 2209–2218.
Hadfield, J. D., Richardson, D. S., & Burke, T. (2006). Towards unbiased parentage assignment: Combining genetic, behavioural and spatial data in a Bayesian framework. Molecular Ecology, 15(12), 3715–3730.
Hodel, R. G. J., Segovia-Salcedo, M. C., Landis, J. B., Crowl, A. A., Sun, M., Liu, X., … Soltis, P. S. (2016). The Report of My Death was an Exaggeration: A Review for Researchers Using Microsatellites in the 21st Century. Applications in Plant Sciences, 4(6), 1600025.
Hofmann, H. a, Benson, M. E., & Fernald, R. D. (1999). Social status regulates growth rate: consequences for life-history strategies. Proceedings of the National Academy of Sciences of the United States of America, 96(24), 14171–6.
Kidd, M. R., Danley, P. D., & Kocher, T. D. (2006). A direct assay of female choice in cichlids: all the eggs in one basket. Journal of Fish Biology, 68(2), 373–384.
Marshall, T. C., Slate, J., Kruuk, L. E. B., & Pemberton, J. M. (1998). Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology, 7(5), 639–655.
Meagher, T. R., & Thompson, E. (1986). The relationship between single parent and parent pair genetic likelihoods in genealogy reconstruction. Theoretical Population Biology, 29(1), 87–106.
Thompson, E. A. (1976). A paradox of genealogical inference. Advances in Applied Probability, 8(04), 648–650.
Wang, J. (2010). Effects of genotyping errors on parentage exclusion analysis. Molecular Ecology, 19(22), 5061–5078.
Wang, J. (2004). Sibship Reconstruction from Genetic Data with Typing Errors. Genetics, 166(4), 1963–1979.
Wang, S., Meyer, E., Mckay, J. K., & Matz, M. V. (2012). 2b-rad: a simple and flexible method for genome-wide genotyping.
Posted in BEACON Researchers at Work | Comments Off on Fish, You are the Father!

200 Years of Developmental Hourglass: Using Big Data to Increase Our Understanding of Vertebrate Embryogenesis from a Trickle to a Flood

By: Megan Chan, Undergraduate Student, University of Texas – Austin

When I started college at The University of Texas at Austin a couple of years ago, I enrolled as a biochemistry/pre-pharmacy major. I didn’t know anything about computational biology back then but have since had the opportunity to participate in computational biology research under the guidance of Dr. Rebecca Young and Dr. Hans Hofmann in the Department of Integrative Biology at UT Austin. Over the last couple of years, I have grown more and more interested in the realm of data analytics, and my experience in hands-on research has completely changed my goals for the future. Because of this, I finally transferred majors last year to computational biology.

Megan Chan

At the University of Texas, we have a program called the Freshman Research Initiative (FRI) that helps new students get experience in research labs. Although I originally applied just to get something interesting on my resume, I ended up gaining much more. As part of FRI, I joined a research stream called Big Data in Biology, led by Dhivya Arasappan. The goal of this stream was to introduce freshmen to concepts in genetics and how statistics and computer science are being used to study biological systems. I chose this stream over others I was interested in (like streams working in genetically engineering bacteria or chemical analysis of wine tannins) because I had really enjoyed a year of programming when I was in high school. I had never considered myself very knowledgeable about computers and often felt overwhelmed when around guys who had been writing code since middle school, but I found the challenge of solving problems and discovering something new exciting. In my sophomore year I realized that I wanted to continue exploring this field and completely changed my career focus from pharmacy to computational biology.

As part of FRI, I had the opportunity to join Dr. Young and Dr. Hofmann in an independent project adding evidence to a long-standing debate over the validity of what is commonly known as the hourglass model of vertebrate development. The hourglass model hypothesizes that the vertebrate body plan imposes a constraint on diversification of mid-embryonic development across vertebrate species. Early evidence for this theory was based on qualitative analysis of anatomical developmental variation, but in recent years gene expression data has been used as evidence for and against the hourglass model. The part of this overall project that I have been working on focuses on describing patterns of similarity in developmental gene expression through embryogenesis among several vertebrate species. This has involved the processing and analysis over 150 open-source gene expression datasets representing developmental stages for six species. By comparing the similarity of gene expression between each combination of species at each time point in development I can ask whether mid-embryonic stages are most similar in gene expression across species.

A major challenge in achieving this goal has been the lack of consistency in staging for different species. There is not a common quantitative way to equate a particular stage of development in one species with that in another. To add to this problem, of the species we have data for, most only have data for a select set of stages, and the number of stages sequenced for each species is also different. For example, there are 8 out of 46 stages represented for chicken embryos and 24 out of a possible 44 stages for a species of frog (not including free-swimming tadpoles). To overcome this essential problem, I’ve turned to machine learning and comparing qualitative descriptions of stages to group developmental time points within each species into comparable sets.

Of the various methods I integrated into my approach, the first method I employed was K-means clustering. K-means is an unsupervised machine learning algorithm that iteratively computes the distance between each data point and a set of k centroids to calculate which points cluster together around a mean, with k being the number of clusters to find. This was the first method I tried because it is a fairly common way of classifying data without pre-determining classes. To find the appropriate k, I generated an elbow plot visualizing the amount of variation that would be accounted for by several possible numbers of clusters and chose a k that represented a reasonable amount of variation without dividing the data into too small of clusters. A known feature with K-means, however, is that it randomizes the initial centroids which can result in some variation in cluster membership when the clusters are not robust. To enhance/strength of this method, I used partitioned hierarchical clustering, another form of unsupervised machine learning. Similar to the first, this algorithm’s goal is to group the data points into a predetermined number of clusters with similar values, but it starts by considering the entire dataset one cluster and then partitions it into smaller pieces until it’s reached the appropriate number of clusters. Hierarchical clustering, unlike K-means, tends to be consistent, and our results showed that, at an appropriate number of clusters found with the earlier described method, it also conserved the order of the developmental stages. Further analysis showed that these clusters could be defined by at least some biological significance. We are now confronted with the challenge of aligning these clusters across species.

Now, my work has turned from heavy computation to intense reading. I’ve made it this far without having to know too much about the details of what all these stages mean, but I’ve come to face the fact that I will need some biological knowledge of vertebrate development in order to compare these stages in any reasonable way. The beauty of being in an interdisciplinary field.

The knowledge that I’ve gained while working on this project is invaluable to me as I start to pursue my own projects and begin exploring my future options as graduation slowly approaches. I’ve enjoyed the work I’ve done in this lab so much that last year I started analyzing data for fun; in one instance looking for patterns in word choice in a dataset of Russian disinformation tweets, and in another instance predicting the length of time a dog will stay in the local shelter based on its age. This research experience has also opened many doors for me, allowing me the opportunity to pursue positions analyzing data for other labs on campus and jobs mentoring new students in research, and giving me the tools I needed to land a software internship in biotech this summer. In my last year, I hope to publish results for this project and leave an impact on future research.

Posted in Uncategorized | Comments Off on 200 Years of Developmental Hourglass: Using Big Data to Increase Our Understanding of Vertebrate Embryogenesis from a Trickle to a Flood

Team yEvo goes to National Association of Biology Teachers

By: Bryce Taylor, Alexa Warwick, and Ryan Skophammer

Hi BEACONites! We are Ryan Skophammer of the Westridge School for Girls, Bryce Taylor of University of Washington, and Alexa Warwick of Michigan State University. We’ve been collaborating on a BEACON-funded grant to expand options for introductory Biology teachers who want to use labs that teach concepts in evolution. Specifically, we have developed a standards-based, hands-on, long-term yeast evolution project (‘yEvo’). Ryan has been developing lesson plans and teaching the lab in his AP biology class, Bryce is providing experimental support and data analysis, and Alexa is evaluating the impact of participation on student learning.

A subset of colorful yeast used in yEvo. The pigments allow us to monitor for contamination during student evolution experiments and serve as a strain-specific marker in competitions.

The yeast evolution project begins by having students choose a favorite color of yeast from a living ‘palette’ of S. cerevisiae strains that have been engineered to express vibrant pigments (courtesy of the Boeke lab at NYU). Over several weeks students grow their yeast in the presence of an over-the-counter antifungal agent to select for mutants with higher tolerance. Classes at Westridge run for 80 minutes on an alternating block schedule. This means students attended AP Biology every other school day. At the beginning of each block, students inspected their experiments and transfer from a saturated culture to fresh media using a disposable sterile swab.

After a few weeks, students purify a single clone from the culture and use it in a class-wide competition, in which they use the color of their yeast as a marker to determine which is “winning” in a mixed culture. Some of these clones are then sequenced by the Dunham lab at the University of Washington to determine mutations, which students analyze and research to form hypotheses about whether a given mutation is likely to be adaptive. Early results have yielded an exciting mix of mutations in genes with known roles in resistance to the active ingredient in our antifungal, which demonstrate the experiment worked, and genes that haven’t been implicated previously but seem worthy of further investigation.

Ryan’s students hard at work with their yeast.

In Ryan’s class, forty-five students completed the first pilot of the yeast evolution project in the 2017-18 school year. To iteratively improve the lessons and to evaluate impacts on student learning of evolution and motivation/attitudes toward science we gave a post-survey to Ryan’s students in May 2018 (17 responses). When asked what they liked about the process of growing yeast in the presence of the fungicure, most of them mentioned watching their yeast survive or evolve over time (64.7%) and determining whether to increase the concentration of the antifungal (29.4%). When analyzing sequence data from their evolved strains the students liked seeing the actual mutations (52.9%), but also found it confusing to figure out how to analyze the data (47%), suggesting more scaffolding is needed in the design of this activity to assist students with this difficulty next time. Most of the students also liked the competition aspect (82.3%), but some disliked losing (17.6%), felt rushed (11.7%), or didn’t like counting (11.7%). Most students (94.1%) reported they were willing to do the activity again because it was fun; one person was uncertain. All students agreed or strongly agreed that they enjoyed participating. We also asked students to report on their interest in becoming a biologist as a result of their participation (41.2% agreed or strongly agreed) and their interest in STEM (47% agreed or strongly disagreed).

In November of 2018, we traveled to the National Association of Biology Teachers conference in San Diego. It was the first time we’d all met in person and provided a great opportunity to catch up and plan out our next steps. Alexa and Ryan had been to the conference previously. Bryce attended for the first time, and was supported by travel funds from our BEACON grant. In addition to discussing yEvo, Alexa and Bryce presented posters on ConnectedBio ( and UW Genomics Salon, respectively. ConnectedBio is an NSF-funded grant project to develop curricular materials that are designed for the Next-Generation Science Standards ( and foster integrated learning of high school genetics and evolution. The materials use the Evo-Ed cases ( as the phenomena that students explore through a series of technology-enhanced lessons as part of the collaboration between Michigan State University researchers and the Concord Consortium ( Genomics Salon is an interdisciplinary discussion group at University of Washington that brings together academics and members of the broader UW community to talk about issues in science and society. Bryce shared a repository of discussion questions and resources from 2 years of meetings, which could be a good starting point for teachers interested in building lesson plans on topics we’ve covered, but who aren’t sure where to start.

Ryan introducing yEvo during his workshop at NABT.

Ryan led a workshop where he shared his experience designing and teaching yEvo. The teachers present had great ideas and feedback on the project that helped us to think through where to take the project next. After the workshop several teachers hung around with additional questions and feedback. Chatting with them helped us to recognize aspects that may or may not work in every school setting, which we aim to address as we further refine protocols and materials. One particularly enthusiastic participant had some fantastic ideas about future conditions or experimental setups we could try out. He’s stayed in contact since and is running yEvo in his classroom this semester!

The most important conference tradition: dinner with new friends.

One of Alexa’s highlights from the meeting was attending science writer Ed Yong’s talk and then going out to dinner with him. If you haven’t seen Ed’s articles in the Atlantic yet, we recommend them: Bryce particularly enjoyed the exhibit hall. The vendors present brought a very cool mix of biology apps, games, and toys, which are a growing and fascinating component of education that play a big role in the early stages of science training, but that you don’t get to interact with often in higher-ed settings.

Posted in Uncategorized | Comments Off on Team yEvo goes to National Association of Biology Teachers

A Fly’s View of Retinoblastoma-Family Protein Conservation

By: Dhruva Kadiyala (Undergraduate Student at Michigan State University)

How do evolutionary perspectives illuminate cancer-related biochemistry? As a high school student, I was involved in a project to find targets to attack cancer cells. That project really inspired me to work on the retinoblastoma-family protein project in the Arnosti lab. I came into Dr. Arnosti’s lab in my freshmen year at Michigan State University as a Professorial Assistant from the Honors College and immediately began to learn about retinoblastoma (Rb) tumor suppressor proteins in Drosophila species.

In humans, the retinoblastoma protein is a tumor suppressor and plays an active role in cell cycle regulation. Mutations in the Rb gene or its regulatory pathway are associated with many human cancers. Rb is ancient; the gene is evolutionarily conserved in most multicellular organisms and present as a single copy gene. In mammals, however, there are three Rb paralogs: Rb, p107 and p130. Independently, in Drosophila the Rb gene duplicated about 60 million years ago, and both paralogs, Rbf1 and Rbf2, have been retained in all modern Drosophila species. This situation provides a great model system to study Rb paralog evolution and function.

To understand the evolution of Rbf1 and Rbf2, I aligned Rbf1 protein sequences from 12 Drosophila species using the Clustal Omega multiple sequence alignment tool from the European Bioinformatics Institution. I split the proteins into three different domains (N-terminus, C-terminus, and Pocket Domain) to see what part of the protein is more conserved. I found that the Rbf1 gene that most resembles the ancestral gene, based on similarity with other organisms’ Rb genes, shows a higher degree of conservation, especially in the Pocket domain important for binding to transcription factors. The derived Rbf2 gene has a higher degree of variation within Drosophila, especially in the N and C-termini.

I also aligned both Rbf1 and Rbf2 sequences from the D. melanogaster with the human Retinoblastoma-family proteins (Rb, p107, and p130). What was striking is that the more evolutionarily variable human Rb and fly Rbf2 proteins have changes especially in the C-terminus that impact a functional domain (the Instability Element IE) important for protein turnover and transcriptional regulation, an apparent case of parallel evolution.

Why do most animals outside of vertebrates make do with a single Rb gene, while Drosophila have expanded their count? To assess the structural variation in Rb genes in arthropods in general, I compared Drosophila Rbf1 sequences with those of the red flour beetle (Tribolium castaneum), eastern honey bee (Apis cerana), monarch butterfly (Danaus plexippus), western flower thrip (Frankliniella occidentalis), green peach aphid (Myzus persicae), a drywood termite (Cryptotermes secundus), a springtail (Folsomia candida), the common house spider (Parasteatoda tepidariorum), and white-legged shrimp (Penaeus vannamei).  Overall, conservation is greatest in the transcription factor binding Pocket Domain, although the internal “spacer” region within the domain is quite variable, something that may influence activity of the proteins. The C terminus was least conserved, but IE sequences are conserved. Thus, evolutionary changes in this portion of the protein seem to be restricted to cases where there are paralogous genes.

I generated a visual representation of these levels of conservation with the help of Clustal Omega. For that purpose, I turned to Jalview software, which uses the multiple sequence alignment tools Clustal Omega and MUSCLE to generate visuals for analysis. Here, I show a visual representation showing residue by residue conservation of Rb genes from arthopod species (Figure 1).

Figure 1: Multiple sequence alignment of Rbf1 of D. melanogaster and Rb genes from other arthropod species. The height and color of the bars represent percent identity and similarity. Higher bars and yellow bars are more conserved than lower brown bars. The protein skeleton is based on D. melanogaster Rbf1 protein with following denotations: Blue: cyclin fold domain, Pink: A pocket, Green: B pocket, purple: Instability element.

Overall, my work in Dr. Arnosti’s lab has been most meaningful work for my development as a researcher. I experienced firsthand how proteins that play a major role in survival and development in cancer are evolutionarily conserved and yet evolve over time among species, and thereby I have deepened my knowledge of biology and the mechanisms of evolution. I hope to continue working in the lab for the rest of my undergraduate career, discover a more disciplined researcher in myself, and contribute to science as I prepare to advance to medical studies.

Dhruva Kadiyala is a sophomore studying Neuroscience in Lyman Briggs College at Michigan State University. He is a pre-medical student also interested in biological research, and has worked with Cell and Molecular Biology Ph.D. student Rima Mouawad in the lab of David Arnosti.

Posted in BEACON Researchers at Work, Uncategorized | Comments Off on A Fly’s View of Retinoblastoma-Family Protein Conservation

CONSTAX: a tool to simplify and improve taxonomic classification of community sequences

By: Natalie Vande Pol (PhD Candidate, Michigan State University)

I am a 5th year PhD student in the Microbiology and Molecular Genetics program at Michigan State University. This is the story of a side project that has been one of the most enjoyable and rewarding undertakings in my PhD career. CONSTAX was the first project on which I was a key contributor. The co-first authors both worked in community ecology and they wanted to develop a tool, but they needed some help writing Python scripts. That’s where I came in.

Community ecologists use a technique called amplicon sequencing, in which they extract DNA from a substrate (e.g., soil, plants, water) and sequence-specific genes that they then use as a “barcode” to identify the organism from which the DNA originated (Figure 1). In bacteria, this barcode is the 16S ribosomal RNA gene. In fungi, we generally use one of two ribosomal regions: ITS1 or ITS2. Ecologists use these barcode sequences to study pooled communities of organisms, allowing comparison of community structure between different conditions (e.g., healthy v. diseased gut/plant). Think of it like a census for soil fungi. These comparisons can sometimes indicate organisms that are important to causing, preventing, detecting, or recovering from a given characteristic or disturbance.

Figure 1: Community barcoding. Barcode genes amplified from different organisms have small differences in sequence. So long as a sequence for that organism is included in the reference database, that sequence can be “translated” back into an organism name.

One of the most important steps in a community analysis pipeline is to “translate” the barcode DNA sequences from the sample into the names of the organisms from which they originated. This is done by comparing sample sequences to reference sequences from known organisms, just as a barcode in a grocery store needs a computer reference to tell the cashier whether you are buying cilantro or parsley. With DNA sequences, the identification algorithm used to match up the sequences is called a classifier. Using different reference databases or different classifiers can yield different identifications.

To illustrate what happens with different classifiers, imagine you and two of your friends are all taking the same test. All three of you get 80/100 questions correct on the exam. However, when you compare your exams, you realize that while you all had 75 questions in common, the other 5 correctly answered questions were unique to each of you. So, on the surface your performances seem identical, but are in fact a bit different. Similarly, using a single classifier and different reference databases is analogous to each of you three taking the same exam having studied from three different textbooks (assuming otherwise identical performance). Your scores on the exam would probably vary.

Fortunately, for fungal research, UNITE is a well-curated reference sequence database, so the largest source of variation is between classifiers. Just as described in the first analogy above, different classifiers use different algorithms to assign taxonomies and estimate confidence/error rates, making it difficult to select a single classifier as the “best”. Therefore, our two community ecologists and I set out to develop a tool that eliminated the need to choose just one! If you and your three friends could collectively take that exam, you could have gotten 90/100, instead of just 80/100 on your own.

First, we chose the most commonly used and most recently developed classifiers: Ribosomal Database Project (RDP), UTAX, and SINTAX. We wrote a series of custom scripts to format the UNITE reference database to be compatible with each of the classifiers and ran our sequence datasets through each of the classifiers. Finally, we used Python scripts to standardize the output formats. This was all packaged and is automated by a single shell script (Figure 2). Users simply place their input files in the specified folders and provide the names and desired parameters in a configuration file.

Figure 2: The CONSTAX workflow. The portion highlighted in the gray box is automated through a single master script to ensure ease of use.

For each sequence, we compared the three assignments given for each taxonomic rank (Kingdom, Phylum, Class, Order, Family, Genus, and Species). If the confidence score for a given assignment was below a threshold value, that and all further taxonomic ranks were considered “Unidentified” for that sequence. In most cases, the three classifiers agreed on the taxonomy assigned. However, there were cases in which they disagreed, whether because one (or two) of the classifiers yielded an Unidentified, or because there were multiple different, confident assignments (Table 1). With three classifiers, we decided to implement a simple majority rule. Since classifiers provide an estimated confidence in taxonomic classifications, we used confidence scores to break ties.

Table 1: The CONSTAX Voting Rules.

We tested our tool on four different datasets from three different studies: barcode gene ITS1 or ITS2 of fungi from Soil or Plants (Figure 3). And it worked! Cross-referencing three classifiers corrected misassignments and improved overall performance. At the Kingdom level, the consensus taxonomy was only ~1% improved as compared to any individual classifier. However, higher levels had much stronger improvement, on average 7-35%, depending on the taxonomic level and the individual classifier. The mean improvement in performance by CONSTAX over individual classifiers is slightly over-estimated due to particularly poor classification by UTAX, which had the most Unidentified levels.

Figure 3: CONSTAX Performance. a) Soil fungi barcoded with the ITS1 gene, from Smith & Peay. b) Soil fungi barcoded with the ITS2 gene, from Oliver et al. c-d) Plant fungi with the c) ITS1 and d) ITS2 genes, from Angler et al.

What’s next for CONSTAX?

First, we would love to develop our tool to be compatible with bacterial community sequences. Fortunately, the classifiers were all written for bacterial community analysis in the first place! Unfortunately, the reference databases are either out of date or so poorly curated as to have misidentified reference sequences and some convoluted taxonomies. Bacteria seem to be renamed rather frequently and it’s difficult to know whether the assignment given is still correct. We focused our preliminary efforts on the SILVA database, as it is the most up-to-date, but it has some serious formatting issues, among other things. In theory, there should be 7 taxonomic ranks. A significant proportion of the SILVA taxa have 4-13 levels, requiring manual correction to determine the appropriate classification for each of the 7 expected levels. At least in fungi, the different taxonomic ranks have consistent suffixes that can be used to identify gaps/insertions and correctly place the ranks. In bacteria, suffixes only seem to be consistent within particular lineages, so I would only be able to fix one group at a time, and quite often the canonical seven taxonomic levels simply don’t exist for some bacterial lineages.

Secondly, we are very interested in incorporating new classifiers into our tool. UTAX, in particular, is becoming obsolete and had the highest rate of “Unidentified” taxonomic assignments. While this may make our tool look good, it’s not really representative of the best we can do. However, an even number of classifiers makes “voting” on a consensus assignment more complicated and we would prefer to have a more elegant and sound basis for breaking ties than just comparing confidence scores, since those metrics are each calculated slightly differently and don’t mean quite the same thing. It’s an excellent starting point, but future work in this area would be served by a more thorough evaluation of disagreements between classifiers.

If you’re interested in more detail or in using CONSTAX for your own research, this blog is based on our publication, which you can find here and our code repository is on GitHub.


Agler MT, Ruhe J, Kroll S, Morhenn C, Kim S-T, Weigel D, et al. (2016) Microbial hub taxa link host and abiotic factors to plant microbiome variation. PLoS Biol. 14(1):e1002352–31.

Gdanetz, K., Benucci, G. M. N., Pol, N. V., & Bonito, G. (2017). CONSTAX: A tool for improved taxonomic resolution of environmental fungal ITS sequences. BMC bioinformatics18(1), 538.

Oliver AK, Mac A, Jr C, Jumpponen A. (2015) Soil fungal communities respond compositionally to recurring frequent prescribed burning in a managed southeastern US forest ecosystem. For Ecol Manag. 345:1–9.

Smith DP, Peay KG. (2014) Sequence depth, not PCR replication, improves ecological inference from next generation DNA sequencing. PLoS One. 9(2):e90234–12.

Posted in BEACON Researchers at Work | Comments Off on CONSTAX: a tool to simplify and improve taxonomic classification of community sequences

Engineering the Tools of Genetic Engineers

By: Melody Keith, Bibiana Toro, Kim Ly, Alyssa Braddom, and Eleanor Young, University of Texas at Austin undergraduate researchers.

The University of Texas at Austin’s 2018 International Genetically Engineered Machine (iGEM) team is a group of students whose aim is to use synthetic biology to solve real world problems. We are a diverse group of upper and underclassmen who come from fields such as biochemistry, biology, and neuroscience. For some of us, this was our first in-depth research experience. For others, it was a chance to apply years of experience to an exciting project (Figure 1). Not only did we utilize microbiology techniques, but we were able to collaborate with researchers from Rice University and Texas Tech University, create a visually compelling poster and oral presentation, and design a website.

Figure 1: Students at the annual iGEM conference.  Eleanor Young, Melody Keith, Alyssa Braddom, Bibiana Toro, and Kim Ly with their poster.

This summer we came together to engineer a solution to a problem many scientists face in their lab every day: how to genetically manipulate non-model organisms. Bacteria are able to accomplish feats that human technology has yet to realize. They are better at producing certain materials, they can manufacture medicines, and some survive in extreme environments. However, these organisms that have incredible potential are usually non-model, which is to say, not the ones we work with in standard research labs. Biologists generally use only a handful of organisms, such as E. coli, because they are better understood and well characterized.  Therefore, we know how to genetically engineer them. However, these organisms do not always have the internal molecular machinery necessary to produce the molecule(s) that scientists desire. Scientists must instead engineer a bacterium that is unfamiliar to them, for which good protocols may not yet be established. Opportunities for failure pervade this process.

Our solution to this common problem is the Broad Host Range Kit, a combination of plasmid parts and fully assembled plasmids, that allows a researcher to test many plasmids at once and then build their own plasmid with their own coding sequence of interest. It relies on a molecular cloning method known as Golden Gate Assembly, which allows genetic parts to be easily assembled and interchanged. These plasmid parts are classified according to their function, or type, and many hours were spent cloning desired parts from template sequences. We built assembly plasmids out of these plasmid parts, varying some parts while conserving others. These varied regions were the reporter gene, encoding a fluorescent protein or chromoprotein, the origin of replication, which allows the bacteria to make copies of the plasmid, the barcode region, a short DNA sequence identifying each plasmid, and the antibiotic resistance gene. Origins of replication were chosen that are known to be broad host range; they function in a wide variety of different bacteria.

Each assembly plasmid, which we call a “Pioneer Plasmid”, has the origin of replication coupled to a specific reporter and barcode. Therefore, when the plasmid is inserted into the bacteria of interest, the origin of replication can be determined just by looking at the color of colonies on the plate. If for some reason, the bacteria can’t express the reporter, the barcode can be sequenced, which adds a layer of redundancy to the system.

Our kit relies on the “One Tube Method”, which puts all the Pioneer Plasmids into a single tube, so the mixture must only be transformed into the non-model organism of interest once (Figure 2). The transformation is then plated onto various antibiotic plates and the origin of replication that functions can be determined by visual inspection of the plate alone (Figure 3). Out kit, therefore, speeds up the process of finding out which broad host range origin functions in non-model organisms and contains 8 fully assembled pioneer plasmids with 3 different origins and over 40 part plasmids. We’ve also test the One Tube Method in Vibrio Natriegens and Serratia Marcescens.

Figure 2: The Kit in Action.  Schematic of how the One Tube Method works.  After transformation, screening for color reveals the plasmid each colony contains. The DNA in non-colored colonies can be extracted and sequenced to identify the plasmid within.  The identity of the plasmid reveals which genetic parts are functional in your bacteria.

Figure 3: Screening Colonies.  Seven plasmids containing different reporter genes transformed into E. coli.  The plate on the left is in natural light, while the plate on the right is under blue or UV light.  The different colonies are highlighted along with the identified reporter in each.

Members of our team had the privilege to attend the iGEM “Giant Jamboree” in Boston, where over 250 teams from all around the world met to present their research, but also to network, collaborate, and share ideas. It was a stimulating and rewarding conference. For example, the team from the National University of Singapore expressed luteolin in E. coli as an eco-friendly alternative to toxic yellow textile dyes. Cornell University produced a genetic circuit that would respond to frequency variable input signals to specifically regulate expression. Both presentations inspired one of our members to investigate optogenetic regulation and engineering of photoswitches for a project currently in progress.

The scientists and undergraduate researchers we interacted with commented that engineering non-model organisms was a problem that they too encountered in their own labs daily. They provided excellent feedback and suggestions, such as showing our kit functions in a particular culture collection or using software to design a combinatorial library of Pioneer Plasmids.  We also spoke to teams interested in acquiring and using our kit. For instance, the team from the Indian Institute of Technology Madras want to use the BHR kit to engineer Acinetobacter baylyi in order to produce biofuels from the degradation of aromatic compounds.

We also had the opportunity to hear from illustrious keynote speakers Dr. Ingrid Pultz, Jason Kelly, and George Church. In particular, the way Jason Kelly talked about the field of synthetic biology, and its enormous potential to revolutionize industries from agriculture to materials to medicine, was inspirational. They made the career path many of us are on feel tangible, achievable and bursting with opportunity. To hear them speak, changing the world seemed not only possible, but just within reach.

Presenting our own team’s research was a dynamic experience that highlighted the grit and focus it takes to practice, refine, and effectively communicate a project which took months to produce (Figures 1 and 4). We left feeling empowered, knowing this experience, being a member of the iGEM team, had given us the skills to successfully achieve every step of the research process from initiation to generating results to finally synthesizing a message about the meaning and impact of that work.As we continue to improve the Broad Host Range Kit, we know that the small steps we take in the lab everyday can translate into larger benefits for both the scientific community and the world.

Figure 4: Melody Keith presenting the students’ work. Bibiana Toro (right) and Eleanor Young (not pictured) also participated in the oral presentation.

Posted in BEACON Researchers at Work, BEACONites | Comments Off on Engineering the Tools of Genetic Engineers

Hyenas & Microbes

By: Connie Rojas, PhD Candidate at Michigan State University

It has been a year of traveling! Earlier this year, I traveled to the Masai Mara National Reserve, Kenya (MMNR) to conduct my field work, and currently, I am in Mexico City doing a visiting scholarship at the Universidad Nacional Autonoma de Mexico (UNAM)! In between, I visited UC Berkeley for the 2018 Science and Technology Centers (STC) Director’s Meeting and San Antonio, TX for the Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) annual conference, where I shared recent research findings as part of a symposium with other BEACONITES.  I am extremely grateful that my dissertation research on host-microbe interactions and the spotted hyena (Crocuta crocuta) microbiome has a strong field, laboratory, and computational component that allows me to travel and work with different collaborators!


I am a 4th year PhD Candidate in Dr. Kay Holekamp’s behavioral ecology laboratory at Michigan State University, in the Department of Integrative Biology and the Ecology, Evolutionary Biology, and Behavior program (EEBB). For my dissertation research, I am using next-generation sequencing technologies to assess how microbes and host-associated microbial communities (‘microbiome’) affect their host’s physiology, fitness, and behavior, and how host themselves, are influencing their microbiomes. I study these questions in a wild population of spotted hyenas. Hyenas are highly social carnivores and apex predators inhabiting much of Sub-Saharan Africa. Their societies are structured by linear dominance hierarchies, wherein an individual’s position in the hierarchy determines its priority of access to resources. Their social groups are also characterized by female dominance and male-biased dispersal.


For my field work, I traveled to my laboratory’s field camp at the MMNR and lived there for 4 months! I conducted 3 projects, all which investigated the role of microbes and microbiomes in shaping their host’s phenotype. One project involved me swabbing decomposing beef daily in order to survey microbial community succession across various stages of decomposition in the savanna environment. I wanted to emulate the environment and decomposition process of the carcasses hyenas eat and determine the types of beneficial and harmful microbes hyenas are acquiring this way. My second project was tons of fun; I conducted scent discrimination trials to ascertain if hyena scent gland secretions and their odors, which are hypothesized to be produced by microbes, contain information about the sender’s age, sex, and residency. I presented juvenile and adult female hyenas with the paste of two strangers (i.e. an immigrant male vs. adult female) and recorded the amount of time they spend sniffing each specimen. If my analyses show that hyenas spend a differential amount of time sniffing the paste samples, then this would indicate that the samples encode different information, and more importantly, that microbes are indeed contributing to their host’s chemical signaling! My last project was also very enjoyable and allowed me to interact with other animals in the reserve. I collected fecal samples from various species of antelope, elephants, and baboons to determine the role of host phylogeny in structuring the gut microbiomes of mammals in the savanna.

Right now, I am working with Dr. Valeria Souza (who gave an EEBB seminar in 2017; that is where I met her!) and Dr. Luis Eguiarte from the Institute of Ecology at the Universidad Nacional Autónoma de Mexico (UNAM) on the bioinformatics portion of my BEACON-funded gut microbiome project. This project investigates the socio-ecological drivers of gut microbiome structure and function in spotted hyenas, as well as its stability, its transmission across generations, and its potential to act as a reservoir for antibiotic resistance. We are using shotgun metagenomics (i.e. whole genome sequencing) to profile gut microbial community function and determine the metabolic pathways being provided by the community as a collective. Specifically, in their lab, I am being trained on the assembly, binning, annotation, and phylogenetic profiling of shotgun metagenomic data. From this data, we will be able to profile the taxonomic composition of the hyena’s gut bacterial communities, reconstruct the hyena’s diet, and survey the diversity of antibiotic resistance genes harbored by the community. We will also determine the relative importance of viruses in driving the evolution of these gut microbiomes and assay their heritability across generations and within an individual’s lifetime. The bioinformatic analyses are challenging and time-intensive, but I am making progress and it has all been very fun! Apart from work, I have been spending lots of time getting to know the city, eating as many tacos as I can, and making friends.

Although I am not looking forward to the bitter cold when I return to Michigan in January, I am looking forward to teaching my first class, taking a course on teaching college science, and co-organizing the 2019 EEBB Research Symposium, among other things. Until then, I am going to make the most of my time here in this great city!

Posted in Uncategorized | Comments Off on Hyenas & Microbes

Avida-ED in Action

BEACON scientists and educators featured in MSU Today reporting on a recent publication highlighting the use of Avida-ED in a newly developed undergraduate biology course. The course IBIO150 Integrative Biology: From DNA to Populations, was developed with non-Biology STEM majors in mind, students who need a rigorous major’s level course that covers core concepts emphasized in undergraduate biology reform. The unique component of the course is the incorporation of a digital evolution lab that uses Avida-ED, featuring series of exercises designed to address important concepts in evolutionary biology. In addition, students complete independent research projects. The incorporation of Avida-ED into the course is supported by current grants to the College of Natural Sciences through the Howard Hughes Medical Institute, in addition to an Improving Undergraduate Science Education (IUSE) grant Active LENS: Learning about Evolution and the Nature of Science. Avida-ED and other resources developed by BEACON have recently been featured in a video produced for the National Science Teachers Association (NSTA) TV.

Posted in BEACON in the News, Education, Evolution 101 | Comments Off on Avida-ED in Action

Ecology/Evolution Scientific Symposium at the SACNAS National Conference

The Society of Systematic Biologists (SSB) and BEACON have collaborated to organize a scientific symposium at the Society for the Advancement of Chicanos/Hispanics and Native Americans in Science (SACNAS) national conference in San Antonio, Texas, on October 11th, 2018 (10:30AM-12:00PM, Room 225B). The symposium title is “It’s Complicated: The Ecology and Evolution of Microbes and Their Hosts.”

Host-microbe interactions are ubiquitous and often drive evolution. Microbial parasites or pathogens harm hosts, whereas other host-associated microbes are beneficial or even necessary for host health. Diverse scientists at the forefront of the ecology and evolution of host-microbe interactions will provide a synthetic perspective of research on this important topic.


  • Connie Rojas, PhD Candidate – Michigan State University
  • Luis Zaman, PhD – LSA Collegiate Postdoctoral Fellow, University of Michigan
  • Lisa Barrow, PhD – NSF Postdoctoral Fellow, University of New Mexico
  • Kat Milligan-Myhre, PhD – Assistant Professor University of Alaska Anchorage

SACNAS is a national organization focused on increasing the proportions of underrepresented minorities in science, technology, engineering, and math (STEM) fields. It is the largest multicultural and multidisciplinary STEM diversity organization in the country. In 2017, the National Conference was attended by 3,845 scientists from diverse backgrounds, with 77% of the participants members of ethnic/racial groups that are significantly underrepresented in STEM fields.

Many people may not realize the complicated and often important relationships occurring all around (and inside!) us with microbes and their hosts. As a result of attending our scientific symposium session, attendees will learn about: (1) exciting, recent advances in science research that illuminate how microbes can drive the ecology and evolution of their hosts; (2) the questions and approaches for studying host-microbe and host- parasite interactions; (3) career options within ecology and evolution from presenters at diverse career stages;  and (4) the central role of ecology and evolution to the life sciences. Read the abstracts, below, for more details about each talk.

In addition to the scientific symposium, BEACON and SSB are also sponsoring a day-long Ecology/Evolution field trip on October 13th to Mitchell Lake Audubon Center and the San Antonio Zoo. The symposium and field trip were co-organized by Eve Humphrey, Maurine Neiman, and Alexa Warwick, with support from the Society for the Study of Evolution Diversity Committee and Education and Outreach Committee. The speakers, organizers, and other attendees will also participate in an Ecology/Evolution session of “Conversations with Scientists” to share scientific career options on October 11th (5:45-7:15PM, Room 221C). If you’ll be at SACNAS 2018, we hope to see you at one or all of these events!

Connie Rojas – Host and Ecological Traits Shape the Structure, Function, and Diversity of the Gut Microbiome in Wild Spotted Hyenas

Animal bodies harbor complex microbial communities, hereafter termed microbiota, that exert profound effects on their physiology, behavior, and evolution. In the mammalian gastrointestinal tract, resident microbes are known to synthesize essential vitamins, supply their host with energy released from the fermentation of indigestible carbohydrates, competitively exclude pathogens, and promote immune system and tissue development. In spotted hyenas (Crocuta crocuta) meerkats (Suricata suricatta), and ring-tailed lemurs (Lemur catta), microbiota inhabiting scent-gland secretions co-vary with the gland’s odorous metabolite profiles and contain well-documented odor producers, indicating they likely contribute to their host’s chemical signaling behaviors. Furthermore, in four species of insectivorous bats, bacteria isolated from the skin exhibit anti-fungal properties against the causal agent (Pseudogymnoascus destructans) for white-nose syndrome, suggesting a beneficial role of these microbes in pathogen defense. However, despite the importance of the microbiota, we know little about the forces shaping its structure and function, especially in wild animal populations. Here, I use 16S rRNA gene sequencing technologies to a) survey the gut microbiota of wild spotted hyenas and b) investigate the host social and ecological factors affecting the gut microbiota. Specifically, I assay if gut microbiota diversity and structure vary with hyena age, reproductive state, group size, and temperature and precipitation. Overall, this research will contribute to our understanding of the ways a host shapes its microbial communities, and how microbial communities, in turn, influence their host’s behavioral phenotype. In this talk I will also share a bit about my journey as a scientist from the perspective of a Latina first-generation college student and daughter of immigrant parents.

Dr. Luis Zaman –  Experimenting with Digital and Microbial Evolution

My path to evolutionary biology was unusual. I started as a computer scientist and ended up working in a wet lab with microbes and viruses. I’ll talk about how I ended up where I am, what I’m doing now, and why disciplined and undisciplined science are important in research.





Dr. Lisa Barrow – Variable Host Susceptibility and Enigmatic Parasite Distributions: Insights from Museum Collections and Genomics of Avian Haemosporidians

There are several outstanding questions in ecology and evolution of host-parasite interactions. Why do host species vary so drastically in their susceptibility to parasites? How localized or widespread are different parasites? What are the environmental and host range limits to parasite distributions? These questions are particularly important given the predicted influence of climate change on species distributions and the potential for emerging infectious diseases. Avian haemosporidians are intracellular parasites that infect birds across the globe, sometimes with devastating consequences. Together with multi-institution, student-driven teams, we have been tackling two complex avian haemosporidian systems in Peru and New Mexico, USA. Using molecular and microscopic screening of extensive museum collections, we found that ~35% of birds are infected. In Peru, we screened nearly 4,000 birds representing 40 families and 523 species. After accounting for several environmental, life history, and ecological predictors of infection, we found that host phylogeny explains substantial variation in infection rate. In other words, susceptibility is deeply conserved across the avian tree, and is likely related to conserved aspects of the immune system. In New Mexico, we sampled avian haemosporidian communities in three mountain ranges to better understand the limits to parasite distributions. Haemosporidian communities exhibit structure on fine spatial scales, with most lineages occurring in a single mountain range, but a few widespread generalists infecting multiple host species. Ongoing work incorporating new genomic methods is improving estimates of the host and environmental range limits of haemosporidians, providing important baselines for identifying potential host switches or range expansions.

Dr. Kat Milligan-Myhre – Use of an Evolutionary Model to Determine the Role of Host Genetic Background on Microbiota

Microbiota are the microbes that live in and on a host. Disruption of the microbiota can lead to painful inflammation in the host, which can become chronic, as in the case of inflammatory bowel disease. Our lab focuses on the role the host genes play on the relationship between the microbiota and their host. Thus, we adapted the evolution and biomedical model organism, threespine stickleback fish (Gasterosteus aculeatus), for host-microbe studies. Stickleback are ideal for these studies due to their large family sizes, genetic variation within and between populations that is similar to human genetic variation within and between populations, and the tools available to study these interactions. We compared the development and behavior in fish raised germ free, with conventional microbiota, with mock communities of up to eight microbiota members, or with microbiota disrupted by antibiotic or environmental contaminants. We found that the populations varied in their response to these manipulations, indicating that the genetic variation between the populations contributed greater to the relationship between microbes and the host than the variation within the populations. We will use these results as a basis for future studies to identify the critical windows in development in which disruptions to gut microbiota result in short- and long-term consequences to host health, and determine the extent to which the host genetic background contributes to the ability of healthy gut microbial communities influence to fitness.

Posted in BEACON Researchers at Work, BEACONites, Diversity in STEM, Member Announcements | Tagged | Comments Off on Ecology/Evolution Scientific Symposium at the SACNAS National Conference

Evolutionary Computation Experts Video Collection

This blog post is by Risto Miikkulainen [1,2], Paul Jarratt [2], and Andrew Turner [2] from (1) The University of Texas at Austin and (2) Sentient Technologies

Given recent advances in evolutionary computation technology, available computational power, and opportunities for AI in the real world, we believe evolution is on the verge of a breakthrough, i.e. becoming the next Deep Learning. In order to chart the possibilities as well as challenges, we sat down at Sentient and at GECCO 2018 with a number of EC experts in both academia and industry to share their ideas about where AI is heading, and the role evolutionary computation can play in its future. The result is a collection of video interviews; they are organized around a number of specific questions so that you can explore those of interest to you. You can check out the collection at

Sentient plans to add more experts to this page in the future, so let us know if you’d like to contribute your point of view to this collection!

Posted in BEACON Researchers at Work, Education, Videos | Tagged , , , | Comments Off on Evolutionary Computation Experts Video Collection