Finding the genetic changes in bacteria


In research published today in Nature Microbiology, MMM researchers have shown how powerful new DNA analysis techniques can detect mutations in bacteria which cause difficult to treat infections.

Sarah Earle, Jessie Wu and Jane Charlesworth  worked with other researchers in a group led by Principal Investigator Daniel Wilson to design and test this technique –  using ‘Bacterial Genome-Wide Association Studies’ on DNA from  Mycobacterium tuberculosisStaphylococcus aureusEscherichia coli and Klebsiella pneumoniae.  They showed that using the bacterial ‘family tree’ could improve standard methods, demonstrated they could detect mutations already known to make these bacteria resistant to antibiotics, and could also pick up new mutations not previously known to researchers.

Here Sarah describes the technique, and how it has been used to analyse common bacterial infections.


Analysing the DNA of bacteria 

Given the global burden of disease, there are lots of bacterial traits that we want to understand. Why do some strains cause disease and others don’t? What causes some strains to be resistant to antibiotics? Why are some strains most commonly found in particular hosts, and others are found in lots of hosts?

We can try to answer these kinds of questions using the genetic information of the bacteria. Genomes can vary in different ways, for example there can be small variations where a single DNA position in the genome can vary between individuals (be either an A, C, G, or T). Regions of DNA can be also deleted or inserted into the genome or be present multiple times. These are all types of genetic variants.

What analysis tools are already out there?

In the field of human genetics, genome-wide association studies (GWASs) have been successfully applied to find genetic variants in the human genome that are associated with susceptibility to many major diseases. The most common approach is a case-control study, which compares two groups of individuals: one which have a disease, and a healthy control group. The participants give a sample of their DNA, and the genetic variants at certain positions in the genome are determined. A GWAS study looks for DNA variants that are at significantly different frequencies between the two groups. If a particular variant is more common in individuals that have the disease, then the variant is said to be associated with the disease. As the variants can be across the whole of the genome, this is why they are called genome-wide association studies.

The problem of population structure

A problem faced by all GWASs is that lots of characteristics other than the one we want to find out about (for example disease risk) could also be at significantly different frequencies between the two groups. Imagine we are trying to find a human DNA variant that is associated with an increased risk of diabetes using a case-control study. If all individuals with diabetes had blue eyes, we might wrongly come to the conclusion that the genetic variation determining blue eyes is associated with an increased diabetes risk. These systematic differences in frequencies between cases and controls is known as population structure, and needs to be accounted for within statistical analyses. Although many methods have been developed to do this looking at human DNA, bacterial populations are much more structured than human ones, meaning that we need some new methods.


Our method finds lineages that are associated with bacterial traits

We can visualise this structure by looking at the relatedness between different bacteria using a phylogenetic tree. This is similar to a family tree, but with the length of the vertical branches giving information about how closely related the different bacteria are (Figure 1). Bacteria can be classified into different ‘lineages’ or ‘strains’ based on grouping them together by their relatedness. By including this strain structure in our statistical analyses, we would be essentially discarding its contribution to the variation in the bacterial trait. However, in this example, it is still valuable to know that strain 1 has a higher frequency of resistant samples than strain 2, so we don’t want to lose this information completely. Our method recovers this typically discarded strain information, and looks for associations between lineages and our traits of interest. We demonstrate the power of this method by looking at resistance to the antibiotic fusidic acid in Staphylococcus aureus.

How do the standard methods perform in bacteria?

The scientific community has recently begun to look into applying GWAS to bacteria (references 1-11). We set out to test bacterial GWASs in four major pathogens with different genome characteristics: Mycobacterium tuberculosis, Staphylococcus aureus, Escherichia coli and Klebsiella pneumoniae. We performed case-control studies, looking for associations between the DNA of 3144 bacteria, and their resistance profiles to 17 different antibiotics. We looked at antibiotic resistance because we have a good understanding of the genetic variants that cause these bacteria to become resistant, so we can check whether our methods find them. We found across our 26 studies, that performance of the standard methods was very good, finding either the real resistance causing genetic variant or a variant found close to it in the genome in 25 out of 26 of our studies. We also identified a gene in E. coli that is associated with resistance to the drug cefazolin. As this gene was not a previously known resistance mechanism, we believe this is a strong candidate for a novel resistance-conferring genetic mechanism.

What will happen for bacterial traits other than antibiotic resistance?

Although standard methods worked very well, antibiotic resistance is an unusual characteristic, and will be less affected by throwing away the lineage information. So to test what might happen when applying these studies to different characteristics, such as whether a bacterium causes disease or not, we used simulated data. This showed that using standard methods, the real genetic variant of interest was statistically significant in between just 1% and 41% of studies. However, if we used our method and didn’t just look at whether the DNA change was significant, but also whether the lineage/strain was significant, then we found significant lineages in between ~10% and 100% of studies. This means that by learning about the lineages associated with our traits of interest, we can gain valuable understanding of bacterial populations in a greater proportion of studies.

The future

GWASs have given us new opportunities to gain a greater understanding of the genetic basis behind variation in bacterial populations. Our method of identifying lineage effects provides an alternative when standard methods do not work due to the problem of structured populations. There is a lot of understanding to be gained from interpreting differences in bacterial characteristics between strains. Our methods will be applied in the CRyPTIC project to identify all genetic variants causing drug resistance in TB in order to speed up diagnosis of the disease. These methods are also not only applicable to bacteria, but to any organism which suffers from the same problem of strong population structure. We have packaged our methods into easy to use software, which we hope will provide a useful contribution to the scientific community.


This blog is by Sarah Earle, DPhil student at Modernising Medical Microbiology.

Original paper

Identifying lineage effects when controlling for population structure improves power in bacterial association studies.

Earle, S. G., Wu, C.-H., Charlesworth, J., Stoesser, N., Gordon, N. C., Walker, T. M., Spencer, C. C. A., Iqbal, Z., Clifton, D. A., Hopkins, K. L., Woodford, N., Smith, E. G., Ismail, N., Llewelyn, M. J., Peto, T. E., Crook, D. W., McVean, G., Walker, A. S. and D. J. Wilson (2016) Nature Microbiology

Our software

Bacterial GWAS pipeline:

R package which applies our lineage test:


  1. Falush, D. & Bowden, R. Genome-wide association mapping in bacteria? Trends Microbiol. 14, 353-355 (2006).
  2. Sheppard, S. K. et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc. Natl. Acad. Sci. USA 110, 11923-11927 (2013).
  3. Alam, M. T. et al. Dissecting vancomycin-intermediate resistance in Staphylococcus aureus using genome-wide association. Genome Biol. Evol. 6, 1174-1185 (2014).
  4. Laabei, M. et al. Predicting the virulence of MRSA from its genome sequence. Genome Res. 24, 839-849 (2014).
  5. Chewapreecha, C. et al. Comprehensive identification of single nucleotide polymorphisms associated with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet. 10, e1004547 (2014).
  6. Salipante, S. J. et al. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains. Genome Res. 25, 119-128 (2014).
  7. Read, T.D. & Massey, R. C. Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med. 6, 109 (2014).
  8. Fahrat, M.R., Shapiro, B.J., Sheppard, S.K., Colijn, C. & Murray, M. A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens. Genome Med. 6, 101 (2014).
  9. Hall, B.G. SNP-associations and phenotype predictions from hundreds of microbial genomes without genome alignments. PLoS ONE 9, e90490 (2014).
  10. Chen, P.E. & Shapiro, B.J. The advent of genome-wide association studies for bacteria. Current Op. Microbiol. 25, 17-24 (2015).
  11. Holt, K.E. et al. Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. Proc. Natl. Acad. Sci. USA 112, E3574-3581 (2015).



This entry was posted in mmm-news-and-updates, News. Bookmark the permalink.