Metagenomics

From BioMineWiki

Jump to: navigation, search

Contents

What is metagenomics?

Metagenomics is the use of DNA sequencing techniques to study DNA extracted directly from environmental samples. It is a culture-independent tool for studying environmental microorganisms. In addition to the information about taxonomic diversity (‘who is there’), metagenomics gives insight into the physiology of the organisms present in the environment (‘what are they doing’), through studying their genes.

Two main goals to be achieved with metagenomics are finding new genes with desired biological activity (bioprospecting) and studying environmental microbes without the need to culture them.

Why is it useful?

It has been estimated that >99% of the microbial numbers in nature are non-culturable by available techniques. Hence, new cultivation-independent methods to study the function and diversity of microorganisms in nature are needed. Metagenomics is an expanding field within microbial ecology that provides access to the genomes of the total microbial community (including the non-culturable microorganisms) in any given environment. This pool of genetic material is often referred to as the metagenome.

Metagenomic project workflow

To be able to study the diversity and function of a metagenome, a metagenomic library is constructed (Figure 1). First the prokaryotic cells are extracted from the environmental sample (e.g. soil, sediment, seawater). Subsequently, the total DNA content (the metagenome) is extracted and purified. During this procedure the purified DNA is sheared and contains fragments of different size. DNA fragments (inserts) of the appropriate size are then cloned into a cloning vector and transformed into a host cell. Here, the type of cloning vector is of crucial importance. The result is thousands of cells, each carrying a DNA fragment from the metagenome. All cells together build up the metagenomic library. Cloning can also be skipped, for example if 454-sequencing is used (see below). Metagenomics can be used for different purposes and therefore the project steps vary accordingly.

Image:metagenomic_library.jpg

Figure 1. Construction of a metagenomic library.

  1. Prokaryotic cells are extracted from the environmental sample (e.g. soil, sediment, seawater).
  2. The total DNA content (the metagenome) is extracted and purified. During this procedure the purified DNA is sheared and contains fragments of different size.
  3. DNA fragments (inserts) of the appropriate size are cloned into a cloning vector and transformed into a host cell.
  4. The result is thousands of cells, each carrying a DNA fragment from the metagenome. All cells together build up the metagenomic library.

Screening metagenomic libraries

For an already existant metagenomic library, different screening approaches can be used depending on the intented end-point.

Functional screening aims to detect a specific function (e.g. enzymatic or antibiotic activity) by expression of the genes in different screening assays Hårdeman 2007. Advantages with this approach are that it has the potential to find completely novel enzymes with only slight similarities to known sequences and full-length genes or gene clusters can be identified. A disadvantage is that it depends on compatible expression systems (e.g. E. coli).

The second approach is sequence-based screening. PCR-based methods can be used for the detection of phylogenetic marker genes (e.g. 16s rRNA genes) or different functional genes, thus enabling phylogenetic studies and linking a specific function to a certain environment.

Massive DNA sequencing projects

If general information about the community is the goal, then sequencing total DNA gives the most information. Most common are whole genome shotgun sequencing, where small-insert (using plasmids that usually take 2-5kb) libraries are constructed, and pyrosequencing (e.g. 454/pyrosequencing), where cloning is omitted. Metagenomic data contains pieces of genomics information from many different organisms mixed together, which poses challenges to analyzing it. In theory this approach has the potential to assemble complete community genomes but the difficulty to achieve complete genomes strongly correlates with the diversity in the given metagenome. In practice, the assembly of sequencing reads is only successful with low-diversity communities (for example acid mine drainage biofilm). For other cases, a method called fragment recruitment was proposed. It is basically comparing sequencing reads with completed microbial genomes and fishing out those that are highly similar to a given genome to put them together.

Image:MetagenomeAssembly.jpg

Figure 2. Schematic picture of assembly in a metagenomic project. In the assembly, lines represent reads and overlapping lines in the same/similar color represent contings. (a) Assembly is possible and can reveal polymorphisms in low-diversity communities. (b) For high-diversity communities only a fraction of reads ends up in contings and chimeric sequences can be produced.

Advantages

In contrast to techniques based on a single gene (usually 16S rRNA, like T-RFLP or DGGE), metagenomics gives much more information. Analysis of microbes' physiology is possible and biodiversity can be studied in more detail. Metagenomics is less biased than PCR and it gives information about relative abundance of different organisms and the community structure. Metagenomics captures polymorphism (different variants) present in natural communities, which makes sequence assembly even more difficult but contains additional information.

Disadvantages

Massive sequencing projects are still expensive. Data analysis can be hard. Assembling reads is impossible for medium or high diversity communities, chimeric sequences can easily be produced. Fragment recruitment methods depend on available genomes. Matching interesting genes with taxonomic groups is hard and can only be done if a marker gene (like 16S rRNA) is present on the same fragment of DNA. Studying low-abundance members of the community is impossible.

Metagenomic projects

CAMERA database

Metagenomic study of the marine bacterioplankton of surface waters. Samples collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded a dataset of 7.7 million sequencing reads (6.3 billion bp) available at the CAMERA database. CAMERA stands for Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis.

Bacteriorhodopsin discovery

One of the most spectacular metagenomics-based discoveries was that of bacterial rhodopsin (Béjà et al. 2000). Rhodopsins were known in eukaryotes, where they function primarily as sensory proteins involved in, for example, colour vision in the human retina. Many archaeal rhodopsins, energy-transducing light-driven protons or chloride pumps were also known. The first indication of bacterial rhodopsin came from a metagenomic study where a fragment containing rRNA from an uncultivated proteobacteria (SAR86) hosted a gene closely resembling archaeal rhodopsin. Bacteriorhodopsins are now known to be very abundant and are intensively studied.

References

Béjà et al.(2000) Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 289: 1902–1906 PubMed

Nature Reviews Microbiology June 2005 Focus on: Metagenomics

Sjöling, S. Stafford, W. and D. Cowan. (2006) Soil metagenomics: Exploring and exploiting the microbial gene pool. In Modern Soil Microbiology (eds VanElsas, Jansson & Trevors) Taylor and Francis CRC Press.

Sjöling, S and Cowan, D. (2008) Metagenomics: Microbial Community Genomes Revealed. In Psycrophiles: from diversity to Biotechnology (eds Margesin. R et al) Springer-Verlag Berlin Heidelberg

Tringe S & Rubin EM. Metagenomics: DNA sequencing of environmental samples. Nat Rev Genet. 6(11):805-14. PubMed

See also

Personal tools