Transcriptome

From Vero - Wikipedia
Jump to navigation Jump to search

Template:Short description The transcriptome is the set of all RNA molecules (transcripts) in a cell or a population of cells. It includes all of the functional RNA molecules and all other transcripts that may arise by spurious transcription or transcription of non-functional regions such as pseudogenes or virus fragments. A major goal of modern molecular biology is to determine which transcripts are functional and which ones are junk RNA.

The term transcriptome is a portmanteau of the words transcript and genome; it is associated with the process of transcript production during the biological process of transcription. The functional part of the transcriptome is dynamic — it changes with cell type, developmental stage, environment, and stimuli — and therefore represents the active gene expression state rather than the static DNA sequence (genome).

Eukaryotic transcriptomes tend to be more complex than bacterial transcriptomes and the transcriptomes of multicellular eukaryotes are even more complex than those of unicellular eukaryotes.

Etymology and history

The word transcriptome is a portmanteau of the words transcript and genome. It appeared along with other neologisms formed using the suffixes -ome and -omics to denote all studies conducted on a genome-wide scale in the fields of life sciences and technology. As such, transcriptome and transcriptomics were one of the first words to emerge along with genome and proteome.<ref name="etymology">Template:Cite journal</ref> The first study to present a case of a collection of a cDNA library for silk moth mRNA was published in 1979.<ref>Template:Cite journal</ref> The first seminal study to mention and investigate the transcriptome of an organism was published in 1997 and it described 60,633 transcripts expressed in S. cerevisiae using serial analysis of gene expression (SAGE).<ref>Template:Cite journal</ref> With the rise of high-throughput technologies and bioinformatics and the subsequent increased computational power, it became increasingly efficient and easy to characterize and analyze enormous amount of data.<ref name="etymology" /> Attempts to characterize the transcriptome became more prominent with the advent of automated DNA sequencing during the 1980s.<ref name="pertea" /> During the 1990s, expressed sequence tag sequencing was used to identify genes and their fragments.<ref name="microarrays">Template:Cite journal</ref> This was followed by techniques such as serial analysis of gene expression (SAGE), cap analysis of gene expression (CAGE), and massively parallel signature sequencing (MPSS).

Transcription

Template:See also The transcriptome encompasses all the ribonucleic acid (RNA) transcripts present in a given organism or experimental sample.<ref name = Brown2018b>Template:Cite book</ref> RNA is the main carrier of genetic information that is responsible for the process of converting DNA into an organism's phenotype. A gene can give rise to a single-stranded messenger RNA (mRNA) through a molecular process known as transcription; this mRNA is complementary to the strand of DNA it originated from.<ref name="pertea">Template:Cite journal</ref> The enzyme RNA polymerase II attaches to the template DNA strand and catalyzes the addition of ribonucleotides to the 3' end of the growing sequence of the mRNA transcript.<ref name="transcription">Template:Cite journal</ref>

In order to initiate its function, RNA polymerase II needs to recognize a promoter sequence, located upstream (5') of the gene. In eukaryotes, this process is mediated by transcription factors, most notably Transcription factor II D (TFIID) which recognizes the TATA box and aids in the positioning of RNA polymerase at the appropriate start site. To finish the production of the RNA transcript, termination takes place usually several hundred nuclecotides away from the termination sequence and cleavage takes place.<ref name="transcription" /> This process occurs in the nucleus of a cell along with RNA processing by which mRNA molecules are capped, spliced and polyadenylated to increase their stability before being subsequently taken to the cytoplasm. The mRNA gives rise to proteins through the process of translation that takes place in ribosomes.

Types of RNA transcripts

Almost all functional transcripts are derived from known genes. The only exceptions are a small number of transcripts that might play a direct role in regulating gene expression near the prompters of known genes. (See Enhancer RNA.)

Gene occupy most of prokaryotic genomes so most of their genomes are transcribed. Many eukaryotic genomes are very large and known genes may take up only a fraction of the genome. In mammals, for example, known genes only account for 40-50% of the genome.<ref name="Francis&Wörheide2017">Template:Cite journal</ref> Nevertheless, identified transcripts often map to a much larger fraction of the genome suggesting that the transcriptome contains spurious transcripts that do not come from genes. Some of these transcripts are known to be non-functional because they map to transcribed pseudogenes or degenerative transposons and viruses. Others map to unidentified regions of the genome that may be junk DNA.

Spurious transcription is very common in eukaryotes, especially those with large genomes that might contain a lot of junk DNA.<ref name = vanBakeletal2011>Template:Cite journal</ref><ref name = Jensenetal2013>Template:Cite journal</ref><ref name = Sverdlov2017>Template:Cite journal</ref><ref name = Wade&Grainger2018>Template:Cite journal</ref> Some scientists claim that if a transcript has not been assigned to a known gene then the default assumption must be that it is junk RNA until it has been shown to be functional.<ref name=vanBakeletal2011/><ref name = Palazzo&Lee2015>Template:Cite journal</ref> This would mean that much of the transcriptome in species with large genomes is probably junk RNA. (See Non-coding RNA)

The transcriptome includes the transcripts of protein-coding genes (mRNA plus introns) as well as the transcripts of non-coding genes (functional RNAs plus introns).

Scope of study

In the human genome, all genes get transcribed into RNA because that's how the molecular gene is defined. (See Gene.) The transcriptome consists of coding regions of mRNA plus non-coding UTRs, introns, non-coding RNAs, and spurious non-functional transcripts.

Several factors render the content of the transcriptome difficult to establish. These include alternative splicing, RNA editing and alternative transcription among others.<ref name="scitable">Template:Cite journal</ref> Additionally, transcriptome techniques are capable of capturing transcription occurring in a sample at a specific time point, although the content of the transcriptome can change during differentiation.<ref name="pertea" /> The main aims of transcriptomics are the following: "catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5′ and 3′ ends, splicing patterns and other post-transcriptional modifications; and to quantify the changing expression levels of each transcript during development and under different conditions".<ref name="biblio1">Template:Cite journal</ref>

The term can be applied to the total set of transcripts in a given organism, or to the specific subset of transcripts present in a particular cell type. Unlike the genome, which is roughly fixed for a given cell line (excluding mutations), the transcriptome can vary with external environmental conditions. Because it includes all mRNA transcripts in the cell, the transcriptome reflects the genes that are being actively expressed at any given time, with the exception of mRNA degradation phenomena such as transcriptional attenuation. The study of transcriptomics, (which includes expression profiling, splice variant analysis etc.), examines the expression level of RNAs in a given cell population, often focusing on mRNA, but sometimes including others such as tRNAs and sRNAs.

Methods of construction

Template:Main Transcriptomics is the quantitative science that encompasses the assignment of a list of strings ("reads") to the object ("transcripts" in the genome). To calculate the expression strength, the density of reads corresponding to each object is counted.<ref name="cellerinopre" /> Initially, transcriptomes were analyzed and studied using expressed sequence tags libraries and serial and cap analysis of gene expression (SAGE).

Currently, the two main transcriptomics techniques include DNA microarrays and RNA-Seq. Both techniques require RNA isolation through RNA extraction techniques, followed by its separation from other cellular components and enrichment of mRNA.<ref name="#9664454">Template:Cite book</ref><ref name="#2440339">Template:Cite journal</ref>

There are two general methods of inferring transcriptome sequences. One approach maps sequence reads onto a reference genome, either of the organism itself (whose transcriptome is being studied) or of a closely related species. The other approach, de novo transcriptome assembly, uses software to infer transcripts directly from short sequence reads and is used in organisms with genomes that are not sequenced.<ref name="scimag" />

DNA microarrays

Template:Main

File:Affymetrix-microarray.jpg
DNA microarray used to detect gene expression in human (left) and mouse (right) samples

The first transcriptome studies were based on microarray techniques (also known as DNA chips). Microarrays consist of thin glass layers with spots on which oligonucleotides, known as "probes" are arrayed; each spot contains a known DNA sequence.<ref>Template:Cite journal</ref>

When performing microarray analyses, mRNA is collected from a control and an experimental sample, the latter usually representative of a disease. The RNA of interest is converted to cDNA to increase its stability and marked with fluorophores of two colors, usually green and red, for the two groups. The cDNA is spread onto the surface of the microarray where it hybridizes with oligonucleotides on the chip and a laser is used to scan. The fluorescence intensity on each spot of the microarray corresponds to the level of gene expression and based on the color of the fluorophores selected, it can be determined which of the samples exhibits higher levels of the mRNA of interest.<ref name="microarrays" />

One microarray usually contains enough oligonucleotides to represent all known genes; however, data obtained using microarrays does not provide information about unknown genes. During the 2010s, microarrays were almost completely replaced by next-generation techniques that are based on DNA sequencing.

RNA sequencing

Template:Main RNA sequencing is a next-generation sequencing technology; as such it requires only a small amount of RNA and no previous knowledge of the genome.<ref name="etymology" /> It allows for both qualitative and quantitative analysis of RNA transcripts, the former allowing discovery of new transcripts and the latter a measure of relative quantities for transcripts in a sample.<ref name="cellerino12" />

The three main steps of sequencing transcriptomes of any biological samples include RNA purification, the synthesis of an RNA or cDNA library and sequencing the library.<ref name="cellerino12">Template:Harvnb</ref> The RNA purification process is different for short and long RNAs.<ref name="cellerino12" /> This step is usually followed by an assessment of RNA quality, with the purpose of avoiding contaminants such as DNA or technical contaminants related to sample processing. RNA quality is measured using UV spectrometry with an absorbance peak of 260 nm.<ref name="cellerino13">Template:Harvnb</ref> RNA integrity can also be analyzed quantitatively comparing the ratio and intensity of 28S RNA to 18S RNA reported in the RNA Integrity Number (RIN) score.<ref name="cellerino13" /> Since mRNA is the species of interest and it represents only 3% of its total content, the RNA sample should be treated to remove rRNA and tRNA and tissue-specific RNA transcripts.<ref name="cellerino13" />

The step of library preparation with the aim of producing short cDNA fragments, begins with RNA fragmentation to transcripts in length between 50 and 300 base pairs. Fragmentation can be enzymatic (RNA endonucleases), chemical (trismagnesium salt buffer, chemical hydrolysis) or mechanical (sonication, nebulisation).<ref name="cellerino18">Template:Harvnb</ref> Reverse transcription is used to convert the RNA templates into cDNA and three priming methods can be used to achieve it, including oligo-DT, using random primers or ligating special adaptor oligos.

Single-cell transcriptomics

Template:Main Transcription can also be studied at the level of individual cells by single-cell transcriptomics. Single-cell RNA sequencing (scRNA-seq) is a recently developed technique that allows the analysis of the transcriptome of single cells, including bacteria.<ref name="Toledo-Arana">Template:Cite journal</ref> With single-cell transcriptomics, subpopulations of cell types that constitute the tissue of interest are also taken into consideration.<ref>Template:Cite journal</ref> This approach allows to identify whether changes in experimental samples are due to phenotypic cellular changes as opposed to proliferation, with which a specific cell type might be overexpressed in the sample.<ref>Template:Cite journal</ref> Additionally, when assessing cellular progression through differentiation, average expression profiles are only able to order cells by time rather than their stage of development and are consequently unable to show trends in gene expression levels specific to certain stages.<ref>Template:Cite journal</ref> Single-cell trarnscriptomic techniques have been used to characterize rare cell populations such as circulating tumor cells, cancer stem cells in solid tumors, and embryonic stem cells (ESCs) in mammalian blastocysts.<ref name="kanter">Template:Cite journal</ref>

Although there are no standardized techniques for single-cell transcriptomics, several steps need to be undertaken. The first step includes cell isolation, which can be performed using low- and high-throughput techniques. This is followed by a qPCR step and then single-cell RNAseq where the RNA of interest is converted into cDNA. Newer developments in single-cell transcriptomics allow for tissue and sub-cellular localization preservation through cryo-sectioning thin slices of tissues and sequencing the transcriptome in each slice. Another technique allows the visualization of single transcripts under a microscope while preserving the spatial information of each individual cell where they are expressed.<ref name="kanter" />

Analysis

A number of organism-specific transcriptome databases have been constructed and annotated to aid in the identification of genes that are differentially expressed in distinct cell populations.

RNA-seq is emerging (2013) as the method of choice for measuring transcriptomes of organisms, though the older technique of DNA microarrays is still used.<ref name="biblio1" /> RNA-seq measures the transcription of a specific gene by converting long RNAs into a library of cDNA fragments. The cDNA fragments are then sequenced using high-throughput sequencing technology and aligned to a reference genome or transcriptome which is then used to create an expression profile of the genes.<ref name="biblio1" />

Applications

Humans

The number of protein-coding RNA sequences expression by each organ varies significantly between the organs, but also depends on the definitions and methodology used. In general, brain, testes, lymphatic system show the highest activity, and endometrium, gallbladder, seminal vesicle and smooth muscle show the lowest.<ref>Template:Cite journal</ref>

Table : The tissue elevated genes for each of the 36 tissue types, shown for the different categories of elevated expression. <ref>Template:Cite web</ref>

Tissue Tissue Group Tissue Total
enriched enriched enhanced elevated
Brain 475 457 1265 2197
Testis 937 296 759 1992
Lymphoid tissue 209 307 954 1470
Liver 263 178 537 978
Intestine 128 250 571 949
Bone marrow 115 172 655 942
Skeletal muscle 59 269 593 921
Retina 134 240 411 785
Skin 184 99 319 602
Tongue 3 226 256 485
Kidney 58 148 254 460
Choroid plexus 33 133 281 447
Esophagus 20 75 337 432
Heart muscle 36 138 245 419
Stomach 35 81 205 321
Epididymis 96 73 146 315
Pancreas 64 74 173 311
Fallopian tube 19 114 177 310
Salivary gland 43 80 177 300
Placenta 67 47 179 293
Pituitary gland 25 115 139 279
Adipose tissue 4 34 195 233
Adrenal gland 25 49 152 226
Parathyroid gland 29 36 136 201
Lung 17 42 136 195
Urinary bladder 7 37 147 191
Ovary 5 28 145 178
Cervix 0 40 136 176
Thyroid gland 13 27 134 174
Vagina 0 34 113 147
Breast 20 37 75 132
Prostate 15 26 85 126
Endometrium 3 13 73 89
Gallbladder 3 16 69 88
Seminal vesicle 6 13 54 73
Smooth muscle 0 7 43 50
Total 3150 1583 6197 10930


Mammals The transcriptomes of stem cells and cancer cells are of particular interest to researchers who seek to understand the processes of cellular differentiation and carcinogenesis. A pipeline using RNA-seq or gene array data can be used to track genetic changes occurring in stem and precursor cells and requires at least three independent gene expression data from the former cell type and mature cells.<ref>Template:Cite journal</ref>

Analysis of the transcriptomes of human oocytes and embryos is used to understand the molecular mechanisms and signaling pathways controlling early embryonic development, and could theoretically be a powerful tool in making proper embryo selection in in vitro fertilisation.Template:Citation needed Analyses of the transcriptome content of the placenta in the first-trimester of pregnancy in in vitro fertilization and embryo transfer (IVT-ET) revealed differences in genetic expression which are associated with higher frequency of adverse perinatal outcomes. Such insight can be used to optimize the practice.<ref>Template:Cite journal</ref> Transcriptome analyses can also be used to optimize cryopreservation of oocytes, by lowering injuries associated with the process.<ref>Template:Cite journal</ref>

Transcriptomics is an emerging and continually growing field in biomarker discovery for use in assessing the safety of drugs or chemical risk assessment.<ref name="David T Szabo">Template:Cite book</ref>

Transcriptomes may also be used to infer phylogenetic relationships among individuals or to detect evolutionary patterns of transcriptome conservation.<ref>Template:Cite journal</ref>

Transcriptome analyses were used to discover the incidence of antisense transcription, their role in gene expression through interaction with surrounding genes and their abundance in different chromosomes.<ref>Template:Cite journal</ref> RNA-seq was also used to show how RNA isoforms, transcripts stemming from the same gene but with different structures, can produce complex phenotypes from limited genomes.<ref name="scimag">Template:Cite journal</ref>

Plants

Transcriptome analysis have been used to study the evolution and diversification process of plant species. In 2014, the 1000 Plant Genomes Project was completed in which the transcriptomes of 1,124 plant species from the families viridiplantae, glaucophyta and rhodophyta were sequenced. The protein coding sequences were subsequently compared to infer phylogenetic relationships between plants and to characterize the time of their diversification in the process of evolution.<ref>Template:Cite journal</ref> Transcriptome studies have been used to characterize and quantify gene expression in mature pollen. Genes involved in cell wall metabolism and cytoskeleton were found to be overexpressed. Transcriptome approaches also allowed to track changes in gene expression through different developmental stages of pollen, ranging from microspore to mature pollen grains; additionally such stages could be compared across species of different plants including Arabidopsis, rice and tobacco.<ref>Template:Cite journal</ref>

Relation to other ome fields

File:Metabolomics schema.png
General schema showing the relationships of the genome, transcriptome, proteome, and metabolome (lipidome).

Similar to other -ome based technologies, analysis of the transcriptome allows for an unbiased approach when validating hypotheses experimentally. This approach also allows for the discovery of novel mediators in signaling pathways.<ref name="cellerinopre">Template:Harvnb</ref> As with other -omics based technologies, the transcriptome can be analyzed within the scope of a multiomics approach. It is complementary to metabolomics but contrary to proteomics, a direct association between a transcript and metabolite cannot be established.

There are several -ome fields that can be seen as subcategories of the transcriptome. The exome differs from the transcriptome in that it includes only those RNA molecules found in a specified cell population, and usually includes the amount or concentration of each RNA molecule in addition to the molecular identities. Additionally, the transcritpome also differs from the translatome, which is the set of RNAs undergoing translation.

The term meiome is used in functional genomics to describe the meiotic transcriptome or the set of RNA transcripts produced during the process of meiosis.<ref>Template:Cite journal</ref> Meiosis is a key feature of sexually reproducing eukaryotes, and involves the pairing of homologous chromosome, synapse and recombination. Since meiosis in most organisms occurs in a short time period, meiotic transcript profiling is difficult due to the challenge of isolation (or enrichment) of meiotic cells (meiocytes). As with transcriptome analyses, the meiome can be studied at a whole-genome level using large-scale transcriptomic techniques.<ref>Template:Cite journal</ref> The meiome has been well-characterized in mammal and yeast systems and somewhat less extensively characterized in plants.<ref>Template:Cite journal</ref>

The thanatotranscriptome consists of all RNA transcripts that continue to be expressed or that start getting re-expressed in internal organs of a dead body 24–48 hours following death. Some genes include those that are inhibited after fetal development. If the thanatotranscriptome is related to the process of programmed cell death (apoptosis), it can be referred to as the apoptotic thanatotranscriptome. Analyses of the thanatotranscriptome are used in forensic medicine.<ref>Template:Cite journal</ref>

eQTL mapping can be used to complement genomics with transcriptomics; genetic variants at DNA level and gene expression measures at RNA level.<ref>Template:Cite journal</ref>

Relation to proteome

Template:Further The transcriptome can be seen as a subset of the proteome, that is, the entire set of proteins expressed by a genome.

However, the analysis of relative mRNA expression levels can be complicated by the fact that relatively small changes in mRNA expression can produce large changes in the total amount of the corresponding protein present in the cell. One analysis method, known as gene set enrichment analysis, identifies coregulated gene networks rather than individual genes that are up- or down-regulated in different cell populations.Template:Ref

Although microarray studies can reveal the relative amounts of different mRNAs in the cell, levels of mRNA are not directly proportional to the expression level of the proteins they code for.<ref>Template:Cite journal</ref> The number of protein molecules synthesized using a given mRNA molecule as a template is highly dependent on translation-initiation features of the mRNA sequence; in particular, the ability of the translation initiation sequence is a key determinant in the recruiting of ribosomes for protein translation.

Transcriptome databases

Template:See also

  • Ensembl: [1]
  • OmicTools: [2]
  • Transcriptome Browser: [3]
  • ArrayExpress: [4]

See also

Template:Portal bar Template:Cmn

Notes

Template:Reflist

References

Template:Refbegin

Template:Refend

Further reading

Template:Refbegin

  • Template:Note Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102(43):15545-50.
  • Template:Note Laule O, Hirsch-Hoffmann M, Hruz T, Gruissem W, and P Zimmermann. (2006) Web-based analysis of the mouse transcriptome using Genevestigator. BMC Bioinformatics 7:311
  • Template:Note Template:Cite journal
  • Template:Note Template:Cite journal

Template:Refend

{{#invoke:Navbox|navbox}}