Monoallelically-expressed Noncoding RNAs form nucleolar territories on NOR-containing chromosomes and regulate rRNA expression.
Hao Q, Liu M, Daulatabad SV, Gaffari S, Song YJ, Srivastava R, Bhaskar S, Moitra A, Mangan H, Tseng E, Gilmore RB, Frier SM, Chen X, Wang C, Huang S, Chamberlain S, Jin H, Korlach J, McStay B, Sinha S, Janga SC, Prasanth S, Prasanth KV. (2024 Jan 19)
Elife. pii: 80684.doi: 10.7554/eLife.80684
Inflammation primes the kidney for recovery by activating AZIN1 A-to-I editing.
Heruye S, Myslinski J, Zeng C, Zollman A, Makino S, Nanamatsu A, Mir Q, Janga SC, Doud EH, Eadon MT, Maier B, Hamada M, Tran TM, Dagher PC, Hato T. (2023 Nov 9)
bioRxiv. pii: 2023.11.09.566426.doi: 10.1101/2023.11.09.566426
Toggle abstract
The progression of kidney disease varies among individuals, but a general methodology to quantify disease timelines is lacking. Particularly challenging is the task of determining the potential for recovery from acute kidney injury following various insults. Here, we report that quantitation of post-transcriptional adenosine-to-inosine (A-to-I) RNA editing offers a distinct genome-wide signature, enabling the delineation of disease trajectories in the kidney. A well-defined murine model of endotoxemia permitted the identification of the origin and extent of A-to-I editing, along with temporally discrete signatures of double-stranded RNA stress and Adenosine Deaminase isoform switching. We found that A-to-I editing of Antizyme Inhibitor 1 (AZIN1), a positive regulator of polyamine biosynthesis, serves as a particularly useful temporal landmark during endotoxemia. Our data indicate that AZIN1 A-to-I editing, triggered by preceding inflammation, primes the kidney and activates endogenous recovery mechanisms. By comparing genetically modified human cell lines and mice locked in either A-to-I edited or uneditable states, we uncovered that AZIN1 A-to-I editing not only enhances polyamine biosynthesis but also engages glycolysis and nicotinamide biosynthesis to drive the recovery phenotype. Our findings implicate that quantifying AZIN1 A-to-I editing could potentially identify individuals who have transitioned to an endogenous recovery phase. This phase would reflect their past inflammation and indicate their potential for future recovery.
Revelation of genetic diversity and genomic footprints of adaptation in Indian pig breeds.
A V, Kumar A, Mahala S, Chandra Janga S, Chauhan A, Mehrotra A, Kumar De A, Ranjan Sahu A, Firdous Ahmad S, Vempadapu V, Dutt T. (2024 Jan 30)
Gene. pii: S0378-1119(23)00791-6.doi: 10.1016/j.gene.2023.147950
Toggle abstract
In the present study, the genetic diversity measures among four Indian domestic breeds of pig namely Agonda Goan, Ghurrah, Ghungroo, and Nicobari, of different agro-climatic regions of country were explored and compared with European commercial breeds, European wild boar and Chinese domestic breeds. The double digest restriction site-associated DNA sequencing (ddRADseq) data of Indian pigs (102) and Landrace (10 animals) were generated and whole genome sequencing data of exotic pigs (60 animals) from public data repository were used in the study. The principal component analysis (PCA), admixture analysis and phylogenetic analysis revealed that Indian breeds were closer in ancestry to Chinese breeds than European breeds. European breeds exhibited highest genetic diversity measures among all the considered breeds. Among Indian breeds, Agonda Goan and Ghurrah were found to be more genetically diverse than Nicobari and Ghungroo. The selection signature regions in Indian pigs were explored using iHS and XP-EHH, and during iHS analysis, it was observed that genes related to growth, reproduction, health, meat quality, sensory perception and behavior were found to be under selection pressure in Indian pig breeds. Strong selection signatures were recorded in 24.25-25.25 Mb region of SSC18, 123.25-124 Mb region of SSC15 and 118.75-119.5 Mb region of SSC2 in most of the Indian breeds upon pairwise comparison with European commercial breeds using XP-EHH. These regions were harboring some important genes such as EPHA4 for thermotolerance, TAS2R16, FEZF1, CADPS2 and PTPRZ1 for adaptability to scavenging system of rearing, TRIM36 and PGGT1B for disease resistance and CCDC112, PIAS1, FEM1B and ITGA11 for reproduction.
Read-depth based approach on whole genome resequencing data reveals important insights into the copy number variation (CNV) map of major global buffalo breeds.
Ahmad SF, Chandrababu Shailaja C, Vaishnav S, Kumar A, Gaur GK, Janga SC, Ahmad SM, Malla WA, Dutt T. (2023 Oct 16)
BMC Genomics. pii: 10.1186/s12864-023-09720-8.doi: 10.1186/s12864-023-09720-8
Toggle abstract
Elucidating genome-wide structural variants including copy number variations (CNVs) have gained increased significance in recent times owing to their contribution to genetic diversity and association with important pathophysiological states. The present study aimed to elucidate the high-resolution CNV map of six different global buffalo breeds using whole genome resequencing data at two coverages (10X and 30X). Post-quality control, the sequence reads were aligned to the latest draft release of the Bubaline genome. The genome-wide CNVs were elucidated using a read-depth approach in CNVnator with different bin sizes. Adjacent CNVs were concatenated into copy number variation regions (CNVRs) in different breeds and their genomic coverage was elucidated.
Experimental and computational methods for studying the dynamics of RNA-RNA interactions in SARS-COV2 genomes.
Srivastava M, Dukeshire MR, Mir Q, Omoru OB, Manzourolajdad A, Janga SC. (2024 Jan 18)
Brief Funct Genomics. pii: 7030841.doi: 10.1093/bfgp/elac050
Toggle abstract
Long-range ribonucleic acid (RNA)-RNA interactions (RRI) are prevalent in positive-strand RNA viruses, including Beta-coronaviruses, and these take part in regulatory roles, including the regulation of sub-genomic RNA production rates. Crosslinking of interacting RNAs and short read-based deep sequencing of resulting RNA-RNA hybrids have shown that these long-range structures exist in severe acute respiratory syndrome coronavirus (SARS-CoV)-2 on both genomic and sub-genomic levels and in dynamic topologies. Furthermore, co-evolution of coronaviruses with their hosts is navigated by genetic variations made possible by its large genome, high recombination frequency and a high mutation rate. SARS-CoV-2’s mutations are known to occur spontaneously during replication, and thousands of aggregate mutations have been reported since the emergence of the virus. Although many long-range RRIs have been experimentally identified using high-throughput methods for the wild-type SARS-CoV-2 strain, evolutionary trajectory of these RRIs across variants, impact of mutations on RRIs and interaction of SARS-CoV-2 RNAs with the host have been largely open questions in the field. In this review, we summarize recent computational tools and experimental methods that have been enabling the mapping of RRIs in viral genomes, with a specific focus on SARS-CoV-2. We also present available informatics resources to navigate the RRI maps and shed light on the impact of mutations on the RRI space in viral genomes. Investigating the evolution of long-range RNA interactions and that of virus-host interactions can contribute to the understanding of new and emerging variants as well as aid in developing improved RNA therapeutics critical for combating future outbreaks.
Sequoia: A Framework for Visual Analysis of RNA Modifications from Direct RNA Sequencing Data.
Koonchanok R, Daulatabad SV, Reda K, Janga SC. (2023)
Methods Mol Biol.doi: 10.1007/978-1-0716-2962-8_9
Toggle abstract
Oxford Nanopore-based long-read direct RNA sequencing protocols are being increasingly used to study the dynamics of RNA metabolic processes due to improvements in read lengths, increased throughput, decreasing cost, ease of library preparation, and convenience. Long-read sequencing enables single-molecule-based detection of posttranscriptional changes, promising novel insights into the functional roles of RNA. However, fulfilling this potential will necessitate the development of new tools for analyzing and exploring this type of data. Although there are tools that allow users to analyze signal information, such as comparing raw signal traces to a nucleotide sequence, they don’t facilitate studying each individual signal instance in each read or perform analysis of signal clusters based on signal similarity. Therefore, we present Sequoia, a visual analytics application that allows users to interactively analyze signals originating from nanopore sequencers and can readily be extended to both RNA and DNA sequencing datasets. Sequoia combines a Python-based backend with a multi-view graphical interface that allows users to ingest raw nanopore sequencing data in Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to find attributes of interest. In this tutorial, we illustrate each individual step involved in running Sequoia and in the process dissect input data characteristics. We show how to generate Nanopore sequencing-based visualizations by leveraging dimensionality reduction and parameter tuning to separate modified RNA sequences from their unmodified counterparts. Sequoia’s interactive features enhance nanopore-based computational methodologies. Sequoia enables users to construct rationales and hypotheses and develop insights about the dynamic nature of RNA from the visual analysis. Sequoia is available at https://github.com/dnonatar/Sequoia .
Epitranscriptomics in parasitic protists: Role of RNA chemical modifications in posttranscriptional gene regulation.
Catacalos C, Krohannon A, Somalraju S, Meyer KD, Janga SC, Chakrabarti K. (2022 Dec)
PLoS Pathog. pii: PPATHOGENS-D-22-01385.doi: 10.1371/journal.ppat.1010972
Toggle abstract
“Epitranscriptomics” is the new RNA code that represents an ensemble of posttranscriptional RNA chemical modifications, which can precisely coordinate gene expression and biological processes. There are several RNA base modifications, such as N6-methyladenosine (m6A), 5-methylcytosine (m5C), and pseudouridine (Ψ), etc. that play pivotal roles in fine-tuning gene expression in almost all eukaryotes and emerging evidences suggest that parasitic protists are no exception. In this review, we primarily focus on m6A, which is the most abundant epitranscriptomic mark and regulates numerous cellular processes, ranging from nuclear export, mRNA splicing, polyadenylation, stability, and translation. We highlight the universal features of spatiotemporal m6A RNA modifications in eukaryotic phylogeny, their homologs, and unique processes in 3 unicellular parasites-Plasmodium sp., Toxoplasma sp., and Trypanosoma sp. and some technological advances in this rapidly developing research area that can significantly improve our understandings of gene expression regulation in parasites.
Combining transfer learning with retinal lesion features for accurate detection of diabetic retinopathy.
Hassan D, Gill HM, Happe M, Bhatwadekar AD, Hajrasouliha AR, Janga SC. (2022)
Front Med (Lausanne).doi: 10.3389/fmed.2022.1050436
Toggle abstract
Diabetic retinopathy (DR) is a late microvascular complication of Diabetes Mellitus (DM) that could lead to permanent blindness in patients, without early detection. Although adequate management of DM
A Putative long-range RNA-RNA interaction between ORF8 and Spike of SARS-CoV-2.
Omoru OB, Pereira F, Janga SC, Manzourolajdad A. (2022)
PLoS One. pii: PONE-D-21-35236.doi: 10.1371/journal.pone.0260331
Toggle abstract
SARS-CoV-2 has affected people worldwide as the causative agent of COVID-19. The virus is related to the highly lethal SARS-CoV-1 responsible for the 2002-2003 SARS outbreak in Asia. Research is ongoing to understand why both viruses have different spreading capacities and mortality rates. Like other beta coronaviruses, RNA-RNA interactions occur between different parts of the viral genomic RNA, resulting in discontinuous transcription and production of various sub-genomic RNAs. These sub-genomic RNAs are then translated into other viral proteins. In this work, we performed a comparative analysis for novel long-range RNA-RNA interactions that may involve the Spike region. Comparing in-silico fragment-based predictions between reference sequences of SARS-CoV-1 and SARS-CoV-2 revealed several predictions amongst which a thermodynamically stable long-range RNA-RNA interaction between (23660-23703 Spike) and (28025-28060 ORF8) unique to SARS-CoV-2 was observed. The patterns of sequence variation using data gathered worldwide further supported the predicted stability of the sub-interacting region (23679-23690 Spike) and (28031-28042 ORF8). Such RNA-RNA interactions can potentially impact viral life cycle including sub-genomic RNA production rates.
FOXP3 exon 2 controls T(reg) stability and autoimmunity.
Du J, Wang Q, Yang S, Chen S, Fu Y, Spath S, Domeier P, Hagin D, Anover-Sombke S, Haouili M, Liu S, Wan J, Han L, Liu J, Yang L, Sangani N, Li Y, Lu X, Janga SC, Kaplan MH, Torgerson TR, Ziegler SF, Zhou B. (2022 Jun 24)
Sci Immunol.doi: 10.1126/sciimmunol.abo5407
CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion.
Krohannon A, Srivastava M, Rauch S, Srivastava R, Dickinson BC, Janga SC. (2022 Mar 2)
BMC Genomics. pii: 10.1186/s12864-022-08366-2.doi: 10.1186/s12864-022-08366-2
Toggle abstract
Recent discovery of the gene editing system – CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems. Cas13, a lesser studied Cas protein, has been repurposed to allow for efficient and precise editing of RNA molecules. The Cas13 system utilizes base complementarity between a crRNA/sgRNA (crispr RNA or single guide RNA) and a target RNA transcript, to preferentially bind to only the target transcript. Unlike targeting the upstream regulatory regions of protein coding genes on the genome, the transcriptome is significantly more redundant, leading to many transcripts having wide stretches of identical nucleotide sequences. Transcripts also exhibit complex three-dimensional structures and interact with an array of RBPs (RNA Binding Proteins), both of which may impact the effectiveness of transcript depletion of target sequences. However, our understanding of the features and corresponding methods which can predict whether a specific sgRNA will effectively knockdown a transcript is very limited.
Penguin: A tool for predicting pseudouridine sites in direct RNA nanopore sequencing data.
Hassan D, Acevedo D, Daulatabad SV, Mir Q, Janga SC. (2022 Jul)
Methods. pii: S1046-2023(22)00035-4.doi: 10.1016/j.ymeth.2022.02.005
Toggle abstract
Pseudouridine is one of the most abundant RNA modifications, occurring when uridines are catalyzed by Pseudouridine synthase proteins. It plays an important role in many biological processes and has been reported to have application in drug development. Recently, the single-molecule sequencing techniques such as the direct RNA sequencing platform offered by Oxford Nanopore technologies have enabled direct detection of RNA modifications on the molecule being sequenced. In this study, we introduce a tool called Penguin that integrates several machine learning (ML) models to identify RNA Pseudouridine sites on Nanopore direct RNA sequencing reads. Pseudouridine sites were identified on single molecule sequencing data collected from direct RNA sequencing resulting in 723 K reads in Hek293 and 500 K reads in Hela cell lines. Penguin extracts a set of features from the raw signal measured by the Oxford Nanopore and the corresponding basecalled k-mer. Those features are used to train the predictors included in Penguin, which in turn, can predict whether the signal is modified by the presence of Pseudouridine sites in the testing phase. We have included various predictors in Penguin, including Support vector machines (SVM), Random Forest (RF), and Neural network (NN). The results on the two benchmark data sets for Hek293 and Hela cell lines show outstanding performance of Penguin either in random split testing or in independent validation testing. In random split testing, Penguin has been able to identify Pseudouridine sites with a high accuracy of 93.38% by applying SVM to Hek293 benchmark dataset. In independent validation testing, Penguin achieves an accuracy of 92.61% by training SVM with Hek293 benchmark dataset and testing it for identifying Pseudouridine sites on Hela benchmark dataset. Thus, Penguin outperforms the existing Pseudouridine predictors in the literature by 16 % higher accuracy than those predictors using independent validation testing. Employing penguin to predict Pseudouridine sites revealed a significant enrichment of “regulation of mRNA 3′-end processing” in Hek293 cell line and ‘positive regulation of transcription from RNA polymerase II promoter involved in cellular response to chemical stimulus’ in Hela cell line. Penguin software and models are available on GitHub at https://github.com/Janga-Lab/Penguin and can be readily employed for predicting Ψ sites from Nanopore direct RNA-sequencing datasets.
Esophageal Microbiome in Healthy Children and Esophageal Eosinophilia.
Parashette KR, Sarsani VK, Toh E, Janga SC, Nelson DE, Gupta SK. (2022 May 1)
J Pediatr Gastroenterol Nutr. pii: 00005176-202205000-00015.doi: 10.1097/MPG.0000000000003413
Toggle abstract
There is limited knowledge about the role of esophageal microbiome in pediatric esophageal eosinophilia (EE). We aimed to characterize the esophageal microbiome in pediatric patients with and without EE.
Geographical Landscape and Transmission Dynamics of SARS-CoV-2 Variants Across India: A Longitudinal Perspective.
Jha N, Hall D, Kanakan A, Mehta P, Maurya R, Mir Q, Gill HM, Janga SC, Pandey R. (2021)
Front Genet. pii: 753648.doi: 10.3389/fgene.2021.753648
Toggle abstract
Globally, SARS-CoV-2 has moved from one tide to another with ebbs in between. Genomic surveillance has greatly aided the detection and tracking of the virus and the identification of the variants of concern (VOC). The knowledge and understanding from genomic surveillance is important for a populous country like India for public health and healthcare officials for advance planning. An integrative analysis of the publicly available datasets in GISAID from India reveals the differential distribution of clades, lineages, gender, and age over a year (Apr 2020-Mar 2021). The significant insights include the early evidence towards B.1.617 and B.1.1.7 lineages in the specific states of India. Pan-India longitudinal data highlighted that B.1.36* was the predominant clade in India until January-February 2021 after which it has gradually been replaced by the B.1.617.1 lineage, from December 2020 onward. Regional analysis of the spread of SARS-CoV-2 indicated that B.1.617.3 was first seen in India in the month of October in the state of Maharashtra, while the now most prevalent strain B.1.617.2 was first seen in Bihar and subsequently spread to the states of Maharashtra, Gujarat, and West Bengal. To enable a real time understanding of the transmission and evolution of the SARS-CoV-2 genomes, we built a transmission map available on https://covid19-indiana.soic.iupui.edu/India/EmergingLineages/April2020/to/March2021. Based on our analysis, the rate estimate for divergence in our dataset was 9.48 e-4 substitutions per site/year for SARS-CoV-2. This would enable pandemic preparedness with the addition of future sequencing data from India available in the public repositories for tracking and monitoring the VOCs and variants of interest (VOI). This would help aid decision making from the public health perspective.
Comparative Analysis of Alternative Splicing Profiles in Th Cell Subsets Reveals Extensive Cell Type-Specific Effects Modulated by a Network of Transcription Factors and RNA-Binding Proteins.
Mir Q, Lakshmipati DK, Ulrich BJ, Kaplan MH, Janga SC. (2021 Sep 28)
Immunohorizons. pii: immunohorizons.2100060.doi: 10.4049/immunohorizons.2100060
Toggle abstract
Alternative splicing (AS) plays an important role in the development of many cell types; however, its contribution to Th subsets has been clearly defined. In this study, we compare mice naive CD4
Mutational Landscape and Interaction of SARS-CoV-2 with Host Cellular Components.
Srivastava M, Hall D, Omoru OB, Gill HM, Smith S, Janga SC. (2021 Aug 24)
Microorganisms. pii: microorganisms9091794.doi: 10.3390/microorganisms9091794
Toggle abstract
The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its rapid evolution has led to a global health crisis. Increasing mutations across the SARS-CoV-2 genome have severely impacted the development of effective therapeutics and vaccines to combat the virus. However, the new SARS-CoV-2 variants and their evolutionary characteristics are not fully understood. Host cellular components such as the ACE2 receptor, RNA-binding proteins (RBPs), microRNAs, small nuclear RNA (snRNA), 18s rRNA, and the 7SL RNA component of the signal recognition particle (SRP) interact with various structural and non-structural proteins of the SARS-CoV-2. Several of these viral proteins are currently being examined for designing antiviral therapeutics. In this review, we discuss current advances in our understanding of various host cellular components targeted by the virus during SARS-CoV-2 infection. We also summarize the mutations across the SARS-CoV-2 genome that directs the evolution of new viral strains. Considering coronaviruses are rapidly evolving in humans, this enables them to escape therapeutic therapies and vaccine-induced immunity. In order to understand the virus’s evolution, it is essential to study its mutational patterns and their impact on host cellular machinery. Finally, we present a comprehensive survey of currently available databases and tools to study viral-host interactions that stand as crucial resources for developing novel therapeutic strategies for combating SARS-CoV-2 infection.
Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets.
Koonchanok R, Daulatabad SV, Mir Q, Reda K, Janga SC. (2021 Jul 7)
BMC Genomics. pii: 10.1186/s12864-021-07791-z.doi: 10.1186/s12864-021-07791-z
Toggle abstract
Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data.
Lantern: an integrative repository of functional annotations for lncRNAs in the human genome.
Daulatabad SV, Srivastava R, Janga SC. (2021 May 26)
BMC Bioinformatics. pii: 10.1186/s12859-021-04207-3.doi: 10.1186/s12859-021-04207-3
Toggle abstract
With advancements in omics technologies, the range of biological processes where long non-coding RNAs (lncRNAs) are involved, is expanding extensively, thereby generating the need to develop lncRNA annotation resources. Although, there are a plethora of resources for annotating genes, despite the extensive corpus of lncRNA literature, the available resources with lncRNA ontology annotations are rare.
Transcriptome-wide high-throughput mapping of protein-RNA occupancy profiles using POP-seq.
Srivastava M, Srivastava R, Janga SC. (2021 Jan 13)
Sci Rep. pii: 10.1038/s41598-020-80846-5.doi: 10.1038/s41598-020-80846-5
Toggle abstract
Interaction between proteins and RNA is critical for post-transcriptional regulatory processes. Existing high throughput methods based on crosslinking of the protein-RNA complexes and poly-A pull down are reported to contribute to biases and are not readily amenable for identifying interaction sites on non poly-A RNAs. We present Protein Occupancy Profile-Sequencing (POP-seq), a phase separation based method in three versions, one of which does not require crosslinking, thus providing unbiased protein occupancy profiles on whole cell transcriptome without the requirement of poly-A pulldown. Our study demonstrates that ~ 68% of the total POP-seq peaks exhibited an overlap with publicly available protein-RNA interaction profiles of 97 RNA binding proteins (RBPs) in K562 cells. We show that POP-seq variants consistently capture protein-RNA interaction sites across a broad range of genes including on transcripts encoding for transcription factors (TFs), RNA-Binding Proteins (RBPs) and long non-coding RNAs (lncRNAs). POP-seq identified peaks exhibited a significant enrichment (p value < 2.2e-16) for GWAS SNPs, phenotypic, clinically relevant germline as well as somatic variants reported in cancer genomes, suggesting the prevalence of uncharacterized genomic variation in protein occupied sites on RNA. We demonstrate that the abundance of POP-seq peaks increases with an increase in expression of lncRNAs, suggesting that highly expressed lncRNA are likely to act as sponges for RBPs, contributing to the rewiring of protein-RNA interaction network in cancer cells. Overall, our data supports POP-seq as a robust and cost-effective method that could be applied to primary tissues for mapping global protein occupancies.
The S-phase-induced lncRNA SUNO1 promotes cell proliferation by controlling YAP1/Hippo signaling pathway.
Hao Q, Zong X, Sun Q, Lin YC, Song YJ, Hashemikhabir S, Hsu RY, Kamran M, Chaudhary R, Tripathi V, Singh DK, Chakraborty A, Li XL, Kim YJ, Orjalo AV, Polycarpou-Schwarz M, Moriarity BS, Jenkins LM, Johansson HE, Zhu YJ, Diederichs S, Bagchi A, Kim TH, Janga SC, Lal A, Prasanth SG, Prasanth KV. (2020 Oct 27)
Elife. pii: 55102.doi: 10.7554/eLife.55102
Toggle abstract
Cell cycle is a cellular process that is subject to stringent control. In contrast to the wealth of knowledge of proteins controlling the cell cycle, very little is known about the molecular role of lncRNAs (long noncoding RNAs) in cell-cycle progression. By performing genome-wide transcriptome analyses in cell-cycle-synchronized cells, we observed cell-cycle phase-specific induction of >2000 lncRNAs. Further, we demonstrate that an S-phase-upregulated lncRNA,
Role of SARS-CoV-2 in Altering the RNA-Binding Protein and miRNA-Directed Post-Transcriptional Regulatory Networks in Humans.
Srivastava R, Daulatabad SV, Srivastava M, Janga SC. (2020 Sep 25)
Int J Mol Sci. pii: ijms21197090.doi: 10.3390/ijms21197090
Toggle abstract
The outbreak of a novel coronavirus SARS-CoV-2 responsible for the COVID-19 pandemic has caused a worldwide public health emergency. Due to the constantly evolving nature of the coronaviruses, SARS-CoV-2-mediated alterations on post-transcriptional gene regulations across human tissues remain elusive. In this study, we analyzed publicly available genomic datasets to systematically dissect the crosstalk and dysregulation of the human post-transcriptional regulatory networks governed by RNA-binding proteins (RBPs) and micro-RNAs (miRs) due to SARS-CoV-2 infection. We uncovered that 13 out of 29 SARS-CoV-2-encoded proteins directly interacted with 51 human RBPs, of which the majority of them were abundantly expressed in gonadal tissues and immune cells. We further performed a functional analysis of differentially expressed genes in mock-treated versus SARS-CoV-2-infected lung cells that revealed enrichment for the immune response, cytokine-mediated signaling, and metabolism-associated genes. This study also characterized the alternative splicing events in SARS-CoV-2-infected cells compared to the control, demonstrating that skipped exons and mutually exclusive exons were the most abundant events that potentially contributed to differential outcomes in response to the viral infection. A motif enrichment analysis on the RNA genomic sequence of SARS-CoV-2 clearly revealed the enrichment for RBPs such as SRSFs, PCBPs, ELAVs, and HNRNPs, suggesting the sponging of RBPs by the SARS-CoV-2 genome. A similar analysis to study the interactions of miRs with SARS-CoV-2 revealed functionally important miRs that were highly expressed in immune cells, suggesting that these interactions may contribute to the progression of the viral infection and modulate the host immune response across other human tissues. Given the need to understand the interactions of SARS-CoV-2 with key post-transcriptional regulators in the human genome, this study provided a systematic computational analysis to dissect the role of dysregulated post-transcriptional regulatory networks controlled by RBPs and miRs across tissue types during a SARS-CoV-2 infection.
STAT5 promotes accessibility and is required for BATF-mediated plasticity at the Il9 locus.
Fu Y, Wang J, Panangipalli G, Ulrich BJ, Koh B, Xu C, Kharwadkar R, Chu X, Wang Y, Gao H, Wu W, Sun J, Tepper RS, Zhou B, Janga SC, Yang K, Kaplan MH. (2020 Sep 28)
Nat Commun. pii: 10.1038/s41467-020-18648-6.doi: 10.1038/s41467-020-18648-6
Toggle abstract
T helper cell differentiation requires lineage-defining transcription factors and factors that have shared expression among multiple subsets. BATF is required for development of multiple Th subsets but functions in a lineage-specific manner. BATF is required for IL-9 production in Th9 cells but in contrast to its function as a pioneer factor in Th17 cells, BATF is neither sufficient nor required for accessibility at the Il9 locus. Here we show that STAT5 is the earliest factor binding and remodeling the Il9 locus to allow BATF binding in both mouse and human Th9 cultures. The ability of STAT5 to mediate accessibility for BATF is observed in other Th lineages and allows acquisition of the IL-9-secreting phenotype. STAT5 and BATF convert Th17 cells into cells that mediate IL-9-dependent effects in allergic airway inflammation and anti-tumor immunity. Thus, BATF requires the STAT5 signal to mediate plasticity at the Il9 locus.
A long non-coding RNA (Lrap) modulates brain gene expression and levels of alcohol consumption in rats.
Saba LM, Hoffman PL, Homanics GE, Mahaffey S, Daulatabad SV, Janga SC, Tabakoff B. (2021 Feb)
Genes Brain Behav.doi: 10.1111/gbb.12698
Toggle abstract
LncRNAs are important regulators of quantitative and qualitative features of the transcriptome. We have used QTL and other statistical analyses to identify a gene coexpression module associated with alcohol consumption. The “hub gene” of this module, Lrap (Long non-coding RNA for alcohol preference), was an unannotated transcript resembling a lncRNA. We used partial correlation analyses to establish that Lrap is a major contributor to the integrity of the coexpression module. Using CRISPR/Cas9 technology, we disrupted an exon of Lrap in Wistar rats. Measures of alcohol consumption in wild type, heterozygous and knockout rats showed that disruption of Lrap produced increases in alcohol consumption/alcohol preference. The disruption of Lrap also produced changes in expression of over 700 other transcripts. Furthermore, it became apparent that Lrap may have a function in alternative splicing of the affected transcripts. The GO category of “Response to Ethanol” emerged as one of the top candidates in an enrichment analysis of the differentially expressed transcripts. We validate the role of Lrap as a mediator of alcohol consumption by rats, and also implicate Lrap as a modifier of the expression and splicing of a large number of brain transcripts. A defined subset of these transcripts significantly impacts alcohol consumption by rats (and possibly humans). Our work shows the pleiotropic nature of non-coding elements of the genome, the power of network analysis in identifying the critical elements influencing phenotypes, and the fact that not all changes produced by genetic editing are critical for the concomitant changes in phenotype.
Granzyme A-producing T helper cells are critical for acute graft-versus-host disease.
Park S, Griesenauer B, Jiang H, Adom D, Mehrpouya-Bahrami P, Chakravorty S, Kazemian M, Imam T, Srivastava R, Hayes TA, Pardo J, Janga SC, Paczesny S, Kaplan MH, Olson MR. (2020 Sep 17)
JCI Insight. pii: 124465.doi: 10.1172/jci.insight.124465
Toggle abstract
Acute graft-versus-host disease (aGVHD) can occur after hematopoietic cell transplant in patients undergoing treatment for hematological malignancies or inborn errors. Although CD4+ T helper (Th) cells play a major role in aGVHD, the mechanisms by which they contribute, particularly within the intestines, have remained elusive. We have identified a potentially novel subset of Th cells that accumulated in the intestines and produced the serine protease granzyme A (GrA). GrA+ Th cells were distinct from other Th lineages and exhibited a noncytolytic phenotype. In vitro, GrA+ Th cells differentiated in the presence of IL-4, IL-6, and IL-21 and were transcriptionally unique from cells cultured with either IL-4 or the IL-6/IL-21 combination alone. In vivo, both STAT3 and STAT6 were required for GrA+ Th cell differentiation and played roles in maintenance of the lineage identity. Importantly, GrA+ Th cells promoted aGVHD-associated morbidity and mortality and contributed to crypt destruction within intestines but were not required for the beneficial graft-versus-leukemia effect. Our data indicate that GrA+ Th cells represent a distinct Th subset and are critical mediators of aGVHD.
Role of SARS-CoV-2 in altering the RNA binding protein and miRNA directed post-transcriptional regulatory networks in humans.
Srivastava R, Daulatabad SV, Srivastava M, Janga SC. (2020 Sep 22)
bioRxiv. pii: 2020.07.06.190348.doi: 10.1101/2020.07.06.190348
Toggle abstract
The outbreak of a novel coronavirus SARS-CoV-2 responsible for COVID-19 pandemic has caused worldwide public health emergency. Due to the constantly evolving nature of the coronaviruses, SARS-CoV-2 mediated alteration on post-transcriptional gene regulation across human tissues remains elusive. In this study, we analyze publicly available genomic datasets to systematically dissect the crosstalk and dysregulation of human post-transcriptional regulatory networks governed by RNA binding proteins (RBPs) and micro-RNAs (miRs), due to SARS-CoV-2 infection. We uncovered that 13 out of 29 SARS-CoV-2 encoded proteins directly interact with 51 human RBPs of which majority of them were abundantly expressed in gonadal tissues and immune cells. We further performed a functional analysis of differentially expressed genes in mock-treated versus SARS-CoV-2 infected lung cells that revealed enrichment for immune response, cytokine-mediated signaling, and metabolism associated genes. This study also characterized the alternative splicing events in SARS-CoV-2 infected cells compared to control demonstrating that skipped exons and mutually exclusive exons were the most abundant events that potentially contributed to differential outcomes in response to viral infection. Motif enrichment analysis on the RNA genomic sequence of SARS-CoV-2 clearly revealed the enrichment for RBPs such as SRSFs, PCBPs, ELAVs, and HNRNPs suggesting the sponging of RBPs by SARS-CoV-2 genome. A similar analysis to study the interactions of miRs with SARS-CoV-2 revealed functionally important miRs that were highly expressed in immune cells, suggesting that these interactions may contribute to the progression of the viral infection and modulate host immune response across other human tissues. Given the need to understand the interactions of SARS-CoV-2 with key post-transcriptional regulators in the human genome, this study provides a systematic computational analysis to dissect the role of dysregulated post-transcriptional regulatory networks controlled by RBPs and miRs, across tissues types during SARS-CoV-2 infection.
New Twists in Detecting mRNA Modification Dynamics.
Anreiter I, Mir Q, Simpson JT, Janga SC, Soller M. (2021 Jan)
Trends Biotechnol. pii: S0167-7799(20)30166-9.doi: 10.1016/j.tibtech.2020.06.002
Toggle abstract
Modified nucleotides in mRNA are an essential addition to the standard genetic code of four nucleotides in animals, plants, and their viruses. The emerging field of epitranscriptomics examines nucleotide modifications in mRNA and their impact on gene expression. The low abundance of nucleotide modifications and technical limitations, however, have hampered systematic analysis of their occurrence and functions. Selective chemical and immunological identification of modified nucleotides has revealed global candidate topology maps for many modifications in mRNA, but further technical advances to increase confidence will be necessary. Single-molecule sequencing introduced by Oxford Nanopore now promises to overcome such limitations, and we summarize current progress with a particular focus on the bioinformatic challenges of this novel sequencing technology.
Targeting Bim via a lncRNA Morrbid Regulates the Survival of Preleukemic and Leukemic Cells.
Cai Z, Aguilera F, Ramdas B, Daulatabad SV, Srivastava R, Kotzin JJ, Carroll M, Wertheim G, Williams A, Janga SC, Zhang C, Henao-Mejia J, Kapur R. (2020 Jun 23)
Cell Rep. pii: S2211-1247(20)30797-X.doi: 10.1016/j.celrep.2020.107816
Toggle abstract
Inhibition of anti-apoptotic proteins BCL-2 and MCL-1 to release pro-apoptotic protein BIM and reactivate cell death could potentially be an efficient strategy for the treatment of leukemia. Here, we show that a lncRNA, MORRBID, a selective transcriptional repressor of BIM, is overexpressed in human acute myeloid leukemia (AML), which is associated with poor overall survival. In both human and animal models, MORRBID hyperactivation correlates with two recurrent AML drivers, TET2 and FLT3
Bcl6 and Blimp1 reciprocally regulate ST2(+) Treg-cell development in the context of allergic airway inflammation.
Koh B, Ulrich BJ, Nelson AS, Panangipalli G, Kharwadkar R, Wu W, Xie MM, Fu Y, Turner MJ, Paczesny S, Janga SC, Dent AL, Kaplan MH. (2020 Nov)
J Allergy Clin Immunol. pii: S0091-6749(20)30340-7.doi: 10.1016/j.jaci.2020.03.002
Toggle abstract
Bcl6 is required for the development of T follicular helper cells and T follicular regulatory (Tfr) cells that regulate germinal center responses. Bcl6 also affects the function of regulatory T (Treg) cells.
Embryonic ethanol exposure alters expression of sox2 and other early transcripts in zebrafish, producing gastrulation defects.
Sarmah S, Srivastava R, McClintick JN, Janga SC, Edenberg HJ, Marrs JA. (2020 Mar 3)
Sci Rep. pii: 10.1038/s41598-020-59043-x.doi: 10.1038/s41598-020-59043-x
Toggle abstract
Ethanol exposure during prenatal development causes fetal alcohol spectrum disorder (FASD), the most frequent preventable birth defect and neurodevelopmental disability syndrome. The molecular targets of ethanol toxicity during development are poorly understood. Developmental stages surrounding gastrulation are very sensitive to ethanol exposure. To understand the effects of ethanol on early transcripts during embryogenesis, we treated zebrafish embryos with ethanol during pre-gastrulation period and examined the transcripts by Affymetrix GeneChip microarray before gastrulation. We identified 521 significantly dysregulated genes, including 61 transcription factors in ethanol-exposed embryos. Sox2, the key regulator of pluripotency and early development was significantly reduced. Functional annotation analysis showed enrichment in transcription regulation, embryonic axes patterning, and signaling pathways, including Wnt, Notch and retinoic acid. We identified all potential genomic targets of 25 dysregulated transcription factors and compared their interactions with the ethanol-dysregulated genes. This analysis predicted that Sox2 targeted a large number of ethanol-dysregulated genes. A gene regulatory network analysis showed that many of the dysregulated genes are targeted by multiple transcription factors. Injection of sox2 mRNA partially rescued ethanol-induced gene expression, epiboly and gastrulation defects. Additional studies of this ethanol dysregulated network may identify therapeutic targets that coordinately regulate early development.
A Circular RNA from the MDM2 Locus Controls Cell Cycle Progression by Suppressing p53 Levels.
Chaudhary R, Muys BR, Grammatikakis I, De S, Abdelmohsen K, Li XL, Zhu Y, Daulatabad SV, Tsitsipatis D, Meltzer PS, Gorospe M, Janga SC, Lal A. (2020 Apr 13)
Mol Cell Biol. pii: MCB.00473-19.doi: 10.1128/MCB.00473-19
Toggle abstract
Circular RNAs (circRNAs) are a class of noncoding RNAs produced by a noncanonical form of alternative splicing called back-splicing. To investigate a potential role of circRNAs in the p53 pathway, we analyzed RNA sequencing (RNA-seq) data from colorectal cancer cell lines (HCT116, RKO, and SW48) that were untreated or treated with a DNA-damaging agent. Surprisingly, unlike the strong p53-dependent induction of hundreds of p53-induced mRNAs upon DNA damage, only a few circRNAs were upregulated from p53-induced genes.
Human protein-RNA interaction network is highly stable across mammals.
Ramakrishnan A, Janga SC. (2019 Dec 30)
BMC Genomics. pii: 10.1186/s12864-019-6330-9.doi: 10.1186/s12864-019-6330-9
Toggle abstract
RNA-binding proteins (RBPs) are crucial in modulating RNA metabolism in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Although previous studies on the conservation of RBP targets have been carried out in lower eukaryotes such as yeast, relatively little is known about the extent of conservation of the binding sites of RBPs across mammalian species.
SliceIt: A genome-wide resource and visualization tool to design CRISPR/Cas9 screens for editing protein-RNA interaction sites in the human genome.
Vemuri S, Srivastava R, Mir Q, Hashemikhabir S, Dong XC, Janga SC. (2020 Jun 1)
Methods. pii: S1046-2023(19)30111-2.doi: 10.1016/j.ymeth.2019.09.004
Toggle abstract
Several protein-RNA cross linking protocols have been established in recent years to delineate the molecular interaction of an RNA Binding Protein (RBP) and its target RNAs. However, functional dissection of the role of the RBP binding sites in modulating the post-transcriptional fate of the target RNA remains challenging. CRISPR/Cas9 genome editing system is being commonly employed to perturb both coding and noncoding regions in the genome. With the advancements in genome-scale CRISPR/Cas9 screens, it is now possible to not only perturb specific binding sites but also probe the global impact of protein-RNA interaction sites across cell types. Here, we present SliceIt (http://sliceit.soic.iupui.edu/), a database of in silico sgRNA (single guide RNA) library to facilitate conducting such high throughput screens. SliceIt comprises of ~4.8 million unique sgRNAs with an estimated range of 2-8 sgRNAs designed per RBP binding site, for eCLIP experiments of >100 RBPs in HepG2 and K562 cell lines from the ENCODE project. SliceIt provides a user friendly environment, developed using advanced search engine framework, Elasticsearch. It is available in both table and genome browser views facilitating the easy navigation of RBP binding sites, designed sgRNAs, exon expression levels across 53 human tissues along with prevalence of SNPs and GWAS hits on binding sites. Exon expression profiles enable examination of locus specific changes proximal to the binding sites. Users can also upload custom tracks of various file formats directly onto genome browser, to navigate additional genomic features in the genome and compare with other types of omics profiles. All the binding site-centric information is dynamically accessible via “search by gene”, “search by coordinates” and “search by RBP” options and readily available to download. Validation of the sgRNA library in SliceIt was performed by selecting RBP binding sites in Lipt1 gene and designing sgRNAs. Effect of CRISPR/Cas9 perturbations on the selected binding sites in HepG2 cell line, was confirmed based on altered proximal exon expression levels using qPCR, further supporting the utility of the resource to design experiments for perturbing protein-RNA interaction networks. Thus, SliceIt provides a one-stop repertoire of guide RNA library to perturb RBP binding sites, along with several layers of functional information to design both low and high throughput CRISPR/Cas9 screens, for studying the phenotypes and diseases associated with RBP binding sites.
Long Non-Coding RNA Expression Levels Modulate Cell-Type-Specific Splicing Patterns by Altering Their Interaction Landscape with RNA-Binding Proteins.
Porto FW, Daulatabad SV, Janga SC. (2019 Aug 6)
Genes (Basel). pii: genes10080593.doi: 10.3390/genes10080593
Toggle abstract
Recent developments in our understanding of the interactions between long non-coding RNAs (lncRNAs) and cellular components have improved treatment approaches for various human diseases including cancer, vascular diseases, and neurological diseases. Although investigation of specific lncRNAs revealed their role in the metabolism of cellular RNA, our understanding of their contribution to post-transcriptional regulation is relatively limited. In this study, we explore the role of lncRNAs in modulating alternative splicing and their impact on downstream protein-RNA interaction networks. Analysis of alternative splicing events across 39 lncRNA knockdown and wildtype RNA-sequencing datasets from three human cell lines-HeLa (cervical cancer), K562 (myeloid leukemia), and U87 (glioblastoma)-resulted in the high-confidence (false discovery rate (fdr) < 0.01) identification of 11,630 skipped exon events and 5895 retained intron events, implicating 759 genes to be impacted at the post-transcriptional level due to the loss of lncRNAs. We observed that a majority of the alternatively spliced genes in a lncRNA knockdown were specific to the cell type. In tandem, the functions annotated to the genes affected by alternative splicing across each lncRNA knockdown also displayed cell-type specificity. To understand the mechanism behind this cell-type-specific alternative splicing pattern, we analyzed RNA-binding protein (RBP)-RNA interaction profiles across the spliced regions in order to observe cell-type-specific alternative splice event RBP binding preference. Despite limited RBP binding data across cell lines, alternatively spliced events detected in lncRNA perturbation experiments were associated with RBPs binding in proximal intron-exon junctions in a cell-type-specific manner. The cellular functions affected by alternative splicing were also affected in a cell-type-specific manner. Based on the RBP binding profiles in HeLa and K562 cells, we hypothesize that several lncRNAs are likely to exhibit a sponge effect in disease contexts, resulting in the functional disruption of RBPs and their downstream functions. We propose that such lncRNA sponges can extensively rewire post-transcriptional gene regulatory networks by altering the protein-RNA interaction landscape in a cell-type-specific manner.
Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.
Eller RJ, Janga SC, Walsh S. (2019 Jun 28)
BMC Bioinformatics. pii: 10.1186/s12859-019-2964-5.doi: 10.1186/s12859-019-2964-5
Toggle abstract
Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses.
Early transcriptome profile of goat peripheral blood mononuclear cells (PBMCs) infected with peste des petits ruminant’s vaccine virus (Sungri/96) revealed induction of antiviral response in an interferon independent manner.
Manjunath S, Saxena S, Mishra B, Santra L, Sahu AR, Wani SA, Tiwari AK, Mishra BP, Singh RK, Janga SC, Kumar GR. (2019 Jun)
Res Vet Sci. pii: S0034-5288(18)30351-5.doi: 10.1016/j.rvsc.2019.03.014
Toggle abstract
Sungri/96 vaccine strain is considered the most potent vaccine providing long-term immunity against peste des petits ruminants (PPR) in India. Previous studies in our laboratory highlighted induction of robust antiviral response in an interferon independent manner at 48 h and 120 h post infection (p.i.). However, immune response at the earliest time point 6 h p.i. (time taken to complete one PPRV life cycle), in PBMCs infected with Sungri/96 vaccine virus has not been investigated. This study was taken up to understand the global gene expression profiling of goat PBMCs after Sungri/96 PPRV vaccine strain infection at 6 h post infection (p.i.). A total of 1926 differentially expressed genes (DEGs) were identified with 616 – upregulated and 1310 – downregulated. TLR7/TLR3, IRF7/IRF1, ISG20, IFIT1/IFIT2, IFITM3, IL27 and TREX1 were identified as key immune sensors and antiviral candidate genes. Interestingly, type I interferons (IFNα/β) were not differentially expressed at this time point as well. TREX1, an exonuclease which inhibits type I interferons at the early stage of virus infection was found to be highly upregulated. IL27, an important antiviral host immune factor was significantly upregulated. ISG20, an antiviral interferon induced gene with exonuclease activity specific to ssRNA viruses was highly expressed. Functional profiling of DEGs showed significant enrichment of immune system processes with 233 genes indicating initiation of immune defense response in host cells. Protein interaction network showed important innate immune molecules in the immune network with high connectivity. The study highlights important immune and antiviral genes at the earliest time point.
Splicing factor ESRP1 controls ER-positive breast cancer by altering metabolic pathways.
Gökmen-Polar Y, Neelamraju Y, Goswami CP, Gu Y, Gu X, Nallamothu G, Vieth E, Janga SC, Ryan M, Badve SS. (2019 Feb)
EMBO Rep. pii: embr.201846078.doi: 10.15252/embr.201846078
Toggle abstract
The epithelial splicing regulatory proteins 1 and 2 (ESRP1 and ESRP2) control the epithelial-to-mesenchymal transition (EMT) splicing program in cancer. However, their role in breast cancer recurrence is unclear. In this study, we report that high levels of ESRP1, but not ESRP2, are associated with poor prognosis in estrogen receptor positive (ER+) breast tumors. Knockdown of ESRP1 in endocrine-resistant breast cancer models decreases growth significantly and alters the EMT splicing signature, which we confirm using TCGA SpliceSeq data of ER+ BRCA tumors. However, these changes are not accompanied by the development of a mesenchymal phenotype or a change in key EMT-transcription factors. In tamoxifen-resistant cells, knockdown of ESRP1 affects lipid metabolism and oxidoreductase processes, resulting in the decreased expression of fatty acid synthase (FASN), stearoyl-CoA desaturase 1 (SCD1), and phosphoglycerate dehydrogenase (PHGDH) at both the mRNA and protein levels. Furthermore, ESRP1 knockdown increases the basal respiration and spare respiration capacity. This study reports a novel role for ESRP1 that could form the basis for the prevention of tamoxifen resistance in ER+ breast cancer.
Large expert-curated database for benchmarking document similarity detection in biomedical literature search.
Brown P, RELISH Consortium, Zhou Y. (2019 Jan 1)
Database (Oxford). pii: 5608006.doi: 10.1093/database/baz085
Toggle abstract
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.
A conserved enhancer regulates Il9 expression in multiple lineages.
Koh B, Abdul Qayum A, Srivastava R, Fu Y, Ulrich BJ, Janga SC, Kaplan MH. (2018 Nov 15)
Nat Commun. pii: 10.1038/s41467-018-07202-0.doi: 10.1038/s41467-018-07202-0
Toggle abstract
Cytokine genes are regulated by multiple regulatory elements that confer tissue-specific and activation-dependent expression. The cis-regulatory elements of the gene encoding IL-9, a cytokine that promotes allergy, autoimmune inflammation and tumor immunity, have not been defined. Here we identify an enhancer (CNS-25) upstream of the Il9 gene that binds most transcription factors (TFs) that promote Il9 gene expression. Deletion of the enhancer in the mouse germline alters transcription factor binding to the remaining Il9 regulatory elements, and results in diminished IL-9 production in multiple cell types including Th9 cells, and attenuates IL-9-dependent immune responses. Moreover, deletion of the homologous enhancer (CNS-18) in primary human Th9 cultures results in significant decrease of IL-9 production. Thus, Il9 CNS-25/IL9 CNS-18 is a critical and conserved regulatory element for IL-9 production.
Epitranscriptomic Code and Its Alterations in Human Disease.
Kadumuri RV, Janga SC. (2018 Oct)
Trends Mol Med. pii: S1471-4914(18)30149-7.doi: 10.1016/j.molmed.2018.07.010
Toggle abstract
Innovations in epitranscriptomics have resulted in the identification of more than 160 RNA modifications to date. These developments, together with the recent discovery of writers, readers, and erasers of modifications occurring across a wide range of RNAs and tissue types, have led to a surge in integrative approaches for transcriptome-wide mapping of modifications and protein-RNA interaction profiles of epitranscriptome players. RNA modification maps and crosstalk between them have begun to elucidate the role of modifications as signaling switches, entertaining the notion of an epitranscriptomic code as a driver of the post-transcriptional fate of RNA. Emerging single-molecule sequencing technologies and development of antibodies specific to various RNA modifications could enable charting of transcript-specific epitranscriptomic marks across cell types and their alterations in disease.
Loss of epigenetic regulator TET2 and oncogenic KIT regulate myeloid cell transformation via PI3K pathway.
Palam LR, Mali RS, Ramdas B, Srivatsan SN, Visconte V, Tiu RV, Vanhaesebroeck B, Roers A, Gerbaulet A, Xu M, Janga SC, Takemoto CM, Paczesny S, Kapur R. (2018 Feb 22)
JCI Insight. pii: 94679.doi: 10.1172/jci.insight.94679
Toggle abstract
Mutations in KIT and TET2 are associated with myeloid malignancies. We show that loss of TET2-induced PI3K activation and -increased proliferation is rescued by targeting the p110α/δ subunits of PI3K. RNA-Seq revealed a hyperactive c-Myc signature in Tet2-/- cells, which is normalized by inhibiting PI3K signaling. Loss of TET2 impairs the maturation of myeloid lineage-derived mast cells by dysregulating the expression of Mitf and Cebpa, which is restored by low-dose ascorbic acid and 5-azacytidine. Utilizing a mouse model in which the loss of TET2 precedes the expression of oncogenic Kit, similar to the human disease, results in the development of a non-mast cell lineage neoplasm (AHNMD), which is responsive to PI3K inhibition. Thus, therapeutic approaches involving hypomethylating agents, ascorbic acid, and isoform-specific PI3K inhibitors are likely to be useful for treating patients with TET2 and KIT mutations.
Express: A database of transcriptome profiles encompassing known and novel transcripts across multiple development stages in eye tissues.
Budak G, Dash S, Srivastava R, Lachke SA, Janga SC. (2018 Mar)
Exp Eye Res. pii: S0014-4835(16)30560-7.doi: 10.1016/j.exer.2018.01.009
Toggle abstract
Advances in sequencing have facilitated nucleotide-resolution genome-wide transcriptomic profiles across multiple mouse eye tissues. However, these RNA sequencing (RNA-seq) based eye developmental transcriptomes are not organized for easy public access, making any further analysis challenging. Here, we present a new database “Express” (http://www.iupui.edu/∼sysbio/express/) that unifies various mouse lens and retina RNA-seq data and provides user-friendly visualization of the transcriptome to facilitate gene discovery in the eye. We obtained RNA-seq data encompassing 7 developmental stages of lens in addition to that on isolated lens epithelial and fibers, as well as on 11 developmental stages of retina/isolated retinal rod photoreceptor cells from publicly available wild-type mouse datasets. These datasets were pre-processed, aligned, quantified and normalized for expression levels of known and novel transcripts using a unified expression quantification framework. Express provides heatmap and browser view allowing easy navigation of the genomic organization of transcripts or gene loci. Further, it allows users to search candidate genes and export both the visualizations and the embedded data to facilitate downstream analysis. We identified total of >81,000 transcripts in the lens and >178,000 transcripts in the retina across all the included developmental stages. This analysis revealed that a significant number of the retina-expressed transcripts are novel. Expression of several transcripts in the lens and retina across multiple developmental stages was independently validated by RT-qPCR for established genes such as Pax6 and Lhx2 as well as for new candidates such as Elavl4, Rbm5, Pabpc1, Tia1 and Tubb2b. Thus, Express serves as an effective portal for analyzing pruned RNA-seq expression datasets presently collected for the lens and retina. It will allow a wild-type context for the detailed analysis of targeted gene-knockout mouse ocular defect models and facilitate the prioritization of candidate genes from Exome-seq data of eye disease patients.
Mutational landscape of RNA-binding proteins in human cancers.
Neelamraju Y, Gonzalez-Perez A, Bhat-Nakshatri P, Nakshatri H, Janga SC. (2018 Jan 2)
RNA Biol.doi: 10.1080/15476286.2017.1391436
Toggle abstract
RNA Binding Proteins (RBPs) are a class of post-transcriptional regulatory molecules which are increasingly documented to be dysfunctional in cancer genomes. However, our current understanding of these alterations is limited. Here, we delineate the mutational landscape of ∼1300 RBPs in ∼6000 cancer genomes. Our analysis revealed that RBPs have an average of ∼3 mutations per Mb across 26 cancer types. We identified 281 RBPs to be enriched for mutations (GEMs) in at least one cancer type. GEM RBPs were found to undergo frequent frameshift and inframe deletions as well as missense, nonsense and silent mutations when compared to those that are not enriched for mutations. Functional analysis of these RBPs revealed the enrichment of pathways associated with apoptosis, splicing and translation. Using the OncodriveFM framework, we also identified more than 200 candidate driver RBPs that were found to accumulate functionally impactful mutations in at least one cancer. Expression levels of 15% of these driver RBPs exhibited significant difference, when transcriptome groups with and without deleterious mutations were compared. Functional interaction network of the driver RBPs revealed the enrichment of spliceosomal machinery, suggesting a plausible mechanism for tumorogenesis while network analysis of the protein interactions between RBPs unambiguously revealed the higher degree, betweenness and closeness centrality for driver RBPs compared to non-drivers. Analysis to reveal cancer-specific Ribonucleoprotein (RNP) mutational hotspots showed extensive rewiring even among common drivers between cancer types. Knockdown experiments on pan-cancer drivers such as SF3B1 and PRPF8 in breast cancer cell lines, revealed cancer subtype specific functions like selective stem cell features, indicating a plausible means for RBPs to mediate cancer-specific phenotypes. Hence, this study would form a foundation to uncover the contribution of the mutational spectrum of RBPs in dysregulating the post-transcriptional regulatory networks in different cancer types.
Transcriptome analysis of developing lens reveals abundance of novel transcripts and extensive splicing alterations.
Srivastava R, Budak G, Dash S, Lachke SA, Janga SC. (2017 Sep 14)
Sci Rep. pii: 10.1038/s41598-017-10615-4.doi: 10.1038/s41598-017-10615-4
Toggle abstract
Lens development involves a complex and highly orchestrated regulatory program. Here, we investigate the transcriptomic alterations and splicing events during mouse lens formation using RNA-seq data from multiple developmental stages, and construct a molecular portrait of known and novel transcripts. We show that the extent of novelty of expressed transcripts decreases significantly in post-natal lens compared to embryonic stages. Characterization of novel transcripts into partially novel transcripts (PNTs) and completely novel transcripts (CNTs) (novelty score ≥ 70%) revealed that the PNTs are both highly conserved across vertebrates and highly expressed across multiple stages. Functional analysis of PNTs revealed their widespread role in lens developmental processes while hundreds of CNTs were found to be widely expressed and predicted to encode for proteins. We verified the expression of four CNTs across stages. Examination of splice isoforms revealed skipped exon and retained intron to be the most abundant alternative splicing events during lens development. We validated by RT-PCR and Sanger sequencing, the predicted splice isoforms of several genes Banf1, Cdk4, Cryaa, Eif4g2, Pax6, and Rbm5. Finally, we present a splicing browser Eye Splicer ( http://www.iupui.edu/~sysbio/eye-splicer/ ), to facilitate exploration of developmentally altered splicing events and to improve understanding of post-transcriptional regulatory networks during mouse lens development.
RNA Editing in Pathogenesis of Cancer.
Baysal BE, Sharma S, Hashemikhabir S, Janga SC. (2017 Jul 15)
Cancer Res. pii: 0008-5472.CAN-17-0520.doi: 10.1158/0008-5472.CAN-17-0520
Toggle abstract
Several adenosine or cytidine deaminase enzymes deaminate transcript sequences in a cell type or environment-dependent manner by a programmed process called RNA editing. RNA editing enzymes catalyze A>I or C>U transcript alterations and have the potential to change protein coding sequences. In this brief review, we highlight some recent work that shows aberrant patterns of RNA editing in cancer. Transcriptome sequencing studies reveal increased or decreased global RNA editing levels depending on the tumor type. Altered RNA editing in cancer cells may provide a selective advantage for tumor growth and resistance to apoptosis. RNA editing may promote cancer by dynamically recoding oncogenic genes, regulating oncogenic gene expression by noncoding RNA and miRNA editing, or by transcriptome scale changes in RNA editing levels that may affect innate immune signaling. Although RNA editing markedly increases complexity of the cancer cell transcriptomes, cancer-specific recoding RNA editing events have yet to be discovered. Epitranscriptomic changes by RNA editing in cancer represent a novel mechanism contributing to sequence diversity independently of DNA mutations. Therefore, RNA editing studies should complement genome sequence data to understand the full impact of nucleic acid sequence alterations in cancer.
PSIP1/p75 promotes tumorigenicity in breast cancer cells by promoting the transcription of cell cycle genes.
Singh DK, Gholamalamdari O, Jadaliha M, Ling Li X, Lin YC, Zhang Y, Guang S, Hashemikhabir S, Tiwari S, Zhu YJ, Khan A, Thomas A, Chakraborty A, Macias V, Balla AK, Bhargava R, Janga SC, Ma J, Prasanth SG, Lal A, Prasanth KV. (2017 Oct 1)
Carcinogenesis. pii: 3869807.doi: 10.1093/carcin/bgx062
Toggle abstract
Breast cancer (BC) is a highly heterogeneous disease, both at the pathological and molecular level, and several chromatin-associated proteins play crucial roles in BC initiation and progression. Here, we demonstrate the role of PSIP1 (PC4 and SF2 interacting protein)/p75 (LEDGF) in BC progression. PSIP1/p75, previously identified as a chromatin-adaptor protein, is found to be upregulated in basal-like/triple negative breast cancer (TNBC) patient samples and cell lines. Immunohistochemistry in tissue arrays showed elevated levels of PSIP1 in metastatic invasive ductal carcinoma. Survival data analyses revealed that the levels of PSIP1 showed a negative association with TNBC patient survival. Depletion of PSIP1/p75 significantly reduced the tumorigenicity and metastatic properties of TNBC cell lines while its over-expression promoted tumorigenicity. Further, gene expression studies revealed that PSIP1 regulates the expression of genes controlling cell-cycle progression, cell migration and invasion. Finally, by interacting with RNA polymerase II, PSIP1/p75 facilitates the association of RNA pol II to the promoter of cell cycle genes and thereby regulates their transcription. Our findings demonstrate an important role of PSIP1/p75 in TNBC tumorigenicity by promoting the expression of genes that control the cell cycle and tumor metastasis.
Paracrine IL-2 Is Required for Optimal Type 2 Effector Cytokine Production.
Olson MR, Ulrich BJ, Hummel SA, Khan I, Meuris B, Cherukuri Y, Dent AL, Janga SC, Kaplan MH. (2017 Jun 1)
J Immunol. pii: jimmunol.1601792.doi: 10.4049/jimmunol.1601792
Toggle abstract
IL-2 is a pleiotropic cytokine that promotes the differentiation of Th cell subsets, including Th1, Th2, and Th9 cells, but it impairs the development of Th17 and T follicular helper cells. Although IL-2 is produced by all polarized Th subsets to some level, how it impacts cytokine production when effector T cells are restimulated is unknown. We show in this article that Golgi transport inhibitors (GTIs) blocked IL-9 production. Mechanistically, GTIs blocked secretion of IL-2 that normally feeds back in a paracrine manner to promote STAT5 activation and IL-9 production. IL-2 feedback had no effect on Th1- or Th17-signature cytokine production, but it promoted Th2- and Th9-associated cytokine expression. These data suggest that the use of GTIs results in an underestimation of the presence of type 2 cytokine-secreting cells and highlight IL-2 as a critical component in optimal cytokine production by Th2 and Th9 cells in vitro and in vivo.
Seten: a tool for systematic identification and comparison of processes, phenotypes, and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles.
Budak G, Srivastava R, Janga SC. (2017 Jun)
RNA. pii: rna.059089.116.doi: 10.1261/rna.059089.116
Toggle abstract
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten’s web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/.
Community-acquired rhinovirus infection is associated with changes in the airway microbiome.
Kloepfer KM, Sarsani VK, Poroyko V, Lee WM, Pappas TE, Kang T, Grindle KA, Bochkov YA, Janga SC, Lemanske RF Jr, Gern JE. (2017 Jul)
J Allergy Clin Immunol. pii: S0091-6749(17)30411-6.doi: 10.1016/j.jaci.2017.01.038
No abstract found.
Comparative and temporal transcriptome analysis of peste des petits ruminants virus infected goat peripheral blood mononuclear cells.
Manjunath S, Mishra BP, Mishra B, Sahoo AP, Tiwari AK, Rajak KK, Muthuchelvan D, Saxena S, Santra L, Sahu AR, Wani SA, Singh RP, Singh YP, Pandey A, Kanchan S, Singh RK, Kumar GR, Janga SC. (2017 Feb 2)
Virus Res. pii: S0168-1702(16)30549-4.doi: 10.1016/j.virusres.2016.12.014
Toggle abstract
Peste des petits ruminanats virus (PPRV), a morbillivirus causes an acute, highly contagious disease – peste des petits ruminants (PPR), affecting goats and sheep. Sungri/96 vaccine strain is widely used for mass vaccination programs in India against PPR and is considered the most potent vaccine providing long-term immunity. However, occurrence of outbreaks due to emerging PPR viruses may be a challenge. In this study, the temporal dynamics of immune response in goat peripheral blood mononuclear cells (PBMCs) infected with Sungri/96 vaccine virus was investigated by transcriptome analysis. Infected goat PBMCs at 48h and 120h post infection revealed 2540 and 2000 differentially expressed genes (DEGs), respectively, on comparison with respective controls. Comparison of the infected samples revealed 1416 DEGs to be altered across time points. Functional analysis of DEGs reflected enrichment of TLR signaling pathways, innate immune response, inflammatory response, positive regulation of signal transduction and cytokine production. The upregulation of innate immune genes during early phase (between 2-5 days) viz. interferon regulatory factors (IRFs), tripartite motifs (TRIM) and several interferon stimulated genes (ISGs) in infected PBMCs and interactome analysis indicated induction of broad-spectrum anti-viral state. Several Transcription factors – IRF3, FOXO3 and SP1 that govern immune regulatory pathways were identified to co-regulate the DEGs. The results from this study, highlighted the involvement of both innate and adaptive immune systems with the enrichment of complement cascade observed at 120h p.i., suggestive of a link between innate and adaptive immune response. Based on the transcriptome analysis and qRT-PCR validation, an in vitro mechanism for the induction of ISGs by IRFs in an interferon independent manner to trigger a robust immune response was predicted in PPRV infection.
The RavA-ViaA Chaperone-Like System Interacts with and Modulates the Activity of the Fumarate Reductase Respiratory Complex.
Wong KS, Bhandari V, Janga SC, Houry WA. (2017 Jan 20)
J Mol Biol. pii: S0022-2836(16)30537-X.doi: 10.1016/j.jmb.2016.12.008
Toggle abstract
Regulatory ATPase variant A (RavA) is a MoxR AAA+ protein that functions together with a partner protein that we termed VWA interacting with AAA+ ATPase (ViaA) containing a von Willebrand Factor A domain. However, the functional role of RavA-ViaA in the cell is not yet well established. Here, we show that RavA-ViaA are functionally associated with anaerobic respiration in Escherichia coli through interactions with the fumarate reductase (Frd) electron transport complex. Expression analysis of ravA and viaA genes showed that both proteins are co-expressed with multiple anaerobic respiratory genes, many of which are regulated by the anaerobic transcriptional regulator Fnr. Consistently, the expression of both ravA and viaA was found to be dependent on Fnr in cells grown under oxygen-limiting condition. ViaA was found to physically interact with FrdA, the flavin-containing subunit of the Frd complex. Both RavA and the Fe-S-containing subunit of the Frd complex, FrdB, regulate this interaction. Importantly, Frd activity was observed to increase in the absence of RavA and ViaA. This indicates that RavA and ViaA modulate the activity of the Frd complex, signifying a potential regulatory chaperone-like function for RavA-ViaA during bacterial anaerobic respiration with fumarate as the terminal electron acceptor.
Differential Expression of miRNAs in Nontumor Liver Tissue of Patients With Hepatocellular Cancer Caused by Nonalcoholic Steatohepatitis Cirrhosis.
Liang T, Chalasani NP, Williams KE, Sarasani V, Janga SC, Vuppalanchi R. (2017 Mar)
Clin Gastroenterol Hepatol. pii: S1542-3565(16)30981-8.doi: 10.1016/j.cgh.2016.10.017
No abstract found.
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.
Cherukuri Y, Janga SC. (2016 Aug 22)
BMC Genomics. pii: 10.1186/s12864-016-2895-8.doi: 10.1186/s12864-016-2895-8
Toggle abstract
Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION(®) sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage.
ExSurv: A Web Resource for Prognostic Analyses of Exons Across Human Cancers Using Clinical Transcriptomes.
Hashemikhabir S, Budak G, Janga SC. (2016)
Cancer Inform. pii: cin-suppl.2-2016-017.doi: 10.4137/CIN.S39367
Toggle abstract
Survival analysis in biomedical sciences is generally performed by correlating the levels of cellular components with patients’ clinical features as a common practice in prognostic biomarker discovery. While the common and primary focus of such analysis in cancer genomics so far has been to identify the potential prognostic genes, alternative splicing – a posttranscriptional regulatory mechanism that affects the functional form of a protein due to inclusion or exclusion of individual exons giving rise to alternative protein products, has increasingly gained attention due to the prevalence of splicing aberrations in cancer transcriptomes. Hence, uncovering the potential prognostic exons can not only help in rationally designing exon-specific therapeutics but also increase specificity toward more personalized treatment options. To address this gap and to provide a platform for rational identification of prognostic exons from cancer transcriptomes, we developed ExSurv (https://exsurv.soic.iupui.edu), a web-based platform for predicting the survival contribution of all annotated exons in the human genome using RNA sequencing-based expression profiles for cancer samples from four cancer types available from The Cancer Genome Atlas. ExSurv enables users to search for a gene of interest and shows survival probabilities for all the exons associated with a gene and found to be significant at the chosen threshold. ExSurv also includes raw expression values across the cancer cohort as well as the survival plots for prognostic exons. Our analysis of the resulting prognostic exons across four cancer types revealed that most of the survival-associated exons are unique to a cancer type with few processes such as cell adhesion, carboxylic, fatty acid metabolism, and regulation of T-cell signaling common across cancer types, possibly suggesting significant differences in the posttranscriptional regulatory pathways contributing to prognosis.
Prediction and Validation of Transcription Factors Modulating the Expression of Sestrin3 Gene Using an Integrated Computational and Experimental Approach.
Srivastava R, Zhang Y, Xiong X, Zhang X, Pan X, Dong XC, Liangpunsakul S, Janga SC. (2016)
PLoS One. pii: PONE-D-16-00328.doi: 10.1371/journal.pone.0160228
Toggle abstract
SESN3 has been implicated in multiple biological processes including protection against oxidative stress, regulation of glucose and lipid metabolism. However, little is known about the factors and mechanisms controlling its gene expression at the transcriptional level. We performed in silico phylogenetic footprinting analysis of 5 kb upstream regions of a diverse set of human SESN3 orthologs for the identification of high confidence conserved binding motifs (BMo). We further analyzed the predicted BMo by a motif comparison tool to identify the TFs likely to bind these discovered motifs. Predicted TFs were then integrated with experimentally known protein-protein interactions and experimentally validated to delineate the important transcriptional regulators of SESN3. Our study revealed high confidence set of BMos (integrated with DNase I hypersensitivity sites) in the upstream regulatory regions of SESN3 that could be bound by transcription factors from multiple families including FOXOs, SMADs, SOXs, TCFs and HNF4A. TF-TF network analysis established hubs of interaction that include SMAD3, TCF3, SMAD2, HDAC2, SOX2, TAL1 and TCF12 as well as the likely protein complexes formed between them. We show using ChIP-PCR as well as over-expression and knock out studies that FOXO3 and SOX2 transcriptionally regulate the expression of SESN3 gene. Our findings provide an important roadmap to further our understanding on the regulation of SESN3.
Dissecting the expression relationships between RNA-binding proteins and their cognate targets in eukaryotic post-transcriptional regulatory networks.
Nishtala S, Neelamraju Y, Janga SC. (2016 May 10)
Sci Rep. pii: srep25711.doi: 10.1038/srep25711
Toggle abstract
RNA-binding proteins (RBPs) are pivotal in orchestrating several steps in the metabolism of RNA in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Here, we employed CLIP (cross-linking immunoprecipitation)-seq datasets for 60 human RBPs and RIP-ChIP (RNP immunoprecipitation-microarray) data for 69 yeast RBPs to construct a network of genome-wide RBP- target RNA interactions for each RBP. We show in humans that majority (~78%) of the RBPs are strongly associated with their target transcripts at transcript level while ~95% of the studied RBPs were also found to be strongly associated with expression levels of target transcripts when protein expression levels of RBPs were employed. At transcript level, RBP – RNA interaction data for the yeast genome, exhibited a strong association for 63% of the RBPs, confirming the association to be conserved across large phylogenetic distances. Analysis to uncover the features contributing to these associations revealed the number of target transcripts and length of the selected protein-coding transcript of an RBP at the transcript level while intensity of the CLIP signal, number of RNA-Binding domains, location of the binding site on the transcript, to be significant at the protein level. Our analysis will contribute to improved modelling and prediction of post-transcriptional networks.
RNA-binding proteins in eye development and disease: implication of conserved RNA granule components.
Dash S, Siddam AD, Barnum CE, Janga SC, Lachke SA. (2016 Jul)
Wiley Interdiscip Rev RNA.doi: 10.1002/wrna.1355
Toggle abstract
The molecular biology of metazoan eye development is an area of intense investigation. These efforts have led to the surprising recognition that although insect and vertebrate eyes have dramatically different structures, the orthologs or family members of several conserved transcription and signaling regulators such as Pax6, Six3, Prox1, and Bmp4 are commonly required for their development. In contrast, our understanding of posttranscriptional regulation in eye development and disease, particularly regarding the function of RNA-binding proteins (RBPs), is limited. We examine the present knowledge of RBPs in eye development in the insect model Drosophila as well as several vertebrate models such as fish, frog, chicken, and mouse. Interestingly, of the 42 RBPs that have been investigated for their expression or function in vertebrate eye development, 24 (~60%) are recognized in eukaryotic cells as components of RNA granules such as processing bodies, stress granules, or other specialized ribonucleoprotein (RNP) complexes. We discuss the distinct developmental and cellular events that may necessitate potential RBP/RNA granule-associated RNA regulon models to facilitate posttranscriptional control of gene expression in eye morphogenesis. In support of these hypotheses, three RBPs and RNP/RNA granule components Tdrd7, Caprin2, and Stau2 are linked to ocular developmental defects such as congenital cataract, Peters anomaly, and microphthalmia in human patients or animal models. We conclude by discussing the utility of interdisciplinary approaches such as the bioinformatics tool iSyTE (integrated Systems Tool for Eye gene discovery) to prioritize RBPs for deriving posttranscriptional regulatory networks in eye development and disease. WIREs RNA 2016, 7:527-557. doi: 10.1002/wrna.1355 For further resources related to this article, please visit the WIREs website.
Knowledge Discovery Using Big Data in Biomedical Systems.
Janga SC, Zhu D, Chen JY, Zaki MJ. (2015 Jul-Aug)
IEEE/ACM Trans Comput Biol Bioinform.doi: 10.1109/tcbb.2015.2454551
No abstract found.
Building integrated ontological knowledge structures with efficient approximation algorithms.
Xiang Y, Janga SC. (2015)
Biomed Res Int.doi: 10.1155/2015/501528
Toggle abstract
The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms.
OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes.
Chetal K, Janga SC. (2015)
Biomed Res Int.doi: 10.1155/2015/318217
Toggle abstract
Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.
A Framework for Identifying Genotypic Information from Clinical Records: Exploiting Integrated Ontology Structures to Transfer Annotations between ICD Codes and Gene Ontologies.
Hashemikhabir S, Xia R, Xiang Y, Janga SC. (2018 Jul-Aug)
IEEE/ACM Trans Comput Biol Bioinform.doi: 10.1109/TCBB.2015.2480056
Toggle abstract
Although some methods are proposed for automatic ontology generation, none of them address the issue of integrating large-scale heterogeneous biomedical ontologies. We propose a novel approach for integrating various types of ontologies efficiently and apply it to integrate International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9CM), and Gene Ontologies. This approach is one of the early attempts to quantify the associations among clinical terms (e.g., ICD9 codes) based on their corresponding genomic relationships. We reconstructed a merged tree for a partial set of GO and ICD9 codes and measured the performance of this tree in terms of associations’ relevance by comparing them with two well-known disease-gene datasets (i.e., MalaCards and Disease Ontology). Furthermore, we compared the genomic-based ICD9 associations to temporal relationships between them from electronic health records. Our analysis shows promising associations supported by both comparisons suggesting a high reliability. We also manually analyzed several significant associations and found promising support from literature.
Database of RNA binding protein expression and disease dynamics (READ DB).
Hashemikhabir S, Neelamraju Y, Janga SC. (2015)
Database (Oxford). pii: bav072.doi: 10.1093/database/bav072
Toggle abstract
RNA Binding Protein (RBP) Expression and Disease Dynamics database (READ DB) is a non-redundant, curated database of human RBPs. RBPs curated from different experimental studies are reported with their annotation, tissue-wide RNA and protein expression levels, evolutionary conservation, disease associations, protein-protein interactions, microRNA predictions, their known RNA recognition sequence motifs as well as predicted binding targets and associated functional themes, providing a one stop portal for understanding the expression, evolutionary trajectories and disease dynamics of RBPs in the context of post-transcriptional regulatory networks.
Differential miRNA Expression in Cells and Matrix Vesicles in Vascular Smooth Muscle Cells from Rats with Kidney Disease.
Chaturvedi P, Chen NX, O’Neill K, McClintick JN, Moe SM, Janga SC. (2015)
PLoS One. pii: PONE-D-15-06910.doi: 10.1371/journal.pone.0131589
Toggle abstract
Vascular calcification is a complex process and has been associated with aging, diabetes, chronic kidney disease (CKD). Although there have been several studies that examine the role of miRNAs (miRs) in bone osteogenesis, little is known about the role of miRs in vascular calcification and their role in the pathogenesis of vascular abnormalities. Matrix vesicles (MV) are known to play in important role in initiating vascular smooth muscle cell (VSMC) calcification. In the present study, we performed miRNA microarray analysis to identify the dysregulated miRs between MV and VSMC derived from CKD rats to understand the role of post-transcriptional regulatory networks governed by these miRNAs in vascular calcification and to uncover the differential miRNA content of MV. The percentage of miRNA to total RNA was increased in MV compared to VSMC. Comparison of expression profiles of miRNA by microarray demonstrated 33 miRs to be differentially expressed with the majority (~ 57%) of them down-regulated. Target genes controlled by differentially expressed miRNAs were identified utilizing two different complementary computational approaches Miranda and Targetscan to understand the functions and pathways that may be affected due to the production of MV from calcifying VSMC thereby contributing to the regulation of genes by miRs. We found several processes including vascular smooth muscle contraction, response to hypoxia and regulation of muscle cell differentiation to be enriched. Signaling pathways identified included MAP-kinase and wnt signaling that have previously been shown to be important in vascular calcification. In conclusion, our results demonstrate that miRs are concentrated in MV from calcifying VSMC, and that important functions and pathways are affected by the miRs dysregulation between calcifying VSMC and the MV they produce. This suggests that miRs may play a very important regulatory role in vascular calcification in CKD by controlling an extensive network of post-transcriptional targets.
The human RBPome: from genes and proteins to human disease.
Neelamraju Y, Hashemikhabir S, Janga SC. (2015 Sep 8)
J Proteomics. pii: S1874-3919(15)00230-4.doi: 10.1016/j.jprot.2015.04.031
Toggle abstract
RNA binding proteins (RBPs) play a central role in mediating post transcriptional regulation of genes. However less is understood about them and their regulatory mechanisms. In this study, we construct a catalogue of 1344 experimentally confirmed RBPs. The domain architecture of RBPs enabled us to classify them into three groups – Classical (29%), Non-classical (19%) and unclassified (52%). A higher percentage of proteins with unclassified domains reveals the presence of various uncharacterised motifs that can potentially bind RNA. RBPs were found to be highly disordered compared to Non-RBPs (p<2.2e-16, Fisher's exact test), suggestive of a dynamic regulatory role of RBPs in cellular signalling and homeostasis. Evolutionary analysis in 62 different species showed that RBPs are highly conserved compared to Non-RBPs (p<2.2e-16, Wilcox-test), reflecting the conservation of various biological processes like mRNA splicing and ribosome biogenesis. The expression patterns of RBPs from human proteome map revealed that ~40% of them are ubiquitously expressed and ~60% are tissue-specific. RBPs were also seen to be highly associated with several neurological disorders, cancer and inflammatory diseases. Anatomical contexts like B cells, T-cells, foetal liver and foetal brain were found to be strongly enriched for RBPs, implying a prominent role of RBPs in immune responses and different developmental stages. The catalogue and meta-analysis presented here should form a foundation for furthering our understanding of RBPs and the cellular networks they control, in years to come. This article is part of a Special Issue entitled: Proteomics in India.
Genomic analysis of host – Peste des petits ruminants vaccine viral transcriptome uncovers transcription factors modulating immune regulatory pathways.
Manjunath S, Kumar GR, Mishra BP, Mishra B, Sahoo AP, Joshi CG, Tiwari AK, Rajak KK, Janga SC. (2015 Feb 24)
Vet Res. pii: s13567-015-0153-8.doi: 10.1186/s13567-015-0153-8
Toggle abstract
Peste des petits ruminants (PPR), is an acute transboundary viral disease of economic importance, affecting goats and sheep. Mass vaccination programs around the world resulted in the decline of PPR outbreaks. Sungri 96 is a live attenuated vaccine, widely used in Northern India against PPR. This vaccine virus, isolated from goat works efficiently both in sheep and goat. Global gene expression changes under PPR vaccine virus infection are not yet well defined. Therefore, in this study we investigated the host-vaccine virus interactions by infecting the peripheral blood mononuclear cells isolated from goat with PPRV (Sungri 96 vaccine virus), to quantify the global changes in the transcriptomic signature by RNA-sequencing. Viral genome of Sungri 96 vaccine virus was assembled from the PPRV infected transcriptome confirming the infection and demonstrating the feasibility of building a complete non-host genome from the blood transcriptome. Comparison of infected transcriptome with control transcriptome revealed 985 differentially expressed genes. Functional analysis showed enrichment of immune regulatory pathways under PPRV infection. Key genes involved in immune system regulation, spliceosomal and apoptotic pathways were identified to be dysregulated. Network analysis revealed that the protein – protein interaction network among differentially expressed genes is significantly disrupted in infected state. Several genes encoding TFs that govern immune regulatory pathways were identified to co-regulate the differentially expressed genes. These data provide insights into the host – PPRV vaccine virus interactome for the first time. Our findings suggested dysregulation of immune regulatory pathways and genes encoding Transcription Factors (TFs) that govern these pathways in response to viral infection.
Uncovering RNA binding proteins associated with age and gender during liver maturation.
Chaturvedi P, Neelamraju Y, Arif W, Kalsotra A, Janga SC. (2015 Mar 31)
Sci Rep. pii: srep09512.doi: 10.1038/srep09512
Toggle abstract
In the present study, we perform an association analysis focusing on the expression changes of 1344 RNA Binding proteins (RBPs) as a function of age and gender in human liver. We identify 88 and 45 RBPs to be significantly associated with age and gender respectively. Experimental verification of several of the predicted associations in mice confirmed our findings. Our results suggest that a small fraction of the gender-associated RBPs (~40%) are expressed higher in males than females. Altogether, these observations show that several of these RBPs are important and conserved regulators in maintaining liver function. Further analysis of the protein interaction network of RBPs associated with age and gender based on the centrality measures like degree, betweenness and closeness revealed that several of these RBPs might be prominent players in aging liver and impart gender specific alterations in gene expression via the formation of protein complexes. Indeed, both age and gender-associated RBPs in liver were found to show significantly higher clustering coefficients and network centrality measures compared to non-associated RBPs. The compendium of RBPs and this study will help us gain insight into the role of post-transcriptional regulatory molecules in aging and gender specific expression of genes.
Prognostic impact of HOTAIR expression is restricted to ER-negative breast cancers.
Gökmen-Polar Y, Vladislav IT, Neelamraju Y, Janga SC, Badve S. (2015 Mar 5)
Sci Rep. pii: srep08765.doi: 10.1038/srep08765
Toggle abstract
Expression of HOX transcript antisense intergenic RNA (HOTAIR), a large intergenic noncoding RNA (lincRNA), has been described as a metastases-associated lincRNA in various cancers including breast, liver and colon cancer cancers. We sought to determine if expression of HOTAIR could be used as a surrogate for assessing nodal metastases and evaluated RNA in situ hybridization (RNA-ISH) assay in a tissue microarray constructed from 133 breast cancer patients. The prognostic value of HOTAIR was further validated in large cohorts using The Cancer Genome Atlas (TCGA) breast cancer subjects. RNA-ISH analysis was successful in 94 cases (17% cases scored 0, 32.9% scored 1, 30.8% scored 2, and 19.1% scored 3). The expression of HOTAIR did not correlate with nodal metastasis regardless of the scoring intensity or with other study parameters (age, tumor size and grade, expression status). Further analysis of TCGA dataset showed that HOTAIR expression was lower in ductal carcinomas but higher in ER-negative tumors. Overexpression of HOTAIR was not associated with nodal metastases or prognosis in ER-positive patients. Its function as a poor prognostic indicator in ER-negative patients was restricted to node-positive patients. HOTAIR appears to be a marker for lymphatic metastases rather than hematogenous metastases in ER-negative patients.
Expression levels of SF3B3 correlate with prognosis and endocrine resistance in estrogen receptor-positive breast cancer.
Gökmen-Polar Y, Neelamraju Y, Goswami CP, Gu X, Nallamothu G, Janga SC, Badve S. (2015 May)
Mod Pathol. pii: S0893-3952(22)01430-2.doi: 10.1038/modpathol.2014.146
Toggle abstract
De novo or acquired resistance to endocrine therapy limits its utility in a significant number of estrogen receptor-positive (ER-positive) breast cancers. It is crucial to identify novel targets for therapeutic intervention and improve the success of endocrine therapies. Splicing factor 3b, subunit 1 (SF3B1) mutations are described in luminal breast cancer albeit in low frequency. In this study, we evaluated the role of SF3B1 and SF3B3, critical parts of the SF3b splicing complex, in ER-positive endocrine resistance. To ascertain the role of SF3B1/SF3B3 in endocrine resistance, their expression levels were evaluated in ER-positive/endocrine-resistant cell lines (MCF-7/LCC2 and MCF-7/LCC9) using a real-time quantitative reverse transcription PCR (qRT-PCR). To further determine their clinical relevance, expression analysis was performed in a cohort of 60 paraffin-embedded ER-positive, node-negative breast carcinomas with low, intermediate, and high Oncotype DX recurrence scores. Expression levels of SF3B1 and SF3B3 and their prognostic value were validated in large cohorts using publicly available gene expression data sets including The Cancer Genome Atlas. SF3B1 and SF3B3 levels were significantly increased in ERα-positive cells with acquired tamoxifen (MCF-7/LCC2; both P<0.0002) and fulvestrant/tamoxifen resistance (MCF-7/LCC9; P=0.008 for SF3B1 and P=0.0006 for SF3B3). Expression levels of both MCF-7/LCC2 and MCF-7/LCC9 were not affected by additional treatments with E2 and/or tamoxifen. Furthermore, qRT-PCR analysis confirmed that SF3B3 expression is significantly upregulated in Oncotype DX high-risk groups when compared with low risk (P=0.019). Similarly, in publicly available breast cancer gene expression data sets, overexpression of SF3B3, but not SF3B1, was significantly correlated with overall survival. Furthermore, the correlation was significant in ER-positive, but not in ER-negative tumors.This is the first study to document the role of SF3B3 in endocrine resistance and prognosis in ER-positive breast cancer. Potential strategies for therapeutic targeting of the splicing mechanism(s) need to be evaluated.
Role of lncRNAs in health and disease-size and shape matter.
Mohanty V, Gökmen-Polar Y, Badve S, Janga SC. (2015 Mar)
Brief Funct Genomics. pii: elu034.doi: 10.1093/bfgp/elu034
Toggle abstract
Most of the mammalian genome including a large fraction of the non-protein coding transcripts has been shown to be transcribed. Studies related to these non-coding RNA molecules have predominantly focused on smaller molecules like microRNAs. In contrast, long non-coding RNAs (lncRNAs) have long been considered to be transcriptional noise. Accumulating evidence suggests that lncRNAs are involved in key cellular and developmental processes. Several critical questions regarding functions and properties of lncRNAs and their circular forms remain to be answered. Increasing evidence from high-throughput sequencing screens also suggests the involvement of lncRNAs in diseases such as cancer, although the underlying mechanisms still need to be elucidated. Here, we discuss the current state of research in the field of lncRNAs, questions that need to be addressed in light of recent genome-wide studies documenting the landscape of lncRNAs, their functional roles and involvement in diseases. We posit that with the availability of high-throughput data sets it is not only possible to improve methods for predicting lncRNAs but will also facilitate our ability to elucidate their functions and phenotypes by using integrative approaches.
Primate vaginal microbiomes exhibit species specificity without universal Lactobacillus dominance.
Yildirim S, Yeoman CJ, Janga SC, Thomas SM, Ho M, Leigh SR, Primate Microbiome Consortium, White BA, Wilson BA, Stumpf RM. (2014 Dec)
ISME J. pii: ismej201490.doi: 10.1038/ismej.2014.90
Toggle abstract
Bacterial communities colonizing the reproductive tracts of primates (including humans) impact the health, survival and fitness of the host, and thereby the evolution of the host species. Despite their importance, we currently have a poor understanding of primate microbiomes. The composition and structure of microbial communities vary considerably depending on the host and environmental factors. We conducted comparative analyses of the primate vaginal microbiome using pyrosequencing of the 16S rRNA genes of a phylogenetically broad range of primates to test for factors affecting the diversity of primate vaginal ecosystems. The nine primate species included: humans (Homo sapiens), yellow baboons (Papio cynocephalus), olive baboons (Papio anubis), lemurs (Propithecus diadema), howler monkeys (Alouatta pigra), red colobus (Piliocolobus rufomitratus), vervets (Chlorocebus aethiops), mangabeys (Cercocebus atys) and chimpanzees (Pan troglodytes). Our results indicated that all primates exhibited host-specific vaginal microbiota and that humans were distinct from other primates in both microbiome composition and diversity. In contrast to the gut microbiome, the vaginal microbiome showed limited congruence with host phylogeny, and neither captivity nor diet elicited substantial effects on the vaginal microbiomes of primates. Permutational multivariate analysis of variance and Wilcoxon tests revealed correlations among vaginal microbiota and host species-specific socioecological factors, particularly related to sexuality, including: female promiscuity, baculum length, gestation time, mating group size and neonatal birth weight. The proportion of unclassified taxa observed in nonhuman primate samples increased with phylogenetic distance from humans, indicative of the existence of previously unrecognized microbial taxa. These findings contribute to our understanding of host-microbe variation and coevolution, microbial biogeography, and disease risk, and have important implications for the use of animal models in studies of human sexual and reproductive diseases.
An intricate network of conserved DNA upstream motifs and associated transcription factors regulate the expression of uromodulin gene.
Srivastava R, Micanovic R, El-Achkar TM, Janga SC. (2014 Sep)
J Urol. pii: S0022-5347(14)00353-X.doi: 10.1016/j.juro.2014.02.095
Toggle abstract
Uromodulin is a kidney specific glycoprotein whose expression can modulate kidney homeostasis. However, the set of sequence specific transcription factors that regulate the uromodulin gene UMOD and their upstream binding locations are not well characterized. We built a high resolution map of its transcriptional regulation.
Whole-genome sequence of sungri/96 vaccine strain of peste des petits ruminants virus.
Siddappa M, Gandham RK, Sarsani V, Mishra BP, Mishra B, Joshi CG, Sahoo AP, Tiwari AK, Janga SC. (2014 Feb 13)
Genome Announc. pii: 2/1/e00056-14.doi: 10.1128/genomeA.00056-14
Toggle abstract
We report the complete genome sequence of the Sungri/96 vaccine strain of peste des petits ruminants virus (PPRV). The whole-genome nucleotide sequence has 89 to 99% identity with the available PPRV genome sequences in the NCBI database. This study helps to understand the epidemiological and molecular characteristics of the Sungri/96 strain.
Dissecting the expression landscape of RNA-binding proteins in human cancers.
Kechavarzi B, Janga SC. (2014 Jan 10)
Genome Biol. pii: gb-2014-15-1-r14.doi: 10.1186/gb-2014-15-1-r14
Toggle abstract
RNA-binding proteins (RBPs) play important roles in cellular homeostasis by controlling gene expression at the post-transcriptional level.
Diversity and abundance of phosphonate biosynthetic genes in nature.
Yu X, Doroghazi JR, Janga SC, Zhang JK, Circello B, Griffin BM, Labeda DP, Metcalf WW. (2013 Dec 17)
Proc Natl Acad Sci U S A. pii: 1315107110.doi: 10.1073/pnas.1315107110
Toggle abstract
Phosphonates, molecules containing direct carbon-phosphorus bonds, compose a structurally diverse class of natural products with interesting and useful biological properties. Although their synthesis in protozoa was discovered more than 50 y ago, the extent and diversity of phosphonate production in nature remains poorly characterized. The rearrangement of phosphoenolpyruvate (PEP) to phosphonopyruvate, catalyzed by the enzyme PEP mutase (PepM), is shared by the vast majority of known phosphonate biosynthetic pathways. Thus, the pepM gene can be used as a molecular marker to examine the occurrence and abundance of phosphonate-producing organisms. Based on the presence of this gene, phosphonate biosynthesis is common in microbes, with ~5% of sequenced bacterial genomes and 7% of genome equivalents in metagenomic datasets carrying pepM homologs. Similarly, we detected the pepM gene in ~5% of random actinomycete isolates. The pepM-containing gene neighborhoods from 25 of these isolates were cloned, sequenced, and compared with those found in sequenced genomes. PEP mutase sequence conservation is strongly correlated with conservation of other nearby genes, suggesting that the diversity of phosphonate biosynthetic pathways can be predicted by examining PEP mutase diversity. We used this approach to estimate the range of phosphonate biosynthetic pathways in nature, revealing dozens of discrete groups in pepM amplicons from local soils, whereas hundreds were observed in metagenomic datasets. Collectively, our analyses show that phosphonate biosynthesis is both diverse and relatively common in nature, suggesting that the role of phosphonate molecules in the biosphere may be more important than is often recognized.
Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome.
Zhao H, Yang Y, Janga SC, Kao CC, Zhou Y. (2014 Apr)
Proteins.doi: 10.1002/prot.24441
Toggle abstract
Detecting protein-RNA interactions is challenging both experimentally and computationally because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified. Here, a template-based, function-prediction technique SPOT-Seq for RBPs is applied to human proteome and its result is validated by a recent proteomic experimental discovery of 860 mRNA-binding proteins (mRBPs). The coverage (or sensitivity) is 42.6% for 1217 known RBPs annotated in the Gene Ontology and 43.6% for 860 newly discovered human mRBPs. Consistent sensitivity indicates the robust performance of SPOT-Seq for predicting RBPs. More importantly, SPOT-Seq detects 2418 novel RBPs in human proteome, 291 of which were validated by the newly discovered mRBP set. Among 291 validated novel RBPs, 61 are not homologous to any known RBPs. Successful validation of predicted novel RBPs permits us to further analysis of their phenotypic roles in disease pathways. The dataset of 2418 predicted novel RBPs along with confidence levels and complex structures is available at http://sparks-lab.org (in publications) for experimental confirmations and hypothesis generation.
Relationship between differential hepatic microRNA expression and decreased hepatic cytochrome P450 3A activity in cirrhosis.
Vuppalanchi R, Liang T, Goswami CP, Nalamasu R, Li L, Jones D, Wei R, Liu W, Sarasani V, Janga SC, Chalasani N. (2013)
PLoS One. pii: PONE-D-13-19922.doi: 10.1371/journal.pone.0074471
Toggle abstract
Liver cirrhosis is associated with decreased hepatic cytochrome P4503A (CYP3A) activity but the pathogenesis of this phenomenon is not well elucidated. In this study, we examined if certain microRNAs (miRNA) are associated with decreased hepatic CYP3A activity in cirrhosis.
From specific to global analysis of posttranscriptional regulation in eukaryotes: posttranscriptional regulatory networks.
Janga SC. (2012 Nov)
Brief Funct Genomics. pii: els046.doi: 10.1093/bfgp/els046
Toggle abstract
Regulation of gene expression occurs at several levels in eukaryotic organisms and is a highly controlled process. Although RNAs have been traditionally viewed as passive molecules in the pathway from transcription to translation, there is mounting evidence that their metabolism is controlled by a class of proteins called RNA-binding proteins (RBPs), as well as a number of small RNAs. In this review, I provide an overview of the recent developments in our understanding of the repertoire of RBPs across diverse model systems, and discuss the computational and experimental approaches currently available for the construction of posttranscriptional networks governed by them. I also present an overview of the different roles played by RBPs in the cellular context, based on their cis-regulatory modules identified in the literature and discuss how their interplay can result in the dynamic, spatial and tissue-specific expression maps of RNAs. I finally present the concept of posttranscriptional network of RBPs and their cognate RNA targets and discuss their cross-talk with other important posttranscriptional regulatory molecules such as microRNAs s, resulting in diverse functional network motifs. I argue that with rapid developments in the genome-wide elucidation of posttranscriptional networks it would not only be possible to gain a deeper understanding of regulation at a level that has been under-appreciated in the past, but would also allow us to use the newly developed high-throughput approaches to interrogate the prevalence of these phenomena in different states, and thereby study their relevance to physiology and disease across organisms.
The RNA-binding protein Musashi1 affects medulloblastoma growth via a network of cancer-related genes and is an indicator of poor prognosis.
Vo DT, Subramaniam D, Remke M, Burton TL, Uren PJ, Gelfond JA, de Sousa Abreu R, Burns SC, Qiao M, Suresh U, Korshunov A, Dubuc AM, Northcott PA, Smith AD, Pfister SM, Taylor MD, Janga SC, Anant S, Vogel C, Penalva LO. (2012 Nov)
Am J Pathol. pii: S0002-9440(12)00601-3.doi: 10.1016/j.ajpath.2012.07.031
Toggle abstract
Musashi1 (Msi1) is a highly conserved RNA-binding protein that is required during the development of the nervous system. Msi1 has been characterized as a stem cell marker, controlling the balance between self-renewal and differentiation, and has also been implicated in tumorigenesis, being highly expressed in multiple tumor types. We analyzed Msi1 expression in a large cohort of medulloblastoma samples and found that Msi1 is highly expressed in tumor tissue compared with normal cerebellum. Notably, high Msi1 expression levels proved to be a sign of poor prognosis. Msi1 expression was determined to be particularly high in molecular subgroups 3 and 4 of medulloblastoma. We determined that Msi1 is required for tumorigenesis because inhibition of Msi1 expression by small-interfering RNAs reduced the growth of Daoy medulloblastoma cells in xenografts. To characterize the participation of Msi1 in medulloblastoma, we conducted different high-throughput analyses. Ribonucleoprotein immunoprecipitation followed by microarray analysis (RIP-chip) was used to identify mRNA species preferentially associated with Msi1 protein in Daoy cells. We also used cluster analysis to identify genes with similar or opposite expression patterns to Msi1 in our medulloblastoma cohort. A network study identified RAC1, CTGF, SDCBP, SRC, PRL, and SHC1 as major nodes of an Msi1-associated network. Our results suggest that Msi1 functions as a regulator of multiple processes in medulloblastoma formation and could become an important therapeutic target.
Extensive cross-talk and global regulators identified from an analysis of the integrated transcriptional and signaling network in Escherichia coli.
Antiqueira L, Janga SC, Costa Lda F. (2012 Nov)
Mol Biosyst.doi: 10.1039/c2mb25279a
Toggle abstract
To understand the regulatory dynamics of transcription factors (TFs) and their interplay with other cellular components we have integrated transcriptional, protein-protein and the allosteric or equivalent interactions which mediate the physiological activity of TFs in Escherichia coli. To study this integrated network we computed a set of network measurements followed by principal component analysis (PCA), investigated the correlations between network structure and dynamics, and carried out a procedure for motif detection. In particular, we show that outliers identified in the integrated network based on their network properties correspond to previously characterized global transcriptional regulators. Furthermore, outliers are highly and widely expressed across conditions, thus supporting their global nature in controlling many genes in the cell. Motifs revealed that TFs not only interact physically with each other but also obtain feedback from signals delivered by signaling proteins supporting the extensive cross-talk between different types of networks. Our analysis can lead to the development of a general framework for detecting and understanding global regulatory factors in regulatory networks and reinforces the importance of integrating multiple types of interactions in underpinning the interrelationships between them.
Synthesis of methylphosphonic acid by marine microbes: a source for methane in the aerobic ocean.
Metcalf WW, Griffin BM, Cicchillo RM, Gao J, Janga SC, Cooke HA, Circello BT, Evans BS, Martens-Habbena W, Stahl DA, van der Donk WA. (2012 Aug 31)
Science. pii: 337/6098/1104.doi: 10.1126/science.1219875
Toggle abstract
Relative to the atmosphere, much of the aerobic ocean is supersaturated with methane; however, the source of this important greenhouse gas remains enigmatic. Catabolism of methylphosphonic acid by phosphorus-starved marine microbes, with concomitant release of methane, has been suggested to explain this phenomenon, yet methylphosphonate is not a known natural product, nor has it been detected in natural systems. Further, its synthesis from known natural products would require unknown biochemistry. Here we show that the marine archaeon Nitrosopumilus maritimus encodes a pathway for methylphosphonate biosynthesis and that it produces cell-associated methylphosphonate esters. The abundance of a key gene in this pathway in metagenomic data sets suggests that methylphosphonate biosynthesis is relatively common in marine microbes, providing a plausible explanation for the methane paradox.
DNA sequence preferences of transcriptional activators correlate more strongly than repressors with nucleosomes.
Charoensawan V, Janga SC, Bulyk ML, Babu MM, Teichmann SA. (2012 Jul 27)
Mol Cell. pii: S1097-2765(12)00553-9.doi: 10.1016/j.molcel.2012.06.028
Toggle abstract
Transcription factors (TFs) and histone octamers are two abundant classes of DNA binding proteins that coordinate the transcriptional program in cells. Detailed studies of individual TFs have shown that TFs bind to nucleosome-occluded DNA sequences and induce nucleosome disruption/repositioning, while recent global studies suggest this is not the only mechanism used by all TFs. We have analyzed to what extent the intrinsic DNA binding preferences of TFs and histones play a role in determining nucleosome occupancy, in addition to nonintrinsic factors such as the enzymatic activity of chromatin remodelers. The majority of TFs in budding yeast have an intrinsic sequence preference overlapping with nucleosomal histones. TFs with intrinsic DNA binding properties highly correlated with those of histones tend to be associated with gene activation and might compete with histones to bind to genomic DNA. Consistent with this, we show that activators induce more nucleosome disruption upon transcriptional activation than repressors.
Construction, structure and dynamics of post-transcriptional regulatory network directed by RNA-binding proteins.
Janga SC, Mittal N. (2011)
Adv Exp Med Biol.doi: 10.1007/978-1-4614-0332-6_7
Toggle abstract
Gene expression is a highly controlled process which is known to occur at several levels in eukaryotic organisms. Although messenger RNAs have been traditionally viewed as passive molecules in the pathway from transcription to translation, there is increasing evidence that their metabolism is controlled by a class of proteins called RNA-binding proteins (RBPs). In this chapter, we provide an overview of the recent developments in our understanding of the repertoire of RBPs across diverse model systems and discuss the approaches currently available for the construction of post-transcriptional networks governed by them. We also present the first analysis of the network properties of a post-transcriptional system in a model eukaryote using currently available data and discuss the implications of understanding the dynamic properties of this important class of regulatory molecules as more data detailing their dynamic, spatial and tissue-specific maps across diverse model systems accumulates. We argue that such developments would not only allow us to gain a deeper understanding of regulation at a level that has been under-appreciated over the past decades, but would also allow us to use the newly developed high-throughput approaches to interrogate the prevalence of these phenomena in different states and thereby study their relevance to physiology and disease across organisms.
MicroRNAs as post-transcriptional machines and their interplay with cellular networks.
Janga SC, Vallabhaneni S. (2011)
Adv Exp Med Biol.doi: 10.1007/978-1-4614-0332-6_4
Toggle abstract
Gene expression is a highly controlled process which is known to occur at several levels in eukaryotic organisms. Although RNAs have been traditionally viewed as passive molecules in the pathway from transcription to translation, there is increasing evidence that their metabolism is controlled by a class of small noncoding RNAs called MicroRNAs (miRNAs). MicroRNAs (miRNAs) control essential gene regulatory pathways in both plants and animals however our understanding about their function, evolution and interplay with other cellular components is only beginning to be elucidated. In this chapter, we provide an overview of the recent developments in our understanding of this class of RNAs from diverse perspectives including biogenesis, mechanism of their function, evolution of their clusters, and discuss the approaches currently available for the construction of post-transcriptional networks governed by them. We also present our current understanding on these post-transcriptional networks in the context other cellular networks. We finally argue that such developments would not only allow us to gain a deeper understanding of regulation at a level that has been under-appreciated over the past decades, but would also allow us to use the newly developed high-throughput approaches to interrogate the prevalence of these phenomena in different states, and thereby exploit the functions of these RNA molecules for therapeutic advantage in higher eukaryotes.
Structural coupling between RNA polymerase composition and DNA supercoiling in coordinating transcription: a global role for the omega subunit?
Geertz M, Travers A, Mehandziska S, Sobetzko P, Chandra-Janga S, Shimamoto N, Muskhelishvili G. (2011)
mBio. pii: mBio.00034-11.doi: 10.1128/mBio.00034-11
Toggle abstract
In growing bacterial cells, the global reorganization of transcription is associated with alterations of RNA polymerase composition and the superhelical density of the DNA. However, the existence of any regulatory device coordinating these changes remains elusive. Here we show that in an exponentially growing Escherichia coli rpoZ mutant lacking the polymerase ω subunit, the impact of the Eσ(38) holoenzyme on transcription is enhanced in parallel with overall DNA relaxation. Conversely, overproduction of σ(70) in an rpoZ mutant increases both overall DNA supercoiling and the transcription of genes utilizing high negative superhelicity. We further show that transcription driven by the Eσ(38) and Eσ(70) holoenzymes from cognate promoters induces distinct superhelical densities of plasmid DNA in vivo. We thus demonstrate a tight coupling between polymerase holoenzyme composition and the supercoiling regimen of genomic transcription. Accordingly, we identify functional clusters of genes with distinct σ factor and supercoiling preferences arranging alternative transcription programs sustaining bacterial exponential growth. We propose that structural coupling between DNA topology and holoenzyme composition provides a basic regulatory device for coordinating genome-wide transcription during bacterial growth and adaptation. IMPORTANCE Understanding the mechanisms of coordinated gene expression is pivotal for developing knowledge-based approaches to manipulating bacterial physiology, which is a problem of central importance for applications of biotechnology and medicine. This study explores the relationships between variations in the composition of the transcription machinery and chromosomal DNA topology and suggests a tight interdependence of these two variables as the major coordinating principle of gene regulation. The proposed structural coupling between the transcription machinery and DNA topology has evolutionary implications and suggests a new methodology for studying concerted alterations of gene expression during normal and pathogenic growth both in bacteria and in higher organisms.
Transcriptional profiling of fetal hypothalamic TRH neurons.
Guerra-Crespo M, Pérez-Monter C, Janga SC, Castillo-Ramírez S, Gutiérrez-Rios RM, Joseph-Bravo P, Pérez-Martínez L, Charli JL. (2011 May 10)
BMC Genomics. pii: 1471-2164-12-222.doi: 10.1186/1471-2164-12-222
Toggle abstract
During murine hypothalamic development, different neuroendocrine cell phenotypes are generated in overlapping periods; this suggests that cell-type specific developmental programs operate to achieve complete maturation. A balance between programs that include cell proliferation, cell cycle withdrawal as well as epigenetic regulation of gene expression characterizes neurogenesis. Thyrotropin releasing hormone (TRH) is a peptide that regulates energy homeostasis and autonomic responses. To better understand the molecular mechanisms underlying TRH neuron development, we performed a genome wide study of its transcriptome during fetal hypothalamic development.
Interplay between posttranscriptional and posttranslational interactions of RNA-binding proteins.
Mittal N, Scherrer T, Gerber AP, Janga SC. (2011 Jun 10)
J Mol Biol. pii: S0022-2836(11)00353-6.doi: 10.1016/j.jmb.2011.03.064
Toggle abstract
RNA-binding proteins (RBPs) play important roles in the posttranscriptional control of gene expression. However, our understanding of how RBPs interact with each other at different regulatory levels to coordinate the RNA metabolism of the cell is rather limited. Here, we construct the posttranscriptional regulatory network among 69 experimentally studied RBPs in yeast to show that more than one-third of the RBPs autoregulate their expression at the posttranscriptional level and demonstrate that autoregulatory RBPs show reduced protein noise with a tendency to encode for hubs in this network. We note that in- and outdegrees in the posttranscriptional RBP-RBP regulatory network exhibit gaussian and scale-free distributions, respectively. This network was also densely interconnected with extensive cross-talk between RBPs belonging to different posttranscriptional steps, regulating varying numbers of cellular RNA targets. We show that feed-forward loops and superposed feed-forward/feedback loops are the most significant three-node subgraphs in this network. Analysis of the corresponding protein-protein interaction (posttranslational) network revealed that it is more modular than the posttranscriptional regulatory network. There is significant overlap between the regulatory and protein-protein interaction networks, with RBPs that potentially control each other at the posttranscriptional level tending to physically interact and being part of the same ribonucleoprotein (RNP) complex. Our observations put forward a model wherein RBPs could be classified into those that can stably interact with a limited number of protein partners, forming stable RNP complexes, and others that form transient hubs, having the ability to interact with multiple RBPs forming many RNPs in the cell.
A screen for RNA-binding proteins in yeast indicates dual functions for many enzymes.
Scherrer T, Mittal N, Janga SC, Gerber AP. (2010 Nov 11)
PLoS One.doi: 10.1371/journal.pone.0015499
Toggle abstract
Hundreds of RNA-binding proteins (RBPs) control diverse aspects of post-transcriptional gene regulation. To identify novel and unconventional RBPs, we probed high-density protein microarrays with fluorescently labeled RNA and selected 200 proteins that reproducibly interacted with different types of RNA from budding yeast Saccharomyces cerevisiae. Surprisingly, more than half of these proteins represent previously known enzymes, many of them acting in metabolism, providing opportunities to directly connect intermediary metabolism with posttranscriptional gene regulation. We mapped the RNA targets for 13 proteins identified in this screen and found that they were associated with distinct groups of mRNAs, some of them coding for functionally related proteins. We also found that overexpression of the enzyme Map1 negatively affects the expression of experimentally defined mRNA targets. Our results suggest that many proteins may associate with mRNAs and possibly control their fates, providing dense connections between different layers of cellular regulation.
Comparative analysis of gene expression and regulation of replicative aging associated genes in S. cerevisiae.
Dhami SP, Mittal N, Janga SC, Roy N. (2011 Feb)
Mol Biosyst.doi: 10.1039/c0mb00161a
Toggle abstract
Aging is a multi-factorial and complex phenomenon. Saccharomyces cerevisiae is developed as a model of aging and has been widely studied in order to understand the mechanism of lifespan regulation. A large number of high-throughput studies were conducted to identify the genes which modulate lifespan. These studies provide the list of genes that regulates the lifespan in yeast; however the regulation of these aging associated genes had not been fully understood. In this study, we have shown that deletion of the genes which increase the replicative lifespan (RLS) of yeast show discrete expression patterns when compared with the genes that, on deletion, cause a decrease in lifespan. Expression of longlived (LL) genes decreases as the cell progresses from mid log to stationary phase, whereas expression of shortlived (SL) genes remains unchanged. This distinct expression of LL and SL gene-sets suggests their differential gene regulation. Further analysis of transcriptional regulation by transcription factors and epigenetic regulators (acetylation and methylation) suggests that this differential expression of the two gene-sets is due to their differential epigenetic regulations, rather than regulation by transcription factors. These results accentuate the importance of epigenetic modifications in aging. We deduce that future focused studies on epigenetic modification regulation will help lead to a better understanding of the aging process.
Genome-wide analysis of mRNA decay patterns during early Drosophila development.
Thomsen S, Anders S, Janga SC, Huber W, Alonso CR. (2010)
Genome Biol. pii: gb-2010-11-9-r93.doi: 10.1186/gb-2010-11-9-r93
Toggle abstract
The modulation of mRNA levels across tissues and time is key for the establishment and operation of the developmental programs that transform the fertilized egg into a fully formed embryo. Although the developmental mechanisms leading to differential mRNA synthesis are heavily investigated, comparatively little attention is given to the processes of mRNA degradation and how these relate to the molecular programs controlling development.
Dissecting the expression patterns of transcription factors across conditions using an integrated network-based approach.
Janga SC, Contreras-Moreira B. (2010 Nov)
Nucleic Acids Res. pii: gkq612.doi: 10.1093/nar/gkq612
Toggle abstract
In prokaryotes, regulation of gene expression is predominantly controlled at the level of transcription. Transcription in turn is mediated by a set of DNA-binding factors called transcription factors (TFs). In this study, we map the complete repertoire of ∼300 TFs of the bacterial model, Escherichia coli, onto gene expression data for a number of nonredundant experimental conditions and show that TFs are generally expressed at a lower level than other gene classes. We also demonstrate that different conditions harbor varying number of active TFs, with an average of about 15% of the total repertoire, with certain stress and drug-induced conditions exhibiting as high as one-third of the collection of TFs. Our results also show that activators are more frequently expressed than repressors, indicating that activation of promoters might be a more common phenomenon than repression in bacteria. Finally, to understand the association of TFs with different conditions and to elucidate their dynamic interplay with other TFs, we develop a network-based framework to identify TFs which act as markers, defined as those which are responsible for condition-specific transcriptional rewiring. This approach allowed us to pinpoint several marker TFs as being central in various specialized conditions such as drug induction or growth condition variations, which we discuss in light of previously reported experimental findings. Further analysis showed that a majority of identified markers effectively control the expression of their regulons and, in general, transcriptional programs of most conditions can be effectively rewired by a very small number of TFs. It was also found that closeness is a key centrality measure which can aid in the successful identification of marker TFs in regulatory networks. Our results suggest the utility of the network-based approaches developed in this study to be applicable for understanding other interactomic data sets.
Identification and genomic analysis of transcription factors in archaeal genomes exemplifies their functional architecture and evolutionary origin.
Pérez-Rueda E, Janga SC. (2010 Jun)
Mol Biol Evol. pii: msq033.doi: 10.1093/molbev/msq033
Toggle abstract
Archaea, which represent a large fraction of the phylogenetic diversity of organisms, are prokaryotes with eukaryote-like basal transcriptional machinery. This organization makes the study of their DNA-binding transcription factors (TFs) and their transcriptional regulatory networks particularly interesting. In addition, there are limited experimental data regarding their TFs. In this work, 3,918 TFs were identified and exhaustively analyzed in 52 archaeal genomes. TFs represented less than 5% of the gene products in all the studied species comparable with the number of TFs identified in parasites or intracellular pathogenic bacteria, suggesting a deficit in this class of proteins. A total of 75 families were identified, of which HTH_3, AsnC, TrmB, and ArsR families were universally and abundantly identified in all the archaeal genomes. We found that archaeal TFs are significantly small compared with other protein-coding genes in archaea as well as bacterial TFs, suggesting that a large fraction of these small-sized TFs could supply the probable deficit of TFs in archaea, by possibly forming different combinations of monomers similar to that observed in eukaryotic transcriptional machinery. Our results show that although the DNA-binding domains of archaeal TFs are similar to bacteria, there is an underrepresentation of ligand-binding domains in smaller TFs, which suggests that protein-protein interactions may act as mediators of regulatory feedback, indicating a chimera of bacterial and eukaryotic TFs’ functionality. The analysis presented here contributes to the understanding of the details of transcriptional apparatus in archaea and provides a framework for the analysis of regulatory networks in these organisms.
Dissecting the expression dynamics of RNA-binding proteins in posttranscriptional regulatory networks.
Mittal N, Roy N, Babu MM, Janga SC. (2009 Dec 1)
Proc Natl Acad Sci U S A. pii: 0906940106.doi: 10.1073/pnas.0906940106
Toggle abstract
In eukaryotic organisms, gene expression requires an additional level of coordination that links transcriptional and posttranslational processes. Messenger RNAs have traditionally been viewed as passive molecules in the pathway from transcription to translation. However, it is now clear that RNA-binding proteins (RBPs) play an important role in cellular homeostasis by controlling gene expression at the posttranscriptional level. Here, we show that RBPs, as a class of proteins, show distinct gene expression dynamics compared to other protein coding genes in the eukaryote Sacchoromyces cerevisiae. We find that RBPs generally exhibit high protein stability, translational efficiency, and protein abundance but their encoding transcripts tend to have a low half-life. We show that RBPs are also most often posttranslationally modified, indicating their potential for regulation at the protein level to control diverse cellular processes. Further analysis of the RBP-RNA interaction network showed that the number of distinct targets bound by an RBP (connectivity) is strongly correlated with its protein stability, translational efficiency, and abundance. We also note that RBPs show less noise in their expression in a population of cells, with highly connected RBPs showing significantly lower noise. Our results indicate that highly connected RBPs are likely to be tightly regulated at the protein level as significant changes in their expression may bring about large-scale changes in global expression levels by affecting their targets. These observations might explain the molecular basis behind the cause of a number of disorders associated with misexpression or mutation in RBPs. Future studies uncovering the posttranscriptional networks in higher eukaryotes can help our understanding of the link between different levels of regulation and their role in pathological conditions.
Scaling relationship in the gene content of transcriptional machinery in bacteria.
Pérez-Rueda E, Janga SC, Martínez-Antonio A. (2009 Dec)
Mol Biosyst.doi: 10.1039/b907384a
Toggle abstract
The metabolic, defensive, communicative and pathogenic capabilities of eubacteria depend on their repertoire of genes and ability to regulate the expression of them. Sigma and transcription factors have fundamental roles in controlling these processes. Here, we show that sigma, transcription factors (TFs) and the number of protein coding genes occur in different magnitudes across 291 non-redundant eubacterial genomes. We suggest that these differences can be explained based on the fact that the universe of TFs, in contrast to sigma factors, exhibits a greater flexibility for transcriptional regulation, due to their ability to sense diverse stimuli through a variety of ligand-binding domains by discriminating over longer regions on DNA, through their diverse DNA-binding domains, and by their combinatorial role with other sigmas and TFs. We also note that the diversity of extra-cytoplasmic sigma factors and TF families is constrained in larger genomes. Our results indicate that most widely distributed families across eubacteria are small in size, while large families are relatively limited in their distribution across genomes. Clustering of the distribution of transcription and sigma families across genomes suggests that functional constraints could force their co-evolution, as was observed in sigma54, IHF and EBP families. Our results also indicate that large families might be a consequence of lifestyle, as pathogens and free-living organisms were found to exhibit a major proportion of these expanded families. Our results suggest that understanding proteomes from an integrated perspective, as presented in this study, can be a general framework for uncovering the relationships between different classes of proteins.
Structure and organization of drug-target networks: insights from genomic approaches for drug discovery.
Janga SC, Tzakos A. (2009 Dec)
Mol Biosyst.doi: 10.1039/B908147j
Toggle abstract
Recent years have seen an explosion in the amount of “omics” data and the integration of several disciplines, which has influenced all areas of life sciences including that of drug discovery. Several lines of evidence now suggest that the traditional notion of “one drug-one protein” for one disease does not hold any more and that treatment for most complex diseases can best be attempted using polypharmacological approaches. In this review, we formalize the definition of a drug-target network by decomposing it into drug, target and disease spaces and provide an overview of our understanding in recent years about its structure and organizational principles. We discuss advances made in developing promiscuous drugs following the paradigm of polypharmacology and reveal their advantages over traditional drugs for targeting diseases such as cancer. We suggest that drug-target networks can be decomposed to be studied at a variety of levels and argue that such network-based approaches have important implications in understanding disease phenotypes and in accelerating drug discovery. We also discuss the potential and scope network pharmacology promises in harnessing the vast amount of data from high-throughput approaches for therapeutic advantage.
Plasticity of transcriptional machinery in bacteria is increased by the repertoire of regulatory families.
Janga SC, Pérez-Rueda E. (2009 Aug)
Comput Biol Chem. pii: S1476-9271(09)00050-4.doi: 10.1016/j.compbiolchem.2009.06.004
Toggle abstract
Escherichia coli K12 and Bacillus subtilis 168 are two of the best characterized bacterial organisms with a long history in molecular biology for understanding various mechanisms in prokaryotic species. However, at the level of transcriptional regulation little is known on a comparative scale. Here we address the question of the degree to which transcription factors (TFs) and their evolutionary families are shared between them. We found that 59 proteins and 28 families are shared between these two bacteria, whereas different subsets were lineage specific. We demonstrate that majority of the common families expand in a lineage-specific manner. More specifically, we found that AraC, ColD, Ebp, LuxR and LysR families are over-represented in E. coli, while ArsR, AsnC, MarR, MerR and TetR families have significantly expanded in B. subtilis. We introduce the notion of regulatory superfamilies based on an empirical number of functional categories regulated by them and show that these families are essentially different in the two bacteria. We further show that global regulators seem to be constrained to smaller regulatory families and generally originate from lineage-specific families. We find that although TF families may be conserved across genomes their functional roles might evolve in a lineage-specific manner and need not be conserved, indicating convergence to be an important phenomenon involved in the functional evolution of TFs of the same family. Although topologically the networks of transcriptional interactions among TF families are similar in both the genomes, we found that the players are different, suggesting different evolutionary origins for the transcriptional regulatory machinery in both bacteria. This study provides evidence from complete repertoires that not only novel families originate in different lineages but conserved TF families expand/contrast in a lineage-specific manner, and suggests that part of the global regulatory mechanisms might originate independently in different lineages.
Interfacing systems biology and synthetic biology.
Lister A, Charoensawan V, De S, James K, Janga SC, Huppert J. (2009)
Genome Biol. pii: gb-2009-10-6-309.doi: 10.1186/gb-2009-10-6-309
Toggle abstract
A report of BioSysBio 2009, the IET conference on Synthetic Biology, Systems Biology and Bioinformatics, Cambridge, UK, 23-25 March 2009.
Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins.
Hu P, Janga SC, Babu M, Díaz-Mejía JJ, Butland G, Yang W, Pogoutse O, Guo X, Phanse S, Wong P, Chandran S, Christopoulos C, Nazarians-Armavil A, Nasseri NK, Musso G, Ali M, Nazemof N, Eroukova V, Golshani A, Paccanaro A, Greenblatt JF, Moreno-Hagelsieb G, Emili A. (2009 Apr 28)
PLoS Biol. pii: 08-PLBI-RA-4517.doi: 10.1371/journal.pbio.1000096
Toggle abstract
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
Transcriptional regulation shapes the organization of genes on bacterial chromosomes.
Janga SC, Salgado H, Martínez-Antonio A. (2009 Jun)
Nucleic Acids Res. pii: gkp231.doi: 10.1093/nar/gkp231
Toggle abstract
Transcription factors (TFs) are the key elements responsible for controlling the expression of genes in bacterial genomes and when visualized on a genomic scale form a dense network of transcriptional interactions among themselves and with other protein coding genes. Although the structure of transcriptional regulatory networks (TRNs) is well understood, it is not clear what constrains govern them. Here, we explore this question using the TRNs of model prokaryotes and provide a link between the transcriptional hierarchy of regulons and their genome organization. We show that, to drive the kinetics and concentration gradients, TFs belonging to big and small regulons, depending on the number of genes they regulate, organize themselves differently on the genome with respect to their targets. We then propose a conceptual model that can explain how the hierarchical structure of TRNs might be ultimately governed by the dynamic biophysical requirements for targeting DNA-binding sites by TFs. Our results suggest that the main parameters defining the position of a TF in the network hierarchy are the number and chromosomal distances of the genes they regulate and their protein concentration gradients. These observations give insights into how the hierarchical structure of transcriptional networks can be encoded on the chromosome to drive the kinetics and concentration gradients of TFs depending on the number of genes they regulate and could be a common theme valid for other prokaryotes, proposing the role of transcriptional regulation in shaping the organization of genes on a chromosome.
Transcript stability in the protein interaction network of Escherichia coli.
Janga SC, Babu MM. (2009 Feb)
Mol Biosyst.doi: 10.1039/b816845h
Toggle abstract
Gene expression is a dynamic process which can be controlled by a number of mechanisms as genetic information flows from nucleic acids to proteins. The study of gene expression in the steady state, while informative, overlooks the underlying dynamics of the processes. Steady-state transcript levels are a result of both RNA synthesis and degradation, and as such, measurements of degradation rates can be used to determine their rates of synthesis as well as reveal regulation that occurs via changes in RNA stability. Messenger RNA degradation plays a central role in diverse cellular processes and is controlled primarily by the activity of the degradosome in prokaryotes. In this study, we use the currently available network of protein-protein interactions (PPIs) and mRNA half-lives in Escherichia coli to demonstrate that centrality of a protein in the PPI network is strongly correlated with its mRNA half-life. We find that interacting proteins tend to show similar half-lives, commonly referred to as assortative behavior in networks, which is frequently found in biological and social networks. While a major fraction of the interacting proteins show significantly lower differences in mRNA stabilities, a smaller but significant number of protein pairs tend to show higher differences than expected by chance. Higher differences in transcript stabilities often involved those that encode for transcription factors and enzymes, suggesting a feedback link at the post-translational level. We also note that although essential genes, which act as a proxy for in vivo centrality in PPI networks, are highly expressed compared to non-essential ones, they do not encode for more stable transcripts than non-essential genes. Our results provide a direct link between mRNA stability and centrality of a protein in PPI network indicating the importance of post-transcriptional mechanisms on nascent RNAs in the cell.
Network-based approaches for linking metabolism with environment.
Janga SC, Babu MM. (2008)
Genome Biol. pii: gb-2008-9-11-239.doi: 10.1186/gb-2008-9-11-239
Toggle abstract
Progress in the reconstruction of genome-wide metabolic maps has led to the development of network-based computational approaches for linking an organism with its biochemical habitat.
Eukaryotic gene regulation in three dimensions and its impact on genome evolution.
Babu MM, Janga SC, de Santiago I, Pombo A. (2008 Dec)
Curr Opin Genet Dev. pii: S0959-437X(08)00147-0.doi: 10.1016/j.gde.2008.10.002
Toggle abstract
Recent advances in molecular techniques and high-resolution imaging are beginning to provide exciting insights into the higher order chromatin organization within the cell nucleus and its influence on eukaryotic gene regulation. This improved understanding of gene regulation also raises fundamental questions about how spatial features might have constrained the organization of genes on eukaryotic chromosomes and how mutations that affect these processes might contribute to disease conditions. In this review, we discuss recent studies that highlight the role of spatial components in gene regulation and their impact on genome evolution. We then address implications for human diseases and outline new directions for future research.
Transcriptional regulation constrains the organization of genes on eukaryotic chromosomes.
Janga SC, Collado-Vides J, Babu MM. (2008 Oct 14)
Proc Natl Acad Sci U S A. pii: 0806317105.doi: 10.1073/pnas.0806317105
Toggle abstract
Genetic material in eukaryotes is tightly packaged in a hierarchical manner into multiple linear chromosomes within the nucleus. Although it is known that eukaryotic transcriptional regulation is complex and requires an intricate coordination of several molecular events both in space and time, whether the complexity of this process constrains genome organization is still unknown. Here, we present evidence for the existence of a higher-order organization of genes across and within chromosomes that is constrained by transcriptional regulation. In particular, we reveal that the target genes (TGs) of transcription factors (TFs) for the yeast, Saccharomyces cerevisiae, are encoded in a highly ordered manner both across and within the 16 chromosomes. We show that (i) the TGs of a majority of TFs show a strong preference to be encoded on specific chromosomes, (ii) the TGs of a significant number of TFs display a strong preference (or avoidance) to be encoded in regions containing particular chromosomal landmarks such as telomeres and centromeres, and (iii) the TGs of most TFs are positionally clustered within a chromosome. Our results demonstrate that specific organization of genes that allowed for efficient control of transcription within the nuclear space has been selected during evolution. We anticipate that uncovering such higher-order organization of genes in other eukaryotes will provide insights into nuclear architecture, and will have implications in genetic engineering experiments, gene therapy, and understanding disease conditions that involve chromosomal aberrations.
Functional organisation of Escherichia coli transcriptional regulatory network.
Martínez-Antonio A, Janga SC, Thieffry D. (2008 Aug 1)
J Mol Biol. pii: S0022-2836(08)00632-3.doi: 10.1016/j.jmb.2008.05.054
Toggle abstract
Taking advantage of available functional data associated with 115 transcription and 7 sigma factors, we have performed a structural analysis of the regulatory network of Escherichia coli. While the mode of regulatory interaction between transcription factors (TFs) is predominantly positive, TFs are frequently negatively autoregulated. Furthermore, feedback loops, regulatory motifs and regulatory pathways are unevenly distributed in this network. Short pathways, multiple feed-forward loops and negative autoregulatory interactions are particularly predominant in the subnetwork controlling metabolic functions such as the use of alternative carbon sources. In contrast, long hierarchical cascades and positive autoregulatory loops are overrepresented in the subnetworks controlling developmental processes for biofilm and chemotaxis. We propose that these long transcriptional cascades coupled with regulatory switches (positive loops) for external sensing enable the coexistence of multiple bacterial phenotypes. In contrast, short regulatory pathways and negative autoregulatory loops enable an efficient homeostatic control of crucial metabolites despite external variations. TFs at the core of the network coordinate the most basic endogenous processes by passing information onto multi-element circuits. Transcriptional expression data support broader and higher transcription of global TFs compared to specific ones. Global regulators are also more broadly conserved than specific regulators in bacteria, pointing to varying functional constraints.
Ten simple rules for organizing a scientific meeting.
Corpas M, Gehlenborg N, Janga SC, Bourne PE. (2008 Jun 27)
PLoS Comput Biol.doi: 10.1371/journal.pcbi.1000080
No abstract found.
Structure and evolution of gene regulatory networks in microbial genomes.
Janga SC, Collado-Vides J. (2007 Dec)
Res Microbiol. pii: S0923-2508(07)00176-3.doi: 10.1016/j.resmic.2007.09.001
Toggle abstract
With the availability of genome sequences for hundreds of microbial genomes, it has become possible to address several questions from a comparative perspective to understand the structure and function of regulatory systems, at least in model organisms. Recent studies have focused on topological properties and the evolution of regulatory networks and their components. Our understanding of natural networks is paving the way to embedding synthetic regulatory systems into organisms, allowing us to expand the natural diversity of living systems to an extent we had never before anticipated.
Coordination logic of the sensing machinery in the transcriptional regulatory network of Escherichia coli.
Janga SC, Salgado H, Martínez-Antonio A, Collado-Vides J. (2007)
Nucleic Acids Res. pii: gkm743.doi: 10.1093/nar/gkm743
Toggle abstract
The active and inactive state of transcription factors in growing cells is usually directed by allosteric physicochemical signals or metabolites, which are in turn either produced in the cell or obtained from the environment by the activity of the products of effector genes. To understand the regulatory dynamics and to improve our knowledge about how transcription factors (TFs) respond to endogenous and exogenous signals in the bacterial model, Escherichia coli, we previously proposed to classify TFs into external, internal and hybrid sensing classes depending on the source of their allosteric or equivalent metabolite. Here we analyze how a cell uses its topological structures in the context of sensing machinery and show that, while feed forward loops (FFLs) tightly integrate internal and external sensing TFs connecting TFs from different layers of the hierarchical transcriptional regulatory network (TRN), bifan motifs frequently connect TFs belonging to the same sensing class and could act as a bridge between TFs originating from the same level in the hierarchy. We observe that modules identified in the regulatory network of E. coli are heterogeneous in sensing context with a clear combination of internal and external sensing categories depending on the physiological role played by the module. We also note that propensity of two-component response regulators increases at promoters, as the number of TFs regulating a target operon increases. Finally we show that evolutionary families of TFs do not show a tendency to preserve their sensing abilities. Our results provide a detailed panorama of the topological structures of E. coli TRN and the way TFs they compose off, sense their surroundings by coordinating responses.
Operons and the effect of genome redundancy in deciphering functional relationships using phylogenetic profiles.
Moreno-Hagelsieb G, Janga SC. (2008 Feb 1)
Proteins.doi: 10.1002/prot.21564
Toggle abstract
Phylogenetic profiles (PPs) are one of the most promising methods for predicting functional relationships by genomic context. The idea behind PPs is that if the products of two genes have a functional interdependence, the genes should both be either present or absent across genomes. One of the main problems with PPs is that evolutionarily close organisms tend to share a higher number of genes resulting in the overscoring of PP-relatedness. The proper measure of the overscoring effect of evolutionary redundancy requires examples of both functionally related genes (positive gold standards) and functionally unrelated genes (negative gold standards). Since experimentally verified functional interactions are only available for a few model organisms, there is a need for an alternative to gold standards. The presence of operons (polycistronic transcription units formed of functionally related genes) in prokaryotic genomes offers such an alternative. Genes in operons are located next to each other in the same DNA strand, and thus their presence should result in a higher proportion of predicted functional interactions among adjacent genes in the same strand than among adjacent genes in opposite strands. Under the preceding principle, we present a confidence value (CV) designed for evaluating predictions of functional interactions obtained using PPs. We first show that the CV corresponds to a positive predictive value calculated using experimentally known operons and further validate operon predictions based on this CV in other organisms using available microarray data. Then, we use a fixed CV of 0.90 as a reference to compare PP predictions obtained using different nonredundant genome datasets filtered at varying thresholds of genomic similarity. Our results demonstrate that nonredundant genome datasets increase the number of high-quality predictions by an average of 20%. Confidence values as those presented here should help compare other strategies and scoring systems to use phylogenetic profiles and other genomic context methods for predicting functional interactions.
Conservation of transcriptional sensing systems in prokaryotes: a perspective from Escherichia coli.
Salgado H, Martínez-Antonio A, Janga SC. (2007 Jul 24)
FEBS Lett. pii: S0014-5793(07)00718-1.doi: 10.1016/j.febslet.2007.06.059
Toggle abstract
The activity of transcription factors is usually governed by allosteric physicochemical signals or metabolites, which are in turn produced in the cell or obtained from the environment by the activity of the products of effector genes. Previously, we identified a collection of more than 110 transcription factors and their corresponding effector genes in Escherichia coli K-12. Here, we introduce the notion of “triferog”, which relates to the identification of orthologous transcription factors and effector genes across genomes and show that transcriptional sensing systems known in E. coli are poorly conserved beyond Salmonella. We also find that enzymes that act as effector genes for the production of endogenous effector metabolites are more conserved than their corresponding effector genes encoding for transport and two-component systems for sensing exogenous signals. Finally, we observe that on an evolutionary scale enzymes are more conserved than their respective TFs, suggesting a homogenous cellular metabolism across genomes and the conservation of transcriptional control of critical cellular processes like DNA replication by a common endogenous signal. We hypothesize that extensive variation in the domain architecture of TFs and changes in endogenous conditions at large phylogenetic distances could be the major contributing factors for the observed differential conservation of TFs and their corresponding effector genes encoding for enzymes, causing variations in transcriptional responses across organisms.
Internal versus external effector and transcription factor gene pairs differ in their relative chromosomal position in Escherichia coli.
Janga SC, Salgado H, Collado-Vides J, Martínez-Antonio A. (2007 Apr 20)
J Mol Biol. pii: S0022-2836(07)00046-0.doi: 10.1016/j.jmb.2007.01.019
Toggle abstract
Transcription factors (TFs) play an important role in the genetic regulation of transcription in response to internal and external cellular stimuli. However, little is known about their functional and dynamic aspects on a large scale, even in a well-studied bacterium like Escherichia coli. To understand the regulatory dynamics and to improve our knowledge about how TFs respond to endogenous and exogenous signals in this simple bacterium model, we previously proposed that TFs can be classified into three classes, depending on how they sense their allosteric or equivalent metabolite: external class, internal class, and hybrid sensing class. Classification of these groups was done without considering the relative chromosomal positions of the TFs and their corresponding effector genes. Here, we analyze the genome organization of the genetic components of these sensing systems, using the classification described earlier. We report the chromosomal proximity of transcription factors and their effector genes to sense periplasmic signals or transported metabolites (i.e. transcriptional sensing systems from the external class) in contrast to the components for sensing internally synthesized metabolites, which tend to be distant on the chromosome. We strengthen our finding that external sensing genetic machinery behaves like chromosomal modules of regulation to respond rapidly to variations in external conditions through co-expression of their genetic components, which is corroborated with microarray data for E. coli. Furthermore, we show several lines of evidence supporting the need for the coordinated activity of external sensing systems in contrast to that of internal sensing machinery, which can explain their close chromosomal organization. The observed functional correlation between the chromosomal organization and the genetic machinery for environmental sensing should contribute to our understanding of the logical functioning and evolution of the transcriptional regulatory networks in bacteria.
The distinctive signatures of promoter regions and operon junctions across prokaryotes.
Janga SC, Lamboy WF, Huerta AM, Moreno-Hagelsieb G. (2006)
Nucleic Acids Res. pii: gkl563.doi: 10.1093/nar/gkl563
Toggle abstract
Here we show that regions upstream of first transcribed genes have oligonucleotide signatures that distinguish them from regions upstream of genes in the middle of operons. Databases of experimentally confirmed transcription units do not exist for most genomes. Thus, to expand the analyses into genomes with no experimentally confirmed data, we used genes conserved adjacent in evolutionarily distant genomes as representatives of genes inside operons. Likewise, we used divergently transcribed genes as representative examples of first transcribed genes. In model organisms, the trinucleotide signatures of regions upstream of these representative genes allow for operon predictions with accuracies close to those obtained with known operon data (0.8). Signature-based operon predictions have more similar phylogenetic profiles and higher proportions of genes in the same pathways than predicted transcription unit boundaries (TUBs). These results confirm that we are separating genes with related functions, as expected for operons, from genes not necessarily related, as expected for genes in different transcription units. We also test the quality of the predictions using microarray data in six genomes and show that the signature-predicted operons tend to have high correlations of expression. Oligonucleotide signatures should expand the number of tools available to identify operons even in poorly characterized genomes.
Bacterial regulatory networks are extremely flexible in evolution.
Lozada-Chávez I, Janga SC, Collado-Vides J. (2006)
Nucleic Acids Res. pii: 34/12/3434.doi: 10.1093/nar/gkl423
Toggle abstract
Over millions of years the structure and complexity of the transcriptional regulatory network (TRN) in bacteria has changed, reorganized and enabled them to adapt to almost every environmental niche on earth. In order to understand the plasticity of TRNs in bacteria, we studied the conservation of currently known TRNs of the two model organisms Escherichia coli K12 and Bacillus subtilis across complete genomes including Bacteria, Archaea and Eukarya at three different levels: individual components of the TRN, pairs of interactions and regulons. We found that transcription factors (TFs) evolve much faster than the target genes (TGs) across phyla. We show that global regulators are poorly conserved across the phylogenetic spectrum and hence TFs could be the major players responsible for the plasticity and evolvability of the TRNs. We also found that there is only a small fraction of significantly conserved transcriptional regulatory interactions among different phyla of bacteria and that there is no constraint on the elements of the interaction to co-evolve. Finally our results suggest that majority of the regulons in bacteria are rapidly lost implying a high-order flexibility in the TRNs. We hypothesize that during the divergence of bacteria certain essential cellular processes like the synthesis of arginine, biotine and ribose, transport of amino acids and iron, availability of phosphate, replication process and the SOS response are well conserved in evolution. From our comparative analysis, it is possible to infer that transcriptional regulation is more flexible than the genetic component of the organisms and its complexity and structure plays an important role in the phenotypic adaptation.
Identification and analysis of DNA-binding transcription factors in Bacillus subtilis and other Firmicutes–a genomic approach.
Moreno-Campuzano S, Janga SC, Pérez-Rueda E. (2006 Jun 13)
BMC Genomics. pii: 1471-2164-7-147.doi: 10.1186/1471-2164-7-147
Toggle abstract
Bacillus subtilis is one of the best-characterized organisms in Gram-positive bacteria. It represents a paradigm of gene regulation in bacteria due its complex life style (which could involve a transition between stages as diverse as vegetative cell and spore formation). In order to gain insight into the organization and evolution of the B. subtilis regulatory network and to provide an alternative framework for further studies in bacteria, we identified and analyzed its repertoire of DNA-binding transcription factors in terms of their abundance, family distribution and regulated genes.
The partitioned Rhizobium etli genome: genetic and metabolic redundancy in seven interacting replicons.
González V, Santamaría RI, Bustos P, Hernández-González I, Medrano-Soto A, Moreno-Hagelsieb G, Janga SC, Ramírez MA, Jiménez-Jacinto V, Collado-Vides J, Dávila G. (2006 Mar 7)
Proc Natl Acad Sci U S A. pii: 0508502103.doi: 10.1073/pnas.0508502103
Toggle abstract
We report the complete 6,530,228-bp genome sequence of the symbiotic nitrogen fixing bacterium Rhizobium etli. Six large plasmids comprise one-third of the total genome size. The chromosome encodes most functions necessary for cell growth, whereas few essential genes or complete metabolic pathways are located in plasmids. Chromosomal synteny is disrupted by genes related to insertion sequences, phages, plasmids, and cell-surface components. Plasmids do not show synteny, and their orthologs are mostly shared by accessory replicons of species with multipartite genomes. Some nodulation genes are predicted to be functionally related with chromosomal loci encoding for the external envelope of the bacterium. Several pieces of evidence suggest an exogenous origin for the symbiotic plasmid (p42d) and p42a. Additional putative horizontal gene transfer events might have contributed to expand the adaptive repertoire of R. etli, because they include genes involved in small molecule metabolism, transport, and transcriptional regulation. Twenty-three putative sigma factors, numerous isozymes, and paralogous families attest to the metabolic redundancy and the genomic plasticity necessary to sustain the lifestyle of R. etli in symbiosis and in the soil.
Internal-sensing machinery directs the activity of the regulatory network in Escherichia coli.
Martínez-Antonio A, Janga SC, Salgado H, Collado-Vides J. (2006 Jan)
Trends Microbiol. pii: S0966-842X(05)00305-7.doi: 10.1016/j.tim.2005.11.002
Toggle abstract
Individual cells need to discern and synchronize transcriptional responses according to variations in external and internal conditions. Metabolites and chemical compounds are sensed by transcription factors (TFs), which direct the corresponding specific transcriptional responses. We propose a classification of the currently known TFs of Escherichia coli based on whether they respond to metabolites incorporated from the exterior, to internally produced compounds, or to both. When analyzing the mutual interactions of TFs, the dominant role of internal signal sensing becomes apparent, greatly due to the role of global regulators of transcription. This work encompasses metabolite-TF interactions, bridging the gap between the metabolic and regulatory networks, thus advancing towards an integrated network model for the understanding of cellular behavior.
The network of transcriptional interactions imposes linear constrains in the genome.
Menchaca-Mendez R, Janga SC, Collado-Vides J. (2005 Summer)
OMICS.doi: 10.1089/omi.2005.9.139
Toggle abstract
Two prokaryotic organisms for which transcriptional regulatory interactions have been well elucidated by experimental means are Escherichia coli and Bacillus subtilis. Here we show, with the help of simulations and from known data, the importance of proximity of the transcription factor gene and the respective regulated gene in regulatory networks. We discuss the importance of the location of external sensing machinery close to the genes for transcription factors that regulate them in light of our finding.
Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons.
Janga SC, Collado-Vides J, Moreno-Hagelsieb G. (2005)
Nucleic Acids Res. pii: 33/8/2521.doi: 10.1093/nar/gki545
Toggle abstract
Since operons are unstable across Prokaryotes, it has been suggested that perhaps they re-combine in a conservative manner. Thus, genes belonging to a given operon in one genome might re-associate in other genomes revealing functional relationships among gene products. We developed a system to build networks of functional relationships of gene products based on their organization into operons in any available genome. The operon predictions are based on inter-genic distances. Our system can use different kinds of thresholds to accept a functional relationship, either related to the prediction of operons, or to the number of non-redundant genomes that support the associations. We also work by shells, meaning that we decide on the number of linking iterations to allow for the complementation of related gene sets. The method shows high reliability benchmarked against knowledge-bases of functional interactions. We also illustrate the use of Nebulon in finding new members of regulons, and of other functional groups of genes. Operon rearrangements produce thousands of high-quality new interactions per prokaryotic genome, and thousands of confirmations per genome to other predictions, making it another important tool for the inference of functional interactions from genomic context.
Conservation of adjacency as evidence of paralogous operons.
Janga SC, Moreno-Hagelsieb G. (2004)
Nucleic Acids Res. pii: 32/18/5392.doi: 10.1093/nar/gkh882
Toggle abstract
Most of the analyses on the conservation of gene order are limited to orthologous genes. However, the organization of genes into operons might also result in the conservation of gene order of paralogous genes. Thus, we sought computational evidence that conservation of gene order of paralogous genes represents another level of conservation of genes in operons. We found that pairs of genes within experimentally characterized operons of Escherichia coli K12 and Bacillus subtilis tend to have more adjacently conserved paralogs than pairs of genes at transcription unit boundaries. The fraction of same strand gene pairs corresponding to conserved paralogs averages 0.07 with a maximum of 0.22 in Borrelia burgdorferi. The use of evidence from the conservation of adjacency of paralogous genes can improve the prediction of operons in E.coli K12 by approximately 0.27 over predictions using conservation of adjacency of orthologous genes alone.