NIH Library Services Bioinformatics Bioinformatics Research Symposium 2016

Bioinformatics Research Symposium 2016

NIH Library Bioinformatics Symposium
Monday, June 20, 2016
9:30 a.m. – 3:00 p.m.
Lipsett Amphitheater, NIH Clinical Center, Building 10
Free and open to the public

Registration preferred, but not required.
To register, visit

Attend the NIH Library Bioinformatics Symposium, Monday, June 20, to learn how scientists are using software licensed by the NIH Library Bioinformatics Support Program to analyze, integrate, and annotate data from multiple genomics technologies, including next generation sequencing. Discover the latest applications of these data analysis tools to problems in molecular biology. Find out how state-of-the-art knowledge bases and pathway analysis applications are transforming downstream functional analysis of high-throughput experiment data.

Bioinformatics Research Symposium 2016​


9:30 a.m.

Welcome – NIH Library Bioinformatics Support Program

9:45 a.m.

Merging Glioblastoma Data for Correlation and Network-Based Analysis Using GeneSpring 13

Dr. Dipa Roy Choudhury, Agilent Technologies, Inc.

A demonstration to show how GeneSpring can be used to study correlation of expression of transcriptomic and proteomic data to identify curated signaling pathways that might be deregulated specifically in Glioblastoma (GBM) by clustering across omic data types. Homologene and BridgeDB are implemented in GeneSpring to facilitate integrative analysis through translation functions, linking probes across data types, array platforms, and organisms that map to the same biological entity. Since GeneSpring can create literature-derived networks, we then extended our investigation to identify NLP- (Natural Language Processing) derived networks of genes consisting largely of interesting genes from the multi-omic experiment that allow cross-talk between curated pathways that were differentially expressed in GBM datasets.

Software highlighted: GeneSpring

Keywords: correlation of expression of transcriptomic and proteomic data, pathway and network analysis

10:20 a.m.

From GWAS SNP to Molecular Mechanism: Insights Gained from Promoter Modeling and Network Analysis

Dr. Susan Dombrowski, Genomatix

Genome-Wide-Association-Studies (GWAS) have been used to search for genetic clues linked to diseases or population groups. A long-standing problem of GWAS approaches is that the result is a mere statistical correlation of some mutation (usually a SNP) with a specific condition or disease, and in most cases the correlated SNPs are located outside the coding region of any gene (≈ 80% of such SNPs according to ENCODE). In order to locate potential target genes, the next coding gene upstream or downstream of the SNP is taken as a candidate—an approach that often fails because many disease-correlated SNPs are in fact regulatory variants that affect genes distant from the SNP site. We have analyzed a regulatory SNP correlated with diabetic nephropathy in a GWAS study and leveraged Genomatix' unique capabilities in comparative promoter analysis to link that SNP to a complex regulatory network (PMID 23434934). This network in turn affects a complete pathway involving genes located on different chromosomes from the regulatory SNP of interest. The talk will illustrate the principles and highlight the results of this regulatory analysis as performed on the Genomatix Genome approach that often fails because many disease-correlated SNPs are in fact regulatory variants that affect genes distant from the SNP site. We have analyzed a regulatory SNP correlated with diabetic nephropathy in a GWAS study and leveraged Genomatix' unique capabilities in comparative promoter analysis to link that SNP to a complex regulatory network (PMID 23434934). This network in turn affects a complete pathway involving genes located on different chromosomes from the regulatory SNP of interest. The talk will illustrate the principles and highlight the results of this regulatory analysis as performed on the Genomatix Genome Analyzer.

Software highlighted: Genomatix Genome Analyzer

Keywords: Genome-Wide-Association-Studies (GWAS), promoter modeling, regulatory SNP, network analysis

10:55 a.m.

miRNA-Seq Analysis with Partek: Serum miRNA Study in Alcohol Use Disorder Subjects Suggests Alterations of CNS Structure and Function

Dr. Eric Seiser, Partek, Inc.

We will feature a successful miRNA-Seq based study of extracellular miRNAs in 20 individuals diagnosed with Alcohol Use Disorder (AUD). The talk will demonstrate how to go from raw sequence data to biological interpretation using Partek software. Analysis of the sequencing data using Partek Flow will include checking the quality of reads, generating aligned reads, quantifying miRNA levels, and determining differentially expressed miRNAs. By integrating miRNA-Seq results in Partek Genomics Suite, we will demonstrate using Partek Pathway to explore how differentially expressed miRNAs impact CNS structure and function. Lastly, miRNA expression microarray data will be analyzed in Partek Genomics Suite to validate findings from the next generation sequencing data.

Software highlighted: Partek Flow, Partek Genomics Suite, Partek Pathway

Keywords: next gen sequence data analysis, microarray data analysis, pathway analysis

11:30 a.m.

Moving Beyond Multi-Locus Sequence Typing (MLST): Using NGS K-mer and Whole Genome SNP Trees for Pathogen Typing and Outbreak Analysis

Dr. Jennifer Poitras, QIAGEN

Typing pathogenic bacteria is important in the surveillance of food safety, and public and animal health. Molecular methods using Next Generation Sequencing (NGS) data from whole pathogen genomes are increasingly being used for outbreak detection of common pathogens. Multi-Locus Sequence Typing (MLST) is a common method for typing pathogens. As sequencing has become less expensive in conjunction with higher throughput, and analysis tools have become more robust, full genome comparisons can be carried out on pathogens to both type and create a high resolution phylogenetic mapping of isolates which is pivotal information when tracing outbreaks. Using whole genome sequence data, and QIAGEN’s CLC Genomics Workbench, the CLC Microbial Genomics Module was used to type and characterize a strain of Salmonella enterica. Additionally, using k-mer and whole genome-based high-resolution SNP tree generation, the origin of its outbreak was determined by comparing to other potential isolates.

Software highlighted: CLC Microbial Genomics Module

Keywords: next gen sequence analysis, microbial genome analysis, pathogen identification

12:05 p.m.

Morning Wrap-Up

1:00 p.m.


Census of the Apoptosis Pathway

Dr. Philip L. Lorenzi, The University of Texas, MD Anderson Cancer Center

We recently compared several different “omic” approaches to constructing the autophagy pathway de novo, including siRNA screening, mass spectrometry-based proteomics, and three different pathway analysis software packages.  Unexpectedly, although merging all of the validated data sets yielded 739 autophagy-modulating genes, each individual approach alone yielded sparse coverage of the autophagy pathway. The best individual siRNA screen, for example, yielded only 169 of the 739 (23%) genes. Nevertheless, text mining-based pathway analysis with Pathway Studio in conjunction with manual curation provided the most comprehensive coverage, yielding 417 targets (56% of the pathway). Here, we explore the generalizability of those findings by examining a more well-characterized pathway—apoptosis. We compiled apoptosis-modulating genes from 12 published siRNA screens and two pathway analysis software packages—Ingenuity Pathway Analysis (IPA) and Pathway Studio. The resulting inventory of 6,882 proteins consisted of 215 targets identified by siRNA screening, 3,378 targets by IPA, and 6,381 targets by Pathway Studio. The extensive coverage (93%) of the apoptosis pathway provided by text mining with Pathway Studio can likely be attributed to recent upgrades in the software, including an expanded database and collection of full-text articles. Together with our previous autophagy pathway analysis, the new apoptosis results support the generalizable conclusions that: (1) siRNA screening has a large false negative rate (i.e., fails to identify many true “hits”), and (2) text mining-based pathway analysis using Pathway Studio provides the most comprehensive pathway coverage.

Software highlighted: Pathway Studio

Keywords: pathway analysis

1:35 p.m.


Role of microRNA mRNA Interactions in Endometrioid Endometrial Carcinoma

Dr. Jennifer Poitras, QIAGEN

Endometrial adenocarcinoma is a common cause of gynecological cancer death in Europe and North America. The most dominant subtype, Endometrioid Endometrial Cancer (EEC) accounts for greater than 80% of this cancer and is estrogen-dependent. At diagnosis, 75% of women have the disease confined to the uterus which is considered Stage One. Five-year survival for Stage One patients is 80%, however, about 15–20% develop metastasis. EEC is generally associated with good prognosis, however the clinical course may be unpredictable as four different EEC subtypes have been recently defined. One “transcriptome” subtype has been identified and presents a worse prognosis. Analyzing RNA expression using QIAGEN’s CLC Genomics Workbench and Ingenuity Pathway Analysis (IPA), biological parameters involved in EEC tumor progression were elucidated, including signaling pathways, biological processes, and potential transcriptional drivers.

Software highlighted: CLC Genomics Workbench and Ingenuity Pathway Analysis (IPA)

Keywords: RNA expression data analysis, pathway analysis, driver transcription factors

2:10 p.m.


Molecular Profiles of Tumor Infiltrating T-Lymphocytes in Breast Cancer Patients Using Metacore

Dr. Matthew E. Wampole, Thomson Reuters Intellectual Property & Science

Cancer immunotherapies are poised to become an integral standard of care component across oncology indications. One immuno-oncology based scientific approach is targeting checkpoint inhibitors. CTLA-4 and PD-1 are two receptors that represent immune checkpoints expressed on T-cells. Antibody inhibition of these targets enhances the antitumor immune response1, yielding high rates of objective clinical responses and ultimately melanoma and lung cancer FDA approvals. Breast cancer clinical trials are also ongoing for these targets along with PD-L1 to address the unmet need of this second most common cancer in the United States among women2.

CD4+ T-lymphocytes play a central role in the fight against cancer by activating the immune response within the tumor microenvironment. To characterize molecular differences between high and low infiltrating populations of CD4+ T-lymphocytes, we analyzed the gene expression profiles (GSE36765) of the sorted CD4+ T-cells within human breast cancer patient tumors. Patients with high infiltration profiles show unique immunological gene signatures including biomarkers such as IL-6, VEGFA, CTLA-4 and IDO1. Upstream causal reasoning analysis of the unique differentially expressed genes predict activation and inhibition of multiple immunological pathways including canonical PD-1 signaling (synergistic p-value = 8.45e-13) which includes statistically significant molecules such as SHP-1, Lck, PLC-gamma and CD3. These results explain mechanistically the unique molecular profiles of high versus low infiltrating CD4+ cells in the breast tumor microenvironment, provide insight into the genomic connections to T-cell population components, and could serve as biomarkers to stratify breast cancer patients.

1 Adams JL, Smothers J, Srinivasan R, Hoos A. Big opportunities for small molecules in immuno-oncology. Nat Rev Drug Discov. 2015;14(9):603-22

Software highlighted: Metacore

Keywords: gene expression data analysis, pathway analysis

2:45 p.m.


Final Remarks and Wrap-Up


Previous Bioinformatics Research Symposium
Jun 24, 2014 Bioinformatics Research Symposium - Day 1 High-Throughput Data Analysis: Next Generation Sequencing and Microarray
Jun 24, 2014 Bioinformatics Research Symposium - Day 2 Functional Analysis: Analysis of Pathways and Gene Sets

For more information contact the NIH Library at 301.496.1080.

Sign language interpreters and other reasonable accommodations can be provided. If you require such accommodations, please contact the NIH Library Information Desk at 301.496.1080 five business days in advance. For TDD users/callers, please call the above number through the Federal Relay Service at 1.800.877.8339.

The NIH Library in Building 10 serves the information needs of NIH staff and select Department of Health and Human Services (HHS) agencies. The NIH Library is part of the Office of Research Services (ORS) in the Office of the Director (OD).​