Monday, January 9, 2012

Validating Active Modules with BiNGO.

Introduction

Gene ontology project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data. The aim of the project is to reduce time spent searching for information related to bioinformatics. The same protein, for example, may be described as participating in ‘translation’ in one database, but ‘protein synthesis’ in another. Gene ontology aims to provide global, consistent descriptions by developing structured vocabularies, or ontologies, that match gene products to associated biological processes.

BINGO is a Cytoscape plugin that assesses overrepresentation of Gene Ontology categories in biological networks. It can be used on the list of genes pasted in text or on sub-networks of biological networks visualised in Cytoscape. BiNGO uses p-value as the indicator of the prominence of a certain functional category and colour the nodes accordingly. Additionally, when a whole branch of the GO hierarchy is highlighted as overrepresented, the nodes that are farther down the hierarchy are the most relevant ones.

For the purpose of this exercise, first jActiveModules plugin was used to find active modules (sub-networks) in the network provided. Next, BiNGO was used to assess the overrepresentation of GO categories in the network. Several relevant nodes were selected from the BiNGO output using the guidelines described above. The GO-IDs of the selected nodes were selected as sub-networks of the network, and the intersection of these sub-networks with the active modules was analysed.

Methods

First, jActiveModules plugin was used as in the previous post, and five active modules were identified in the network. The first module, which contained 95 nodes, was used for the exersice.


Figure 1 – active modules in the network.

BiNGO plugin was started and the settings were specified as advised in the practical exercise directions.


Figure 2 – BiNGO plugin settings.

The graphical output from BiNGO was a graph where nodes were coloured according to the p-value. The yellow and orange nodes represent gene ontology categories that are overrepresented at the significance level. Uncoloured nodes are not overrepresented themselves, but they are parents of overrepresented nodes further down. Some nodes could be immediately identified as most relevant – there were most intensely coloured and were located away from the centre of the network. Graphical output and some of the relevant nodes were presented in Figure 3.


Figure 3 – graphical output from BiNGO and relevant nodes.

The data output from the BiNGO plugin is the list of significantly overrepresented categories, p-values, frequencies and the list of genes that are included into each category. The output file, module1BP.bgo, is attached.


Figure 4 – BiNGO data output.

GO-ID 48731

GO-ID 48731 is annotated as “system development”. This is described in GO database as “the biological process whose specific outcome is the progression of an organismal system over time, from its formation to the mature structure.” The node is relevant as it is intensely coloured and lies away from the centre of the graph. Selecting the node in BiNGO data output and clicking “Select Nodes” highlighted all nodes in the original network (not only ones in the active module) which are annotated with “system development” category. The result was a network of 865 nodes. In the Figure 5, ratlung-child is a network that was created from the active module, and ratlung-child.1 is a network that contains the genes annotated as “system development”.


Figure 5 – GO-ID 48731 in BiNGO output

Figure 6 – ‘system development” nodes in the network.

Next, the “Advanced Network Merge” plugin was used to find the intersection between those two networks.


Figure 7 – intersection between active module and nodes annotated as “system development”

GO-ID 6357


Figure 8 – GO-ID 6357 in BiNGO graphical output

GO-ID 6357 is annotated as “regulation of transcription from RNA polymerase II promoter”. This is described as any process that modulates the frequency, rate of extent of transcription from RNA polymerase II promoter. Selecting the node in BiNGO data output and clicking “Select Nodes” resulted in a network of 292 nodes. The network was then analysed for intersecting with active module 1.


Figure 9 – Intersection between active module and nodes annotated as “regulation of transcription from RNA polymerase II promoter”

References:

S. Maere, K. Heymans and M. Kuiper, BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks, Bioinformatics Applications, 21:3448 (2005)

Gene Ontology Project Website

BINGO Tutorial

by . Also posted on my website