Title: | Enrichment Analysis Using Multiple Ontologies and False Discovery Rate |
---|---|
Description: | Background - Traditional gene set enrichment analyses are typically limited to a few ontologies and do not account for the interdependence of gene sets or terms, resulting in overcorrected p-values. To address these challenges, we introduce mulea, an R package offering comprehensive overrepresentation and functional enrichment analysis. Results - mulea employs a progressive empirical false discovery rate (eFDR) method, specifically designed for interconnected biological data, to accurately identify significant terms within diverse ontologies. mulea expands beyond traditional tools by incorporating a wide range of ontologies, encompassing Gene Ontology, pathways, regulatory elements, genomic locations, and protein domains. This flexibility enables researchers to tailor enrichment analysis to their specific questions, such as identifying enriched transcriptional regulators in gene expression data or overrepresented protein domains in protein sets. To facilitate seamless analysis, mulea provides gene sets (in standardised GMT format) for 27 model organisms, covering 22 ontology types from 16 databases and various identifiers resulting in almost 900 files. Additionally, the muleaData ExperimentData Bioconductor package simplifies access to these pre-defined ontologies. Finally, mulea's architecture allows for easy integration of user-defined ontologies, or GMT files from external sources (e.g., MSigDB or Enrichr), expanding its applicability across diverse research areas. Conclusions - mulea is distributed as a CRAN R package. It offers researchers a powerful and flexible toolkit for functional enrichment analysis, addressing limitations of traditional tools with its progressive eFDR and by supporting a variety of ontologies. Overall, mulea fosters the exploration of diverse biological questions across various model organisms. |
Authors: | Cezary Turek [aut] , Marton Olbei [aut] , Tamas Stirling [aut, cre] , Gergely Fekete [aut] , Ervin Tasnadi [aut] , Leila Gul [aut], Balazs Bohar [aut] , Balazs Papp [aut] , Wiktor Jurkowski [aut] , Eszter Ari [aut, cph] |
Maintainer: | Tamas Stirling <[email protected]> |
License: | GPL-2 |
Version: | 1.1.1 |
Built: | 2024-11-19 09:32:51 UTC |
Source: | https://github.com/eltebioinformatics/mulea |
Filtering ontology to contain entries having number of elements (genes or proteins) between a given range. The reason for this is enrichment analysis results can sometimes be skewed by overly specific or broad entries. Filtering ontologies allows you to customize the size of ontology entries, ensuring your analysis aligns with your desired scope.
filter_ontology(gmt, min_nr_of_elements = NULL, max_nr_of_elements = NULL)
filter_ontology(gmt, min_nr_of_elements = NULL, max_nr_of_elements = NULL)
gmt |
A |
min_nr_of_elements |
Minimum number of elements. Ontology entries containing as many or fewer elements (genes or proteins) will be excluded. |
max_nr_of_elements |
Maximum number of elements. Ontology entries containing as many or more elements (genes or proteins) will be excluded. |
Return a data.frame
which contains the entries (gene or protein
sets) in a similar format that produced by the read_gmt
function.
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400)
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400)
An S4 class to represent the gsea tests in mulea.
## S4 method for signature 'gsea' run_test(model)
## S4 method for signature 'gsea' run_test(model)
model |
Object of S4 class representing the mulea test. |
GSEA object. This object represents the result of the
gsea
tests.
run_test method for GSEA object. Returns results of the enrichment analysis.
run_test(gsea)
: runs test calculations.
gmt
A data.frame
representing the ontology GMT.
element_names
A vector of elements names (gene or protein names or identifiers) to include in the analysis.
element_scores
A vector of numeric values representing a score (e.g. p-value, z-score, log fold change) for each 'element_name', in the same number and order as element_name.
gsea_power
A power of weight. Default value is 1.
element_score_type
Defines the GSEA score type.
'pos': Only positive element_scores
'neg': Only negative element_scores
'std': standard, containing both positive and negative scores Default value is 'std'.
number_of_permutations
The number of permutations used in
gsea
test. Default value is 1000.
test
character
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example `data.frame` scored_gene_tab <- read.delim(file = system.file(package = "mulea", "extdata", "ordered_set.tsv")) # creating the GSEA model gsea_model <- gsea(gmt = tf_gmt_filtered, # the names of elements to test element_names = scored_gene_tab$Gene.symbol, # the logFC-s of elements to test element_scores = scored_gene_tab$logFC, # consider elements having positive logFC values only element_score_type = "pos", # the number of permutations number_of_permutations = 10000) library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example `data.frame` scored_gene_tab <- read.delim(file = system.file(package = "mulea", "extdata", "ordered_set.tsv")) # creating the GSEA model gsea_model <- gsea(gmt = tf_gmt_filtered, # the names of elements to test element_names = scored_gene_tab$Gene.symbol, # the logFC-s of elements to test element_scores = scored_gene_tab$logFC, # consider elements having positive logFC values only element_score_type = "pos", # the number of permutations number_of_permutations = 10000) # running the test gsea_results <- run_test(gsea_model)
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example `data.frame` scored_gene_tab <- read.delim(file = system.file(package = "mulea", "extdata", "ordered_set.tsv")) # creating the GSEA model gsea_model <- gsea(gmt = tf_gmt_filtered, # the names of elements to test element_names = scored_gene_tab$Gene.symbol, # the logFC-s of elements to test element_scores = scored_gene_tab$logFC, # consider elements having positive logFC values only element_score_type = "pos", # the number of permutations number_of_permutations = 10000) library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example `data.frame` scored_gene_tab <- read.delim(file = system.file(package = "mulea", "extdata", "ordered_set.tsv")) # creating the GSEA model gsea_model <- gsea(gmt = tf_gmt_filtered, # the names of elements to test element_names = scored_gene_tab$Gene.symbol, # the logFC-s of elements to test element_scores = scored_gene_tab$logFC, # consider elements having positive logFC values only element_score_type = "pos", # the number of permutations number_of_permutations = 10000) # running the test gsea_results <- run_test(gsea_model)
data.frame
.Converts a list of ontology elements (gene sets) to an ontology
(GMT) data.frame
object.
list_to_gmt(gmt_list)
list_to_gmt(gmt_list)
gmt_list |
A list with named character vectors. The name will become the
'ontology_id', and the elements in the vector will become the
'list_of_values' in the ontology (GMT) |
Returns ontology (GMT) data.frame
where the 'ontology_name'
contains random unique strings.
library(mulea) # creating a list of gene sets ontology_list <- list(gene_set1 = c("gene1", "gene2", "gene3"), gene_set2 = c("gene4", "gene5", "gene6")) # converting the list to a ontology (GMT) object new_ontology_object <- list_to_gmt(ontology_list)
library(mulea) # creating a list of gene sets ontology_list <- list(gene_set1 = c("gene1", "gene2", "gene3"), gene_set2 = c("gene4", "gene5", "gene6")) # converting the list to a ontology (GMT) object new_ontology_object <- list_to_gmt(ontology_list)
PRIVATE class : An S4 class to represent a Hypergeometric tests in mulea.
## S4 method for signature 'MuleaHypergeometricTest' run_test(model)
## S4 method for signature 'MuleaHypergeometricTest' run_test(model)
model |
Object of s4 class represents mulea Test. |
MuleaHypergeometricTest object. Used as private function.
run_test method for MuleaHypergeometricTest object. Used as private function.
run_test(MuleaHypergeometricTest)
: runs test calculations.
gmt
A data.frame representing the GMT model.
element_names
Data to be analysed across the model.
background_element_names
Background data used for the test.
An S4 class to represent a set based tests in mulea.
ora object. This object represents the result of the overrepresentation test in mulea.
method
The overrepresentation (ora) method. Possible values: "Hypergeometric", "SetBasedEnrichment".
gmt
A data.frame
representing the ontology GMT.
element_names
A vector of elements names (gene or protein names or identifiers) representing the target set to analyse. For example differentially expressed genes.
background_element_names
A vector of elements names (gene or protein names or identifiers) representing all the elements involved in the previous analyses For example all genes that were measured in differential expression analysis.
p_value_adjustment_method
A character string representing the type of the p-value adjustment method. Possible values:
'eFDR': empirical false discovery rate correction method
all method
options from stats::p.adjust
documentation.
number_of_permutations
A numeric value representing the number of permutations used to calculate the eFDR values. Default value is 10000.
nthreads
Number of processor's threads to use in calculations.
random_seed
Optional natural number (1, 2, 3, ...) setting the seed for the random generator, to make the results reproducible.
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file(package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model)
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file(package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model)
Plots barplot of p-values.
plot_barplot( reshaped_results, ontology_id_colname = "ontology_id", selected_rows_to_plot = NULL, p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
plot_barplot( reshaped_results, ontology_id_colname = "ontology_id", selected_rows_to_plot = NULL, p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
reshaped_results |
'data.table' in relaxed form, obtained as the
output of the |
ontology_id_colname |
Character, specifies the column name that contains ontology IDs in the input data. |
selected_rows_to_plot |
A numeric vector specifying which rows of the reshaped results 'data.frame' should be included in the plot. Default is 'NULL'. |
p_value_type_colname |
Character, specifies the column name for p-values in the input data. Default is 'eFDR'. |
p_value_max_threshold |
Numeric, representing the maximum p-value threshold for filtering data. Default is 0.05. |
Create a customized barplot of p-values, facilitating visual exploration and analysis of statistical significance within ontology categories.
Returns a barplot.
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results(model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot barplot plot_barplot(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # upper threshold for the value indicating the significance p_value_max_threshold = 0.05, # column that indicates the significance values p_value_type_colname = "eFDR")
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results(model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot barplot plot_barplot(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # upper threshold for the value indicating the significance p_value_max_threshold = 0.05, # column that indicates the significance values p_value_type_colname = "eFDR")
Plots graph representation of enrichment results.
plot_graph( reshaped_results, ontology_id_colname = "ontology_id", ontology_element_colname = "element_id_in_ontology", shared_elements_min_threshold = 0, p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
plot_graph( reshaped_results, ontology_id_colname = "ontology_id", ontology_element_colname = "element_id_in_ontology", shared_elements_min_threshold = 0, p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
reshaped_results |
Character, the input |
ontology_id_colname |
Character, the name of the column in the reshaped results that contains ontology identifiers or names. Default value is 'ontology_id'. |
ontology_element_colname |
Character, the name of the column in the reshaped results that contains element identifiers within the ontology. Default value is 'element_id_in_ontology'. |
shared_elements_min_threshold |
Numeric, threshold specifying the minimum number of shared elements required between two ontologies to consider them connected by an edge on the graph. Default value is 0. |
p_value_type_colname |
Character, the name of the column in the reshaped results that contains the type of p-values associated with the ontology elements. Default value is 'eFDR'. |
p_value_max_threshold |
Numeric, a threshold value for filtering rows in the reshaped results based on the p-values. Rows with p-values greater than this threshold will be filtered out. Default value is 0.05. |
This function generates a graph (network) visualization of the enriched ontology entries. On the plot each node represents an ontology entry below a given p-value threshold, and is coloured based on its significance level. A connection (edge) is drawn between two nodes if they share at least one common element (gene) belonging to the target set – in the case of ORA results – or all analysed elements – in the case of GSEA results.
Returns a graph plot.
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results(model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot graph plot_graph(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # upper threshold for the value indicating the significance p_value_max_threshold = 0.05, # column that indicates the significance values p_value_type_colname = "eFDR")
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results(model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot graph plot_graph(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # upper threshold for the value indicating the significance p_value_max_threshold = 0.05, # column that indicates the significance values p_value_type_colname = "eFDR")
Plots heatmap of enriched terms and obtained p-values.
plot_heatmap( reshaped_results, ontology_id_colname = "ontology_id", ontology_element_colname = "element_id_in_ontology", p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
plot_heatmap( reshaped_results, ontology_id_colname = "ontology_id", ontology_element_colname = "element_id_in_ontology", p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
reshaped_results |
data.table in relaxed form, obtained as the output
of the |
ontology_id_colname |
Character, specifies the column name that contains ontology IDs in the input data. |
ontology_element_colname |
Character, specifying the column name that contains ontology elements or terms in the input data. Default: 'element_id_in_ontology'. |
p_value_type_colname |
Character, specifies the column name for p-values in the input data. Default is 'eFDR'. |
p_value_max_threshold |
Numeric, representing the maximum p-value threshold for filtering data. Default is 0.05. |
The plot_heatmap
function provides a convenient way to create a ggplot2
heatmap illustrating the significance of enriched terms within ontology
categories based on their associated p-values.
Returns a ggplot2 heatmap.
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results( model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot heatmap plot_heatmap(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # column that indicates the significance values p_value_type_colname = "eFDR")
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results( model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot heatmap plot_heatmap(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # column that indicates the significance values p_value_type_colname = "eFDR")
Plots lollipop plot of p-values.
plot_lollipop( reshaped_results, ontology_id_colname = "ontology_id", selected_rows_to_plot = NULL, p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
plot_lollipop( reshaped_results, ontology_id_colname = "ontology_id", selected_rows_to_plot = NULL, p_value_type_colname = "eFDR", p_value_max_threshold = 0.05 )
reshaped_results |
data.table in relaxed form, obtained as the output
of the |
ontology_id_colname |
Character, specifies the column name that contains ontology IDs in the input data. |
selected_rows_to_plot |
A numeric vector specifying which rows of the reshaped results data frame should be included in the plot. Default is NULL. frame should be included in the plot? |
p_value_type_colname |
Character, specifies the column name for p-values in the input data. Default is 'eFDR'. |
p_value_max_threshold |
Numeric, representing the maximum p-value threshold for filtering data. Default is 0.05. |
Create a customized lollipop plot of p-values, facilitating visual exploration and analysis of statistical significance within ontology categories.
Returns a lollipop plot
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results( model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot lollipop plot_lollipop(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # upper threshold for the value indicating the significance p_value_max_threshold = 0.05, # column that indicates the significance values p_value_type_colname = "eFDR")
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file( package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results( model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR") # Plot lollipop plot_lollipop(reshaped_results = ora_reshaped_results, # the column containing the names we wish to plot ontology_id_colname = "ontology_id", # upper threshold for the value indicating the significance p_value_max_threshold = 0.05, # column that indicates the significance values p_value_type_colname = "eFDR")
Reads gene set or ontology data from a Gene Matrix Transposed
(GMT) file and parse into a data.frame
.
read_gmt(file)
read_gmt(file)
file |
Character, a path to a file. |
Returns a data.frame
with three columns:
'ontology_id': Ontology identifier that uniquely identifies the element within the referenced ontology.
'ontology_name': Ontology name or description that provides a user-friendly label or textual description for the 'ontology_id'.
'list_of_values': Associated genes or proteins that is a vector of identifiers of genes or proteins belonging to the 'ontology_id'.
# import example gene set library(mulea) tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt"))
# import example gene set library(mulea) tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt"))
This function takes a model and model_results data, reshapes them into a suitable format for plotting, and returns the resulting data frame, which can be used for further analysis or visualization.
reshape_results( model = NULL, model_results = NULL, model_ontology_col_name = "ontology_id", ontology_id_colname = "ontology_id", p_value_type_colname = "eFDR", p_value_max_threshold = TRUE )
reshape_results( model = NULL, model_results = NULL, model_ontology_col_name = "ontology_id", ontology_id_colname = "ontology_id", p_value_type_colname = "eFDR", p_value_max_threshold = TRUE )
model |
a mulea model, created by the
|
model_results |
Result |
model_ontology_col_name |
Character, specifies the column name in the model that contains ontology IDs. It defines which column in the model should be used for matching ontology IDs. Possible values are 'ontology_id' and 'ontology_name'. The default value is 'ontology_id'. |
ontology_id_colname |
Character, specifies the column name for ontology IDs in the model results. It indicates which column in the model results contains ontology IDs for merging. Possible values are 'ontology_id' and 'ontology_name'. The default value is 'ontology_id'. |
p_value_type_colname |
Character, specifies the column name
for the type or raw or adjusted p-value in the result
|
p_value_max_threshold |
Logical, indicating whether to apply a p-value threshold when filtering the resulting data. If TRUE, the function filters the data based on a p-value threshold. |
Return detailed and relaxed data.table
where model and results are
merged for plotting purposes.
plot_graph
, plot_barplot
,
plot_heatmap
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file( package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines( system.file(package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results(model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR")
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file(package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file( package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines( system.file(package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) # reshaping results for visualisation ora_reshaped_results <- reshape_results(model = ora_model, model_results = ora_results, # choosing which column to use for the indication of significance p_value_type_colname = "eFDR")
This is a generic function that chooses an enrichment analysis procedure based on the model class and runs the analysis.
run_test(model) ## S4 method for signature 'ora' run_test(model)
run_test(model) ## S4 method for signature 'ora' run_test(model)
model |
Object of S4 class representing the mulea test. |
The function requires the definition of a model. Models currently implemented in mulea include Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA). These models must be defined through their specific functions which are provided in this package.
Results in form of data.frame
. Structure of data.frame
depends on
object processed by this generic method.
In the case of run_test
was used with the model generated
by the ora
function the returned
data.frame
contains the following columns:
'ontology_id': Identifiers of the ontology elements.
'ontology_name': Names of the ontology elements.
'nr_common_with_tested_elements': Number of common elements between the
ontology element and the vector defined by the element_names parameter
of the ora
function.
'nr_common_with_background_elements': Number of common elements between
the ontology element and the vector defined by the
background_element_names parameter of the ora
function.
'p_value': The raw p-value of the overrepresentation analysis.
The adjusted p-value.
The column named based on the
p_value_adjustment_method parameter of the
ora
function, e.g. 'eFDR'
In the case of run_test
was used with the model
generated by the gsea
function the returned
data.frame
contains the following columns:
'ontology_id': Identifiers of the ontology elements.
'ontology_name': Names of the ontology elements.
'nr_common_with_tested_elements': Number of common elements between the
ontology element and the vector defined by the element_names parameter
of the gsea
function.
'p_value': The raw p-value of the gene set enrichment analysis.
'adjusted_p_value': The adjusted p-value.
run_test method for ora object. Returns the results of the overrepresentation analysis.
run_test(ora)
: ora test.
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file(package="mulea", "extdata", " background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file(package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model)
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file(package="mulea", "extdata", " background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model) library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) tf_gmt_filtered <- filter_ontology(gmt = tf_gmt, min_nr_of_elements = 3, max_nr_of_elements = 400) # loading the example data sign_genes <- readLines(system.file(package = "mulea", "extdata", "target_set.txt")) background_genes <- readLines(system.file(package="mulea", "extdata", "background_set.txt")) # creating the ORA model ora_model <- ora(gmt = tf_gmt_filtered, # the test set variable element_names = sign_genes, # the background set variable background_element_names = background_genes, # the p-value adjustment method p_value_adjustment_method = "eFDR", # the number of permutations number_of_permutations = 10000, # the number of processor threads to use nthreads = 2) # running the ORA ora_results <- run_test(ora_model)
PRIVATE class : An S4 class to represent a Hypergeometric tests in mulea.
## S4 method for signature 'SetBasedEnrichmentTest' run_test(model)
## S4 method for signature 'SetBasedEnrichmentTest' run_test(model)
model |
Object of s4 class represents mulea Test. |
SetBasedEnrichmentTest object. Used as private function.
run_test method for SetBasedEnrichmentTest object. Used as private function.
run_test(SetBasedEnrichmentTest)
: runs test calculations.
gmt
A data.frame representing GMT's representation of model.
element_names
A data from experiment to analyse across model.
pool
A background data to count test.
nthreads
Number of processor's threads used in calculations.
random_seed
Setup seed for random generator.
PRIVATE class : An S4 class to represent a ranked based tests in mulea.
## S4 method for signature 'SubramanianTest' run_test(model)
## S4 method for signature 'SubramanianTest' run_test(model)
model |
Object of s4 class represents mulea Test. |
data.frame
with presented columns 'ontology_id', 'ontology_name',
'nr_common_with_tested_elements',
'p_value', 'adjusted_p_value'
run_test method for SubramanianTest object. Used as private function.
run_test(SubramanianTest)
: runs test calculations.
gmt
A data.frame
representing the ontology GMT.
element_names
A vector of elements names (gene or protein names or identifiers) to include in the analysis.
element_scores
A vector of numeric values representing a score (e.g. p-value, z-score, log fold change) for each 'element_name', in the same number and order as element_name.
p
A power of weight.
element_score_type
Defines the GSEA score type.
"pos": Only positive element_scores
"neg": only negative element_scores - "neg" and mixed
"std": standard – containing both positive and negative scores Default value is "std".
Writes gene set or ontology data.frame
with specific
formatting (columns representing ontology identifiers, descriptions, and
associated lists of values) and writes it to a file in a standardized Gene
Matrix Transposed (GMT) file format.
write_gmt(gmt, file)
write_gmt(gmt, file)
gmt |
A |
file |
Character, a path to a file. |
Returns the input as a GMT file at a specific location.
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) # writing the filtered ontology to a GMT file write_gmt( gmt = tf_gmt, file = "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")
library(mulea) # loading and filtering the example ontology from a GMT file tf_gmt <- read_gmt(file = system.file( package="mulea", "extdata", "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")) # writing the filtered ontology to a GMT file write_gmt( gmt = tf_gmt, file = "Transcription_factor_RegulonDB_Escherichia_coli_GeneSymbol.gmt")