Following gene set enrichment analysis (GSEA), one often would like to explore the expression of genes comprised in enriched pathways. For this purpose, this function builds a pathway database from MSigDB and creates signatures (module scores calculated from UCell or Seurat's respective functions) from features found in a Seurat object and extracted from supplied pathways. It also returns the feature names which can be used to visualize their individual expression, using for example DotPlot_Heatmap
.
Usage
GSEA_Signatures(
seurat_object,
assay = "RNA",
layer = "data",
species = "Homo sapiens",
category = NULL,
subcategory = NULL,
pathways,
min.features = 2,
signatures.names = "name",
method = "UCell",
only.features = FALSE,
fail.safe = 10,
verbose = TRUE,
...
)
Arguments
- seurat_object
A Seurat object.
- assay
Character. The name of an assay containing the
layer
with the expression matrix. If theseurat_object
contains multiple 'RNA' assays, you may specify which one to use (for example, 'RNA2' if you have created a second 'RNA' assay you named 'RNA2'. See Seurat v5 vignettes for more information). You may also use another assay, such as 'SCT', to pull feature expression from.- layer
Character. The name of a layer (formerly known as slot) which stores the expression matrix. If the
seurat_object
contains split layers, the function will always join them before searching features and adding the signatures.- species
Character. The species name to be internally passed to
msigdbr
to build the pathway database. Use msigdbr::msigdbr_species
for the names of available species.- category
Character. The names of one or several categories to be internally passed to
msigdbr
to build the pathway database. Use msigdbr::msigdbr_collections
for the names of available categories (gs_cat column). IfNULL
, all categories will be used.- subcategory
Character. The names of one or several subcategories to be internally passed to
msigdbr
to build the pathway database. Use msigdbr::msigdbr_collections
for the names of available subcategories (gs_subcat column). IfNULL
, all subcategories will be used.- pathways
Character. The names of one or several pathways to be searched in the pathway database and added as signatures. You may provide either a pathway id (for example, 'GO:0006574') or a name matching the pattern found in the gs_name column (uppercase letters and underscores between words). Please note that you may also provide a partial match (for example, 'TYPE_I_INTERFERON') and the function will find all pathways containing this partial pattern. Beware that this may result in a large number of pathways to be added as signatures (using
only.features
=TRUE
is highly recommended) but is very handy to explore all pathways of interest in a particular biological process.- min.features
Numeric. The minimum number of features present in the
seurat_object
for a pathway to be added as a signature.- signatures.names
Character. Either 'id', which will add the ids of the
pathways
as signature names (for example, 'GO:0004657', 'hsa05200' etc), or 'name', which will add the names of thepathways
as signature names (for example, 'GOBP_T_CELL_RECEPTOR_SIGNALING_PATHWAY'). You may also provide custom names to be used as signature names, whose length must match the length ofpathways
supplied. If multiple results are found for a pathway, the function will append a number to the corresponding custom signature name for each result.- method
Character. The method used to calculate the module scores, either 'UCell' or 'Seurat'.
- only.features
Logical. If
TRUE
, the function will not add any signature to theseurat_object
and will only return the feature names from thepathways
found in theseurat_object
and the feature names present in theseurat_object
.- fail.safe
Numeric. The maximum number of signatures the function will attempt to add to the
seurat_object
. If the number of signatures found is higher than this number, the function will not add any signature, and will instead return theseurat_object
as well as the feature names from thepathways
found in theseurat_object
and the feature names present in theseurat_object
. This prevents the function from adding a large number of signatures to theseurat_object
by mistake.- verbose
Logical. If
FALSE
, does not print progress messages and output, but warnings and errors will still be printed.- ...
Additional arguments to be passed to
AddModuleScore_UCell
orAddModuleScore
, such asnbin
ormaxRank
.
Value
A list
containing the seurat_object
with added signatures, all feature names from the pathways
found in the seurat_object
, the feature names present in the seurat_object
and the signature names. If only.features
= TRUE
, the function will instead return a list
containing the feature names from the pathways
found in the seurat_object
and the feature names present in the seurat_object
.