Get the top markers for fast annotation
Source:R/Find_Annotation_Markers.R
Find_Annotation_Markers.Rd
This function is a wrapper around FindMarkers
that allows for parallelization and filtering of mitochondrial, ribosomal and non-coding RNA features in human, as well as filtering of pseudogenes in mouse. It will also directly return the top X markers for each identity.
Usage
Find_Annotation_Markers(
seurat_object,
ident.1 = NULL,
ident.2 = NULL,
min.pct = 0.25,
top.markers = 5,
unique.markers = TRUE,
filter.mito = TRUE,
filter.ribo = TRUE,
filter.ncRNA = TRUE,
species = "human",
parallelized = FALSE,
BPPARAM = NULL,
name.features = FALSE,
output.df = FALSE,
output.list = FALSE,
verbose = TRUE,
...
)
Arguments
- seurat_object
A Seurat object.
- ident.1
Character. (from
FindMarkers
documentation) Identity class to define markers for; pass an object of classphylo
or 'clustertree' to find markers for a node in a cluster tree; passing 'clustertree' requiresBuildClusterTree
to have been run. LeaveNULL
to find markers for all clusters.- ident.2
Character. (from
FindMarkers
documentation) A second identity class for comparison; ifNULL
, use all other cells for comparison; if an object of classphylo
or 'clustertree' is passed toident.1
, must pass a node to find markers for.- min.pct
Numeric. (from
FindMarkers
documentation) Only test features that are detected in a minimum fraction of min.pct cells in either of the two populations. Meant to speed up the function by not testing features that are very infrequently expressed.- top.markers
Numeric. The number of top markers to return. If set to
Inf
, all markers will be returned.- unique.markers
Logical. If
TRUE
, unique markers will be returned for each identity in order to prevent features repeated multiple times.- filter.mito
Logical. If
TRUE
, mitochondrial features will be filtered out.- filter.ribo
Logical. If
TRUE
, ribosomal features will be filtered out.- filter.ncRNA
Logical. If
TRUE
, non-coding RNA features will be filtered out.- species
Character. The species from which to pull data from to filter out features. If 'human', non-coding RNA features will be filtered out from a dataset named ncRNA_human built from genenames database. If 'mouse', only pseudogenes will be filtered out based on a dataset named pseudogenes_mouse and built from dreamBase2 database. These datasets are loaded with RightOmicsTools and may be checked for more information.
- parallelized
Logical. If
TRUE
,FindMarkers
will be parallelized using BiocParallel. Please note that parallelization is complex and depends on your system operating system (Windows users might not see a gain or might even experience a slowdown).- BPPARAM
A
BiocParallelParam
object to be used for parallelization. IfNULL
, the function will set this parameter toSerialParam
, which uses a single worker (core) and is therefore not parallelized, in order to prevent accidental use of large computation resources. Ignored ifparallelized
=FALSE
.- name.features
Logical. If
TRUE
, and ifoutput.df
=FALSE
, each feature will be named with the corresponding cluster identity.- output.df
Logical. If
TRUE
, a data frame of features names and associated statistics will be returned. IfFALSE
, a character vector of features names will be returned.- output.list
Logical. If
TRUE
, a list of data frames for each identity with features names and statistics or a list of character vectors containing features names ifoutput.df
=FALSE
will be returned.- verbose
Logical. If
FALSE
, does not print progress messages and output, but warnings and errors will still be printed.- ...
Additional arguments to be passed to
FindMarkers
, such astest.use
, or passed to other methods and to specific DE methods.
Value
A data frame or a list of data frames with features names and associated statistics, or a character vector or a list of character vectors with features names.
Examples
# Prepare data
pbmc3k <- Right_Data("pbmc3k")
# Example 1: default parameters and origin of markers
pbmc3k.markers <- Find_Annotation_Markers(pbmc3k,
name.features = TRUE)
#> Finding markers for cluster Naive CD4 T against all other clusters
#> Finding markers for cluster CD14+ Mono against all other clusters
#> Finding markers for cluster Memory CD4 T against all other clusters
#> Finding markers for cluster B against all other clusters
#> Finding markers for cluster CD8 T against all other clusters
#> Finding markers for cluster FCGR3A+ Mono against all other clusters
#> Finding markers for cluster NK against all other clusters
#> Finding markers for cluster DC against all other clusters
#> Finding markers for cluster Platelets against all other clusters
pbmc3k.markers
#> Naive CD4 T Naive CD4 T Naive CD4 T Naive CD4 T Naive CD4 T CD14+ Mono
#> "CCR7" "LEF1" "MAL" "PIK3IP1" "TCF7" "FOLR3"
#> CD14+ Mono CD14+ Mono CD14+ Mono CD14+ Mono Memory CD4 T Memory CD4 T
#> "S100A12" "S100A8" "S100A9" "CD14" "CD40LG" "AQP3"
#> Memory CD4 T Memory CD4 T Memory CD4 T B B B
#> "SUSD3" "CD2" "TRAT1" "VPREB3" "CD79A" "FCRLA"
#> B B CD8 T CD8 T CD8 T CD8 T
#> "TCL1A" "FCER2" "GZMK" "GZMH" "CD8A" "CCL5"
#> CD8 T FCGR3A+ Mono FCGR3A+ Mono FCGR3A+ Mono FCGR3A+ Mono FCGR3A+ Mono
#> "KLRG1" "CKB" "CDKN1C" "MS4A4A" "HES4" "BATF3"
#> NK NK NK NK NK DC
#> "AKR1C3" "GZMB" "SH2D1B" "SPON2" "FGFBP2" "SERPINF1"
#> DC DC DC DC Platelets Platelets
#> "FCER1A" "CLIC2" "CLEC10A" "ENHO" "LY6G6F" "CLDN5"
#> Platelets Platelets Platelets
#> "GP9" "ITGA2B" "SEPT5"
# Example 2: parallelized FindAllMarkers
BPPARAM <- BiocParallel::registered()[[1]]
if (BPPARAM$workers > 4) BPPARAM$workers <- 4
pbmc3k.markers <- Find_Annotation_Markers(pbmc3k,
min.pct = 0.01,
top.markers = Inf,
unique.markers = FALSE,
filter.mito = FALSE,
filter.ribo = FALSE,
filter.ncRNA = FALSE,
parallelized = TRUE,
BPPARAM = BPPARAM,
output.df = TRUE)
#> Finding markers for cluster FCGR3A+ Mono against all other clusters
#> Finding markers for cluster NK against all other clusters
#> Finding markers for cluster DC against all other clusters
#> Finding markers for cluster Platelets against all other clusters
#> Finding markers for cluster Naive CD4 T against all other clusters
#> Finding markers for cluster CD14+ Mono against all other clusters
#> Finding markers for cluster Memory CD4 T against all other clusters
#> Finding markers for cluster B against all other clusters
#> Finding markers for cluster CD8 T against all other clusters
head(pbmc3k.markers)
#> p_val avg_log2FC pct.1 pct.2 p_val_adj cluster feature
#> GTSCR1 1.720280e-08 7.163733 0.016 0.000 2.359193e-04 Naive CD4 T GTSCR1
#> REG4 4.484775e-10 5.902153 0.022 0.000 6.150421e-06 Naive CD4 T REG4
#> C2orf40 1.024936e-10 5.644114 0.023 0.000 1.405597e-06 Naive CD4 T C2orf40
#> MMP28 1.729283e-07 4.380462 0.016 0.000 2.371539e-03 Naive CD4 T MMP28
#> NOG 5.025979e-10 4.148065 0.032 0.003 6.892627e-06 Naive CD4 T NOG
#> FAM153A 8.848417e-08 3.880304 0.020 0.001 1.213472e-03 Naive CD4 T FAM153A