Skip to contents

This function is a wrapper around FindMarkers that allows for parallelization and filtering of mitochondrial, ribosomal and non-coding RNA features in human, as well as filtering of pseudogenes in mouse. It will also directly return the top X markers for each identity.

Usage

Find_Annotation_Markers(
  seurat_object,
  ident.1 = NULL,
  ident.2 = NULL,
  min.pct = 0.25,
  top.markers = 5,
  unique.markers = TRUE,
  filter.mito = TRUE,
  filter.ribo = TRUE,
  filter.ncRNA = TRUE,
  species = "human",
  parallelized = FALSE,
  BPPARAM = NULL,
  name.features = FALSE,
  output.df = FALSE,
  output.list = FALSE,
  verbose = TRUE,
  ...
)

Arguments

seurat_object

A Seurat object.

ident.1

Character. (from FindMarkers documentation) Identity class to define markers for; pass an object of class phylo or 'clustertree' to find markers for a node in a cluster tree; passing 'clustertree' requires BuildClusterTree to have been run. Leave NULL to find markers for all clusters.

ident.2

Character. (from FindMarkers documentation) A second identity class for comparison; if NULL, use all other cells for comparison; if an object of class phylo or 'clustertree' is passed to ident.1, must pass a node to find markers for.

min.pct

Numeric. (from FindMarkers documentation) Only test features that are detected in a minimum fraction of min.pct cells in either of the two populations. Meant to speed up the function by not testing features that are very infrequently expressed.

top.markers

Numeric. The number of top markers to return. If set to Inf, all markers will be returned.

unique.markers

Logical. If TRUE, unique markers will be returned for each identity in order to prevent features repeated multiple times.

filter.mito

Logical. If TRUE, mitochondrial features will be filtered out.

filter.ribo

Logical. If TRUE, ribosomal features will be filtered out.

filter.ncRNA

Logical. If TRUE, non-coding RNA features will be filtered out.

species

Character. The species from which to pull data from to filter out features. If 'human', non-coding RNA features will be filtered out from a dataset named ncRNA_human built from genenames database. If 'mouse', only pseudogenes will be filtered out based on a dataset named pseudogenes_mouse and built from dreamBase2 database. These datasets are loaded with RightOmicsTools and may be checked for more information.

parallelized

Logical. If TRUE, FindMarkers will be parallelized using BiocParallel. Please note that parallelization is complex and depends on your system operating system (Windows users might not see a gain or might even experience a slowdown).

BPPARAM

A BiocParallelParam object to be used for parallelization. If NULL, the function will set this parameter to SerialParam, which uses a single worker (core) and is therefore not parallelized, in order to prevent accidental use of large computation resources. Ignored if parallelized = FALSE.

name.features

Logical. If TRUE, and if output.df = FALSE, each feature will be named with the corresponding cluster identity.

output.df

Logical. If TRUE, a data frame of features names and associated statistics will be returned. If FALSE, a character vector of features names will be returned.

output.list

Logical. If TRUE, a list of data frames for each identity with features names and statistics or a list of character vectors containing features names if output.df = FALSE will be returned.

verbose

Logical. If FALSE, does not print progress messages and output, but warnings and errors will still be printed.

...

Additional arguments to be passed to FindMarkers, such as test.use, or passed to other methods and to specific DE methods.

Value

A data frame or a list of data frames with features names and associated statistics, or a character vector or a list of character vectors with features names.

Examples

# Prepare data
pbmc3k <- Right_Data("pbmc3k")

# Example 1: default parameters and origin of markers
pbmc3k.markers <- Find_Annotation_Markers(pbmc3k,
                                          name.features = TRUE)
#> Finding markers for cluster Naive CD4 T against all other clusters
#> Finding markers for cluster CD14+ Mono against all other clusters
#> Finding markers for cluster Memory CD4 T against all other clusters
#> Finding markers for cluster B against all other clusters
#> Finding markers for cluster CD8 T against all other clusters
#> Finding markers for cluster FCGR3A+ Mono against all other clusters
#> Finding markers for cluster NK against all other clusters
#> Finding markers for cluster DC against all other clusters
#> Finding markers for cluster Platelets against all other clusters
pbmc3k.markers
#>  Naive CD4 T  Naive CD4 T  Naive CD4 T  Naive CD4 T  Naive CD4 T   CD14+ Mono 
#>       "CCR7"       "LEF1"        "MAL"    "PIK3IP1"       "TCF7"      "FOLR3" 
#>   CD14+ Mono   CD14+ Mono   CD14+ Mono   CD14+ Mono Memory CD4 T Memory CD4 T 
#>    "S100A12"     "S100A8"     "S100A9"       "CD14"     "CD40LG"       "AQP3" 
#> Memory CD4 T Memory CD4 T Memory CD4 T            B            B            B 
#>      "SUSD3"        "CD2"      "TRAT1"     "VPREB3"      "CD79A"      "FCRLA" 
#>            B            B        CD8 T        CD8 T        CD8 T        CD8 T 
#>      "TCL1A"      "FCER2"       "GZMK"       "GZMH"       "CD8A"       "CCL5" 
#>        CD8 T FCGR3A+ Mono FCGR3A+ Mono FCGR3A+ Mono FCGR3A+ Mono FCGR3A+ Mono 
#>      "KLRG1"        "CKB"     "CDKN1C"     "MS4A4A"       "HES4"      "BATF3" 
#>           NK           NK           NK           NK           NK           DC 
#>     "AKR1C3"       "GZMB"     "SH2D1B"      "SPON2"     "FGFBP2"   "SERPINF1" 
#>           DC           DC           DC           DC    Platelets    Platelets 
#>     "FCER1A"      "CLIC2"    "CLEC10A"       "ENHO"     "LY6G6F"      "CLDN5" 
#>    Platelets    Platelets    Platelets 
#>        "GP9"     "ITGA2B"      "SEPT5" 

# Example 2: parallelized FindAllMarkers
BPPARAM <- BiocParallel::registered()[[1]]
if (BPPARAM$workers > 4) BPPARAM$workers <- 4
pbmc3k.markers <- Find_Annotation_Markers(pbmc3k,
                                          min.pct = 0.01,
                                          top.markers = Inf,
                                          unique.markers = FALSE,
                                          filter.mito = FALSE,
                                          filter.ribo = FALSE,
                                          filter.ncRNA = FALSE,
                                          parallelized = TRUE,
                                          BPPARAM = BPPARAM,
                                          output.df = TRUE)
#> Finding markers for cluster FCGR3A+ Mono against all other clusters
#> Finding markers for cluster NK against all other clusters
#> Finding markers for cluster DC against all other clusters
#> Finding markers for cluster Platelets against all other clusters
#> Finding markers for cluster Naive CD4 T against all other clusters
#> Finding markers for cluster CD14+ Mono against all other clusters
#> Finding markers for cluster Memory CD4 T against all other clusters
#> Finding markers for cluster B against all other clusters
#> Finding markers for cluster CD8 T against all other clusters
head(pbmc3k.markers)
#>                p_val avg_log2FC pct.1 pct.2    p_val_adj     cluster feature
#> GTSCR1  1.720280e-08   7.163733 0.016 0.000 2.359193e-04 Naive CD4 T  GTSCR1
#> REG4    4.484775e-10   5.902153 0.022 0.000 6.150421e-06 Naive CD4 T    REG4
#> C2orf40 1.024936e-10   5.644114 0.023 0.000 1.405597e-06 Naive CD4 T C2orf40
#> MMP28   1.729283e-07   4.380462 0.016 0.000 2.371539e-03 Naive CD4 T   MMP28
#> NOG     5.025979e-10   4.148065 0.032 0.003 6.892627e-06 Naive CD4 T     NOG
#> FAM153A 8.848417e-08   3.880304 0.020 0.001 1.213472e-03 Naive CD4 T FAM153A