Builder of CIBERSORTx Reference Matrix
Source:R/Reference_Matrix_Builder.R
Reference_Matrix_Builder.Rd
This function builds a Reference Matrix for CIBERSORTx from a Seurat object.
Usage
Reference_Matrix_Builder(
seurat_object,
assay = "RNA",
layer = "counts",
ident.1 = NULL,
ident.2 = NULL,
double.ident = TRUE,
double.ident.sep = "_",
reverse.double.ident = FALSE,
subset.ident.1 = NULL,
subset.ident.1.invert = FALSE,
subset.ident.2 = NULL,
subset.ident.2.invert = FALSE,
downsample.object.first = NULL,
downsample.object.last = NULL,
downsample.cluster = NULL,
automatic.downsample = FALSE,
check.size = FALSE,
max.matrix.size = 900,
cell.barcodes = FALSE,
file.name = "Reference_Matrix",
file.format = "txt",
file.sep = "\t",
write.path = NULL,
write = TRUE,
verbose = TRUE
)
Arguments
- seurat_object
A Seurat object.
- assay
Character. The name of an assay containing the
layer
with the expression matrix. If theseurat_object
contains multiple 'RNA' assays, you may specify which one to use (for example 'RNA2' if you have created a second 'RNA' assay you named 'RNA2'. See Seurat v5 vignettes for more information). You may also use another assay, such as 'SCT', if you want to extract the expression matrix for projects other than CIBERSORTx.- layer
Character. The name of a layer (formerly known as slot) which stores the expression matrix. Must be 'counts' layer for CIBERSORTx. You may also specify another layer such as 'data' (normalized counts) or any other present in the specified
assay
if you want to extract the expression matrix for projects other than CIBERSORTx. If you have split layers the function will always join them before extracting the expression matrix.- ident.1
Character. The name of a metadata (for example, 'orig.ident', 'seurat_clusters', etc) from which to extract identities as column names in the Reference Matrix. If
NULL
, uses the active.ident metadata.- ident.2
Character. The name of a metadata from which to split
ident.1
identities by (for example ifident.1
is 'seurat_clusters' andident.2
is 'orig.ident', identity 0 will be separated into several 0_identities, each corresponding to an orig.ident: 0_sample1, 0_sample2 etc).- double.ident
Logical. If
TRUE
, column names of the Reference Matrix will be set with dual identities (ident.1_ident.2). IfFALSE
, column names will be set asident.1
identities. Ignored ifident.2
=NULL
.- double.ident.sep
Character. The separator to use between the identity names of
ident.1
andident.2
. Ignored ifident.2
=NULL
ordouble.ident
=FALSE
.- reverse.double.ident
Logical. If
TRUE
, the identity names ofident.1
andident.2
will be reversed (ident.2_ident.1). Ignored ifident.2
=NULL
ordouble.ident
=FALSE
.- subset.ident.1
Character. The names of one or several
ident.1
identities to select. IfNULL
, all identities are used.- subset.ident.1.invert
Logical. If
TRUE
, inverts the selection fromsubset.ident.1
(for example, ifsubset.ident.1
= c('0','1','2') andsubset.ident.1.invert
=TRUE
, all identities except 0, 1 and 2 will be kept). Ignored ifsubset.ident.1
=NULL
.- subset.ident.2
Character. The names of one or several
ident.2
identities to select. IfNULL
, all identities are used. Ignored ifident.2
=NULL
.- subset.ident.2.invert
Logical. If
TRUE
, inverts the selection fromsubset.ident.2
(for example, ifsubset.ident.2
= c('patient1','patient4') andsubset.ident.2.invert
=TRUE
, all identities except patient 1 and patient 4 will be kept). Ignored ifsubset.ident.2
=NULL
orident.2
=NULL
.- downsample.object.first
Numeric. The number of cells to downsample from the entire
seurat_object
(not from each identity) before subsetting withsubset.ident.1
and/orsubset.ident.2
. IfNULL
, all cells are used.- downsample.object.last
Numeric. The number of cells to downsample from the entire
seurat_object
after subsetting withsubset.ident.1
and/orsubset.ident.2
. IfNULL
, all cells are used.- downsample.cluster
Numeric. The number of cells to downsample from each
ident.1
identity. Will be performed afterdownsample.object.last
. IfNULL
, all cells are used.- automatic.downsample
Logical. If
TRUE
, automatically downsamples theseurat_object
so that the Reference Matrix file written to disk would be just under themax.matrix.size
limit (empirically estimated). Performed last. Ignored ifcheck.size
=TRUE
orwrite
=FALSE
. Please report an issue if you see a significant difference between the file size written to disk andmax.matrix.size
(for example,max.matrix.size
is set to 200 MB but the file is 400 MB or 100 MB).- check.size
Logical. If
TRUE
, prints the estimated size of the Reference Matrix file that would be written to disk and the number of cells to downsample if need be.- max.matrix.size
Numeric. The maximum size of the Reference Matrix file written to disk in MB. Will stop the function if the Reference Matrix file written to disk is estimated to be over this limit, or if
automatic.downsample
=TRUE
, will downsample theseurat_object
instead so that the Reference Matrix written to disk is under the size limit. Ignored ifwrite
=FALSE
.- cell.barcodes
Logical. Must be
FALSE
for CIBERSORTx. IfTRUE
, keeps the cell barcodes and does not rename with cell identities, if you want to extract the expression matrix for projects other than CIBERSORTx.- file.name
Character. The name of the Reference Matrix file written to disk. Must not contain any space for CIBERSORTx, the function will automatically replace any space with an underscore. Ignored if
check.size
=TRUE
orwrite
=FALSE
.- file.format
Character. The format of the Reference Matrix file written to disk. Must be 'txt' or 'tsv' for CIBERSORTx, but you may also specify 'csv' for example, if you want to extract the expression matrix for projects other than CIBERSORTx. Accepts any format the
fwrite
function would accept. Ignored ifcheck.size
=TRUE
orwrite
=FALSE
.- file.sep
Character. The separator to use in the Reference Matrix file written to disk. Must be tabulation for CIBERSORTx, but you may also specify a comma for example, if you want to extract the expression matrix for projects other than CIBERSORTx. Accepts any separator the
fwrite
function would accept. Ignored ifcheck.size
=TRUE
orwrite
=FALSE
.- write.path
Character. The path to write the Reference Matrix into. If
NULL
, uses current working directory. Ignored ifcheck.size
=TRUE
orwrite
=FALSE
.- write
Logical. If
TRUE
, writes to disk the Reference Matrix file. Ignored ifcheck.size
=TRUE
.- verbose
Logical. If
FALSE
, does not print progress messages and output, but warnings and errors will still be printed.
Value
A data.table
object containing the raw counts from the seurat_object
or any other specified assay
and layer
, with cell identities or barcodes as column names and feature names as first column. If write
= TRUE
, the data.table
object is also written to disk. If check.size
= TRUE
, will instead print the estimated size of the Reference Matrix file that would be written to disk and the number of cells to downsample if need be.
Examples
# Prepare data
pbmc3k <- Right_Data("pbmc3k")
# Example 1:
refmat <- Reference_Matrix_Builder(pbmc3k,
ident.1 = "seurat_annotations",
ident.2 = "orig.ident",
downsample.object.first = 1500,
subset.ident.1 = c("Naive CD4 T",
"Memory CD4 T",
"CD8 T",
"NK"),
subset.ident.2 = "Donor_2",
subset.ident.2.invert = TRUE,
write = FALSE)
#> Starting...
#> Downsampling...
#> Subsetting...
#> Subsetting...
#> Extracting the expression matrix...
#> Building the data.table...
#> Cleaning...
#> Done.
refmat[1:5, 1:5]
#> Gene CD8 T_Donor_1 CD8 T_Donor_1 CD8 T_Donor_3 Naive CD4 T_Donor_3
#> <char> <num> <num> <num> <num>
#> 1: AL627309.1 0 0 0 0
#> 2: AP006222.2 0 0 0 0
#> 3: RP11-206L10.2 0 0 0 0
#> 4: RP11-206L10.9 0 0 0 0
#> 5: LINC00115 0 0 0 0
# Example 2:
Reference_Matrix_Builder(pbmc3k,
check.size = TRUE,
max.matrix.size = 40)
#> Starting...
#> Extracting the expression matrix...
#> Current estimated Reference Matrix size on CIBERSORTx web portal is between 71 and 74 MB :
#> Matrix of 2700 cells by 13714 features
#> You may want to downsample your Seurat object to 1429 cells for a Reference Matrix under 40 MB