| Title: | Cell DiffErential Expression by Pooling ('CellDEEP') |
|---|---|
| Description: | Pool cells together before running differentially expression (DE) analysis. Tell 'CellDEEP' how many cells you want to pool together (which shall be determined by the overall cell number of data), then run DE analysis. Cheng et al. (2026) <doi:10.64898/2026.03.09.710522>. |
| Authors: | Yiyi Cheng [aut, cre] (ORCID: <https://orcid.org/0009-0005-3329-6842>) |
| Maintainer: | Yiyi Cheng <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.1 |
| Built: | 2026-05-29 10:45:43 UTC |
| Source: | https://github.com/cran/CellDEEP |
Pools cells into "pseudocells" by applying k-means clustering to PCA embeddings. This reduces data sparsity while maintaining the biological grouping of sample, cluster, and condition.
CellDEEP.Kmean( dataset, n_cells = 10, nstart = 100, assay_name = "RNA", readcounts = "mean", min_cells_per_subgroup = 25 )CellDEEP.Kmean( dataset, n_cells = 10, nstart = 100, assay_name = "RNA", readcounts = "mean", min_cells_per_subgroup = 25 )
dataset |
A Seurat object. Must have PCA reductions calculated. |
n_cells |
Integer. Target number of cells to pool into each pseudocell. |
nstart |
Integer. Number of random sets to start with in |
assay_name |
Character. The assay to pull counts from (default "RNA"). |
readcounts |
Character. Aggregation method: "mean" (rounded average), "sum", "10X" (mean * 10). |
min_cells_per_subgroup |
Integer. Minimum cells required in each sample-cluster subgroup to perform pooling (default 25). |
A new Seurat object where each "cell" is a pooled group of original cells.
This function requires that PCA has already been run on the input dataset,
as it uses the "pca" reduction for clustering.
data("sim") pool_input <- prepare_data( sim, sample_id = "DonorID", group_id = "Status", cluster_id = "cluster_id" ) pooled_kmean <- CellDEEP.Kmean( pool_input, readcounts = "sum", n_cells = 3, min_cells_per_subgroup = 1, assay_name = "RNA" ) pooled_kmeandata("sim") pool_input <- prepare_data( sim, sample_id = "DonorID", group_id = "Status", cluster_id = "cluster_id" ) pooled_kmean <- CellDEEP.Kmean( pool_input, readcounts = "sum", n_cells = 3, min_cells_per_subgroup = 1, assay_name = "RNA" ) pooled_kmean
Pools cells into pseudocells by random selection within biological groups. Includes a minimum threshold filter of 25 cells per subgroup to ensure pooling quality.
CellDEEP.Random( dataset, n_cells = 10, assay_name = "RNA", min_cells_per_subgroup = 25, readcounts = "mean" )CellDEEP.Random( dataset, n_cells = 10, assay_name = "RNA", min_cells_per_subgroup = 25, readcounts = "mean" )
dataset |
A Seurat object. |
n_cells |
Integer. The number of cells to pool into each pseudocell. |
assay_name |
Character. The assay to use for counts (default "RNA"). |
min_cells_per_subgroup |
Integer. Minimum cells required in each sample-cluster subgroup to perform pooling (default 25). |
readcounts |
Character. Method to aggregate counts: "sum" or "mean". |
A new Seurat object containing the aggregated pseudocells.
Subgroups (sample-cluster combinations) with fewer than 25 cells are automatically skipped. The function also generates a DimPlot to visualize the random pooling across samples.
data("sim") pool_input <- prepare_data( sim, sample_id = "DonorID", group_id = "Status", cluster_id = "cluster_id" ) pooled_random <- CellDEEP.Random( pool_input, readcounts = "sum", n_cells = 3, min_cells_per_subgroup = 1, assay_name = "RNA" ) pooled_randomdata("sim") pool_input <- prepare_data( sim, sample_id = "DonorID", group_id = "Status", cluster_id = "cluster_id" ) pooled_random <- CellDEEP.Random( pool_input, readcounts = "sum", n_cells = 3, min_cells_per_subgroup = 1, assay_name = "RNA" ) pooled_random
It can run Seurat DE directly or first aggregate cells into metacells using CellDEEP pooling.
FindMarker.CellDEEP( object, ident.1 = NULL, ident.2 = NULL, group.by = "group_id", sample_id = NULL, group_id = NULL, cluster_id = NULL, prepare = TRUE, test.use = "wilcox", Pool = TRUE, readcounts = "sum", n_cells = 10, assay = "RNA", min_cells_per_subgroup = 25, cell_selection = "kmean", name.only = TRUE, logfc.threshold = 0.25, min.pct = 0.01, p_cutoff = 0.05, full_list = FALSE, ... )FindMarker.CellDEEP( object, ident.1 = NULL, ident.2 = NULL, group.by = "group_id", sample_id = NULL, group_id = NULL, cluster_id = NULL, prepare = TRUE, test.use = "wilcox", Pool = TRUE, readcounts = "sum", n_cells = 10, assay = "RNA", min_cells_per_subgroup = 25, cell_selection = "kmean", name.only = TRUE, logfc.threshold = 0.25, min.pct = 0.01, p_cutoff = 0.05, full_list = FALSE, ... )
object |
A Seurat object. |
ident.1 |
Character. First identity group to compare. |
ident.2 |
Character. Second identity group to compare. |
group.by |
Character. Metadata column used for grouping (default |
sample_id |
Character. Input metadata column for sample IDs. |
group_id |
Character. Input metadata column for group IDs. |
cluster_id |
Character. Input metadata column for cluster IDs. |
prepare |
Logical. If TRUE, run |
test.use |
Character. DE test to use. |
Pool |
Logical. If TRUE, perform CellDEEP pooling before DE (default TRUE). |
readcounts |
Character. Pool aggregation method: |
n_cells |
Integer. Target number of cells per pool. |
assay |
Character. Assay to use (default |
min_cells_per_subgroup |
Integer. Minimum cells in each sample-cluster subgroup required for pooling. |
cell_selection |
Character. Pooling strategy: |
name.only |
Logical. If TRUE, return gene names only. |
logfc.threshold |
Numeric. Minimum log fold-change. |
min.pct |
Numeric. Minimum detection rate. |
p_cutoff |
Numeric. Adjusted p-value threshold. |
full_list |
Logical. If TRUE, return all genes regardless of p-value. |
... |
Additional arguments passed to |
A vector of gene names or a DE data.frame.
Standardizes metadata columns to sample_id, group_id, and
cluster_id so CellDEEP functions can run consistently.
prepare_data( Subset.Seurat, assay = "RNA", sample_id, group_id, cluster_id, file_path = NULL )prepare_data( Subset.Seurat, assay = "RNA", sample_id, group_id, cluster_id, file_path = NULL )
Subset.Seurat |
A Seurat object. |
assay |
Character. Assay to use (default |
sample_id |
Character. Metadata column name for sample IDs. |
group_id |
Character. Metadata column name for group IDs. |
cluster_id |
Character. Metadata column name for cluster IDs. |
file_path |
Character. Reserved for compatibility. |
A Seurat object with standardized metadata fields.
A wrapper for Seurat::FindMarkers that simplifies the extraction of
Differentially Expressed (DE) genes. It supports p-value filtering and can
return either gene names or a full results table.
return.DE( dataset, test.use = "wilcox", DE.ident.1, DE.ident.2, DE.group, assay = "RNA", p_cutoff = 0.05, name.only = TRUE, logfc.threshold = 0.25, min.pct = 0.01, full_list = FALSE, ... )return.DE( dataset, test.use = "wilcox", DE.ident.1, DE.ident.2, DE.group, assay = "RNA", p_cutoff = 0.05, name.only = TRUE, logfc.threshold = 0.25, min.pct = 0.01, full_list = FALSE, ... )
dataset |
A Seurat object. |
test.use |
Character. DE test to use (default |
DE.ident.1 |
Identifier(s) for the first group of cells. |
DE.ident.2 |
Identifier(s) for the second group of cells. |
DE.group |
Character. Metadata column to group by. |
assay |
Character. Assay to use (default |
p_cutoff |
Numeric. Adjusted p-value threshold (default 0.05). |
name.only |
Logical. If TRUE, return gene names only. |
logfc.threshold |
Numeric. Minimum log fold change (default 0.1). |
min.pct |
Numeric. Minimum fraction of cells expressing a gene. |
full_list |
Logical. If TRUE, return all genes and skip p-value filter. |
... |
Extra arguments passed to |
A character vector of genes or a marker data.frame.
A dataset containing 200 simulated cells(100 per group) for demonstrating CellDEEP functions. Can be found at doi:10.5281/zenodo.18863779
data(sim)data(sim)
A Seurat object
simulated data with muscat package