A novel clustering tool improves detection of cell types and states from single-cell analyses

Single-cell sequencing technologies are used to understand the complexities of health and disease. A critical step for analyzing single-cell data is clustering cells based on their gene expression profiles to identify distinct cell types. Current clustering tools can be arbitrary and introduce biases, which can lead to overclustering (cells are clustered into many meaningless groups) or underclustering (different cell types are lumped together).

FunGen-AD-funded investigator Ryan Corces and other investigators from the Gladstone Institute of Neurological Disease developed a cell clustering tool called CHOIR (cluster hierarchy optimization by iterative random forests). CHOIR uses random forest classifiers and permutation tests to add a statistically informed approach to clustering single-cell data. The performance of CHOIR was compared with 15 existing clustering methods across 230 simulated and 5 real datasets, including single-cell RNA sequencing, spatial transcriptomic, multi-omic, and ATAC-seq data. CHOIR outperformed the existing clustering methods in every case, especially in identifying rare or subtle cell populations that other clustering tools missed. The research team is now applying CHOIR to analyze Alzheimer’s disease (AD) data with the expectation that this more precise tool can advance AD research.

This research, partially supported by FunGen-AD grant U01AG072573, is published in Nature Genetics here. You can read more about these research findings at the following links:

  • CHOIR improves significance-based detection of cell types and states from single-cell data (RNA-SEQ Blog)
  • Powerful New Tool Can Identify Cells Promoting Health or Disease (Gladstone Institutes)
  • Revolutionary Tool Pinpoints Health and Disease Cells (Mirage.News)
  • Revolutionary Tools Unveils Cells That Drive Health or Disease (Bioengineer)