Skip to content

Stemness Transport with Representation Alignment for Lineage Inference (StreamLine)

Notifications You must be signed in to change notification settings

BGI-Qingdao/StreamLine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StreamLine: Stemness Transport with Representation Alignment for Lineage Inference

StreamLine is a framework designed to infer cell differentiation state transitions across time points using single-cell sequencing data.

Directory Structure

Please ensure the project directory is organized as follows to match the file paths in the scripts:

StreamLine/
├── data/
│   └── MouseSC.h5ad          # Input AnnData file
├── model/                    # Directory for model checkpoints
├── result/
│   └── temp/                 # Intermediate directory for cost matrix parquet files
├── autoencoder.py
├── generate_cost_dfs.py
└── save_problems_cc.py

Environment Configuration

The pipeline requires Python 3.10. It is recommended to create a dedicated virtual environment (e.g., using Conda) to manage dependencies.

Create and activate environment:

conda create -n streamline python=3.10
conda activate streamline
pip install torch scanpy moscot numpy pandas statsmodels scikit-learn matplotlib seaborn pyarrow fastparquet

Execution

To run the StreamLine, execute the following two scripts in order:

Step 1: Generate Cost Matrices

Calculates the transport cost between time points based on the pre-computed Stemness Score and Autoencoding Embedding.

python generate_cost_dfs.py

Output: Saves intermediate cost files to ./result/temp/.

Step 2: Calculate Transition Matrix

Using moscot to solve the optimal transport problem

python save_problems_cc.py

Output: Saves the final Transition Matrix to ./result/MouseSC.

Output Interpretation

The generated output ./result/MouseSC contains the solved Temporal Optimal Transport problem. The core result is the Transition Matrix, which represents the likelihood of cells at time $t_1$ differentiating into specific cells at time $t_2$.

You can load the solution and access the matrix using the following code:

from moscot.problems.time import TemporalProblem

tp = TemporalProblem.load('./result/MouseSC')
T = tp.solutions[13.0, 13.5].transport_matrix
ptrin(T)

Data Preparation Note

The provided input file ./data/MouseSC.h5ad already contains the necessary pre-computed data fields:

  1. Stemness Score (CytoTRACE2_Score):

    • Calculated using CytoTRACE2.
    • Parameters used: species='mouse', smooth_batch_size=10000.
  2. Autoencoder Embedding (X_ae128):

    • (Optional) If you wish to re-train the model or regenerate these embedding, you can run:
      python autoencoder.py

About

Stemness Transport with Representation Alignment for Lineage Inference (StreamLine)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages