StreamLine: Stemness Transport with Representation Alignment for Lineage Inference

StreamLine is a framework designed to infer cell differentiation state transitions across time points using single-cell sequencing data.

Directory Structure

Please ensure the project directory is organized as follows to match the file paths in the scripts:

StreamLine/
├── data/
│   └── MouseSC.h5ad          # Input AnnData file
├── model/                    # Directory for model checkpoints
├── result/
│   └── temp/                 # Intermediate directory for cost matrix parquet files
├── autoencoder.py
├── generate_cost_dfs.py
└── save_problems_cc.py

Environment Configuration

The pipeline requires Python 3.10. It is recommended to create a dedicated virtual environment (e.g., using Conda) to manage dependencies.

Create and activate environment:

conda create -n streamline python=3.10
conda activate streamline
pip install torch scanpy moscot numpy pandas statsmodels scikit-learn matplotlib seaborn pyarrow fastparquet

Execution

To run the StreamLine, execute the following two scripts in order:

Step 1: Generate Cost Matrices

Calculates the transport cost between time points based on the pre-computed Stemness Score and Autoencoding Embedding.

python generate_cost_dfs.py

Output: Saves intermediate cost files to ./result/temp/.

Step 2: Calculate Transition Matrix

Using moscot to solve the optimal transport problem

python save_problems_cc.py

Output: Saves the final Transition Matrix to ./result/MouseSC.

Output Interpretation

The generated output ./result/MouseSC contains the solved Temporal Optimal Transport problem. The core result is the Transition Matrix, which represents the likelihood of cells at time $t_1$ differentiating into specific cells at time $t_2$.

You can load the solution and access the matrix using the following code:

from moscot.problems.time import TemporalProblem

tp = TemporalProblem.load('./result/MouseSC')
T = tp.solutions[13.0, 13.5].transport_matrix
ptrin(T)

Data Preparation Note

The provided input file ./data/MouseSC.h5ad already contains the necessary pre-computed data fields:

Stemness Score (CytoTRACE2_Score):
- Calculated using CytoTRACE2.
- Parameters used: species='mouse', smooth_batch_size=10000.
Autoencoder Embedding (X_ae128):
- (Optional) If you wish to re-train the model or regenerate these embedding, you can run:
```
python autoencoder.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamLine: Stemness Transport with Representation Alignment for Lineage Inference

Directory Structure

Environment Configuration

Execution

Step 1: Generate Cost Matrices

Step 2: Calculate Transition Matrix

Output Interpretation

Data Preparation Note

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
model		model
README.md		README.md
autoencoder.py		autoencoder.py
generate_cost_dfs.py		generate_cost_dfs.py
save_problems_cc.py		save_problems_cc.py

BGI-Qingdao/StreamLine

Folders and files

Latest commit

History

Repository files navigation

StreamLine: Stemness Transport with Representation Alignment for Lineage Inference

Directory Structure

Environment Configuration

Execution

Step 1: Generate Cost Matrices

Step 2: Calculate Transition Matrix

Output Interpretation

Data Preparation Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages