StreamLine is a framework designed to infer cell differentiation state transitions across time points using single-cell sequencing data.
Please ensure the project directory is organized as follows to match the file paths in the scripts:
StreamLine/
├── data/
│ └── MouseSC.h5ad # Input AnnData file
├── model/ # Directory for model checkpoints
├── result/
│ └── temp/ # Intermediate directory for cost matrix parquet files
├── autoencoder.py
├── generate_cost_dfs.py
└── save_problems_cc.py
The pipeline requires Python 3.10. It is recommended to create a dedicated virtual environment (e.g., using Conda) to manage dependencies.
Create and activate environment:
conda create -n streamline python=3.10
conda activate streamline
pip install torch scanpy moscot numpy pandas statsmodels scikit-learn matplotlib seaborn pyarrow fastparquetTo run the StreamLine, execute the following two scripts in order:
Calculates the transport cost between time points based on the pre-computed Stemness Score and Autoencoding Embedding.
python generate_cost_dfs.pyOutput: Saves intermediate cost files to ./result/temp/.
Using moscot to solve the optimal transport problem
python save_problems_cc.pyOutput: Saves the final Transition Matrix to ./result/MouseSC.
The generated output ./result/MouseSC contains the solved Temporal Optimal Transport problem. The core result is the Transition Matrix, which represents the likelihood of cells at time
You can load the solution and access the matrix using the following code:
from moscot.problems.time import TemporalProblem
tp = TemporalProblem.load('./result/MouseSC')
T = tp.solutions[13.0, 13.5].transport_matrix
ptrin(T)The provided input file ./data/MouseSC.h5ad already contains the necessary pre-computed data fields:
-
Stemness Score (
CytoTRACE2_Score):- Calculated using CytoTRACE2.
- Parameters used:
species='mouse',smooth_batch_size=10000.
-
Autoencoder Embedding (
X_ae128):- (Optional) If you wish to re-train the model or regenerate these embedding, you can run:
python autoencoder.py
- (Optional) If you wish to re-train the model or regenerate these embedding, you can run: