nf-reclaim pipeline

scbirlab/nf-reclaim is a Nextflow pipeline to identify bacterial orthologs and putative spectrum of activity of targets with known inhibitors.

Table of contents

Processing steps
Requirements
Quick start
Inputs
Outputs
Credit
Issues, problems, suggestions
Further help

Processing steps

scbirlab/nf-reclaim carries out the following steps:

Fetch all high confidence targets with inhibitors from ChEMBL
Fetch all protein sequences for targets from UniProt
Fetch proteomes of query organisms from UniProt
BLAST target protein sequences against query organism proteomes
Annotate LOEUF genomic constraint scores for human targets
Output inhibitors for targets meeting identity, coverage, and LOEUF cutoffs from ChEMBL

In parallel, the pipeline fetches data to assess LOEUF cutoffs for toxicity:

Fetch all inhibitors with cell IC50 or CC50 and target biochemical $K_i$ from ChEMBL
Fetch all targets of these inhibitors from ChEMBL
Filter down to inhibitors that have target biochemical $K_i$ from ChEMBL
Output cell line IC50 paired with target biochemical $K_i$

Requirements

You need Nextflow and either Anaconda, Singularity, or Docker to be installed.

First time using Nextflow?

Crick users

If you're at the Crick or your shared cluster has Nextflow and Singularity already installed, try:

module load Nextflow Singularity

Others

Otherwise, if it's your first time using Nextflow on your system, you can install it using conda:

conda install -c bioconda nextflow

You may need to set the NXF_HOME environment variable. For example,

mkdir -p ~/.nextflow
export NXF_HOME=~/.nextflow

To make this a permanent change, you can do something like the following:

mkdir -p ~/.nextflow
echo "export NXF_HOME=~/.nextflow" >> ~/.bash_profile
source ~/.bash_profile

Quick start

The easiest way to get going is by specifying parameters on the command-line:

nextflow run scbirlab/nf-reclaim \
    --organism_id 243273  \
    --min_identity 0.3 \
    --min_coverage 0.5 \
    --min_pchembl 7.0

Here's what the flags mean:

--organism_id: The Taxon ID of the organism, whih you can find at NCBI or UniProt
--min_identity (optional): minimum amino acid identity for orthology
--min_coverage (optional): minimum coverage for orthology
--min_pchembl (optional): minimum reported pChEMBL (potency) for inhibitors

Other options are available.

Running with Singularity, Docker, or Conda

scbirlab/nf-reclaim runs on a Singularity container engine by default to ensure software versions are consistent. If you have docker installed, you can run using -with-docker to use it instead, or if you have Conda you can run -with-conda.

Running more than one query in parallel

Make a sample sheet (see below) with columns representing the flags above, and, optionally, a nextflow.config file in the directory where you want the pipeline to run. Then simply run:

nextflow run scbirlab/nf-reclaim

Pipeline versions

If you want to run a particular tagged version of the pipeline, such as v0.0.2, you can do so using

nextflow run scbirlab/nf-reclaim -r v0.0.2

For help, use nextflow run scbirlab/nf-reclaim --help.

The first time you run the pipeline on your system, the software dependencies in environment.yml will be installed. This may take several minutes.

Inputs

Command-line usage

The pipeline can be run with command-line arguments:

nextflow run scbirlab/nf-reclaim --organism_id <taxon ID>

The following parameters are required:

--organism_id       Taxon ID for organism
# or if using sample sheet
--sample_sheet      CSV listing Taxon ID for multiple organisms

The following parameters have default options, and are optional.

min_identity = 35: minimum amino acid identity for orthology
min_coverage = 0.7: minimum sequence coverage for orthology
min_loeuf = 0.515: minimum genomic constraint for human targets
min_pchembl = 6.0: minimum inhibitor pChEMBL
gnomad_version = "4.1": which gNOMAD version to use for LOEUF values
tox_cell_lines = ["HCT116","HEK293T",...,"CHO"]: cell lines to fetch toxicity data
outputs = "outputs": output directory

Sample sheet

You can run multiple combinations in one command using a sample sheet. The sample sheet is a CSV file with one row per combination of parameters to run.

nextflow run scbirlab/nf-reclaim --sample_sheet path/to/sample-sheet.csv

Sample sheet structure

Here is an example of the sample sheet to find all the mycoplasma orthologous putative inhibitors:

organism_id	proteome_name
243273	"Mycoplasma genitalium"

Further examples are in the test directory of this repository.

Config-file usage (recommended)

For reproducibility, self-documentation, and to save typing, parameters with the same names as the command line flags above can be provided in a nextflow.config file in the working directory. For example:

params {
    organism_id = "243273"
}

Or with a sample sheet:

params {
    sample_sheet = "path/to/sample-sheet.csv"
}

Outputs

Outputs are saved in the output folder defined above.

Issues, problems, suggestions

Add to the issue tracker.

Further help

Here are the pages of the software and databases used by this pipeline.

Databases:

ChEMBL for inhibitors and targets
UniProt for protein sequences
NCBI Genbank for taxonomy

Software:

diamond to BLAST many-against-many protein sequences.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
modules		modules
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-reclaim pipeline

Processing steps

Requirements

First time using Nextflow?

Crick users

Others

Quick start

Running with Singularity, Docker, or Conda

Running more than one query in parallel

Pipeline versions

Inputs

Command-line usage

Sample sheet

Sample sheet structure

Config-file usage (recommended)

Outputs

Issues, problems, suggestions

Further help

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

scbirlab/nf-report

Folders and files

Latest commit

History

Repository files navigation

nf-reclaim pipeline

Processing steps

Requirements

First time using Nextflow?

Crick users

Others

Quick start

Running with Singularity, Docker, or Conda

Running more than one query in parallel

Pipeline versions

Inputs

Command-line usage

Sample sheet

Sample sheet structure

Config-file usage (recommended)

Outputs

Issues, problems, suggestions

Further help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages