2.2 Example Run
In this example, we will run a fully automated single-cell analysis on six 10X samples using scFlow. The input sparse matrices are the standard feature-barcode files generated by CellRanger. These are a small, artificial dataset generated from mouse brain data.
2.2.1 Getting Started
2.2.1.1 Install NextFlow
To get started, first install the latest version of NextFlow (>20.10) according to the installation instructions.
2.2.1.2 Download the Example Dataset
Next, create a folder for the analysis called scFlowExample
. Inside this folder, generate a conf
sub-folder for storing configuration files, and a refs
folder for reference files.
The sparse matrices for the example dataset are available in the scFlowExample folder in this Google Bucket: <https://console.cloud.google.com/storage/browser/scflowexamplegcp> . Save these files to your preferred location – we tend to keep these sparse matrices in a separate storage location for sequencing data.
Next, download the Manifest.txt
and SampleSheet.tsv
files from and to the refs
folder. Finally, download the scflow_analysis.config
and reddim_genes.yml
files from and to the conf
folder.
Your analysis folder scFlowExample
should now have this structure: -
.
├── conf
│ ├── reddim_genes.yml
│ └── scflow_analysis.config
└── refs
├── Manifest.txt └── SampleSheet.tsv
2.2.1.3 Edit the Manifest File
The Manifest.txt
file should include absolute paths to the sparse matrix containing folders. Edit these locations to reflect the locations of the files downloaded in the previous step, e.g.
key | filepath |
---|---|
tipif | /foo/bar/scflowexamplegcp/scFlowExample/individual_1 |
jarul | /foo/bar/scflowexamplegcp/scFlowExample/individual_2 |
zoham | /foo/bar/scflowexamplegcp/scFlowExample/individual_3 |
sibod | /foo/bar/scflowexamplegcp/scFlowExample/individual_4 |
limuz | /foo/bar/scflowexamplegcp/scFlowExample/individual_5 |
horud | /foo/bar/scflowexamplegcp/scFlowExample/individual_6 |
2.2.1.4 Download Additional Resources
The automated cell-type annotation reference files can be downloaded from the Google Bucket scFlowResources/refs/ctd
. An ensembl_mappings.tsv
file can be downloaded from the Google Bucket in the scFlowResources/src/ensembl-ids
folder. These resources are common for different analyses and can be saved in a generic location outside of the analysis folder.
To override the default parameter values with these locations, we will use a custom configuration file. Create a file with a .config
extension (e.g. my_scflow.config
) and add the locations, e.g.
params {
ensembl_mappings = "/foo/bar/scFlowResources/src/ensembl-ids/ensembl_mappings.tsv"
ctd_folder = "/foo/bar/scFlowResources/refs/ctd"
}
2.2.2 Setting up Nextflow
2.2.2.1 Enable Nextflow Tower
Nextflow Tower is an optional – though highly recommended – add-on for Nextflow. It provides a number of features including powerful real-time monitoring of workflows, and is useful for troubleshooting. Simply register for an account at tower.nf/login and obtain a token. Nextflow tower can be enabled for your scFlow run by appending the following to your custom configuration file above: -
tower {
accessToken = 'insertyourtokenhere'
enabled = true
}
2.2.2.2 Infrastructure Configuration
Further details on infrastructure configuration, including example config files, are available at the nf-core/configs github repository. Finer details on individual parameters are available in the Nextflow documentation here.
2.2.3 First Run
The analysis can now be run with the custom parameters and configuration options attached with the -c
parameter. For convenience, a bash script can be used: -
#!/bin/sh
nextflow run combiz/nf-core-scflow \
-c ./conf/scflow_analysis.config \
-c ~/combiz_config/imperial_dri.config \
-c ~/combiz_config/my_scflow.config \
-resume
In this example dataset, the “filtered” feature-barcode matrices are used; as such, ambient RNA profiling has already been performed by CellRanger so this step can be skipped in scFlow.