Config file format
Each time you run a MARTi analysis on a sequencing run, you need to specify a config file which provides the details of the analysis to be performed.
This config file is generated by the MARTi launcher front-end (Desktop) or GUI (cluster/HPC).
The following table specifies the meaning of the parameters in the file. Keywords in bold are mandatory, others are optional.
Sample and global settings
Keyword |
Example |
Meaning |
---|---|---|
SampleName |
BAMBI_1D_18042017 |
Sample name |
RawDataDir |
/path/to/dir |
Run directory - specifically the path to the directory containing the fastq_pass, fastq_fail etc. directories. Or for guppy run separately, the directory containing the fastq directory. |
SampleDir |
/path/to/dir |
Path to directory to use for MARTi analysis files (will be created if doesn’t exist) |
ProcessBarcodes |
01,02,03 |
If a barcoded sample, indicates which barcodes to process |
BarcodeId<n> |
BarcodeSampleId1 |
Sample ID to use for barcode n |
Scheduler |
local |
Job scheduler to use - currently only “local” works. Soon will be able to specify “slurm” to run SLURM. |
Queue |
ei-medium |
The default job submission queue. Currently only required for SLURM and equates to the partition name. |
MaxJobs |
4 |
Specifies the maximum number of concurrent jobs that can be run by the scheduler (local or SLURM). |
InactivityTimeout |
10 |
How long (seconds) before giving up waiting for new reads to appear. After this timeout, all remaining analysis will be completed and analysis will stop. Default timeout is 10 seconds. |
StopProcessingAfter |
50000 |
Stop analysis after this number of reads. Default behaviour is no limit. |
schedulerFileTimeout |
600000 |
For SLURM, the allowed time between a job completing an an output file appearing before concluding a failutre. Default 600000 (i.e. 10m). |
SchedulerFileWriteDelay |
30000 |
For SLURM, the delay after a job completing and an output file appearing before MARTi attempts to read it. Default 30000 (i.e. 30s). |
SchedulerResubmissionAttemplts |
2 |
For SLURM, how many times to try resubmitting a failed job before giving up. |
TaxonomyDir |
/path/to/dir |
Specifies location of NCBI taxonomy files (i.e. the directory containing nodes.dmp and names.dmp). |
AccessionMap |
/path/to/file |
Specifies an accession map for mapping accession IDs to taxa. This is generated using the NCBI accession2taxid data by a separte tool. Option should not be required for normal MARTi operation. |
ConvertFastQ |
n/a |
Deprecated. |
ReadsPerBlast |
4000 |
BLAST chunk size - reads are batched into bundles of this number before BLASTing. |
Pre-filtering settings
Keyword |
Example |
Meaning |
---|---|---|
ReadFilterMinQ |
9 |
Minimmum mean quality value. Reads with mean Q below this are not processed. Default 0 (all reads). | |
ReadFilterMinLength |
500 |
Minimum read length. Reads shorter than this are not processed. Default 0 (all reads). | |
LCA classification settings
These Lowest Common Ancestor settings apply to BLAST results (see below).
Keyword |
Example |
Meaning |
---|---|---|
LCAMaxHits |
100 |
Maximum number of BLAST hits to consider in LCA assignment. Default 100. |
LCAScorePercent |
90 |
Only consider hits within this percentage of the top hit for a given read. Default 90. |
LCAMinIdentity |
60 |
Only consider hits with this minimum identity. Default 60. |
LCAMinQueryCoverage |
70 |
Only consider hits with this minmum percent coverage of query. Default 0. |
LCAMinCombinedScore |
120 |
Only consider hits where identity % added to query coverage % is greater than this value. Default 0. |
LCALimitToSpecies |
n/a |
Limit LCA classification to species level and no lower. |
BLAST processes
You can run multiple BLAST processes. Each begins with the Keyword BlastProcess.
Keyword |
Example |
Meaning |
---|---|---|
BlastProcess |
n/a |
Defines the start of a BLAST process |
Name |
nt |
Name of process |
Program |
megablast |
Blast algorithm to use - megablast or blast |
Database |
/path/to/db |
Path to BLAST database |
UseToClassify |
n/a |
Use BLAST results for classification (can only be set for 1 BLAST process) |
TaxaFilter |
/path/to/file.txt |
Taxa filter file to use with BLAST (e.g. to filter to bacteria/viruses) |
MaxE |
0.001 |
Max E value for BLAST |
MaxTargetSeqs |
25 |
Maximum number of target sequences for BLAST |
RunMeganEvery |
n/a |
Deprecated. |
BlastThreads |
4 |
Number of threads to use when running BLAST. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option. |
Memory |
16G |
For SLURM scheduler, the memory to use per BLAST job. Passed with the SLURM –mem parameter. |
Queue |
ei-medium |
The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name. |
Dust |
15 64 1 |
Dust string to be passed on to all blast commands for this blast process (optional). |
Centrifuge processes
You can run multiple Centrifuge processes. Each begins with the keyword CentrifugeProcess.
Keyword |
Example |
Meaning |
---|---|---|
CentrifugeProcess |
n/a |
Defines the start of a Centrifuge process |
Name |
cent_nt |
Name of process |
Database |
/path/to/db |
Path to Centrifuge database |
UseToClassify |
n/a |
Use Centrifuge results for classification (can only be set for 1 classification process) |
CentrifugeThreads |
4 |
Number of threads to use when running Centrifuge. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option. |
Memory |
16G |
For SLURM scheduler, the memory to use per Centrifuge job. Passed with the SLURM –mem parameter. |
Queue |
ei-medium |
The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name. |
Kraken2 processes
You can run multiple Kraken2 processes. Each begins with the keywork Kraken2Process.
Keyword |
Example |
Meaning |
---|---|---|
Kraken2Process |
n/a |
Defines the start of a Krakaen2 process |
Name |
k2_refseq |
Name of process |
Database |
/path/to/db/ |
Path to directory containing Kraken2 database |
UseToClassify |
n/a |
Use Kraken2 results for classification (can only be set for 1 classification process) |
Kraken2Threads |
4 |
Number of threads to use when running Kraken2. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option. |
Memory |
16G |
For SLURM scheduler, the memory to use per Kraken2 job. Passed with the SLURM –mem parameter. |
Queue |
ei-medium |
The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name. |
Metadata
Metadata blocks are optional blocks that contain data describing the collection of samples. A metadata block could describe the whole run or a subset of barcodes.
Keyword |
Example |
Meaning |
---|---|---|
Metadata |
n/a |
Defines the start of a metadata block |
Location |
52.62170,1.21900 |
GPS coordinates of location where sample was collected. |
Date |
31/10/23 |
Date of sample collection |
Time |
11:41 |
Time of sample collection |
Temperature |
21.7C |
Temperature at location at time of collection. |
Humidity |
49% |
Humidity at location at time of collection. |
Keywords |
field,potatoes,infected |
Comma-separated list of keywords to describe the sample. Used for searching. |
Barcodes |
01,02,03,04,05 |
Optional comma-separated list of barcodes for which this metadata applies. Do not include this field to use metadata for all barcodes. |
Example
Example file:
SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere
Scheduler:local
LocalSchedulerMaxJobs:4
InactivityTimeout:10
StopProcessingAfter:50000000
TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50
ConvertFastQ
ReadsPerBlast:8000
ReadFilterMinQ:9
ReadFilterMinLength:500
BlastProcess
Name:nt
Program:megablast
Database:/Users/leggettr/Documents/Databases/nt_30Jan2020_v5/nt
TaxaFilter:/Users/leggettr/Documents/Datasets/bacteria_viruses.txt
MaxE:0.001
MaxTargetSeqs:25
BlastThreads:4
UseToClassify
BlastProcess
Name:card
Program:blastn
Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
MaxE:0.001
MaxTargetSeqs:100
BlastThreads:1
Metadata
Location:52.62170,1.21900
Date:31/10/23
Time: 11:41
Temperature:21.7C
Humidity:49%
Keywords:bambi
Different classification processes can be performed in the same MARTi process (but only one classification process can have the “UseToClassify” field). The example below shows a config file that classifies reads using Kraken2, and searches for AMR hits using BLAST and the CARD database. Note that if a BLAST/CARD process is used, a walkout analysis giving the putative host taxa for AMR genes is only performed if a BLAST process is used to classify the reads.
SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere
Scheduler:local
LocalSchedulerMaxJobs:4
InactivityTimeout:10
StopProcessingAfter:50000000
TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50
ConvertFastQ
ReadsPerBlast:8000
ReadFilterMinQ:9
ReadFilterMinLength:500
Kraken2Process
Name:refseq_16
Database:/Users/leggettr/Documents/Databases/kraken2/k2_standard_16gb_20231009/
Kraken2Threads:4
UseToClassify
BlastProcess
Name:card
Program:blastn
Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
MaxE:0.001
MaxTargetSeqs:100
BlastThreads:1