Config file format
Each time you run a MARTi analysis on a sequencing run, you need to specify a config file which provides the details of the analysis to be performed.
This config file is generated by the MARTi launcher front-end (Desktop) or GUI (cluster/HPC).
The following table specifies the meaning of the parameters in the file. Keywords in bold are mandatory, others are optional.
Sample and global settings
Keyword |
Example |
Meaning |
|---|---|---|
SampleName |
BAMBI_1D_18042017 |
Sample name |
RawDataDir |
/path/to/dir |
Run directory - specifically the path to the directory containing the fastq_pass, fastq_fail etc. directories. Or for guppy/dorado run separately, the directory containing the fastq directory. |
SampleDir |
/path/to/dir |
Path to directory to use for MARTi analysis files (will be created if doesn’t exist) |
ProcessBarcodes |
01,02,03 |
If a barcoded sample, indicates which barcodes to process |
BarcodeId<n> |
BarcodeSampleId1 |
Sample ID to use for barcode n |
Scheduler |
local |
Job scheduler to use - either “local” or “slurm”. |
Queue |
ei-medium |
The default job submission queue. Currently only required for SLURM and equates to the partition name. |
MaxJobs |
4 |
Specifies the maximum number of concurrent jobs that can be run by the scheduler (local or SLURM). |
InactivityTimeout |
10 |
How long (seconds) before giving up waiting for new reads to appear. After this timeout, all remaining analysis will be completed and analysis will stop. Default timeout is 10 seconds. |
StopProcessingAfter |
50000 |
Stop analysis after this number of reads. Default behaviour is no limit. |
schedulerFileTimeout |
600000 |
For SLURM, the allowed time between a job completing an an output file appearing before concluding a failutre. Default 600000 (i.e. 10m). |
SchedulerFileWriteDelay |
30000 |
For SLURM, the delay after a job completing and an output file appearing before MARTi attempts to read it. Default 30000 (i.e. 30s). |
SchedulerResubmissionAttemplts |
2 |
For SLURM, how many times to try resubmitting a failed job before giving up. |
TaxonomyDir |
/path/to/dir |
Specifies location of NCBI taxonomy files (i.e. the directory containing nodes.dmp and names.dmp). |
AccessionMap |
/path/to/file |
Specifies an accession map for mapping accession IDs to taxa. This is generated using the NCBI accession2taxid data by a separte tool. Option should not be required for normal MARTi operation. |
ConvertFastQ |
n/a |
Deprecated. |
ReadsPerBlast |
4000 |
BLAST chunk size - reads are batched into bundles of this number before BLASTing. |
Pre-filtering settings
Keyword |
Example |
Meaning |
|---|---|---|
ReadFilterMinQ |
9 |
Minimmum mean quality value. Reads with mean Q below this are not processed. Default 0 (all reads). | |
ReadFilterMinLength |
150 |
Minimum read length. Reads shorter than this are not processed. Default 150. | |
LCA classification settings
These Lowest Common Ancestor settings apply to BLAST results (see below).
Keyword |
Example |
Meaning |
|---|---|---|
LCAMaxHits |
100 |
Maximum number of BLAST hits to consider in LCA assignment. Default 100. |
LCAScorePercent |
90 |
Only consider hits within this percentage of the top hit for a given read. Default 90. |
LCAMinIdentity |
70 |
Only consider hits with this minimum identity. Default 70. |
LCAMinLength |
150 |
Minimum length of alignment to consider. Default 150. |
LCAMinReadLength |
100 |
Minimum length of read to consider. Default 0. Note, this comes after ReadFilterMinLength, so if set to a value lower than that it will have no effect. |
LCAMinQueryCoverage |
70 |
Only consider hits with this minmum percent coverage of query. Default 0. |
LCAMinCombinedScore |
120 |
Only consider hits where identity % added to query coverage % is greater than this value. Default 0. |
LCALimitToSpecies |
n/a |
Limit LCA classification to species level and no lower. |
BLAST processes
You can run multiple BLAST processes. Each begins with the Keyword BlastProcess.
Keyword |
Example |
Meaning |
|---|---|---|
BlastProcess |
n/a |
Defines the start of a BLAST process |
Name |
nt |
Name of process |
Program |
megablast |
Blast algorithm to use e.g megablast, blastn |
Database |
/path/to/db |
Database (path)name. Note, this should be the same as you would specify to the BLAST command line with the -db parameter i.e. it is typically a prefix, or may point to the FASTQ file that the database was built from. |
UseToClassify |
n/a |
Use BLAST results for classification (can only be set for 1 BLAST process) |
TaxaFilter |
/path/to/file.txt |
Taxa filter file to use with BLAST (e.g. to filter to bacteria/viruses) |
MaxE |
0.001 |
Max E value for BLAST |
MaxTargetSeqs |
100 |
Maximum number of target sequences for BLAST |
RunMeganEvery |
n/a |
Deprecated. |
BlastThreads |
4 |
Number of threads to use when running BLAST. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option. |
Memory |
16G |
For SLURM scheduler, the memory to use per BLAST job. Passed with the SLURM –mem parameter. |
Queue |
ei-medium |
The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name. |
Dust |
15 64 1 |
Dust string to be passed on to all blast commands for this blast process (optional). |
Options |
-ungapped |
Any additional options to pass to BLAST (multiple options can be separated with spaces) |
Diamond processes
Diamond can be used to classify reads against a Diamond database that is built with taxonomy information. For example, to build a compatible diamond database using NCBI taxonomy, use the command
diamond makedb --threads 8 --in nr.gz -d nr.diamond-2.0.9 --taxonmap prot.accession2taxid.FULL.gz --taxonnodes nodes.dmp --taxonnames names.dmp
The fields --taxonmap, --taxonnodes, and --taxonnames must be specified for the database to be compatible with MARTi.
Diamond processes are a subset of BLAST processes, with the Program field set to diamond. All compatible fields from the BLAST process are passed through to Diamond. Diamond processes have an additional options field to specify the sensistivity mode (or any other options). See below for an example.
Centrifuge processes
You can run multiple Centrifuge processes. Each begins with the keyword CentrifugeProcess.
Keyword |
Example |
Meaning |
|---|---|---|
CentrifugeProcess |
n/a |
Defines the start of a Centrifuge process |
Name |
cent_nt |
Name of process |
Database |
/path/to/db |
Path to Centrifuge database |
UseToClassify |
n/a |
Use Centrifuge results for classification (can only be set for 1 classification process) |
CentrifugeThreads |
4 |
Number of threads to use when running Centrifuge. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option. |
Memory |
16G |
For SLURM scheduler, the memory to use per Centrifuge job. Passed with the SLURM –mem parameter. |
Queue |
ei-medium |
The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name. |
MinHitLen |
500 |
This value is passed to the Centrifuge option –min-hitlen for this process. |
TaxaFilter |
544,550 |
Passes through to Centrifuge’s exclude-taxids option which is described as “a comma-separated list of taxonomic IDs that will be excluded in classification procedure. The descendants from these IDs will also be excluded.” |
Options |
–reorder |
Any additional options to pass to Centrifuge (multiple options can be separated with spaces) |
Kraken2 processes
You can run multiple Kraken2 processes. Each begins with the keywork Kraken2Process.
Keyword |
Example |
Meaning |
|---|---|---|
Kraken2Process |
n/a |
Defines the start of a Kraken2 process |
Name |
k2_refseq |
Name of process |
Database |
/path/to/db/ |
Path to directory containing Kraken2 database |
UseToClassify |
n/a |
Use Kraken2 results for classification (can only be set for 1 classification process) |
Kraken2Threads |
4 |
Number of threads to use when running Kraken2. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option. |
Memory |
16G |
For SLURM scheduler, the memory to use per Kraken2 job. Passed with the SLURM –mem parameter. |
Queue |
ei-medium |
The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name. |
Options |
–confidence 0.01 |
Any additional options to pass to Kraken2 (multiple options can be separated with spaces) |
AMR Walkout
Keyword |
Example |
Meaning |
|---|---|---|
WalkoutMinDistance |
50 |
Minimum distance from AMR hit that host hit must extend |
WalkoutMinID |
80 |
Minimum percentage identity for an AMR hit |
WalkoutMinLength |
100 |
Minimum length for an AMR hit alignment |
Metadata
Metadata blocks are optional blocks that contain data describing the collection of samples. A metadata block could describe the whole run or a subset of barcodes.
Keyword |
Example |
Meaning |
|---|---|---|
Metadata |
n/a |
Defines the start of a metadata block |
Location |
52.62170,1.21900 |
GPS coordinates of location where sample was collected. |
Date |
31/10/23 |
Date of sample collection |
Time |
11:41 |
Time of sample collection |
Temperature |
21.7C |
Temperature at location at time of collection. |
Humidity |
49% |
Humidity at location at time of collection. |
Keywords |
field,potatoes,infected |
Comma-separated list of keywords to describe the sample. Used for searching. |
Barcodes |
01,02,03,04,05 |
Optional comma-separated list of barcodes for which this metadata applies. Do not include this field to use metadata for all barcodes. |
Example
Example file:
SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere
Scheduler:local
LocalSchedulerMaxJobs:4
InactivityTimeout:10
StopProcessingAfter:50000000
TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50
ConvertFastQ
ReadsPerBlast:8000
ReadFilterMinQ:9
ReadFilterMinLength:500
BlastProcess
Name:nt
Program:megablast
Database:/Users/leggettr/Documents/Databases/nt_30Jan2020_v5/nt
TaxaFilter:/Users/leggettr/Documents/Datasets/bacteria_viruses.txt
MaxE:0.001
MaxTargetSeqs:25
BlastThreads:4
UseToClassify
BlastProcess
Name:card
Program:blastn
Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
MaxE:0.001
MaxTargetSeqs:100
BlastThreads:1
Metadata
Location:52.62170,1.21900
Date:31/10/23
Time: 11:41
Temperature:21.7C
Humidity:49%
Keywords:bambi
Different classification processes can be performed in the same MARTi process (but only one classification process can have the “UseToClassify” field). The example below shows a config file that classifies reads using Kraken2, and searches for AMR hits using BLAST and the CARD database. Note that if a BLAST/CARD process is used, a walkout analysis giving the putative host taxa for AMR genes is only performed if a BLAST process is used to classify the reads.
SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere
Scheduler:local
LocalSchedulerMaxJobs:4
InactivityTimeout:10
StopProcessingAfter:50000000
TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50
ConvertFastQ
ReadsPerBlast:8000
ReadFilterMinQ:9
ReadFilterMinLength:500
Kraken2Process
Name:refseq_16
Database:/Users/leggettr/Documents/Databases/kraken2/k2_standard_16gb_20231009/
Kraken2Threads:4
UseToClassify
BlastProcess
Name:card
Program:blastn
Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
MaxE:0.001
MaxTargetSeqs:100
BlastThreads:1
To classify using Diamond and a compatible database, use a BlastProcess with the Program field set to diamond. For example
SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere
Scheduler:local
LocalSchedulerMaxJobs:4
InactivityTimeout:10
StopProcessingAfter:50000000
TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50
ConvertFastQ
ReadsPerBlast:8000
ReadFilterMinQ:9
ReadFilterMinLength:500
BlastProcess
Name:diamond-nr
Program:diamond
Database:/Users/leggettr/Documents/Databases/diamond/nr.diamond-2.0.9
MaxE:0.001
MaxTargetSeqs:100
BlastThreads:2
options: --sensitive --range-culling
Processing Barcodes Example
The following example demonstrates how to configure MARTi to process multiple barcodes.
RunName:Sample_Name
RawDataDir:/path/to/data/reads
SampleDir:/path/to/marti_output/Sample_Name
ProcessBarcodes:01,02,03,04,05,06,07,08,09,10,11,12
BarcodeId1:Kessingland1
BarcodeId2:Kessingland2
BarcodeId3:CarltonMarshes1
BarcodeId4:CarltonMarshe2
BarcodeId5:ThetfordForest1
BarcodeId6:ThetfordForest2
BarcodeId7:CityCentre1
BarcodeId8:CityCentre2
BarcodeId9:Brancaster1
BarcodeId10:Brancaster2
BarcodeId11:FoxleyWood1
BarcodeId12:FoxleyWood2
Scheduler:local
MaxJobs:64
InactivityTimeout:10
StopProcessingAfter:0
TaxonomyDir:/path/to/databases/taxonomy/taxdump_2024_03_09
ReadFilterMinQ:8
ReadFilterMinLength:150
ConvertFastQ
ReadsPerBlast:10000
BlastProcess
Name:nt
Program:megablast
Database:/path/to/databases/blast/ncbi/nt_20240305/nt
NegativeTaxaFilter:/path/to/results/marti/exclude/other_sequences_taxids.txt
MaxE:0.001
MaxTargetSeqs:25
UseToClassify
LCAMaxHits:100
LCAScorePercent:90.0
LCAMinIdentity:75
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:150
The ProcessBarcodes line specifies which barcodes MARTi should analyse during the run. The lines following ProcessBarcodes (e.g., BarcodeId1:Kessingland1) are used to assign custom names to each barcode. If these lines are omitted, MARTi will assign default names using the run name followed by the barcode number (e.g., Sample_Name_bc01).
Users can also rename barcodes after running MARTi. This can be done through the GUI or by creating an ids.json file in the MARTi output directory. For this example, the file would be placed at /path/to/marti_output/Sample_Name/ids.json.
Here is an example of an ids.json file to rename two samples after the analysis has been completed:
{
"Kessingland1": "Kessingland1_Autumn24",
"Kessingland2": "Kessingland2_Autumn24"
}