Config file format

Each time you run a MARTi analysis on a sequencing run, you need to specify a config file which provides the details of the analysis to be performed.

This config file is generated by the MARTi launcher front-end (Desktop) or GUI (cluster/HPC).

The following table specifies the meaning of the parameters in the file. Keywords in bold are mandatory, others are optional.

Sample and global settings

Keyword	Example	Meaning
SampleName	BAMBI_1D_18042017	Sample name
RawDataDir	/path/to/dir	Run directory - specifically the path to the directory containing the fastq_pass, fastq_fail etc. directories. Or for guppy run separately, the directory containing the fastq directory.
SampleDir	/path/to/dir	Path to directory to use for MARTi analysis files (will be created if doesn’t exist)
ProcessBarcodes	01,02,03	If a barcoded sample, indicates which barcodes to process
BarcodeId<n>	BarcodeSampleId1	Sample ID to use for barcode n
Scheduler	local	Job scheduler to use - currently only “local” works. Soon will be able to specify “slurm” to run SLURM.
Queue	ei-medium	The default job submission queue. Currently only required for SLURM and equates to the partition name.
MaxJobs	4	Specifies the maximum number of concurrent jobs that can be run by the scheduler (local or SLURM).
InactivityTimeout	10	How long (seconds) before giving up waiting for new reads to appear. After this timeout, all remaining analysis will be completed and analysis will stop. Default timeout is 10 seconds.
StopProcessingAfter	50000	Stop analysis after this number of reads. Default behaviour is no limit.
schedulerFileTimeout	600000	For SLURM, the allowed time between a job completing an an output file appearing before concluding a failutre. Default 600000 (i.e. 10m).
SchedulerFileWriteDelay	30000	For SLURM, the delay after a job completing and an output file appearing before MARTi attempts to read it. Default 30000 (i.e. 30s).
SchedulerResubmissionAttemplts	2	For SLURM, how many times to try resubmitting a failed job before giving up.
TaxonomyDir	/path/to/dir	Specifies location of NCBI taxonomy files (i.e. the directory containing nodes.dmp and names.dmp).
AccessionMap	/path/to/file	Specifies an accession map for mapping accession IDs to taxa. This is generated using the NCBI accession2taxid data by a separte tool. Option should not be required for normal MARTi operation.
ConvertFastQ	n/a	Deprecated.
ReadsPerBlast	4000	BLAST chunk size - reads are batched into bundles of this number before BLASTing.

Pre-filtering settings

Keyword	Example	Meaning
ReadFilterMinQ	9	Minimmum mean quality value. Reads with mean Q below this are not processed. Default 0 (all reads). \|
ReadFilterMinLength	500	Minimum read length. Reads shorter than this are not processed. Default 0 (all reads). \|

LCA classification settings

These Lowest Common Ancestor settings apply to BLAST results (see below).

Keyword	Example	Meaning
LCAMaxHits	100	Maximum number of BLAST hits to consider in LCA assignment. Default 100.
LCAScorePercent	90	Only consider hits within this percentage of the top hit for a given read. Default 90.
LCAMinIdentity	60	Only consider hits with this minimum identity. Default 60.
LCAMinLength	150	Minimum length of alignment to consider. Default 150.
LCAMinQueryCoverage	70	Only consider hits with this minmum percent coverage of query. Default 0.
LCAMinCombinedScore	120	Only consider hits where identity % added to query coverage % is greater than this value. Default 0.
LCALimitToSpecies	n/a	Limit LCA classification to species level and no lower.

BLAST processes

You can run multiple BLAST processes. Each begins with the Keyword BlastProcess.

Keyword	Example	Meaning
BlastProcess	n/a	Defines the start of a BLAST process
Name	nt	Name of process
Program	megablast	Blast algorithm to use - megablast or blast
Database	/path/to/db	Path to BLAST database
UseToClassify	n/a	Use BLAST results for classification (can only be set for 1 BLAST process)
TaxaFilter	/path/to/file.txt	Taxa filter file to use with BLAST (e.g. to filter to bacteria/viruses)
MaxE	0.001	Max E value for BLAST
MaxTargetSeqs	25	Maximum number of target sequences for BLAST
RunMeganEvery	n/a	Deprecated.
BlastThreads	4	Number of threads to use when running BLAST. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.
Memory	16G	For SLURM scheduler, the memory to use per BLAST job. Passed with the SLURM –mem parameter.
Queue	ei-medium	The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.
Dust	15 64 1	Dust string to be passed on to all blast commands for this blast process (optional).

Centrifuge processes

You can run multiple Centrifuge processes. Each begins with the keyword CentrifugeProcess.

Keyword	Example	Meaning
CentrifugeProcess	n/a	Defines the start of a Centrifuge process
Name	cent_nt	Name of process
Database	/path/to/db	Path to Centrifuge database
UseToClassify	n/a	Use Centrifuge results for classification (can only be set for 1 classification process)
CentrifugeThreads	4	Number of threads to use when running Centrifuge. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.
Memory	16G	For SLURM scheduler, the memory to use per Centrifuge job. Passed with the SLURM –mem parameter.
Queue	ei-medium	The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.
MinHitLen	500	This value is passed to the Centrifuge option –min-hitlen for this process.

Kraken2 processes

You can run multiple Kraken2 processes. Each begins with the keywork Kraken2Process.

Keyword	Example	Meaning
Kraken2Process	n/a	Defines the start of a Krakaen2 process
Name	k2_refseq	Name of process
Database	/path/to/db/	Path to directory containing Kraken2 database
UseToClassify	n/a	Use Kraken2 results for classification (can only be set for 1 classification process)
Kraken2Threads	4	Number of threads to use when running Kraken2. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.
Memory	16G	For SLURM scheduler, the memory to use per Kraken2 job. Passed with the SLURM –mem parameter.
Queue	ei-medium	The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.

Metadata

Metadata blocks are optional blocks that contain data describing the collection of samples. A metadata block could describe the whole run or a subset of barcodes.

Keyword	Example	Meaning
Metadata	n/a	Defines the start of a metadata block
Location	52.62170,1.21900	GPS coordinates of location where sample was collected.
Date	31/10/23	Date of sample collection
Time	11:41	Time of sample collection
Temperature	21.7C	Temperature at location at time of collection.
Humidity	49%	Humidity at location at time of collection.
Keywords	field,potatoes,infected	Comma-separated list of keywords to describe the sample. Used for searching.
Barcodes	01,02,03,04,05	Optional comma-separated list of barcodes for which this metadata applies. Do not include this field to use metadata for all barcodes.

Example

Example file:

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

BlastProcess
    Name:nt
    Program:megablast
    Database:/Users/leggettr/Documents/Databases/nt_30Jan2020_v5/nt
    TaxaFilter:/Users/leggettr/Documents/Datasets/bacteria_viruses.txt
    MaxE:0.001
    MaxTargetSeqs:25
    BlastThreads:4
    UseToClassify

BlastProcess
    Name:card
    Program:blastn
    Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:1

Metadata
    Location:52.62170,1.21900
    Date:31/10/23
    Time: 11:41
    Temperature:21.7C
    Humidity:49%
    Keywords:bambi

Different classification processes can be performed in the same MARTi process (but only one classification process can have the “UseToClassify” field). The example below shows a config file that classifies reads using Kraken2, and searches for AMR hits using BLAST and the CARD database. Note that if a BLAST/CARD process is used, a walkout analysis giving the putative host taxa for AMR genes is only performed if a BLAST process is used to classify the reads.

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

Kraken2Process
    Name:refseq_16
    Database:/Users/leggettr/Documents/Databases/kraken2/k2_standard_16gb_20231009/
    Kraken2Threads:4
    UseToClassify

BlastProcess
    Name:card
    Program:blastn
    Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:1