Config file format

Each time you run a MARTi analysis on a sequencing run, you need to specify a config file which provides the details of the analysis to be performed.

This config file is generated by the MARTi launcher front-end (Desktop) or GUI (cluster/HPC).

The following table specifies the meaning of the parameters in the file. Keywords in bold are mandatory, others are optional.

Sample and global settings

Keyword	Example	Meaning
SampleName	BAMBI_1D_18042017	Sample name
RawDataDir	/path/to/dir	Run directory - specifically the path to the directory containing the fastq_pass, fastq_fail etc. directories. Or for guppy/dorado run separately, the directory containing the fastq directory.
SampleDir	/path/to/dir	Path to directory to use for MARTi analysis files (will be created if doesn’t exist)
ProcessBarcodes	01,02,03	If a barcoded sample, indicates which barcodes to process
BarcodeId<n>	BarcodeSampleId1	Sample ID to use for barcode n
Scheduler	local	Job scheduler to use - either “local” or “slurm”.
Queue	ei-medium	The default job submission queue. Currently only required for SLURM and equates to the partition name.
MaxJobs	4	Specifies the maximum number of concurrent jobs that can be run by the scheduler (local or SLURM).
InactivityTimeout	10	How long (seconds) before giving up waiting for new reads to appear. After this timeout, all remaining analysis will be completed and analysis will stop. Default timeout is 10 seconds.
StopProcessingAfter	50000	Stop analysis after this number of reads. Default behaviour is no limit.
schedulerFileTimeout	600000	For SLURM, the allowed time between a job completing an an output file appearing before concluding a failutre. Default 600000 (i.e. 10m).
SchedulerFileWriteDelay	30000	For SLURM, the delay after a job completing and an output file appearing before MARTi attempts to read it. Default 30000 (i.e. 30s).
SchedulerResubmissionAttemplts	2	For SLURM, how many times to try resubmitting a failed job before giving up.
TaxonomyDir	/path/to/dir	Specifies location of NCBI taxonomy files (i.e. the directory containing nodes.dmp and names.dmp).
AccessionMap	/path/to/file	Specifies an accession map for mapping accession IDs to taxa. This is generated using the NCBI accession2taxid data by a separte tool. Option should not be required for normal MARTi operation.
ConvertFastQ	n/a	Deprecated.
ReadsPerBlast	4000	BLAST chunk size - reads are batched into bundles of this number before BLASTing.

Pre-filtering settings

Keyword	Example	Meaning
ReadFilterMinQ	9	Minimmum mean quality value. Reads with mean Q below this are not processed. Default 0 (all reads). \|
ReadFilterMinLength	150	Minimum read length. Reads shorter than this are not processed. Default 150. \|

LCA classification settings

These Lowest Common Ancestor settings apply to BLAST results (see below).

Keyword	Example	Meaning
LCAMaxHits	100	Maximum number of BLAST hits to consider in LCA assignment. Default 100.
LCAScorePercent	90	Only consider hits within this percentage of the top hit for a given read. Default 90.
LCAMinIdentity	70	Only consider hits with this minimum identity. Default 70.
LCAMinLength	150	Minimum length of alignment to consider. Default 150.
LCAMinReadLength	100	Minimum length of read to consider. Default 0. Note, this comes after ReadFilterMinLength, so if set to a value lower than that it will have no effect.
LCAMinQueryCoverage	70	Only consider hits with this minmum percent coverage of query. Default 0.
LCAMinCombinedScore	120	Only consider hits where identity % added to query coverage % is greater than this value. Default 0.
LCALimitToSpecies	n/a	Limit LCA classification to species level and no lower.
LCAMinSupportDenominator	filteredReads	Specifies the denominator used when applying tree minimum support - either filteredReads (all pass-filter reads) or assignedReads (classified/assigned reads only). Currently defaults to assignedReads for backwards compatibility, but recommend using filteredReads.

BLAST processes

You can run multiple BLAST processes. Each begins with the Keyword BlastProcess.

Keyword	Example	Meaning
BlastProcess	n/a	Defines the start of a BLAST process
Name	nt	Name of process
Program	megablast	Blast algorithm to use e.g megablast, blastn
Database	/path/to/db	Database (path)name. Note, this should be the same as you would specify to the BLAST command line with the -db parameter i.e. it is typically a prefix, or may point to the FASTQ file that the database was built from.
UseToClassify	n/a	Use BLAST results for classification (can only be set for 1 BLAST process)
TaxaFilter	/path/to/file.txt	Taxa filter file to use with BLAST (e.g. to filter to bacteria/viruses)
MaxE	0.001	Max E value for BLAST
MaxTargetSeqs	100	Maximum number of target sequences for BLAST
RunMeganEvery	n/a	Deprecated.
BlastThreads	4	Number of threads to use when running BLAST. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.
Memory	16G	For SLURM scheduler, the memory to use per BLAST job. Passed with the SLURM –mem parameter.
Queue	ei-medium	The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.
Dust	15 64 1	Dust string to be passed on to all blast commands for this blast process (optional).
Options	-ungapped	Any additional options to pass to BLAST (multiple options can be separated with spaces)

Diamond processes

Diamond can be used to classify reads against a Diamond database that is built with taxonomy information. For example, to build a compatible diamond database using NCBI taxonomy, use the command

diamond makedb --threads 8 --in nr.gz -d nr.diamond-2.0.9 --taxonmap prot.accession2taxid.FULL.gz --taxonnodes nodes.dmp --taxonnames names.dmp

The fields --taxonmap, --taxonnodes, and --taxonnames must be specified for the database to be compatible with MARTi.

Diamond processes are a subset of BLAST processes, with the Program field set to diamond. All compatible fields from the BLAST process are passed through to Diamond. Diamond processes have an additional options field to specify the sensistivity mode (or any other options). See below for an example.

Centrifuge processes

You can run multiple Centrifuge processes. Each begins with the keyword CentrifugeProcess.

Keyword	Example	Meaning
CentrifugeProcess	n/a	Defines the start of a Centrifuge process
Name	cent_nt	Name of process
Database	/path/to/db	Path to Centrifuge database
UseToClassify	n/a	Use Centrifuge results for classification (can only be set for 1 classification process)
CentrifugeThreads	4	Number of threads to use when running Centrifuge. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.
Memory	16G	For SLURM scheduler, the memory to use per Centrifuge job. Passed with the SLURM –mem parameter.
Queue	ei-medium	The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.
MinHitLen	500	This value is passed to the Centrifuge option –min-hitlen for this process.
TaxaFilter	544,550	Passes through to Centrifuge’s exclude-taxids option which is described as “a comma-separated list of taxonomic IDs that will be excluded in classification procedure. The descendants from these IDs will also be excluded.”
Options	–reorder	Any additional options to pass to Centrifuge (multiple options can be separated with spaces)

Kraken2 processes

You can run multiple Kraken2 processes. Each begins with the keywork Kraken2Process.

Keyword	Example	Meaning
Kraken2Process	n/a	Defines the start of a Kraken2 process
Name	k2_refseq	Name of process
Database	/path/to/db/	Path to directory containing Kraken2 database
UseToClassify	n/a	Use Kraken2 results for classification (can only be set for 1 classification process)
Kraken2Threads	4	Number of threads to use when running Kraken2. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.
Memory	16G	For SLURM scheduler, the memory to use per Kraken2 job. Passed with the SLURM –mem parameter.
Queue	ei-medium	The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.
Options	–confidence 0.01	Any additional options to pass to Kraken2 (multiple options can be separated with spaces)

AMR Walkout

Keyword	Example	Meaning
WalkoutMinDistance	50	Minimum distance from AMR hit that host hit must extend
WalkoutMinID	80	Minimum percentage identity for an AMR hit
WalkoutMinLength	100	Minimum length for an AMR hit alignment

Metadata

Metadata blocks are optional blocks that contain data describing the collection of samples. A metadata block could describe the whole run or a subset of barcodes.

Keyword	Example	Meaning
Metadata	n/a	Defines the start of a metadata block
Location	52.62170,1.21900	GPS coordinates of location where sample was collected.
Date	31/10/23	Date of sample collection
Time	11:41	Time of sample collection
Temperature	21.7C	Temperature at location at time of collection.
Humidity	49%	Humidity at location at time of collection.
Keywords	field,potatoes,infected	Comma-separated list of keywords to describe the sample. Used for searching.
Barcodes	01,02,03,04,05	Optional comma-separated list of barcodes for which this metadata applies. Do not include this field to use metadata for all barcodes.

Example

Example file:

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

BlastProcess
    Name:nt
    Program:megablast
    Database:/Users/leggettr/Documents/Databases/nt_30Jan2020_v5/nt
    TaxaFilter:/Users/leggettr/Documents/Datasets/bacteria_viruses.txt
    MaxE:0.001
    MaxTargetSeqs:25
    BlastThreads:4
    UseToClassify

BlastProcess
    Name:card
    Program:blastn
    Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:1

Metadata
    Location:52.62170,1.21900
    Date:31/10/23
    Time: 11:41
    Temperature:21.7C
    Humidity:49%
    Keywords:bambi

Different classification processes can be performed in the same MARTi process (but only one classification process can have the “UseToClassify” field). The example below shows a config file that classifies reads using Kraken2, and searches for AMR hits using BLAST and the CARD database. Note that if a BLAST/CARD process is used, a walkout analysis giving the putative host taxa for AMR genes is only performed if a BLAST process is used to classify the reads.

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

Kraken2Process
    Name:refseq_16
    Database:/Users/leggettr/Documents/Databases/kraken2/k2_standard_16gb_20231009/
    Kraken2Threads:4
    UseToClassify

BlastProcess
    Name:card
    Program:blastn
    Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:1

To classify using Diamond and a compatible database, use a BlastProcess with the Program field set to diamond. For example

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

BlastProcess
    Name:diamond-nr
    Program:diamond
    Database:/Users/leggettr/Documents/Databases/diamond/nr.diamond-2.0.9
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:2
    options: --sensitive --range-culling

Processing Barcodes Example

The following example demonstrates how to configure MARTi to process multiple barcodes.

RunName:Sample_Name
RawDataDir:/path/to/data/reads
SampleDir:/path/to/marti_output/Sample_Name

ProcessBarcodes:01,02,03,04,05,06,07,08,09,10,11,12
BarcodeId1:Kessingland1
BarcodeId2:Kessingland2
BarcodeId3:CarltonMarshes1
BarcodeId4:CarltonMarshe2
BarcodeId5:ThetfordForest1
BarcodeId6:ThetfordForest2
BarcodeId7:CityCentre1
BarcodeId8:CityCentre2
BarcodeId9:Brancaster1
BarcodeId10:Brancaster2
BarcodeId11:FoxleyWood1
BarcodeId12:FoxleyWood2

Scheduler:local
MaxJobs:64
InactivityTimeout:10
StopProcessingAfter:0
TaxonomyDir:/path/to/databases/taxonomy/taxdump_2024_03_09
ReadFilterMinQ:8
ReadFilterMinLength:150
ConvertFastQ
ReadsPerBlast:10000

BlastProcess
Name:nt
Program:megablast
Database:/path/to/databases/blast/ncbi/nt_20240305/nt
NegativeTaxaFilter:/path/to/results/marti/exclude/other_sequences_taxids.txt
MaxE:0.001
MaxTargetSeqs:25
UseToClassify

LCAMaxHits:100
LCAScorePercent:90.0
LCAMinIdentity:75
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:150

The ProcessBarcodes line specifies which barcodes MARTi should analyse during the run. The lines following ProcessBarcodes (e.g., BarcodeId1:Kessingland1) are used to assign custom names to each barcode. If these lines are omitted, MARTi will assign default names using the run name followed by the barcode number (e.g., Sample_Name_bc01).

Users can also rename barcodes after running MARTi. This can be done through the GUI or by creating an ids.json file in the MARTi output directory. For this example, the file would be placed at /path/to/marti_output/Sample_Name/ids.json.

Here is an example of an ids.json file to rename two samples after the analysis has been completed:

{
    "Kessingland1": "Kessingland1_Autumn24",
    "Kessingland2": "Kessingland2_Autumn24"
}