Config file format

Each time you run a MARTi analysis on a sequencing run, you need to specify a config file which provides the details of the analysis to be performed.

This config file is generated by the MARTi launcher front-end (Desktop) or GUI (cluster/HPC).

The following table specifies the meaning of the parameters in the file. Keywords in bold are mandatory, others are optional.

Sample and global settings

Keyword

Example

Meaning

SampleName

BAMBI_1D_18042017

Sample name

RawDataDir

/path/to/dir

Run directory - specifically the path to the directory containing the fastq_pass, fastq_fail etc. directories. Or for guppy run separately, the directory containing the fastq directory.

SampleDir

/path/to/dir

Path to directory to use for MARTi analysis files (will be created if doesn’t exist)

ProcessBarcodes

01,02,03

If a barcoded sample, indicates which barcodes to process

BarcodeId<n>

BarcodeSampleId1

Sample ID to use for barcode n

Scheduler

local

Job scheduler to use - currently only “local” works. Soon will be able to specify “slurm” to run SLURM.

Queue

ei-medium

The default job submission queue. Currently only required for SLURM and equates to the partition name.

MaxJobs

4

Specifies the maximum number of concurrent jobs that can be run by the scheduler (local or SLURM).

InactivityTimeout

10

How long (seconds) before giving up waiting for new reads to appear. After this timeout, all remaining analysis will be completed and analysis will stop. Default timeout is 10 seconds.

StopProcessingAfter

50000

Stop analysis after this number of reads. Default behaviour is no limit.

schedulerFileTimeout

600000

For SLURM, the allowed time between a job completing an an output file appearing before concluding a failutre. Default 600000 (i.e. 10m).

SchedulerFileWriteDelay

30000

For SLURM, the delay after a job completing and an output file appearing before MARTi attempts to read it. Default 30000 (i.e. 30s).

SchedulerResubmissionAttemplts

2

For SLURM, how many times to try resubmitting a failed job before giving up.

TaxonomyDir

/path/to/dir

Specifies location of NCBI taxonomy files (i.e. the directory containing nodes.dmp and names.dmp).

AccessionMap

/path/to/file

Specifies an accession map for mapping accession IDs to taxa. This is generated using the NCBI accession2taxid data by a separte tool. Option should not be required for normal MARTi operation.

ConvertFastQ

n/a

Deprecated.

ReadsPerBlast

4000

BLAST chunk size - reads are batched into bundles of this number before BLASTing.

Pre-filtering settings

Keyword

Example

Meaning

ReadFilterMinQ

9

Minimmum mean quality value. Reads with mean Q below this are not processed. Default 0 (all reads). |

ReadFilterMinLength

500

Minimum read length. Reads shorter than this are not processed. Default 0 (all reads). |

LCA classification settings

These Lowest Common Ancestor settings apply to BLAST results (see below).

Keyword

Example

Meaning

LCAMaxHits

100

Maximum number of BLAST hits to consider in LCA assignment. Default 100.

LCAScorePercent

90

Only consider hits within this percentage of the top hit for a given read. Default 90.

LCAMinIdentity

60

Only consider hits with this minimum identity. Default 60.

LCAMinLength

150

Minimum length of alignment to consider. Default 150.

LCAMinQueryCoverage

70

Only consider hits with this minmum percent coverage of query. Default 0.

LCAMinCombinedScore

120

Only consider hits where identity % added to query coverage % is greater than this value. Default 0.

LCALimitToSpecies

n/a

Limit LCA classification to species level and no lower.

BLAST processes

You can run multiple BLAST processes. Each begins with the Keyword BlastProcess.

Keyword

Example

Meaning

BlastProcess

n/a

Defines the start of a BLAST process

Name

nt

Name of process

Program

megablast

Blast algorithm to use - megablast or blast

Database

/path/to/db

Path to BLAST database

UseToClassify

n/a

Use BLAST results for classification (can only be set for 1 BLAST process)

TaxaFilter

/path/to/file.txt

Taxa filter file to use with BLAST (e.g. to filter to bacteria/viruses)

MaxE

0.001

Max E value for BLAST

MaxTargetSeqs

25

Maximum number of target sequences for BLAST

RunMeganEvery

n/a

Deprecated.

BlastThreads

4

Number of threads to use when running BLAST. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.

Memory

16G

For SLURM scheduler, the memory to use per BLAST job. Passed with the SLURM –mem parameter.

Queue

ei-medium

The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.

Dust

15 64 1

Dust string to be passed on to all blast commands for this blast process (optional).

Centrifuge processes

You can run multiple Centrifuge processes. Each begins with the keyword CentrifugeProcess.

Keyword

Example

Meaning

CentrifugeProcess

n/a

Defines the start of a Centrifuge process

Name

cent_nt

Name of process

Database

/path/to/db

Path to Centrifuge database

UseToClassify

n/a

Use Centrifuge results for classification (can only be set for 1 classification process)

CentrifugeThreads

4

Number of threads to use when running Centrifuge. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.

Memory

16G

For SLURM scheduler, the memory to use per Centrifuge job. Passed with the SLURM –mem parameter.

Queue

ei-medium

The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.

MinHitLen

500

This value is passed to the Centrifuge option –min-hitlen for this process.

Kraken2 processes

You can run multiple Kraken2 processes. Each begins with the keywork Kraken2Process.

Keyword

Example

Meaning

Kraken2Process

n/a

Defines the start of a Krakaen2 process

Name

k2_refseq

Name of process

Database

/path/to/db/

Path to directory containing Kraken2 database

UseToClassify

n/a

Use Kraken2 results for classification (can only be set for 1 classification process)

Kraken2Threads

4

Number of threads to use when running Kraken2. Note: for SLURM scheduler, MARTi also uses this value for the SLURM –cpus-per-task option.

Memory

16G

For SLURM scheduler, the memory to use per Kraken2 job. Passed with the SLURM –mem parameter.

Queue

ei-medium

The job submission queue to use. Can be left out and the default queue (see above) will be used. Currently only required for SLURM and equates to the partition name.

Metadata

Metadata blocks are optional blocks that contain data describing the collection of samples. A metadata block could describe the whole run or a subset of barcodes.

Keyword

Example

Meaning

Metadata

n/a

Defines the start of a metadata block

Location

52.62170,1.21900

GPS coordinates of location where sample was collected.

Date

31/10/23

Date of sample collection

Time

11:41

Time of sample collection

Temperature

21.7C

Temperature at location at time of collection.

Humidity

49%

Humidity at location at time of collection.

Keywords

field,potatoes,infected

Comma-separated list of keywords to describe the sample. Used for searching.

Barcodes

01,02,03,04,05

Optional comma-separated list of barcodes for which this metadata applies. Do not include this field to use metadata for all barcodes.

Example

Example file:

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

BlastProcess
    Name:nt
    Program:megablast
    Database:/Users/leggettr/Documents/Databases/nt_30Jan2020_v5/nt
    TaxaFilter:/Users/leggettr/Documents/Datasets/bacteria_viruses.txt
    MaxE:0.001
    MaxTargetSeqs:25
    BlastThreads:4
    UseToClassify

BlastProcess
    Name:card
    Program:blastn
    Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:1

Metadata
    Location:52.62170,1.21900
    Date:31/10/23
    Time: 11:41
    Temperature:21.7C
    Humidity:49%
    Keywords:bambi

Different classification processes can be performed in the same MARTi process (but only one classification process can have the “UseToClassify” field). The example below shows a config file that classifies reads using Kraken2, and searches for AMR hits using BLAST and the CARD database. Note that if a BLAST/CARD process is used, a walkout analysis giving the putative host taxa for AMR genes is only performed if a BLAST process is used to classify the reads.

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

Kraken2Process
    Name:refseq_16
    Database:/Users/leggettr/Documents/Databases/kraken2/k2_standard_16gb_20231009/
    Kraken2Threads:4
    UseToClassify

BlastProcess
    Name:card
    Program:blastn
    Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:1