Config file format

Each time you run a MARTi analysis on a sequencing run, you need to specify a config file which provides the details of the analysis to be performed.

This config file is generated by the MARTi launcher front-end (Desktop) or GUI (cluster/HPC).

The following table specifies the meaning of the parameters in the file. Keywords in bold are mandatory, others are optional.

Sample and global settings

Keyword

Example

Meaning

SampleName

BAMBI_1D_18042017

Sample name

RawDataDir

/path/to/dir

Run directory - specifically the path to the directory containing the fastq_pass, fastq_fail etc. directories. Or for guppy run separately, the directory containing the fastq directory.

SampleDir

/path/to/dir

Path to directory to use for MARTi analysis files (will be created if doesn’t exist)

ProcessBarcodes

01,02,03

If a barcoded sample, indicates which barcodes to process

BarcodeId<n>

BarcodeSampleId1

Sample ID to use for barcode n

Scheduler

local

Job scheduler to use - currently only “local” works. Soon will be able to specify “slurm” to run SLURM.

LocalSchedulerMaxJobs

4

Specifies the maximum number of concurrent jobs that can be run by the local scheduler.

InactivityTimeout

10

How long (seconds) before giving up waiting for new reads to appear. After this timeout, all remaining analysis will be completed and analysis will stop. Default timeout is 10 seconds.

StopProcessingAfter

50000

Stop analysis after this number of reads. Default behaviour is no limit.

TaxonomyDir

/path/to/dir

Specifies location of NCBI taxonomy files (i.e. the directory containing nodes.dmp and names.dmp).

AccessionMap

/path/to/file

Specifies an accession map for mapping accession IDs to taxa. This is generated using the NCBI accession2taxid data by a separte tool. Option should not be required for normal MARTi operation.

ConvertFastQ

n/a

Deprecated.

ReadsPerBlast

4000

BLAST chunk size - reads are batched into bundles of this number before BLASTing.

Pre-filtering settings

Keyword

Example

Meaning

ReadFilterMinQ

9

Minimmum mean quality value. Reads with mean Q below this are not processed. Default 0 (all reads). |

ReadFilterMinLength

500

Minimum read length. Reads shorter than this are not processed. Default 0 (all reads). |

LCA classification settings

These Lowest Common Ancestor settings apply to BLAST results (see below).

Keyword

Example

Meaning

LCAMaxHits

20

Maximum number of BLAST hits to consider in LCA assignment. Default 20.

LCAScorePercent

90

Only consider hits within this percentage of the top hit for a given read. Defauly 90.

LCAMinIdentity

70

Only consider hits with this minimum identity. Default 0.

LCAMinQueryCoverage

70

Only consider hits with this minmum percent coverage of query. Default 0.

LCAMinCombinedScore

120

Only consider hits where identity % added to query coverage % is greater than this value. Default 0.

BLAST processes

You can run multiple BLAST processes. Each begins with the Keyword BlastProcess.

Keyword

Example

Meaning

BlastProcess

n/a

Defines the start of a BLAST process

Name

nt

Name of process

Program

megablast

Blast algorithm to use - megablast or blast

Database

/path/to/db

Path to BLAST database

UseToClassify

n/a

Use BLAST results for classification (can only be set for 1 BLAST process)

TaxaFilter

/path/to/file.txt

Taxa filter file to use with BLAST (e.g. to filter to bacteria/viruses)

MaxE

0.001

Max E value for BLAST

MaxTargetSeqs

25

Maximum number of target sequences for BLAST

RunMeganEvery

n/a

Deprecated.

BlastThreads

4

Number of threads to use when running BLAST

Example

Example file:

SampleName:BAMBI_1D_19092017_MARTi
RawDataDir:/Users/leggettr/Documents/Datasets/BAMBI_1D_19092017_MARTi
SampleDir:/Users/leggettr/Documents/Projects/MARTiTest/BAMBI_1D_19092017_MARTi
ProcessBarcodes:
BarcodeId1:SampleNameHere

Scheduler:local
LocalSchedulerMaxJobs:4

InactivityTimeout:10
StopProcessingAfter:50000000

TaxonomyDir:/Users/leggettr/Documents/Databases/taxonomy_6Jul20
LCAMaxHits:20
LCAScorePercent:90
LCAMinIdentity:60
LCAMinQueryCoverage:0
LCAMinCombinedScore:0
LCAMinLength:50

ConvertFastQ

ReadsPerBlast:8000

ReadFilterMinQ:9
ReadFilterMinLength:500

BlastProcess
    Name:nt
    Program:megablast
    Database:/Users/leggettr/Documents/Databases/nt_30Jan2020_v5/nt
    TaxaFilter:/Users/leggettr/Documents/Datasets/bacteria_viruses.txt
    MaxE:0.001
    MaxTargetSeqs:25
    RunMeganEvery:0
    BlastThreads:4

BlastProcess
    Name:card
    Program:blastn
    Database:/Users/leggettr/Documents/Databases/card/nucleotide_fasta_protein_homolog_model.fasta
    MaxE:0.001
    MaxTargetSeqs:100
    BlastThreads:1