What is the format of the input files?Fasta. The files have to be at fasta format, one file by sample, and the sequences can use IUPAC code.
Is it possible to analyze multiple files in one time?Yes. Up to 10 files are allowed in the same time. You should have only one genome by fasta file.
Can I upload raw reads from sequencer (fastq file)?No. The clermonTyping method only works on pre-assembled contigs. For assembly you can check for others programs suchs as SPAdes or ngopt for illumina data.
Is there any limit in upload?Yes. The maximum size for a file is currently 10Mb. An E.coli assembly should be under 5Mb. If your file is greater than 10Mb, this is most likely a poor quality assembly. In this case, try to manually remove smalls contigs to reduce file size.
The quadruplex column In the quadruplex column you will find a representation of presence(+)/absence(-) for the 4 genes described in Clermont O, Christenson JK, Denamur E, Gordon DM. The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups. Respectively, arpA, chuA, yjaA and TspE4.C2 .
What should I do if phylogroup and mash columns are different?It may indicate that your sample is contaminated by another one (in this exemple IAI24 is suspect because the phylogroup column gives E but mash column gives D). You can increase the value of contig size cutoff in advanced parameters. Further analyses can also be done such as a recombination events detection pipeline.