The Dengue Virus Automated Typing Tool: Introduction.
The definition of a dengue genotype is based on old studies with partial sequencing and sequenced strains in the public domain. Time has passed and the virus has diverged into more complex lineages. We consider that the virus will continue to evolve and new lineages might turn into new serotypes and genotypes in time. We also consider a wide diversity of variables to define a genotype, i.e. monophily, pairwise distance within group and between groups, net genetic diversity within groups, etc.
We understand that there is no good consensus in the dengue world for naming genotypes. The global spread of the virus has made the previous naming of genotypes by geographic association irrelevant. DENV-1, -3, and -4 have genotypes names by roman numerals (i.e. I, II, III, etc.) but there are still publications with contrasting classifications.
Geographic association names DENV-2 genotypes. Considering that geographic association of certain genotypes might be too difficult, we find that using roman numerals (i.e. I, II, III, etc.) could be more appropriate. In addition, a numeral system might also be beneficial to stay in structure with new diverging lineages and genotypes. However, in our tool we still maintain geographic association names for DENV-2 (e.g. like the SE Asia and American genotypes of DEN2) as researchers that deal the public health issue are accustomed to these designations.
We have carefully selected reference sequences that represent the diversity of each genotype. In addition, we performed extensive testing to be sure that our reference strains accurately classify other sequences.
- We include serotypes 1, 2, 3 and 4.
- Serotype 1 includes 1I ,1II, 1III, 1IV and 1V genotypes.
- Serotype 2 includes 2I (American), 2II (Cosmopolitan), 2III (Southern Asian-American), 2IV (Asian II), 2V (Asian I) and 2VI (Sylvatic) genotypes.
- Serotype 3 includes 3I, 3II, 3III and 3V genotypes.
- Serotype 4 includes 4I, 4II, 4III and 4IV genotypes.
Partial vs. whole genome. Whole genome sequences (WGS) are not commonly produced for epidemiological purpose or are available in the public domain. Most of the WGS came from the GRID consortium sponsored by the Broad Institute of Harvard and MIT a few years ago. As a result of this project, collaborating sites deposited hundreds of WGS in GenBank, however, redundancy still exists in these datasets, i. e. many samples from one region, same year. There is consensus that the envelope glycoprotein gene (E) is good for phylogenetic classification. This gene has been very effective since it contains sufficient phylogenetic signal (1,485bp) to identify dengue from other viruses and differentiate between serotypes and genotypes. Phylogenetic trees inferred with E gene sequence data generate topologies very similar to whole genome trees. Therefore, investigators working with dengue sequence can expect a similar accuracy for classification of WGS and E gene.
Fig. 1. Dengue virus phylogenetic tree for WGS and E gene