Skip to main content
  • Original Article
  • Open access
  • Published:

Insights on genomic diversity of Vibrio spp. through Pan-genome analysis



The aquaculture sector is a major contributor to the economic and nutritional security for a number of countries. India’s total seafood exports for the year 2017–2018 accounted for US$ Million 7082. One of the major setbacks in this sector is the frequent outbreaks of diseases often due to bacterial pathogens. Vibriosis is one of the major diseases caused by bacteria of Vibrio spp., causing significant economic loss to the aquaculture sector. The objective of this study was to understand the genetic composition of Vibrio spp.


Thirty-five complete genomes were downloaded from GenBank comprising seven vibrio species, namely, Vibrio alginolyticus, V. anguillarum, V. campbellii, V. harveyi, V. furnissii, V. parahaemolyticus, and V. vulnificus. Pan-genome analysis was carried out with coding sequences (CDS) generated from all the Vibrio genomes. In addition, genomes were mined for genes coding for toxin-antitoxin systems, antibiotic resistance, genomic islands, and virulence factors.


Results revealed an open pan-genome comprising of 2004 core, 8249 accessory, and 6780 unique genes. Downstream analysis of genomes and the identified unique genes resulted in 312 antibiotic resistance genes, 430 genes coding for toxin and antitoxin systems along with 4802, and 4825 putative virulent genes from genomic island regions and unique gene sets, respectively.


Pan-genome and other downstream analytical procedures followed in this study have the potential to predict strain-specific genes and their association with habitat and pathogenicity.


Aquaculture, the farming of aquatic animals such as finfishes, crustaceans, mollusks, and seaweed, has the potential to contribute significantly to the society and economy. The annual production of the year 2016, accounting for 80 million tons worth USD 231.6 billion, with 19.3 million people employed worldwide, signifies the production potentials and importance of the aquaculture sector (FAO 2018). One of the major impediments to the aquaculture sector is the diseases caused by various factors, especially diseases caused by infectious agents such as bacteria and viruses. Vibriosis is one such disease caused by Vibrio species, significantly affecting the productivity of aquaculture. The incidences of human infections caused by Vibrio spp. are also significant (Kim and Lee 2017; Ina-Salwany et al. 2019).

Vibrios are naturally occurring gram-negative, facultative anaerobes found in brackish and marine water. These are opportunistic pathogens and cause disease in finfish and shellfish when the host species are under stress (Ina-Salwany et al. 2019). They are also found to cause human illness such as septicemia, gastroenteritis, and necrotizing wounds (Ina-Salwany et al. 2019). More than 100 Vibrio species have been identified comprising both virulent and non-virulent strains. At times the non-virulent strains have also found to cause chronic infections due to acquisition of virulence factors (Le Roux et al. 2015). Some of the species known to cause Vibriosis in finfish and shellfish are V. parahaemolyticus, V. vulnificus, V. campbellii, and V. alginolyticus. Recent studies reported presence of virulent genes, viz., chiA, vhpA, luxR, flaC, tdh, and tlh (Mohamad et al. 2019; Mok et al. 2019; Soto-Rodriguez et al. 2019) antibiotic resistance to beta-lactam, tetracycline, sulfamethizole, etc., and associated genes, viz., aadA16, blaTEM1B, ARR3, sul1, tet(B), tet(E), tet(R), dfrA27, Zeo, qacEA, and RND efflux pump (Silvester et al. 2015; Mohamad et al. 2019; Mok et al. 2019; Soto-Rodriguez et al. 2019) and toxin-antitoxin systems especially the Phd/doc and ccdvfi (Guérout et al. 2013; Lloyd et al. 2019), HigBA, HipBA, ParDE, etc.(Szekeres et al. 2007; Guérout et al. 2013), present within and outside the superintegrons of Vibrio species.

Recent advances in sequencing technologies and public domain repositories enabled researchers to access large volumes of genomic data, which includes whole genome sequences. Bioinformatics analysis would provide the gene composition and their functions of the genomes. This structural and functional information would pave the way for inter and intra species comparison of a large number of genomes. In GenBank 181 complete genome sequences of Vibrio spp. are available as on 21 Feb 2019. Vibrios have two chromosomes with a significant genetic diversity shaped by recombination and horizontal gene transfer (Baker-Austin et al. 2018). Comparative analyses of different species from the same genus helps us in understanding the evolution, gene repertoire, and the minimal genome size which brings about the concept of pan-genome into the picture. In this study, pan-genome analysis was performed to understand the gene profiles and evolutionary patterns in the genomes of Vibrio spp. Further, attempts were made to elucidate information on toxin-antitoxin systems (TA systems), virulence, and antibiotic resistance of Vibrio spp.

Materials and methods

Genome retrieval

Thirty-five genome sequences of seven Vibrio spp., comprising three strains of V. alginolyticus (strains ATCC 17749; ATCC 33787; ZJ-T), four strains of V anguillarum (strains 90-11-286; 775; M3; NB10), three strains of V campbellii (strains ATCC-BB120; LB102; LMB29), one strain of V furnissii NCTC 11218, three strains of V harveyi (strains ATCC 33843; ATCC 43516; QT520), 13 strains of V parahaemolyticus (strains ATCC 17802; BB22OP; CDC K4557; CHN25; FDA R31; FORC_004; FORC_006; FORC_008; FORC_014; FORC_018; FORC_023; RIMD 2210633; UCM-V493), eight strains of V vulnificus (strains 93U204; ATL 6-1306; CMCP6; FORC_009; FORC_016; FORC_017; MO6-24/O; YJ016) were downloaded from GenBank. Amongst the genomes included in this study, the strain LB102 of V. campbellii sequenced in-house is the only entry at the scaffold level wherein others are at closed/complete level. V. cholerae N16961 was included as an outgroup for phylogenetic comparisons. Table 1 provides a glimpse of the genomes included in this study. The genome similarity was visualized using a BLAST atlas generated with BLAST Ring Image Generator (BRIG) (Alikhan et al. 2011).

Table 1 Table showing general statistics of 35 genomes retrieved from GenBank

Genome annotation

Annotation of the sequences was carried out using Prokka, a command line rapid annotation tool written using Perl (Seemann 2014). Along with the prediction of coding sequences (CDS), Prokka carries out identification of RNA sequences using RNAmmer (Lagesen et al. 2007) and clustered regularly interspaced short palindromic repeats (CRISPR) sequences using MINCED (Bland et al. 2007). Coding sequences generated from Prokka were subjected to further analysis.

Pan-genome analysis

Coding sequences of all the genomes were concatenated and converted into a single FASTA file. Clustering of orthologous genes was done using the Usearch (v10.0.240_win32) (Edgar 2010). The orthologous cluster thus produced is used to generate the binary presence/absence matrix. Pan-genome analysis was carried out using bacterial pan-genome analysis pipeline (BPGA) (Chaudhari et al. 2016). Shared genes were calculated following the stepwise addition of each genome. The trend thus obtained was plotted as core and pan profile curves as shown in Fig. 2. Along with pan-genome profiles, BPGA also provides information on genome-wise exclusively absent gene counts based on the orthologous clusters. This is followed by the generation of pan-core phylogeny. The pan phylogeny was generated using the pan-matrix data and the core phylogeny using the concatenated core genes, the trees were constructed using the neighbor-joining method with a default combination value of 20 iterations and V. cholerae as the outgroup. The functional and pathway analysis was carried out by searching through the clusters of Orthologous Groups (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGGs) databases and by assigning the COG and KEGG categories to the sequences from the best hits obtained.

Genomic islands

The genomes were subjected to IslandViewer 4 (Bertelli et al. 2017) for the prediction of genomic islands. IslandViewer uses multiple algorithms, namely, IslandPath-DIMOB, SIGI-HMMer, IslandPick, and Islander, for prediction. It also identifies resistant and virulent gene regions present in the genomic island regions; these predictions are based on the information available on the databases, namely, CARD (Comprehensive Antibiotic Resistance Database) (Jia et al. 2017), VFDB (Virulence Factor Database) (Chen et al. 2004), and PATRIC (Pathosystems Resource Integration Center) (Wattam et al. 2017).

Virulence prediction

VirulentPred (Garg and Gupta 2008), a web-based tool built based on support vector machines (SVM) cascade algorithm, was used for prediction of virulent genes. CDS present in the genomic islands and predicted unique genes from the pan-genome analysis were subjected to VirulentPred tool with default settings.

Antibiotic resistance

The antibiotic-resistant genes were identified using two methods, (1) subjecting the genome sequences to ResFinder (Zankari et al. 2012), a web-based tool with the default parameters, and (2) collecting the antibiotic gene data available in the PATRIC database, in order to determine almost all the resistant genes present in the genomes.

Toxin-antitoxins prediction

To identify the TA systems present in the genomes, TAfinder (Xie et al. 2018), a web-based tool, was used to predict the type II TA systems present in the genomes. This tool is capable of identifying and predicting the type II TA loci present in the bacterial genomes. The predictions were carried out based on the homologous search module and the operon detection module with the whole genome sequences, and the GenBank files as the input.

Results and discussion

Genome features

The genome sequences of 7 Vibrio species, namely, V. alginolyticus, V. anguillarum, V. campbellii, V. furnissii, V. harveyi, V. parahaemolyticus, and V. vulnificus, were retrieved from GenBank. These species were selected for the study due to their common occurrence in aquaculture. Their genomes ranged from 4.03–6.17 megabase pairs (Mbp) in size and the number of genes and proteins were in the ranges of 3693–5886 and 3426–5574, respectively. Amongst the 35 genomes, V. campbellii had the highest number of genes and proteins compared with all other species in this study. V. harveyi reported to be very close orthologue to V. campbellii stands second in terms of gene and protein count, wherein V. anguillarum stands last. GC content of the selected genomes ranges from 44.37 to 50.63%. The similarity between the genomes is visualized using a BLAST shown in Fig. 1, by aligning the thirty-five genomes against V. parahaemolyticus RIMD 2210633 along with V. cholerae as an outgroup. The comparison of the genomes was carried out with an upper identity threshold of 90% and a lower identity threshold of 70% using BLASTn. The BLAST atlas illustrates that the genome of V. parahaemolyticus is more similar to V. alginolyticus, V. campbellii, and V. harveyi than other species of interest, and this similarity was further observed in the phylogeny analysis also.

Fig. 1
figure 1

Image showing the BLAST results of the 35 genome (along with V. cholerae as an outgroup) against the reference genome in the form of circular consecutive ring. The identity percentage is shown by the color gradient for each ring; the regions appearing gray denote 70% similarity and the regions appearing in solid colors are similar at 90% or above

Genome annotation

The genomes were annotated using Prokka before subjecting to other analyses in order to standardize and enhance the annotations. This step was carried out to normalize the anomalies in the annotations if any present, keeping in mind the fact that different annotation approaches could have been followed for each genome sequence that was submitted to GenBank. Prokka predicted the CDSs, RNAs, and CRISPRs using Prodigal, RNAmmer, and MINCED, respectively. The Prokka annotations and statistics of the genome feature revealed that an average of 4609 CDS, 31 genes coding for rRNAs and 117 genes coding for tRNAs, and 1 gene coding for tmRNAs were present in a genome; the genome-wise statistics is given in Supplementary Table 1.

Pan-genome analysis

The pan-genome of seven Vibrio species (35 strains) consisted of 17,033 genes, which included 2004 (11.76%), 8249 (48.43%), and 6780 (39.81%) core, accessory, and unique genes, respectively. The variable content of 88.24% present in the pan-genome reflects the high genetic diversity of Vibrio spp. Genome-wise detailed statistics on the core, accessory, unique genes along with exclusively absent genes are given in Supplementary Table 2.

Core-pan profile

Based on the number of core, accessory, unique, and exclusively absent genes, the core-pan profile is plotted. The core-pan profile plot exhibits the open nature of the pan-genome, i.e., with the sequential addition of genomes, the new genes count tends to increase significantly which again establishes the diverseness of the Vibrio genomes. In general, open pan-genomes are predominantly observed in bacteria that are susceptible to horizontal gene transfer (Hurtado et al. 2018). Figure 2 represents the increase in the number of new genes in the pan-genome and the decrease in the number of core genes with the sequential addition of the genomes.

Fig. 2
figure 2

Image showing the core-pan profile of the genomes. The blue boxes represent the number of new genes for each genome and the green boxes represent the core gene count

Pan and Core phylogeny

Both the pan and core phylogenies were computed using the neighbor-joining method with a default combination of 20 iterations. Both the pan phylogeny represented in Fig. 3 and the core phylogeny represented in Fig. 4 showed similar clustering patterns. Here in both the pan and core phylogeny, all seven species are monophyletic. The strains of V. parahaemolyticus and V. alginolyticus clusters close to each other, with a similar observation made with respect to V. campbellii and V. harveyi in both the pan and core phylogeny, whereas V. furnissii clusters near V. vulnificus in the pan phylogeny and near V. anguillarum in core phylogeny. By and large, clear distinction of species was observed in phylogenies made with core and pan genes.

Fig. 3
figure 3

Image showing the phylogenetic tree of the Pan-genome

Fig. 4
figure 4

Image showing the phylogenetic tree of Core genome

Functional and pathway analysis

The functional and pathway analyses were carried out using KEGG and COG annotations. The COG annotations as represented in Fig. 5 revealed that a majority of the genes were found to be associated with metabolism, information storage, and processing. Probing further on COG annotations revealed that a large group of genes was involved in transcription followed by signal transduction. Similarly, the KEGG annotations as represented in Fig. 6 revealed that on a broader classification, genes were found to be involved in metabolism and environmental information storage and processing. On further evaluation, it was known that the genes were found to be involved in carbohydrate, amino acid, and lipid metabolism; signal transduction; replication and repair; translation; cell motility; and in energy metabolism. The core genes are mostly involved in metabolism and the accessory and unique genes are majorly involved in information storage and processing with respect to environment and other factors. From these results, we can infer that the genome is designed in such a way that the genes in some cases belonging to the core gene set and in some cases belonging to the variable gene set are largely involved in the process and mechanism of survival during variable conditions and thereby contributing to the virulence and pathogenicity of the organism.

Fig. 5
figure 5

Percentage of genes involved in information processing, cellular processes, and metabolism in Vibrio spp., based on COG analysis

Fig. 6
figure 6

Percentage of genes involved in information processing, cellular processes, and metabolism in Vibrio spp., based on KEGG analysis

Toxin-antitoxin systems

The toxin-antitoxin prediction using the TAfinder revealed the presence of 394 type II toxins and antitoxins on total in the 35 organisms. The list of the TAs present in each genome is given in Supplementary Table 3. Amongst the 35 organisms, V. harveyi QT520 contained the highest number of the toxin-antitoxin systems present in them, followed by V. vulnificus and V. campbellii strains, respectively (Fig. 7). The most commonly observed toxin/antitoxin (T/A) pairs amongst the 35 genomes are relE/relB, relE/PHD, relE/RHH, GNAT/RHH, and HipA/Xre. The TA system is important as it reveals not only pathogenicity of bacteria but also its role in the defense mechanism. Every organism has its own toxin and antitoxin system, the toxin part of the system aids in cell death during unfavorable conditions and the antitoxin part of the system also helps in unfavorable conditions but in this case, it helps the organism to overcome the obstacles caused by various external and internal factors. All the 35 genomes were identified with at least one toxin-antitoxin pair.

Fig. 7
figure 7

Percentage of genes encoding the toxin-antitoxin families in 35 Vibrio genomes

Antibiotic resistance

Antibiotic resistance in bacteria is becoming one of the major threats to human health. With the emergence and progression of diseases on one side and the bacteria becoming resistant to one or more drugs on the other side, it is important to study the bacteria’s competence and the mechanism of action it adopts to safeguard itself from one or more drugs. Hence, it is very important to delineate these antibiotic-resistant genes.

The antibiotic resistance prediction revealed the presence of 312 resistant genes in total (Supplementary Table 4). Twenty-one of the 36 genomes were identified with antibiotic resistance and 6 of them had at least one gene coding for multidrug resistance. The predicted antibiotic resistance genes were primarily involved in functions such as target site protection, antibiotic efflux, antibiotic inactivation, and decrease in antibiotic permeability. These genes were found to be coding for resistance against some of the highly important antibiotics such as tetracycline, beta-lactam, chloramphenicol, fluoroquinolone, fosfomycin, and sulfonamides like trimethoprim (Guglielmini and Van Melderen 2011). Six strains, viz., V. alginolyticus ATCC 17749, V. anguillarum, V. furnissii, V. parahaemolyticus RIMD 221063, and V. vulnificus CMCP6 and YJ016, were identified with the presence of multidrug resistant genes. The multidrug resistant genes present in these 6 strains were of two types, i.e., (i) multidrug resistance transporter and (ii) multidrug efflux pump.

Virulence factors

One of the important components of a genome is the genes coding for virulence factors. Hence, the prediction of potentially virulent genes is necessary. Genomic islands are regions of the genome that had been horizontally transferred. The genomic islands and the unique genes were subjected to virulence prediction using VirulentPred. As a result, 4802 out of 7098 genes of the genomic islands and 4825 out of 6780 unique genes were predicted to be potentially virulent (Supplementary Table 5). The analysis of the predicted virulent genes revealed that amongst the unique virulent genes, 1885 had proper annotations, whereas 2756 were hypothetical, 64 were unknown, and 120 were annotated as uncharacterized genes. Similarly, amongst the genomic islands, 1773 had proper annotations and 3029 were annotated as hypothetical proteins.

V. campbellii strain LB 102’s unique genes were studied in detail due to its predominant occurrence in the shrimp hatcheries present in the coastal regions in India. The unique genes from V. campbellii LB 102 were extracted and checked for virulence; a total of 131 unique genes were subjected to virulence prediction. The analysis revealed the presence of 19 virulence genes out of which, 13 had known functions and the remaining 6 were categorized as hypothetical proteins. On delving deep into the function of these 13 genes might help us understand the host-pathogen interaction with strain specificity.


The pan-genome analysis helped us in figuring out the functions of the core genes and the dispensable (accessory and unique) genes. The pan-genome, genomic islands, and the phylogenetic analysis were useful for understanding the genetic diversity, the evolutionary pattern, and relationships. In addition, prediction and identification of genes coding for toxin-antitoxin systems, virulence factors, and antibiotic resistance implicate to the pathogenicity, survival, and the defense mechanisms of Vibrio spp. The unique genes could serve as targets for developing diagnostics and help in designing other methods to scale down the epidemic.


Download references


The Indian Council of Agricultural Research (ICAR), New Delhi, provided financial support under the “Network Project on Agricultural Bioinformatics and Computational Biology.”

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ashok Kumar Jangam.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Research involving human participants and/or animals


Informed consent


Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material


(XLSX 11 kb)


(XLSX 9 kb)


(XLSX 18 kb)


(XLSX 22 kb)


(XLSX 491 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nathamuni, S., Jangam, A.K., Katneni, V.K. et al. Insights on genomic diversity of Vibrio spp. through Pan-genome analysis. Ann Microbiol 69, 1547–1555 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: