Genomic analysis of a functional haloacid-degrading gene of Bacillus megaterium strain BHS1 isolated from Blue Lake (Mavi Gölü, Turkey)

Bacillus megaterium strain BHS1, isolated from an alkaline water sample taken from Mavi Gölü (Blue Lake, Turkey), can grow on minimal medium containing 2,2-dichloropropionic acid. We characterized this bacterium at the genomic level. The HiSeq platform was used to carry out genome sequencing, de novo assembly, and scaffolding with strain BHS1. Next, genome data were analyzed to demarcate DNA regions containing protein-coding genes and determine the function of certain BHS1 genes. Finally, results from a colorimetric chloride ion–release assay demonstrated that strain BHS1 produces dehalogenase. De novo assembly of the BHS1 genomic sequence revealed a genome size of ~ 5.37 Mb with an average G+C content of 38%. The predicted nuclear genome harbors 5509 protein-coding genes, 1353 tRNA genes, 67 rRNA genes, and 6 non-coding (mRNA) genes. Genomic mapping of strain BHS1 revealed its amenability to synthesize two families of dehalogenases (Cof-type haloacid dehalogenase IIB family hydrolase and haloacid dehalogenase type II), suggesting that these enzymes can participate in the catabolism of halogenated organic acids. The mapping identified seven Na+/H+ antiporter subunits that are vital for adaptation of the bacterium to an alkaline environment. Apart from a pairwise analysis to the well-established L-2-haloacid dehalogenases, whole-cell analysis strongly suggested that the haloacid dehalogenase type II might act stereospecifically on L-2-chloropropionic acid, D,L-2-chloropropionic acid, and 2,2-dichloropropionic acid. Whole-cell studies confirmed the utilization of these three substrates and the gene’s role in dehalogenation. To our knowledge, this is the first report of the full genome sequence for strain BHS1, which enabled the characterization of selected genes having specific metabolic activities and their roles in the biodegradation of halogenated compounds.


Introduction
The soda lake, Mavi Gölü (Blue Lake), in Turkey is a famous tourist destination known for its uniquely beautiful turquoise water that is retained in the lakebed from June to December. It is formed in the Göksu Creek, the only carbonated water that flows into the Black Sea. Given the lake's unique and rugged landscape, the identification/characterization of the lake's culturable microbial communities is essential for expanding the repertoire of alkaliphilic microbes in fortifying available databases for alkaliphilic microbes. The data are particularly pertinent for basic research that elucidates various applications of alkaliphiles, such as in food industries, bioremediation, and medicine (Batumalaie et al. 2018;Bagherbaigi et al. 2013;Neelam et al. 2019;Kevbrin, 2019). Despite being unpolluted, Mavi Gölü may contain organobromines, which are produced naturally in the lake. The organobromine naturally produced by an array of the lake's biological and chemical processes. The processes might trigger certain bacteria to produce dehalogenases (Gribble, 2000). Hence, we investigated the properties of dehalogenases produced by bacteria in heavily polluted areas as well as natural ecosystems that produce environmentally detrimental halogenated compounds.
Because BHS1 produces a dehalogenase, it can utilize 2,2-dichloropropionic acid (2,2DCP) as a sole carbon source (Wahhab et al. 2020). Therefore, we sequenced the genome of strain BHS1 to understand the regulatory mechanisms for overexpression of dehalogenase genes and how dehalogenases contribute to BHS1 survival in an alkali-laden ecosystem. Previous literature reported that B. megaterium produces proteins of unknown function (Korneli et al. 2013) that could potentially be used for bioremediation. Thus, to consider its appropriate applications and understand its adaptability to alkali-laden environments such as soda lakes, the acquisition of a full genomic sequence for BSH1 is of interest.
Preparation of pure genomic DNA B. megaterium strain BHS1 was isolated from water samples taken from Mavi Gölü and grown aerobically in minimal medium containing 20 mM 2,2-DCP (pH 9.0) at 30°C with rotation. After 18 h, the sample was harvested via centrifugation (13,000×g, 20 min, 4°C). The QIAamp DNA Minikit (Qiagen, Germany) was used to extract and purify genomic DNA, the quality of which was validated using a Qubit 2.0. fluorometer (Apical Scientific, Malaysia). The purity of DNA samples (UVA 260nm /A 280nm ) was assessed using a NanoDrop spectrophotometer (ThermoFisher).

Analysis of the 16S rRNA gene
Amplification of the 16S rRNA gene. Genomic DNA from strain BHS1 was extracted from bacterial cultures grown on minimal media containing 20 mM 2,2-DCP using the Wizard® Genomic DNA Purification kit. PCR was used to amplify the target DNA fragments using the universal primers fP1 (5′-AGAGTTTGATCCTGGC TCAG-3′) and rP1 (5′-ACGGTCATACCTTGTTAC GACTT-3′) (Fulton and Cooper, 2005). PCR was carried out by 30 cycles of denaturation at 94°C for 1 min and annealing at 55°C for 1 min, with final extension at 72°C for 10 min. Amplicons were purified using the QIAquick PCR purification kit (Qiagen, Germany) and sequenced by the 1st Base Laboratories Sdn Bhd. (Malaysia).
Evolutionary distances were computed using the Maximum Composite Likelihood method (Tamura et al. 2013), with units being the number of base substitutions per site. This analysis involved 15 sequences for bacterial 16S rDNA.

Genome sequencing, de novo assembly, and scaffolding
The genome of strain BHS1 was sequenced using the HiSeq platform. The insert library with ≤ 500 bp was prepared using standard protocols as stated by New England Biolabs Inc. and sequenced using the NGS-Illumina HiSeq TM 2000 Platform (Apical Scientific, Malaysia). All sequences were uploaded to the UCLA CNSI Hoffman 2 computer cluster lab located at the Malaysia Genome Institute for gene assembly. The read maps were based on the Pipeline of Bioinformatics Analysis afforded by Apical Scientific. Data processing, involving the original optical density data obtained by high-throughput sequencing (Illumina platform), were transformed into raw sequenced reads by CASAVA (Hosseini, et al. 2010) and stored in FASTQ format for both the read sequences and the corresponding sequencing-quality information. Quality control was performed, and low-quality sequences were removed.
De novo genome assembly was accomplished using Velvet assembler v1.2.10 at the Malaysia Genome Institute (Zerbino and Birney, 2008). High-quality, short paired-end Illumina reads were assembled using Velve-tOptimiser.pl v2.2.5. An optimal parameter value of kmer lengths was identified, and the best assembly was chosen based on N50, number of contigs, and assembly size values. Scaffolding was performed using SSPACE-Standard v3.0 (Boetzer et al. 2010). SSPACE revealed distance information for the paired-end Illumina reads to assess the order, distance, and orientation of the assembled contigs, which were combined into scaffolds. The assembled genome was further refined using the multi-draft-based scaffolder MeDuSa v1.6 (Bosi et al. 2015). This tool allows for the use of multiple reference genomes during scaffolding.
The statistics for final assembly were successfully obtained as one continuous scaffold with 5.37 Mb of total genome size, GC content 38%, and 0.05% gaps (contains N bases). For data accessibility, the B. megaterium BHS1 (L1) genome was deposited into NCBI under BioProject PRJNA637885. The Fasta genome submission was under accession number CP058255 (https://submit.ncbi.nlm.nih.gov).

Genome data analysis
Following genome assembly, genetic prediction or annotation was performed to identify DNA regions containing protein-coding genes. To determine the function of genes in the B. megaterium BHS1 genome, suitable software and databases, such as BLAST (Basic Local Alignment Search Tool), InterProScan, KEGG, and Blast2Go, were used. BLAST is a program that compares protein or DNA sequences with sequences listed inside various databases, i.e., nr, ref-seq, or SWISS-PROT.
The BLAST search tools used in this study were Blas-t2GO high-performance cloud server, BLAST Search, and the stand-alone NCBI-BLAST+ with DIAMOND high-performance analysis with a local database. BLAST analysis aided in finding sequence similarity with proteins in the non-redundant database GenBank with the default parameter set BLOSUM62. Automated annotation using Blast2GO version 5.2.5. (2018) [http://www. blast2go.com] was conducted to characterize the coding regions in B. megaterium strain BHS1. OmicsBox version 1.3 (https://www.biobam.com/omicsbox) was used for mapping, annotation, and visualization as well as the quantitative and statistical analyses for the scaffold of strain BHS1. This program was utilized because of its advanced features compared with Blast2GO (Götz et al., 2008).
Depending on the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Galperin et al. 2014;Zhang et al. 2019) with BLASTP, several databases were utilized for genome annotation and gene function prediction, including Gene Ontology, Clusters of Orthologous Groups (COG), and "shn." Java version 1.8.0_152 (2019), which is the OS version for Java 10.0 (http://java.sun.com), was used for accessing most databases. Comparison against other reference genomes was performed using a progressive Mauve tool; Mauve is a system for constructing multiple genome alignments that provide a basis for comparative genomics and the study of genome-wide evolutionary dynamics (Darling et al. 2010).
Whole-cell analysis of maximum chloride released from 2,2DCP, D,L-2-chloropropionic acid (D,L-2CP), L-2chloropropionic acid (L-2CP), and D-2CP B. megaterium BHS1 was incubated with individual selected substrates (2,2-DCP, D,L-2CP, L-2CP, and D-2CP) under specific growth conditions as previously described (Wahhab et al. 2020). A pure grade of the substrates above were purchased from Sigma-Aldrich or Merck (USA). The chloride ion released was determined colorimetrically for each substrate (Bergmann and Sanik, 1957). The color was allowed to develop for 10 min at 20°C and measured at A 460 nm . After incubation at 20°C for 10 min, chloride-ion release for each substrate was determined colorimetrically (A 460 nm ) (Bergmann and Sanik, 1957).

Organism information
The general features of strain BHS1 have been described by Wahhab et al. (2020). Its full 16S rRNA gene sequence was deposited in GenBank under accession number MT883351. The 16S rRNA gene sequence of BHS1 (GenBank accession number MT883351) was analyzed to determine its evolutionary relationships with other bacterial species. The final dataset contained 1591 bases. Evolutionary analyses were conducted with MEGA X (Kumar et al. 2018). As shown in Figure 1, BHS1 is closely related to B. megaterium strains NCT-2 and WSH-002. Hence, it was confirmed that BHS1 belongs to the species B. megaterium (homotypic synonym of Priestia megaterium) (Gupta et al. 2020) (Fig. 1).
Comparison with other complete reference genome sequences A draft genome for BHS1 was generated based on a comparison with complete reference genomes of B. megaterium strains available in the NCBI database. The reference strains WSH-002 (NC_01738) and NCT-2 (NZ_CP032527) agreed with the 16S rRNA gene analysis. Figure 2 shows the alignment of the whole genome, where each genome is laid out horizontally. Homologous genome segments in the BHS1 draft genome are shown as colored blocks connected by lines to similarly colored blocks of the reference genomes for the other two strains. Regions that are entirely white could not be aligned and probably contain sequence elements that were recently acquired and thus are unique to the particular strain.

Whole-genome sequencing information for strain BHS1
Analysis of one scaffold revealed that B. megaterium strain BHS1 has a 5,376,285-bp circular chromosome with an average G+C content of 38%. Analysis of the nuclear genome of strain BHS1 predicted~5509 protein-coding genes, 79 tRNA genes, 67 rRNA genes, and 6 non-coding (mRNA) genes. Figure 3 shows the circular map of the genome, which was generated using CGview based on open reading frames with COG information. Table 1 summarizes other general features of the genome, and Table 2 lists the functional categories in COG for the full genome sequence of strain BHS1. The main categories are cell wall/membrane/envelope biogenesis (M)~28.3%, transcription (K) 8.0%, translation, ribosomal structure and biogenesis (J) 3.6%, and transport and metabolism of nucleotides (F)~3%.
With respect to potential biotechnology applications on the industrial aspect, the genome for BHS1 harbors several important industrial enzymes such as α-2 and βamylase 1. Moreover, although plants, algae, fungi, and yeast have been the traditional primary sources for urease, this enzyme is also produced by strain BHS1. Other biotechnologically relevant enzymes encoded in the BHS1 genome include thirteen proteases, nine ureases, seven aminopeptidases, five lipases, four aminomutases, three serine proteases, three enzymes for L-lysine synthesis, two glucose dehydrogenases, two pullulanases, and one mutarotase. Thus, it is clear that alkaliphilic Bacillus strains are quite important and exciting to explore. This includes the production of biotechnologically important enzymes such as the annotated extracellular hydrolases. Fig. 1 Phylogenetic tree highlighting the relative position of the Bacillus genus with another closely related genus/species and strains. The evolutionary history was inferred using the Neighbor-Joining method (Saitou and Nei, 1987), where the optimal tree with the sum of branch length is shown to be 0.287. The percentages of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) were sited next to the branches (Felsenstein, 1985). The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree Dehalogenase genes and dehalogenase subunits for adaptation of strain BHS1 to an alkaline environment The germane finding of the genomic annotation for strain BHS1 was the identification of a metabolic pathway that includes two dehalogenase families that were inferred based on the ability of the cells to optimally grow on minimal medium containing 2,2-DCP as the sole carbon source, as reported by Wahhab et al. (2020).
Thus, genes linked to these two dehalogenases with a Locs tag were found in the genome of strain BHS1: the first gene encodes a Cof-type haloacid dehalogenase IIB family hydrolase , and the second encodes a haloacid dehalogenase type II (Table 3).
The existence of more than a single dehalogenase gene in a bacterial genome has been reported only for Rhizobium sp. RC1, and this bacterium has the ability to grow solely on 2,2-DCP and D,L-2-chloropropionate as the only sources of both carbon and energy (Allison et al. 1983;Leigh et al. 1986). To date, no study has explored the possibility that dehalogenase genes could contribute   The total based on the total number of genes in the full genome to adaptation to an alkali-laden environment (Ismail et al. 2017). The 2-haloacid dehalogenases act on 2-haloacids, liberating a halide ion(s) and the corresponding 2-hydroxy acid. The 2-haloacid dehalogenases have been categorized phylogenetically into two groups, namely I and II. Group II, or L-type enzymes, include L-2-haloacid dehalogenases (i.e., L-DEXs) that explicitly target L-2haloacids. The group II enzymes are more common than group I dehalogenases, which catalyze on both D and Lhaloacids (Wang et al. 2018).

Co-transport mechanisms
Many cation/proton antiporters have been identified in bacteria (Krulwich et al. 2009;Tsujii et al. 2020). Cation/ proton antiporters contribute to maintaining the appropriate intracellular pH homeostasis particularly in alkaline environments that require import of hydrogen ions from a comparatively proton-poor external environment. Three families of cation/proton antiporters of the Gene Ontology Membrane Transporter Group have been identified in bacteria and perform significant roles in maintaining cation equilibrium in the cytoplasm in extreme environmental settings. For instance, monovalent cation/proton antiporters are integral membrane proteins that facilitate the efflux of cytoplasmic sodium, potassium, or lithium ions in exchange for external hydrogen ions (protons) (Morino et al. 2014). The genome of strain BHS1 which was isolated from the alkaline lake, encodes all seven cation/proton antiporter subunits (A-G) in a single gene. These seven subunits with their Locs tag are shown in Table 4. Na + /H + antiporters have numerous functions, namely the creation of an essential electrochemical gradient of Na + across the plasma membrane for the purpose Na + -driven flagellar rotation, the efflux of Na + and Li + (both of which are toxic if accumulated at high concentrations within cells), and regulation of intracellular pH under alkaline conditions (Wai Liew et al. 2007). Interestingly, strain BHS1 encodes several types of Na + /H + antiporters ( Table 4) that might be vital for maintaining intracellular pH during growth in alkaline environments, e.g., pH 10.5, similar to the previously reported Na + /H + antiporter gene (g1-nhaC) of alkaliphilic Bacillus sp. G1 when expressed in Escherichia coli (Wai Liew et al. 2007). The full genome sequence of Bacillus may not only reveal individual bacilli functions but also play a role in identifying unique properties that are essential for the survival and adaptation of bacilli in alkaline ecosystems.

Organization of dehalogenase genes and a putative regulatory gene
The regulation of dehalogenase gene expression is poorly understood from the perspective of wholegenome function (Huyop and Cooper, 2011). Hence, analysis of the full genome of BHS1 may shed light on this aspect via the identification of an operon consisting of a functional gene cluster, i.e., regulatory gene and structural genes, the expression of which depends on their location within the operon. A putative haloacid dehalogenase operon comprised of L-type dehalogenase (group II haloacid dehalogenase or L-specific dehalogenases) was identified in the BHS1 genome [4,389,524_ 4,390,183; 659 bp], with downstream genes consisting of a hypothetical protein [4,390,313_4,390,738; 425 bp], a capsular biosynthesis protein [439,171_439,333;1562 bp], and the polyglutamate capsule biosynthesis protein CAPE [4,393,457_4,393,660;203 bp]. Conversely, the upstream region of the operon encodes the glycoside hydrolase family 10 [4,387,790_4,388,125; 335 bp] and a helix-turn-helix (HTH)-type transcriptional regulator [4, 385,958_4,386,812; 854 bp]. Thus, the acquisition of the full genomic sequence for BHS1 is crucial for determining the most appropriate applications of these gene clusters for biotechnology purposes (Fig. 4). Toward that end, we are currently using cloning analysis to decipher the regulatory mechanisms for these genes, especially given the existence of a HTH-type transcriptional regulator gene.
Analysis of a putative L-type dehalogenase (group II haloacid dehalogenase) and HTH-type transcriptional regulator The size of L-type dehalogenase was 657 bp equivalent to 219 amino acid residues. The L-type dehalogenase of strain BHS1 was later designated as DehLBHS1. The program ProtParam (Gasteiger et al. 2005) indicated that DehLBHS1 has a molecular mass of 25,636.47 Da and a   (Hisano et al. 1996;Li et al. 1998) revealed three key residues, namely D (Asp), R (Arg), and S (Ser), that may play key roles in the catalytic mechanism of DehLBHS1. The functional residues of L-DEX YL were determined to be D10, R41, S118, and D180, which are comparatively equivalent to the BHS1 residues D10, R41, S127, and D180, respectively (Fig. 5). Although the amino-acid sequences of the L-haloacid dehalogenases from BHS1 and Pseudomonas sp. YL (L-DEX YL) are only 46% identical, the key residues are 100% identical and thus may have similar functions. Analysis of the HTH-type transcriptional regulator, designated as DehRBHS1 (a BHS1 strain, dehalogenase regulator protein) revealed very low sequence identity (< 17%) compared with the wellestablished regulator protein (DehR) of Rhizobium sp. RC1 (data not shown). This suggests that DehRBHS1 may have a similar regulatory function in the dehalogenase operon system that controls dehalogenases.
Functional characterization of dehalogenase-producing strain BHS1 Dehalogenase-producing B. megaterium strain BHS1 was incubated with 5 mM of each substrate (2,2-DCP, D,L-2CP, L-2CP, and D-2CP) under proper growth conditions, and the maximum amount of chloride ion released was monitored. In the presence of 2,2-DCP or D,L-2CP, 50% of the chloride was released (Fig. 6). In the presence of L-2CP alone, essentially all chloride ions were released (4.5 μmol Cl -/ml), whereas, with D-2CP, no chloride ion was detected, as expected, suggesting the stereospecificity of the enzyme. A control experiment, L-2CP without any cells, showed no dehalogenation reaction, affirming the substrate's non-autodegradation (Fig.  6).

Conclusions
The study performed here is the first full genome sequence analysis of the functional haloacid-degrading gene from the alkaliphilic B. megaterium strain BHS1 from Mavi Gölü with the potential to degrade haloalkanoic acid in an alkaline environment that amenable to converting haloacids for carbon and energy sources. The genomic data acquired during this study may facilitate the discovery of new dehalogenases and its regulatory Fig. 4 Physical map of putative L-2-haloacid dehalogenase gene (encoded putative DehLBSH1 protein) and its putative gene HTH-type transcriptional regulator together with other genes location Fig. 5 The pairwise sequence alignment results of DehLBHS1 and L-DEX YL from Pseudomonas sp. strain YL (Nardi-Dei et al., 1997). Mark in bold were three key amino acids (D,Asp; R,Arg; S,Ser), which play an important role in the catalytic function of L-DEX YL haloacid dehalogenase. The three key amino acids (in bold) also were located in DehLBHS1 in the same position except for S(Ser). * symbol indicates a stop codon functions. The knowledge may also provide insights into the bacterium's genetic and metabolic regulatory pathways, of which will be vital for potential application of BHS1 to the bioremediation of contaminated alkaliladen ecosystems.