Geographical location and habitat predict variation in prokaryotic community composition of Suberites diversicolor

Marine lakes are unique habitats that house diverse assemblages of benthic and planktonic organisms including endemic species. In this study, we aimed to assess to what extent geographical location (Berau versus Papua) and the degree of marine lake connectivity (relatively open versus closed) to the surrounding marine environment structures the prokaryotic community composition of the sponge species Suberites diversicolor. Sponge specimens were sampled in five marine lakes in Borneo and Papua and one open sea habitat in Taiwan. Prokaryotic communities of S. diversicolor were dominated by members assigned to the Proteobacteria (particularly Alphaproteobacteria and Gammaproteobacteria) and Cyanobacteria, which together made up from 78 to 87% of sequences in all samples. The dominant operational taxonomic units (OTUs) in most samples, OTUs 1 and 3, were both assigned to the alphaproteobacterial order Rhodospirillales with OTU-1 dominant in the marine lakes of Berau and Papua and OTU-3 in Taiwan. OTU-3 was also largely absent from Papuan samples but present in all Berau samples. Compositionally, S. diversicolor samples clustered according to geographical location with the main axis of variation separating marine lake samples collected in Berau from those collected in Papua and the second axis of variation separating open sea samples collected in Taiwan from all marine lake samples. In addition, our results suggest that the degree of lake connectivity to the open sea also influences prokaryotic composition. Although previous studies have shown that sponge-associated microbial communities tend to be stable across different geographical and environmental gradients, in the present study, both geography and local environmental conditions were significant predictors of variation in prokaryotic community composition of S. diversicolor.


Introduction
Marine lakes, also known as anchialine systems, are small seawater bodies, isolated in the interior of islands and connected to the surrounding marine environment through subterranean fissures, tunnels or small dissolution channels in the surrounding rock (Holthuis 1973;Hamner and Hamner 1998;Becking et al. 2011Becking et al. , 2013a. Based on the connection to the sea, marine lakes can be limnologically classified as holomictic or meromictic. Holomictic lakes are well connected to the outer marine environment, and at least once each year, there is a physical mixing of superficial and deeper waters. In contrast, meromictic lakes have limited connection to the surrounding marine environment. Consequently, waters tend to stratify in meromictic lakes with denser water (saline) in the deeper parts and less dense fresher water closer to the surface (Gotoh et al. 2011;Saitoh et al. 2011). The connection between the marine lakes and the outer marine environment strongly influences the communities inhabiting these lakes (Saitoh et al. 2011). Lakes with a limited connection tend to house a smaller number of species including numerous endemics (Becking et al. 2013a(Becking et al. , 2013bStrelkov et al. 2014), whereas more connected lakes tend to house more species, but with a greater compositional similarity to open sea habitat. Lake size also influences the number of species, with larger lakes housing a higher number of species than smaller ones (Becking et al. 2013b). In addition to differences in species number, connectivity also influences the lake environment with less connected lakes usually characterised by lower pH, lower salinity and higher water temperature than more connected lakes (Becking et al. 2011). These differences in environmental conditions make comparisons among lakes with different degrees of connection to the open sea particularly interesting. Marine lakes may provide some information on the long-term impact of low pH conditions on symbiont composition, although care must be taken in interpreting results due to the confounding effect of lower salinity, which can also have a profound effect on prokaryotic composition (Meyerhof et al. 2016).
Previous studies have focused on the abundance and diversity of various organisms including algae, ascidians, molluscs and sponges inhabiting marine lakes in Indonesia, Palau and Vietnam (Hoeksema 2004;Cerrano et al. 2006;Becking et al. 2011). Sponges (phylum Porifera) are the most basal of Metazoans (Borchiellini et al. 2001) and have been considered living fossils (Müller 1998;Oláh et al. 2017). They inhabit a range of habitats from tropical to polar seas, shallow to deep waters and marine, brackish and freshwater environments (Ruetzler 2004). Sponges have been shown to house numerous microorganisms including Bacteria, Archaea, fungi and dinoflagellates (Lee et al. 2011;He et al. 2014;Cleary 2019). These microorganisms can make up to 35% of sponge biomass and have been shown to play key roles in the sponge metabolism, including carbon and nitrogen cycling and chemical defence (Webster et al. 2010;Bolaños et al. 2015). The role of symbiotic microorganisms in bioactive metabolite production has further fostered interest in these organisms (Park et al. 2019). The interactions between sponges and their symbiotic microorganisms have been studied and characterised (Bolaños et al. 2015), but the majority of symbiotic bacteria found in sponges have not yet been cultivated (Cleary et al. 2013;He et al. 2014). In order to overcome this problem, nucleic acid-based molecular techniques have emerged. These techniques provide much more detailed information with respect to the diversity and composition of the sponge microorganisms (Cleary et al. 2013;He et al. 2014).
The diversity and abundance of sponge-associated prokaryotes have been previously studied for different sponge species (Cleary et al. 2013de Voogd et al. 2015;Polónia et al. 2015). These studies have revealed pronounced variation in prokaryotic composition among different sponge species, and between sponges and the surrounding environment including water and sediment. There is, however, some debate on the degree to which prokaryotic communities vary within the same sponge species and the role of environmental conditions and geographical isolation herein (Burgsdorf et al. 2014;Luter et al. 2015;Moitinho-Silva et al. 2017). Lee et al. (2011), for example, found that there was relatively little variation in the composition of bacteria in the sponge species Hyrtios erectus, Stylissa carteri and Xestospongia testudinaria despite pronounced differences in the environments from which the specimens were collected. In contrast, Luter et al. (2015) found marked differences in the bacterial composition of specimens of the sponge Carteriospongia foliascens collected in the inshore Fantome and Orpheus Islands, and in Great Barrier Reef communities at Green Island and Davies Reef (Luter et al. 2015). Likewise, Swierts et al. (2018) found geography to be a significant determinant of prokaryotic community composition in Indo-Pacific barrel sponges (Xestospongia spp.). In order to fulfil this lack of knowledge and understand how geographical location and marine lake connectivity influence prokaryotic community composition, we assessed the prokaryotic communities of the sponge S. diversicolor Becking and Lim 2009 (Demospongiae: Hadromerida: Suberitidae) across three distinct geographical locations (Borneo, Papua and Taiwan) and two distinct marine lake habitats (relatively open and closed marine lake environments). The distribution of S. diversicolor makes it an ideal model to study intraspecific variation in prokaryotic composition. This sponge species can be found inside and outside of marine lakes with different degrees of connection to the surrounding sea and is able to adapt to environments with unfavourable conditions, such as low salinity or periodic exposure to air (Becking et al. 2013b). The genus Suberites presently contains 78 species with a worldwide distribution. Only seven have been reported from the Indo-Pacific region. Suberites diversicolor is mainly known from marine lakes in Vietnam, Indonesia and possibly Palau but also occurs in coastal mangrove systems and a man-made pool in northern Australia (Becking and Lim 2009).
Previous studies of the prokaryotic composition of S. diversicolor have shown relatively minor compositional variation between samples from inside and outside lakes within the Berau Delta barrier reef system (Cleary et al. 2013). However, when comparing specimens from related species from different environments, such as coral reefs and hydrothermal vents, differences between bacterial communities were observed (Coelho et al. 2018). Cleary et al. (2013) and Coelho et al. (2018) identified Alphaproteobacteria as the dominant taxa within S. diversicolor with the dominant OTUs assigned to the Kiloniellales order (Cleary et al. 2013;Coelho et al. 2018).
The aims of the present study were to (i) compare richness and composition of sponge-associated microbial communities of S. diversicolor specimens collected from different environments (marine lakes with different degrees of connectivity to the open sea and open sea habitat) and geographical locations (Borneo, Papua, Taiwan), (ii) identify the most abundant OTUs and closely related sponge-associated microorganisms using BLAST search, and (iii) evaluate if geographical location and the degree of connectivity of marine lakes to the surrounding open sea are significant predictors of the variation in the structural composition of sponge prokaryotic communities. The coastal system of Berau consists of different ecosystems including coral reefs and mangroves. The marine lakes in the present study were located on the islands of Kakaban and Maratua. Three marine lakes were sampled in Berau, namely, lake Kakaban, lake Haji Buang and lake Tanah Bamban. Previous studies have provided detailed descriptions of these lakes (Tomascik and Mah 1994;Tomascik et al. 1997;Becking et al. 2011;Cleary et al. 2018a). In brief, Kakaban lake (BrK, N02°08′ 23.5″ E118°30′ 31.9″) is located within Kakaban island and is surrounded by mangroves. Kakaban lake, the largest (ca. 4 km 2 ) lake sampled in this study, had a tidal amplitude dampened to 11% of the surrounding sea and a 3.5-h tidal phase delay. Salinity in lake Kakaban varied from 23 to 24 ppt and temperature from 29 to 31.5°C. This is an indication that the connection with the surrounding environment is limited (Tomascik and Mah 1994;Becking et al. 2011;Cleary et al. 2018a). Kakaban lake also differs from the marine lakes of Maratua in being a former atoll lagoon and not a karstic marine lake (Tomascik T, pers. comm.). Haji Buang lake (BrM, N02°12′ 31.2″ E118°35′ 46.7″) and Tanah Bamban lake (BrT, N02°13′ 50.0″ E118°34′ 50.7″) are located in Maratua Island. Both of these lakes are considerably smaller (Haji Buang Lake is ca. 0.14 km 2 and Tanah Bamban lake is ca. 0.12 km 2 ) and more connected to the surrounding sea than lake Kakaban. The tidal amplitude of Haji Buang was dampened to 48% of the surrounding sea with a tidal delay of 2.5 h . Salinity and temperature in Haji Buang ranged from 26 to 28.5 ppt and from 29 to 30°C, respectively. Salinity and temperature in Tanah Bamban ranged from 26 to 28.5 ppt and 29 to 30°C, respectively. The reported presence of a resident saltwater crocodile did not allow a more in-depth study of the lake environment of Tanah Bamban; however, Tanah Bamban had the greatest connectivity to the open sea as indicated by the relatively high salinity . The marine lakes of West Papua are located in two different islands in Southeast Misool, Raja Ampat region, West Papua Province, Indonesia. Southeast Misool includes 343,200 ha of a marine protected area established in 2009 (KKPD Misool Timur-Selatan). The waters surrounding both islands are highly diverse . The lakes do not yet have formal names and are here identified as marine lake K (PaK) and marine lake M (PaM). Marine lake K (S02°13′ 13.77″ E130°27′ 32.03″) is located in Karawop island and is 72 by 64 m at its widest point. Marine lake K had a tidal amplitude of 45 cm and a 3-h tidal delay indicating a limited connection to the sea (Cleary et al. 2018b). Salinity and temperature recorded in marine lake K were 28.9 ppt and 31.5°C, respectively. Marine lake M (S01°59′ 02.29″ E130°30′ 58.95″) is located within a small island to the east of Misool Island. There was relatively little difference between the lake and the sea in terms of tidal amplitude and tidal lag indicating pronounced connection with the surrounding sea. Salinity and temperature recorded in marine lake M were 25.9 ppt and 31.7°C, respectively. The relatively low salinity indicates that there is an important source of freshwater to the lake. These lakes were previously described by Becking et al. (2011), Klei (2016) and Cleary et al. (2018b).

Material and methods
Suberites diversicolor specimens from open sea habitat were collected in the Penghu archipelago, off the western coast of Taiwan ( Fig. 1 in Huang et al. 2016). The coastal areas of the islands and islets are dominated by characteristic fringing reefs, with high abundance and biodiversity of subtidal sponges (Huang et al. 2016;Coelho et al. 2018). The waters surrounding this archipelago are influenced by three distinct currents, the cold southbound Chinese Coastal Current, the warm northbound South China Sea Current and the Kuroshio Current. A detailed description of this area can be found in Coelho et al. (2018) and Huang et al. (2016).

Sampling
Specimens of S. diversicolor were collected in Berau from the 21st to 30th August 2012, in Papua from the 15th to 16th September 2013 and in Taiwan from the 25th to 27th July 2014.
Three replicates were sampled in both Papuan lakes and in Kakaban and Tanah Bamban; two replicates were sampled in Haji Buang marine lake and open sea habitat in Taiwan. All specimens were collected from relatively shallow water (< 5 m depth). In order to assess, as much as possible, the whole prokaryotic community, cores of each specimen were taken including segments of the surface and the interior of the sponge; these were stored in 95% EtOH for microbial analysis. Sponge specimens were preserved in 70% EtOH for later identification by Nicole de Voogd at Naturalis Biodiversity Center (Leiden, Netherlands). In addition to sponge samples, we also collected water samples between the depths of 1-2 m with a 1.5-L bottle and subsequently 1 L of water was filtered through a Millipore® White Isopore Membrane Filter (0.22 μm pore size) to obtain water prokaryotic communities.

DNA extraction and next-generation sequencing analysis
We isolated PCR-ready genomic DNA from S. diversicolor samples using the FastDNA® SPIN Kit (MPbiomedicals) following the manufacturer's instructions. This is an extraction method frequently used for this purpose (Costa et al. 2013;Urakawa et al. 2010). Briefly, the whole membrane filter (for water samples) and 500 mg of sponge specimens were cut into small pieces and transferred to Lysing Matrix E tubes containing a mixture of ceramic and silica particles. The microbial cell lysis was performed in the FastPrep® Instrument (Q Biogene) for 80 s at a speed of 6.0. The extracted DNA was eluted into DNase/Pyrogen-Free Water to a final volume of 50 μL and stored at − 20°C until use. The 16S rRNA gene V3V4 variable region PCR primers 341F 5′-CCTACGGGNGGCWGCAG-3′ and 785R 5′-GACT ACHVGGGTATCTAATCC-3′ (Klindworth et al. 2013) with barcode on the forward primer were used in a 30cycle PCR assay using the HotStarTaq Plus Master Mix Kit (Qiagen, USA) under the following conditions: 94°C for 3 min, followed by 28 cycles of 94°C for 30 s, 53°C for 40 s and 72°C for 1 min, after which a final elongation step at 72°C for 5 min was performed. After amplification, PCR products were checked in 2% agarose gel to determine the success of amplification and the relative intensity of bands. Multiple samples were pooled together in equal proportions based on their molecular weight and DNA concentrations. Pooled samples were purified using calibrated Ampure XP beads. Pooled and purified PCR product was used to prepare the DNA library following the Illumina TruSeq DNA library preparation protocol. Next-generation, paired-end sequencing was performed at MRDNA (Molecular Research LP; http://www.mrdnalab.com/; last checked on 18 November 2016) on an Illumina MiSeq device (Illumina Inc., San Diego, CA, USA) following the manufacturer's guidelines. Sequences from each end were joined following Q25 quality trimming of the ends followed by reorienting any 3′-5′ reads back into 5′-3′ and removal of short reads (< 150 bp). The resultant files were analysed using the QIIME software package (Quantitative Insights into Microbial Ecology; Caporaso et al. 2010; http://www.qiime.org/).

16S rRNA gene sequencing analysis
For a detailed description of the sequence analysis, see Coelho et al. (2018) and Cleary et al. (2018a). Briefly, we used QIIME (Caporaso et al. 2010) and USEARCH (https://www.drive5.com/usearch/) for quality filtering and OTU clustering (97% sequence similarity threshold). Taxonomy was assigned to reference sequences of OTUs using default arguments in the assign_taxonomy.py script with the rdp method (Wang et al. 2007). In the assign_taxonomy.py function, we used a FASTA file containing reference sequences from the SILVA_128_ QIIME_release and the UCLUST classifier method in QIIME (Quast et al. 2012). For the OTU table, OTUs not classified as Bacteria or Archaea or classified as chloroplasts or mitochondria were removed prior to statistical analysis. Finally, we used the make_otu_ table.py script to generate a square matrix of OTUs × SAMPLES and subsequently rarefied to 10,500 sequences per sample with the single_rarefaction.py script yielding 168,000 sequences assigned to 4522 OTUs. The DNA sequences generated in this study can be downloaded from the NCBI SRA: PRJNA479655.

Statistical analysis
The rarefied table containing OTU counts per sample was imported into R (R Core Team 2013) and used to compare community composition, estimate richness and assess the relative abundance of selected higher taxa. The OTU table was first log e (x + 1) transformed and a distance matrix constructed using the Bray-Curtis index. Variation in prokaryotic composition of S. diversicolor samples from different habitats was assessed with principal coordinates analysis (PCO) using the Bray-Curtis distance matrix as input. Compositional difference among habitats was tested for significance by adonis analysis in the vegan package in R (https://cran.r-project. org/web/packages/vegan/vegan.pdf) using the Bray-Curtis distance matrix as response variable and habitat as independent variable. The number of permutations was set at 999. We performed an additional analysis only including marine lake samples to test for differences in location (Berau versus Papua) and lake connectivity (open versus closed). Weighted average scores were computed for OTUs on the first four PCO axes. SIMPER analysis in vegan was used to identify significantly discriminating OTUs between pairs of habitats. The discriminating OTUs contribute the most to differences between pairs of habitats.
Following Cleary et al. (2018a), we tested for significant differences in the relative abundance of selected phyla, classes and orders among habitats from different geographical locations with an analysis of deviance in R using the glm() function. Post hoc analysis of estimated marginal means was performed using the emmeans function with the false discovery rate (fdr) method. A heatmap was constructed to visualise the distribution of the dominant OTUs (≥ 400 sequences). The OTUs were log-transformed and clustered according to their occurrence by UPGMA hierarchical clustering. For a detailed description of the statistical analysis, see .

Results
In the present study, 168,000 sequences (after rarefying to 10,500 sequences per sample) were assigned to 4522 OTUs after quality screening and OTU filtering. OTUs were assigned to 57 phyla, 134 classes and 181 orders. Rarefied OTU richness was highest in samples from marine lakes and lowest in open sea habitat (Supplementary File 3).
The PCO analysis revealed compositional differences among locations and between relatively closed and more connected lakes (Fig. 1a). When only including the two most open and closed marine lakes from Berau and Papua (Fig. 1c), the factors' location (Berau versus Papua; adonis-F 1,8 = 3.34, R 2 = 0.207, P < 0.001) and connection (BrK and PaK versus BrT and PaM; adonis-F 1,8 = 2.50, R 2 = 0.154, P = 0.003) were both highly significant predictors of variation in composition. There was also a highly significant interaction (adonis-F 1,8 = 2.33, R 2 = 0.144, P = 0.002). The main axes of variation clustered samples according to the location (Fig. 1a). There were three different clusters: (1) samples from open sea habitat in Taiwan (Taw), (2) samples from marine lakes in Berau (BrK, BrM and BrT) and (3) samples from marine lakes in Papua (Pak and PaM). The first axis separated S. diversicolor samples from marine lakes in Berau from those sampled in the marine lakes in Papua and open sea habitat. The second axis separated samples from Taiwan from marine lake samples (Fig. 1a). The fourth axis of variation (Fig. 2) separated the samples according to the presumed level of connection to the marine environment, with S. diversicolor samples from marine lakes with lower connection (BrK and PaK) at low axis 4 values and samples from more connected marine lakes (BrT and PaM) and open habitat at high axis 4 values. In addition to this, a PCO analysis including water samples from marine lakes and open habitat was performed. This analysis showed pronounced compositional differences between prokaryotic communities inhabiting S. diversicolor specimens and those inhabiting water (Fig. 1d). This was further confirmed by the adonis analysis (S. diversicolor versus water; adonis-F 6,20 = 3.075, R 2 = 0.480, P < 0.001 and F 2,24 = 4.303, R 2 = 0.264, P < 0.001).
Overall, in S. diversicolor specimens, Proteobacteria was the most abundant phylum (108,947 sequences, 1801 OTUs), followed by Cyanobacteria (29,696 sequences, 89 OTUs), Bacteroidetes (8110 sequences, 448 OTUs), Actinobacteria (6335 sequences, 155 OTUs), Planctomycetes (4943 sequences, 426 OTUs) and Spirochaetes (2972 sequences, 37 OTUs) (Supplementary File 1). Proteobacteria and Cyanobacteria were the most abundant phyla in all locations (Fig. 3). After controlling for multiple comparisons, the only significant differences for phyla, however, were a significantly higher abundance of Cyanobacteria in marine lakes with lower connectivity to open sea (BrK, BrM and Pak) than in the Berau marine lake with greater connectivity (BrT) and in the open sea habitat in Taiwan, together with a significantly higher abundance of Thaumarchaeota in Taiwan than all other habitats (Supplementary File 4). At a lower taxonomic level, there were also significant differences in the abundance of Deltaproteobacteria and the order Rhodobacterales among lakes. The abundance of Deltaproteobacteria was significantly higher in Haji Buang than all other habitats and significantly higher in lake K than lake M in Papua. The relative abundance of Rhodobacterales was significantly higher in lake Tanah Bamban than all habitats except Haji Buang (Supplementary File 4).
The most abundant OTUs observed were OTUs 1 (46, 749 sequences), 2 (26,240 sequences) and 3 (20,608 sequences) (Fig. 4). The heatmap in Fig. 4, only including the most abundant OTUs, also confirms the results of the PCO ordination with samples from open sea habitat in Taiwan and marine lake habitat in Berau and Papua forming distinct clusters. In total, there were 28 OTUs with ≥ 400 sequences that significantly discriminated (P < 0.01) between pairs of habitats ( Fig. 4 and Supplementary File 5). For example, OTU-1, assigned to the Rhodospirillales, and related to an organism obtained  Files 5 and 6). OTU-3, in contrast, also assigned to the Rhodospirillales, was most abundant in Kakaban and Taiwan and was also closely related to the same organism obtained from marine sediment to which OTU-1 was related. It significantly discriminated between Taiwan, Kakaban and the remaining marine lakes. OTU-167, assigned to the Thaumarchaeota, is closely related to an organism obtained from sediment and significantly discriminated between open sea habitat in Taiwan and all marine lakes (Supplementary File 6). Some OTUs were locally abundant in a single habitat (marine lake). For example, OTU-49, assigned to the Actinobacteria phylum, significantly discriminated between hosts from Haji Buang and hosts from all other habitats. OTUs 2 and 180, assigned to the Cyanobacteria, were least abundant in Tanah Bamban marine lake. OTU-180 significantly discriminated between hosts from the more connected Tanah Bamban marine lake and all other marine lakes. These OTUs are closely related to organisms obtained from seawater in Northern China (100% and 99.77% similarity, respectively).

Discussion
Much research in sponge microbial ecology has focused on the prevalent role of sponge identity or biotope in structuring the microbial community, often neglecting the role of geographical variation (Webster et al. 2010;Lee et al. 2011). In the present study, we assessed the prokaryotic communities associated with the sponge S. diversicolor across a range of habitats from open sea to marine lake and across different geographical areas. OTUs assigned to Proteobacteria and Cyanobacteria were the most abundant members of the prokaryotic community of S. diversicolor. Less abundant phyla detected in this study included Actinobacteria, Bacteroidetes, Planctomycetes, Spirochaetae, Tenericutes, Firmicutes and Chlamydiae, which have been previously observed in other sponge surveys (Hentschel et al. 2002;Giles et al. 2012;Schmitt et al. 2012;Cleary et al. 2013Cleary et al. , 2015Cleary et al. , 2018ade Voogd et al. 2015).
At a lower taxonomic level, Alphaproteobacteria were, by far, the most abundant class, followed by Gamma-, Delta-and Betaproteobacteria. Members of these taxa have been observed in different sponge species from several different locations such as the Indian and Pacific Oceans, the Caribbean, Mediterranean and Red Sea , Berau Delta and barrier reef system in East Kalimantan, Indonesia (Cleary et al. 2013), Orpheus Island in Australia (Webster et al. 2013) and Korsfjord in Norway (Jensen et al. 2017). The dominance of Alphaproteobacteria was mainly due to members of the Rhodospirillales, and to a much lesser extent, members of the Rhodobacterales orders. Members of the Rhodospirillales (purple non-sulphur bacteria) and Rhodobacterales orders have been previously found in corals and in water (McDevitt-Irwin et al. 2017;Thiele et al. 2017). Previous studies have already associated Rhodospirillales and Rhodobacterales with sponges. Cleary et al. (2015) reported Rhodospirillales in sediment samples and samples of the sponge species Hyrtios erectus and Xestospongia testudinaria. Recently, Karimi et al. (2019) were able to obtain 46 isolates from the sponge species Spongia officinalis; 38 of these were assigned to the Rhodobacterales order. Other studies, however, reported Rhodobacterales as more abundant in seawater samples than in sponge samples Cleary et al. 2018cCleary et al. , 2018b. McDevitt-Irwin et al. (2017) reported an increased abundance of these taxa in response to different stressors (climate change, water pollution and overfishing) (McDevitt-Irwin et al. 2017). Thiele et al. (2017) observed members of these orders in water samples close to a polluted area (Thiele et al. 2017).
In the present study, the recently updated SILVA database (Balvočiūtė and Huson 2017) was used as opposed to the Greengenes database, which has not been updated since May 2013. The same sequence was assigned to Rhodospirillales by the SILVA database and to the Kiloniellales order using the Greengenes database. In this study, Rhodospirillales was the most abundant alphaproteobacterial order. Cleary et al. (2013) and Coelho et al. (2018) previously identified OTUs assigned to the Kiloniellales order as the most dominant members of the Alphaproteobacteria in S. diversicolor (Cleary et al. 2013;Coelho et al. 2018). This difference, is thus, due to the use of different taxonomic databases, namely SILVA versus Greengenes. This was also the case for lower cyanobacterial taxa. For example, the representative sequence of OTU-2 was assigned to the subsection I order using the SILVA database and the Synechococcales order using the Greengenes database.
Archaea made up a relatively minor component of the prokaryotic communities of S. diversicolor. There was, however, a significantly higher relative abundance of Thaumarchaeota in open sea samples from Taiwan than all other habitats. Previous studies have shown that members of Thaumarchaeota can be abundant in various sponge species (Turque et al. 2010;Polónia et al. 2015Polónia et al. , 2016Coelho et al. 2018). Polónia et al. (2015) previously showed that the abundances of mesophilic Crenarchaeota (recently adopted as Thaumarchaeota phylum) may be an indication of pollution levels . The open sea samples of S. diversicolor in Taiwan were also sampled close to the densely inhabited main Penghu island of Magong, which is subject to a number of perturbations related to fishing, tourism and local industry. Alternatively, the environmental conditions inside marine lakes may not be advantageous to Thaumarchaeota members. Nitrosopumilus members have been shown to predominate in coastal environments when compared to freshwater or lake environments (Xie et al. 2014;Polónia and Cleary 2019) while the Cenarchaeum symbiosum species complex seems to have phylotypes specialised in both open water and marine lake environments .
Much research in sponge microbial ecology has focused on the prevalent role of sponge identity in structuring the microbial community, often neglecting the role of geographical location (Webster et al. 2010;Lee et al. 2011). Overall, our results showed that S. diversicolor samples clustering primarily according to geographical location suggesting that spatial factors, or local environmental conditions, prevalent in each location, play a role in structuring the prokaryotic communities of this sponge species. This finding is in line with recent studies Luter et al. 2015;Swierts et al. 2018). Luter et al. (2015) observed that the microbial community of Carteriospongia foliascens varied according to geographical location, while Swierts et al. (2018) showed that location was a more important explanatory factor in predicting variation in the prokaryotic communities of closely related Xestospongia species than genetic relatedness. Schmitt et al. (2012), when studying sponges from eight different locations, reported subtropical sponge microbial communities more similar to each other than to those found in tropical sponges. There was also a significant difference between sponge and water prokaryotic communities in the present study, which corroborates previous studies, which have shown that sponge-associated prokaryotic communities differ from those found in seawater (Hentschel et al. 2002;Taylor et al. 2005Taylor et al. , 2007Schmitt et al. 2012;Cleary et al. 2018a).
In addition to this, there was evidence for a role of lake connectivity to the open sea in structuring prokaryotic community composition, although playing a secondary role. The limited connection to the sea observed in lakes Kakaban and lake K in Papua allowed us to infer that they have characteristic behaviours of meromictic lakes. Indeed, both lakes are characterised by pronounced stratification of the water column due to, among other things, limited water turbulence (Becking et al. 2011;Gotoh et al. 2011;Saitoh et al. 2011). Differences in physical conditions can greatly influence phytoplankton communities (Reynolds 1980). The marine lakes of Berau were also shown to have distinct archaeal, bacterial and eukaryotic planktonic communities from those found in the surrounding open sea (Cleary and Gomes 2019). It has been shown that low water turbulence, as found in sheltered or meromictic lakes, can lead to the formation of dense cyanobacterial populations (Steinberg and Hartmann 1988). When turbulence is high, or mixing patterns are irregular, Cyanobacteria tend to be out-competed (Steinberg and Hartmann 1988). This data supports our findings, where S. diversicolor hosts from habitats with more pronounced connection to the open sea (Tanah Bamban and Taiwan) exhibited a lower abundance of OTUs assigned to Cyanobacteria, namely OTUs 2 and 180.
In addition to the above, pH also tends to be lower in lakes with a limited connection to the open sea . Cyanobacteria, particularly members of the genus Synechococcus, have been associated with low pH environments suggesting an environmental driver for the role of lake connectivity in structuring the prokaryotic communities of S. diversicolor (Cleary et al. 2013, 2018b, Morrow et al. 2016, Coelho et al. 2018, Cleary and Gomes 2019. The family Synechococcophycideae is, furthermore, known to be widely distributed in marine environments, whereby a subgroup was identified containing important symbionts of sponges Burgsdorf et al. 2015). The Synechococcophycideae family consists of photosymbionts that play a significant role in biogeochemical cycles and have the ability to assimilate carbon and transfer it to the sponge (Freeman et al. 2013;Callieri 2017).
Despite pronounced differences in the relative abundance of several OTUs among habitats, these differences were much less pronounced at higher taxonomic levels with the exception of a few taxa including the Cyanobacteria and Thaumarchaeota. In other words, hosts from different locations had similar higher-level taxa, but location-specific OTUs. This OTU-level geographic specificity was previously suggested for the giant barrel sponge species complex Xestospongia spp. (Swierts et al. 2018). This suggests that S. diversicolor hosts provide a similar environment to their symbionts and that OTU composition may be driven by stochastic factors including spatial processes (Goldford et al. 2018).
OTUs 2 and 180 assigned to Cyanobacteria, at a lower taxonomic level related to the Synechococcales order, were present in all habitats, but significantly discriminated between the lake Tanah Bamban and other marine lakes. The most abundant OTU overall, OTU-1, was assigned to the Rhodospirillales order and was recorded in every single specimen of S. diversicolor, although it did vary in abundance and was most abundant in Papuan marine lakes. Rhodospirillaceae members are bacteria with the ability to grow under distinct environmental conditions, e.g., chemotrophic, phototrophic, organotrophic, autotrophic and anoxic conditions (Esposti et al. 2019). Additionally, some members of Rhodospirillales can thrive in darker habitats due to their ability to use light in the infrared range (Solon et al. 2018). This can be an advantage to S. diversicolor due to the reduced penetration of light in marine lakes as a result of dense canopy or the presence of coloured terrestrial organic matter (Karlsson et al. 2009). Although recorded in all samples, the order Rhodospirillales was most abundant in marine lake samples from Papua and open sea samples from Taiwan and less abundant in the marine lakes of Berau, particularly lake Haji Buang.

Conclusion
In recent years, a number of studies have shown that sponge-associated microbial communities are largely host species specific and tend to be stable across different geographical and environmental settings (Webster and Thomas 2016). In the present study, variation among samples from different geographical locations and the degree of connection of the lakes to the surrounding marine environment also appear to play a role in structuring sponge prokaryotic communities. Although previous studies have shown that the host sponge species is the major factor that structures the sponge symbiont community, our results suggest that geographical location and, for sponges inhabiting marine lakes, connectivity to the surrounding environment, also appear to play a significant role.
Additional file 6. List of most abundant OTUs (≥ 400 sequences) including OTU-numbers; total sequences (Abund); taxonomic assignment of OTU, GenBank GenInfo sequence identifiers (GI) of closely related organisms identified using BLAST; sequence identity (Seq) of these organisms with our representative OTU sequences; isolation source of organisms identified using BLAST (Source).