Identification of small RNAs involved in nitrogen fixation in Anabaena sp. PCC 7120 based on RNA-seq under steady state conditions

Anabaena sp. PCC7120 is a genetically tractable model organism for nitrogen fixation and photosynthesis research. The importance of small regulatory RNAs (sRNAs) as mediators of a number of cellular processes in bacteria has begun to be recognized. Bacterial sRNA binds to target genes through base pairing, and play a regulatory role. Many studies have shown that bacterial sRNA can regulate cell stress response, carbon and nitrogen fixation, and so on. However, little is known about sRNAs in Anabaena sp. PCC 7120 regarded to nitrogen fixation under later steady state. To provide a comprehensive study of sRNAs in this model organism, the sRNA (< 200 nt) extracted from Anabaena sp. PCC 7120 under nitrogen step-down treatment of 12 days, together with the sRNA from the control, was analyzed using deep RNA sequencing. Possible target genes regulated by all identified putative sRNAs were predicted by IntaRNA and further analyzed for functional categorizations for biological pathways. Totally, 14,132 transcripts were produced from the de novo assembly. Among them, transcripts that are located either in the intergenic region or antisense strand were kept, which resulted in 1219 sRNA candidates, for further analysis. RPKM-based differential expression analysis showed that 418 sRNAs were significantly differentially expressed between the samples from control (nitrogen addition, N+) and nitrogen depletion, (N−). Among them, 303 sRNAs were significantly upregulated, whereas 115 sRNAs were significantly downregulated. RT-PCR of 18 randomly chosen sRNAs showed a similar pattern as RNA-seq result, which confirmed the reliability of the RNA-seq data. In addition, the possible target genes regulated by unique sRNAs of Anabaena sp. PCC 7120 under nitrogen addition (N+) condition or that under nitrogen depletion (N−) condition were analyzed for functional categorization and biological pathways, which provided the evidences that sRNAs were indeed involved in many different metabolic pathways. The information from the present study provides a valuable reference for understanding the sRNA-mediated regulation of the nitrogen fixation in Anabaena PCC 7120 under steady state conditions.


Introduction
Cyanobacteria are a group of photoautotrophs originated nearly 3.5 billion years ago, during which they have evolved a diverse array of metabolic capabilities (Carr and Whitton 1973;Herrero and Flores 2008;Kaushik et al. 2016;Srivastava et al. 2017). They have been studied as prokaryotic hosts for sustainable biofuel production due to their high photosynthetic efficiency, genetic manipulability, and diverse metabolic pathways (Dismukes et al. 2008;Ducat et al. 2011;Halfmann et al. 2014b;Halfmann et al. 2014a;Chen et al. 2015;Burnat et al. 2018;Singh et al. 2019;Ishikawa et al. 2019). Some filamentous cyanobacteria, including Anabaena PCC strain 7120, are able to differentiate heterocyst to fix atmospheric dinitrogen in response to nitrogen deficiency Kumar et al. 2010;Xu et al. 2015;Esteves-Ferreira et al. 2018;Olmedo-Verd et al. 2019). Understanding of the underlying mechanisms of nitrogen fixation and diazotrophic growth in cyanobacteria will shed light on basic mechanisms of bacterial genetic regulation and physiology character. Besides, it can help the catalog of genetically modified cyanobacterial strains to increase and produce renewable chemicals and biofuels.
In Anabaena PCC 7120, 5 to 10% vegetative cells along the filament differentiate into heterocysts in a semi-regular pattern after deprivation of combined nitrogen (Maldener and Muro-Pastor 2001). Metabolite exchange occurs between the heterocysts and neighboring vegetative cells, that is, vegetative cells supply fixed carbon, such as sucrose, to adjacent heterocysts, while heterocysts provide vegetative cells with fixed nitrogen in the form of amino acids (López-Igual et al. 2010;Vargas et al. 2010;Brenes-Álvarez et al. 2019). It is known that the heterocysts differentiate from vegetative cells, involving substantial changes in cell morphology and physiology that result in a micro-oxic environment for the expression and function of oxygen-sensitive nitrogenase (Wolk et al. 2004;Flores and Herrero 2010). The development of heterocysts is mainly regulated by two transcription factors, NtcA and HetR (Buikema and Haselkorn 1991;Herrero et al. 2004;Zhao et al. 2010;Chen et al. 2011;Herrero et al. 2013). Deprivation of combined nitrogen leads to the accumulation of 2oxogluatarate which is the backbone for nitrogen assimilation (Laurent et al. 2005). The increased cellular levels of 2-oxoglutarate in consequence activates NtcA to initiate the heterocysts differentiation . The hetR gene is required for an early step of heterocyst differentiation, whenever under N− conditions or N+ conditions (Buikema and Haselkorn 1991).
Small regulatory RNAs (sRNAs) are known to be key genetic regulators in many organisms including bacteria, plants, and animals (Babski et al. 2014; Morris and Mattick 2014;Higo et al. 2017;Lambrecht et al. 2018). They perform many important regulatory functions, such as in stress response, regulation of virulence genes, carbon source uptake, and metabolism (Gottesman 2005;Gottesman and Storz 2011). In bacteria, sRNAs typically range from 50 to 200 nt in length , and are often encoded in intergenic regions in trans to their target genes or in cis on the opposite strand of their target gene (Babski et al. 2014). The former ones usually interact with their target mRNAs near the ribosomal binding site (RBS) through imperfect base-pairing. The latter ones have full complementarity to their target mRNA. The interactions between sRNA and its targeted mRNA can result in gene translational repression by masking the RBS or activation by making the RBS accessible, as well as affecting mRNA degradation and mRNA stability Babski et al. 2014).
sRNAs in cyanobacteria are frequently implicated in photosynthesis, stress responses, and nitrogen fixation (Nakamura et al. 2007;Hernández et al. 2010;Ionescu et al. 2010;Gong and Xu 2012;Georg et al. 2014;Muro-Pastor 2014;Klähn et al. 2015;de Porcellinis et al. 2016;Higo et al. 2018). For example, in Synechocystis sp. PCC 6803, the small RNA PsrR1 is a regulatory factor controlling photosynthetic functions. In filamentous cyanobacterium Anabaena PCC 7120, furA, the gene for the ferric uptake regulator, is interfered by a cis-acting antisense RNA whose knockout mutation results in an iron deficiency phenotype (Hernández et al. 2006;Hernández et al. 2010). In unicellular cyanobacterium Synechocystis, the antisense RNA IsrR regulates the expression of iron stress-induced protein IsiA in a co-degradation mechanism under iron stress (Dühring et al. 2006).
Examples of sRNA involved in nitrogen fixation in cyanobacterium are nitrogen stress-induced RNAs (NsiR). NsiR1 is the first known bacterial non-coding RNA specifically upregulated in response to nitrogen step-down, and its expression requires NtcA and HetR (Ionescu et al. 2010;Mitschke et al. 2011a), it has been suggested to be an early marker for cells undergoing differentiation (Muro-Pastor 2014). NsiR2 and NsiR3 that were identified in a differential RNA-seq analysis to Anabaena PCC 7120 annotated an unknown function (Mitschke et al. 2011a). NsiR4 was firstly found to be strongly upregulated in Synechocystis sp. PCC 6803 under nitrogen depletion (Mitschke et al. 2011b;Kopf et al. 2014). Later, the ortholog in Anabaena PCC 7120 was found to play an important role in the regulation of glutamine synthetase, a key enzyme in biological nitrogen assimilation (Klähn et al. 2015). Recently, using the Northern blot, a study showed that NsiR8 and NsiR9 were strongly induced in the WT control, but not in a HetR mutant subjected to a 24-h nitrogen step-down treatment, suggesting they could be related to heterocysts differentiation or function (Brenes-Álvarez et al. 2016).
Although many sRNAs related with nitrogen fixation have been identified in Anabaena PCC 7120 using RNA sequencing-based approaches (Flaherty et al. 2011;Mitschke et al. 2011a). They were all identified from cells at the early stage (1-24 h) of nitrogen depletion. However, up to now, there has not been any report on the characterization of the sRNA regulation profiles in Anabaena PCC 7120 under steady state after nitrogen step-down. In this study, we have employed the RNAseq method to analyze the sRNA transcripts (< 200 nt) in Anabaena PCC 7120 after 12 d treatment of nitrogen step-down. By applying this approach, 1219 sRNA candidates were identified. Among them, 1124 were shared by the control of nitrogen addition (N+) and the treatment of nitrogen depletion (N−); 46 ones were unique in the N− group and 49 in the control group. Possible target genes regulated by all identified putative sRNAs were predicted by IntaRNA and further analyzed for functional categorizations for biological pathways. Overall, our results provide a new insight towards understanding the complex regulatory network of sRNAs in Anabaena sp. PCC 7120 after combined nitrogen depletion under steady state conditions.

Strains, growth conditions, and sample preparations
Anabaena PCC 7120 was grown in AA/8 medium with KNO 3 and NaNO 3 as the combined nitrogen source (Allen and Arnon 1955) and incubated at 30°C with shaking at 120 rpm in a shaker under continuous whitelight (ca. 50 μE m −2 s −1 ) illumination. To conduct the nitrogen step-down treatment, Anabaena PCC 7120 grown in AA/8(N+) medium were collected by centrifugation at 5000 × g for 10 min and washed 3 times in AA/8 nitrate free medium AA/8(N−). Then cells were resuspended and grown in 300-ml AA/8(N−) medium until OD 700 was around 0.5. The nitrogen step-down treatments were repeated 4 times. After culturing 12 d, cells were harvested by centrifugation at 5000 × g for 10 min, cell pellets were quickly frozen in liquid nitrogen and stored at − 80°C for later use. The control group was prepared similarly as the nitrogen-deprived samples except grown in AA/8(N+) medium.

RNA isolation and sequencing
Total RNA from Anabaena sp. PCC 7120 strain samples cultured on media of AA/8(N+) and AA/8(N−) were separately extracted using TRIzol® reagent (INVITRO-GEN, Carlsbad, CA, USA). The Anabaena sp. PCC 7120 strain samples above cultured on media of AA/8(N+) and AA/8(N−) were mixtures of three repeated samples.
RNA quantity and quality including concentration, RIN value, and 23S/16S ratio were assessed using Agilent 2100 Bioanalyzer (Agilent Technologies). Samples with high quality of 260/280 and 260/230 ratios were selected for Illumina sequencing library construction. RNA samples were treated with DNase I (Promega), and 3 μg of total RNA from each sample was electrophoresed by agarose gel electrophoresis for RNA fractionation, then the electrophoretic bands of 18-200 nt RNA segments were excised from the PAGE gel. The library was constructed according to Illumina TruSeq TM Small RNA Sample Preparation protocol. The sequencing run was conducted on an Illumina HiSeq™ 2000 platform at the Beijing Genome Institute, Shenzhen, China. The deep sequencing data have been submitted to the NCBI Sequence Read Archive with the accession number SUB1682364.

Sequence data processing, assembly, and annotation
The raw reads generated from the high-throughput sequencing were firstly cleaned by removing adaptor sequences. Then sequences were filtered off based on low quality value (Q 20 ratio > 40%) and the high N numbers in the reads (> 10%).Then, all the clean reads from all replications were merged and subjected to de novo assembly using the Trinity program (Haas et al. 2013). The assembled transcripts were mapped onto the Anabaena sp. PCC 7120 genome and its six megaplasmids using BWA (Li and Durbin 2009). For the annotation of candidate sRNAs, we searched databases of sRNAMap (Huang et al. 2009), sRNATarBase (Cao et al. 2010), SIPHI (http://newbio.cs.wisc.edu/sRNA/), and BSRD (Li et al. 2012) based on the similarity of sequences using Blast (e value < 0.00001).

Identification of differentially expressed sRNA
The RPKM method (reads per kilo base per million mapped reads) was used for length normalization and calculation of the transcript expression levels (Mortazavi et al. 2008). An FDR (false discovery rate) of < 0.001 was used as the threshold p value in multiple tests to judge the degree of differences in gene expression (Reiner et al. 2003). In this study, the sRNA differential expression between two groups was considered when the p value was less than 0.001 and the expression level was at least a two-fold change between the two groups.

Target prediction
To obtain the target information, we searched the target genes of all the putative sRNAs by using the program IntaRNA (Busch et al. 2008;Wright et al. 2014) to take the sRNAs as query with default parameters. GO annotations with the default parameters were performed with the functional annotation tool WEIGO (Ye et al. 2006).
All putative sRNA/target pairs were subsequently filtered by correlation analysis of the sRNA and transcript expression. Only sRNA/target pairs with strong negative correlation were retained as putatively sRNA regulated transcripts.

qRT-PCR validation
Total RNA was treated with RNase-free DNase I (FER-MENTAS, Life Sciences, Germany), then it was used as a template to synthesize first-strand cDNA using a One Step SYBR PrimeScript RT-PCR kit (TAKARA) following the manual. We selected 18 differentially expressed sRNA for qRT-PCR analysis to evaluate our Illumina sequencing result. The sRNA primers were designed using Primer 5 and listed in Table 1. The expression level of the 18 sRNA was analyzed by using the comparative CT method (2 -ΔΔCT method). The housekeeping gene rrn16Sa in Anabaena sp. PCC7120 was used as the internal control for normalization (Pinto et al., 2012), and used the Mix of AceQ TM qPCR SYBR® Green Master to establish the qRT-PCR reaction system. qRT-PCR reactions were carried out by using BIO-RAD CFX96 in three biological repeats for every treatment. Primers, reaction systems, and programs are listed in Additional file. GraphPad Prism 6 was applied to analysis and construction.

Strain growth observations on different media
When cultured on different media, filament cells of Anabaena sp. PCC 7120 showed different growth, heterocysts differentiated directly on AA/8(N−) medium (Fig. 1a, c) after 3 d culturing; however, only the vegetative cells grown on AA/8(N+) medium (Fig. 1b, d). It is obvious that nitrogen step-down of the medium induced the heterocysts differentiation.

sRNA assembly and functional annotation
To comprehensively identify sRNA involved in nitrogen fixation at a steady state, we compared the sRNAs profile (18-200 nt) of Anabaena sp. PCC 7120 grown on media of AA/8(N+) and AA/8(N−) after 12 d nitrogen step-down treatment. A total of 13,219,856 clean reads were obtained from the control group (N+), and 13,778, 584 reads from the nitrogen-depleted group (N−); while Q ≥ 20. Totally, 14,132 transcripts were produced from the de novo assembly. These assembled transcripts were then mapped onto the Anabaena sp. PCC 7120 genome (92.96%) and its 6 megaplasmids (7.04%). The mapped transcripts were classified into 4 groups: group I contains sRNAs that were partially overlapped to mRNA (PM, 6.68%); group II comprises intergenic sRNAs that were mapped to an intergenic region (IGR, 24.24%); group III were referred to as antisense sRNAs which were located antisense to known genes (AM, 29.76%); and group IV were those located within mRNAs (IM, 39.32%) (Fig. 2). Transcripts that were placed into either group II or III, with a total expression both in N+ and N− media bigger than 20 were kept as candidate sRNAs; thus, a total of 1219 candidate sRNAs were selected for further analysis   Table 1). It should be noted that 157 candidate sRNAs were matched to the registered sRNAs by searching the databases of sRNAMap (Huang et al. 2009), sRNATarBase (Cao et al. 2010), SIPHI (http://newbio.cs. wisc.edu/sRNA/), and BSRD (Li et al. 2012), indicating the reliability of our method. As more as 83.1% of the candidate sRNAs was 20-80 nt; the size distribution of the candidate sRNAs is shown in Supplementary Fig. 1. Among all the sRNA candidates, 1124 were shared by the nitrogen-depleted group and the control group; 46 were unique in the nitrogen-depleted group and 49 in the control group.

sRNA differential expression
We identified differentially expressed sRNAs between the control (N+) and nitrogen depletion treatment (N−) groups by comparing the relative transcript abundance using the RPKM-based method (supplementary Table  2). Principally, any sRNA that has a value of (|log2Ratio| ≥ 1 and FDR ≤ 0.001) was kept as significantly differentially expressed sRNA. Totally, 418 sRNA were found to be differentially expressed between the control (N+) and nitrogen-depleted (N−) samples (supplementary By adding different adaptors to the 3′ and 5′ ends of each RNA molecule in the sample prior to cDNA synthesis, RNA-seq analysis here enabled us to distinguish the direction of each transcription. In total, 110 antisense RNAs were found and 29 of them were significantly differentially expressed between the (N−) and (N+) groups (Table 2). In addition, 1109 intergenic region sRNAs were found and 389 were significantly differentially expressed (Table 3). We further characterized these antisense sRNAs and intergenic region sRNAs by retrieving putative target gene function information from the cyanobase (Table 2 and Table 3).

RT-PCR validation
For validation of the Illumina sequencing results, a number of 18 sRNA were randomly chosen for qRT-PCR analysis. The qRT-PCR results showed in Fig. 3, 11 sRNA showed significant differences in the relative expression between N+ and N− groups, and 7 sRNAs were in a significantly upregulated trend in N− group compared with N+ group, while 4 sRNAs were significantly downregulated in N− group. Similar trends were observed for the expression of these 18 sRNA analyzed by the 2 methods, the biggest discrepancy was around 7fold difference in sRNA1133, and the lowest was around 0.5-fold difference in sRNA0703 (Fig. 3). However, the relative expression results from these 2 analyses did not match perfectly, perhaps due to sequencing biases or different normalization controls.

sRNA target gene prediction
To better understand the role of sRNAs in different biological modules, sRNA-regulating target genes were predicted using IntaRNA (Busch et al. 2008;Wright et al. 2014). Totally, 6043 target genes, which covering 97.11% of ORFs in Anabaena 7120, were found from the prediction using the 418 differentially expressed sRNAs as queries (supplementary Table 3). This indicates that single sRNA could target multiple genes. To further explore the differences of biological modules between the (N+) and (N−) groups, target genes that were predicted using the 49 (N+) unique sRNAs and 46 (N-) unique sRNAs as queries were subjected to functional categorization using the Gene ontology (GO) database (Consortium 2015). In total, 1674 target genes were predicted from the (N+) unique sRNAs, and 1811 were predicted from the (N−) unique sRNAs, while 630 were predicted in both. Functional annotation tool WEIGO (Ye et al. 2006) was used to analyze the target GO annotation terms. As shown in Fig. 4, there is no major difference for the GO analysis between these 2 groups. The GO terms were distributed widely with regard to their respective biological processes, from metabolic process to translation or transcription regulation and signal transducing.

Discussion
With the development of high-through sequencing methods such as RNA-seq, small RNAs have been increasingly recognized as major modulators of gene expression in bacteria (Prévost et al. 2011;Eisenhut et al. 2012;Sakurai et al. 2012;Yan et al. 2013;Behrens et al. 2014;Papenfort et al. 2015 ;Luo et al. 2019). However, only a few studies have sought to identify the global profile of sRNAs in cyanobacteria (Voß et al. 2007;Georg et al. 2009;Voß et al. 2009;Mitschke et al. 2011b;Bi et al. 2018). Flaherty et al. (2011) used the directional RNA-seq to analyze the Anabaena transcriptome during nitrogen step-down. Their RNA-seq data provided information on transcript abundance and boundaries, including detection of operons and the length of the untranslated region (UTR) of each transcript. Totally, they found that 434 and 396 genes were substantially upregulated at 12 h and 21 h, respectively. In contrast, only 32 and 35 genes were downregulated at 12 h and 21 h, respectively (Flaherty et al. 2011). Using the differential RNA sequencing method, Mitschke et al. (2011aMitschke et al. ( , 2011b addressed the differential use of transcriptional start sites (TSS) in Anabaena PCC 7120 after nitrogen depletion in the wild type strain and a hetR mutant which was unable to differentiate heterocysts (Buikema and Haselkorn 1991;Mitschke et al. 2011a). They compared RNA profiles between these 2 strains grown in medium with or without combined nitrogen for 8 h. They identified > 900 putative TSS with a minimal 8-fold change in response to nitrogen deficiency. Among them, 209 were not induced in the hetR mutant indicating their involvement in heterocyst differentiation (Mitschke et al. 2011a). No study has been carried out to characterize the transcriptome in Anabaena PCC 7120 at later steady stage after combined nitrogen deprivation. In this study, we utilized the deep RNA-seq method for the identification of sRNAs in the model cyanobacterium Anabaena sp. strain PCC 7120 under steady state after 12-day nitrogen step-down. We identified 14,132 transcripts in total, which was comparable to transcripts (13,705) identified by Mitschke et al. (2011aMitschke et al. ( , 2011b. Generally, sRNAs are transcribed from the non-coding regions, such as the antisense region and 5′ or 3′ UTR region (Gottesman and Storz 2011;Tsai et al. 2015). So we kept 1219 sRNAs that were located either on the antisense strand or intergenic region for further analysis (supplementary Table  2). Among them, 29 antisense sRNAs and 389 intergenic sRNAs were significantly differently expressed between the (N−) and (N+) samples (supplementary Table 3).
The 2 studies based on deep sequencing mentioned above have shown the presence of multiple non-coding RNA and antisense transcripts (Flaherty et al. 2011;Mitschke et al. 2011a). For example, Mitschke et al.  Table 3 Intergenic region sRNA with more than 10-fold expression change after nitrogen step-down  Table 3 Intergenic region sRNA with more than 10-fold expression change after nitrogen step-down (Continued) Our RNA-seq data showed sRNAs throughout the Anabaena transcriptome. For example, we identified an antisense transcript sRNA0414 covering 21 nt of 3′ end of alr1690 and 73-nt downstream sequence. It partially overlapped with a previously identified cis-acting antisense RNA, α-furA, which was co-transcribed with alr1690 and could modulate ferric uptake regulation Table 3 Intergenic region sRNA with more than 10-fold expression change after nitrogen step-down (Continued)  Fig. 3 Comparison of the expression profiles of selected genes as determined by Illumina Hiseq TM 2000 sequencing (grey) and qRT-PCR (black). Error bars indicated standard deviations of averages from three replicates. sRNAs (1-18) represent sRNA0246, sRNA0320, sRNA0411, sRNA0423, sRNA0468, sRNA0530, sRNA0580, sRNA0645, sRNA0703, sRNA1133, sRNA0092, sRNA0188, sRNA0410, sRNA0699, sRNA0987, sRNA1073, sRNA1153, and sRNA1169, respectively protein FurA (All1691) expression (Hernández et al. 2006;Hernández et al. 2010). Also, we found 8 transcripts around a heterocyst regulatory gene alr3546 (hetF). One of them, sRNA0787, matched the noncoding RNA NsiR1 (Ionescu et al. 2010;Muro-Pastor 2014). This indicates that the NsiR1 might reach its expression peak around 12 h (Ionescu et al. 2010;Muro-Pastor 2014), then maintains a low abundance to regulate target genes at later steady state. Regulation of nitrogen fixation genes is the culminating event of heterocyst differentiation. Our RNA-seq data showed that strong upregulation of sRNA3661 and sRNA3671 that target intergenic region of all1517 (nifB)-all1518 and all1455 (nifH)-all1456 (nifU) after nitrogen step-down (Table 3). These transcripts might be the degradation of long mRNA of these genes, which indicates that nitrogenase enzymes still maintain active functions at later steady state. However, other well-characterized nitrogen fixation genes, such as heterocyst exopolysaccharide synthesis (hep) and heterocyst glycolipid synthesis (hgl) were not observed to be regulated by differently expressed sRNAs here. This indicates that the mature heterocyst might not require an upregulation of these genes at later steady state or not regulated by sRNA presented in this study.

Conclusion
We have comprehensively analyzed sRNA expression profile after nitrogen step-down using Anabaena PCC 7120 and detected 418 differentially expressed sRNAs. Though the targets of sRNAs are currently unknown and need to be further analyzed, the antisense transcripts and intergenic RNAs perhaps play important roles in regulation during the response to combined nitrogen deprivation. This study provides more information about the regulatory network of sRNAs and will advance our understanding of sRNA functions in cyanobacteria.