Plant metabolites are well recognized and explored. In contrast, the biosynthetic capability of fungi is relatively undervalued. Fungi (>3 million species) have enormous diversity and are rich in biocatalysts and secondary metabolites15,16,17. In particular, fungi produce a large portfolio of volatile terpenoids, including sesquiterpenes (e.g., β-caryophyllene and cubenol18) and monoterpenes (linalool14, limonene14, and 1,8-cineole19). To our knowledge, to date, there are only a few fungal monoterpene synthases reported in the literature (Hyp3, a 1,8-cineole synthase19 and the very recently identified PpSTS25, a bifunctional myrcene/linalool synthase20). Here we address this gap and specifically explored fungal linalool synthases.
Identification of linalool synthase in agaric mushroom
As A. aegerita is known to produce linalool and its genome has been recently sequenced21, we tried to identify the linalool synthase in its genome. In our previous study, we identified 11 putative sesquiterpene synthases and 9 of them are functional and produced various sesquiterpenes but not linalool3. A re-evaluation of the raw genomic data led to an additional putative TPS sequence (AAE3_109435, accession number MN954676). The gene exists in the Illumina sequencing data, whereas it is absent in the PacBio results. PCR amplification of the AAE3_109435 with subsequent sequencing confirmed the presence of the gene in the genome (Supplementary Fig. S1).
The subsequent expression of AAE3_109435 in a GPP-accumulating E. coli that co-expressed the native enzymes DXS, IDI, and ispA_S80F mutant (GPPS). The resulting strain (GPPS+9435) produced the acyclic monoterpene linalool as the main product and small amount of nerolidol (Fig. 2 and mass spectra in Supplementary Fig. S2). Geraniol was detected in GPPS+9435 as well as in its control strain (GPPS_ctrl), indicating that geraniol is not the product of AAE3_109435 but instead that of native E. coli enzymes (such as PhoA, a phosphatase, and NudB, a Nudix hydrolase22). Furthermore, AAE3_109435 was expressed in the E. coli strain that accumulates FPP by overexpressing DXS and IDI (without ispA_S80F), and the strain was named ‘FPPS+9435.’ Its main product (>96% regarding the peak area of all the detected terpenes) was nerolidol and only traces of linalool could be detected (Fig. 2A). The control strain (FPPS_ctrl) with an empty vector produced neither linalool nor nerolidol. The results clearly proved that the TPS coded by AAE3_109435 is a bifunctional LNS, which is able to convert FPP into nerolidol and GPP into linalool. Accordingly, it is named Aa.LNS. Furthermore, the linalool produced by Aa.LNS is mainly (R)-linalool (95% ee, Fig. 2B).
Bioinformatics prediction of other fungal LNSs
Aa.LNS was used to probe other potential fungal LNSs. The first focus was on A. pediades, another sequenced fungal species of the genus Agrocybe. Linalool was detected in the headspace of A. pediades cultures grown in malt extract medium in our laboratory. A blast search of Aa.LNS against the A. pediades genome using the online tool of the Joint Genome Institute (https://genome.jgi.doe.gov/Agrped1/Agrped1.home.html) resulted in 11 TPS homologs (Fig. 2C).
To improve the prediction confidence, we combined two strategies: (1) full-sequence alignment (Fig. 2C and protein sequence in Supplementary Data 1) and (2) comparison of predicted active sites (Supplementary Tables S1 and S2, the results were analyzed with 4LXW (Epi-isozizaene synthase from Streptomyces coelicolor) and 5NX5 (PDB ID, the bacterial linalool synthase from Streptomyces clavuligerus, or Sc.LNS) as templates). Here 4LXW and 5NX5 are chosen based on the two criteria: (1) the higher sequence similarity to Aa.LNS and (2) the availability of large ligands (either substrate or product analogs) in the crystal structures, which can facilitate the identification of the active-site residues. Four TPS homologs (Agrped1_820868, Agrped1_694262, Agrped1_689671, and Agrped1_689675) were found to be closely related to Aa.LNS (Fig. 2C). It was hypothesized that the two enzymes Agrped1_820868 and Agrped1_694262 are more similar to Agr8 (a γ-muurolene/β-cadinene synthase)3 as all three enzymes share almost identical active sites based on our algorithm (Supplementary Table S1). Hence, we studied Agrped1_689671 and Agrped1_689675. As predicted, the strain expressing Agrped1_689671 produced linalool and nerolidol (renamed Ap.LNS) similar to Aa.LNS (Fig. 2D). Surprisingly, linalool was the sole terpene product of the E. coli clone expressing Agrped1_689675, indicating that it is a monofunctional linalool synthase (renamed Ap.LS). This attributes to the high specificity of Ap.LS, which is underpinned by the fact that FPP is more abundant in microbial cells than GPP13. The high specificity of Ap.LS is very interesting, possibly due to steric hindrance of some amino acid residues surrounding the binding pocket to the larger substrate FPP. And we will look into this in the later section of this article. In addition, Ap.LS produced the enantiopure (R)-linalool (Fig. 2B).
Next, we asked whether we could use Ap.LS to probe other fungal linalool synthases. We carried out a UniProt BLAST search of Ap.LS and collected those hits with the highest alignment score (score >700, Supplementary Fig. S3): three from Galerina marginata (Galma_223690, UniProt ID A0A067THX9; Galma_63556, A0A067T8I8; Galma_266794, A0A067T571); two from Hypholoma sublateritium (Hypsu1_148365, A0A0D2NH86; Hypsu1_148385, A0A0D2NA50), and one from Hebeloma cylindrosporum (M413_27416, A0A0C2YLE7).
Four of them (Galma_223690, Galma_63556, Hypsu1_148365, and Hypsu1_148385) clustered into a branch or subgroup with Ap.LS and Ap.LNS in both analyses using full-sequence and active-site alignment (Fig. 3A, B and protein sequence in Supplementary Data 1). Overall, the six homologs share >90% identity of active-site residues (Supplementary Tables S1 and S2). Particularly, the active sites of Hypsu1_148385 and Galma_223690 show 95% (36/38) and 97–100% (37–38/38) identity with that of Ap.LS and Ap.LNS, respectively, indicating that they are potential fungal LNSs. The other two amino acid sequences (Galma_266794 and M413_27416) were more closely related to Agr8, with as high as 95% active-site identity (Supplementary Table S1). Thus it was hypothesized that they were more likely to produce 1,10 cyclization products of FPP, e.g., muurolene, cadinene. This hypothesis was validated by the expression of Galma_223690, Hypsu1_148385, or Galma_266794 in the FPP-accumulating E. coli. As predicted, Galma_223690 and Hypsu1_148385 were found to be bifunctional LNSs (Fig. 3C), thus renamed Gm.LNS and Hs.LNS, respectively. Moreover, like Agr8, the strain expressing Galma_266794 produced germacrene D (1,10 cyclization) as the main product and a few minor products (γ-muurolene and (+)-δ-cadinene, Supplementary Fig. S4), thus validating our hypothesis.
Here the synergistic use of BLAST search, full-sequence alignment, and active-site alignment was explored. As such, we achieved a relatively high predictability of hunting for biocatalysts of the same function (e.g., linalool synthases). Such a method can be potentially applied for the identification of all kinds of enzymes. BLAST search helps with identifying overall similar enzymes that form our initial screening candidates. Full-sequence alignment and phylogenetic tree facilitate the classification of enzymes. Different classes often have distinct catalytic functions (e.g., different cyclization positions). Active-site prediction further supplements the prediction with two main roles. One is to filter out those enzyme candidates with incomplete binding pockets (e.g., Agr11 is missing the NSE triad, Supplementary Tables S1 and S2). The other is to complement the enzyme classification of full-sequence alignment. This is based on the hypothesis that enzymes of the same function may have overall low similarity but more conserved active sites. For example, the Galma_266794 shares comparable full-sequence similarities with Ap.LS (61%) and Agr8 (62%); however, it has much higher active-site identity with Agr8 (95%) than with Ap.LS (71%, Supplementary Table S1). Indeed, the products of Galma_266794 are similar to that of Agr8.
Purification and characterization of fungal linalool synthases and LNSs
To date, a number of plant linalool synthases and LNSs have been identified. However, only one bacterial LNS from S. clavuligerus was recently identified11, which only shares 15.2% identity with Ap.LS (Supplementary Fig. S5). With the fungal enzymes studied in this work, linalool synthases and LNSs in three kingdoms have been identified. Next, we sought to compare their catalytic activities and mechanisms by in vitro, in vivo assays, sequence alignments, and three-dimensional (3D) structural models.
Protein purification is the prerequisite to study the in vitro kinetics of the fungal enzymes. Though all the five bacterial strains expressing fungal enzymes produced linalool, their expression levels in E. coli were largely different. Aa.LNS had the highest expression level, followed by Gm.LNS, Ap.LNS, and Ap.LS. The expression of Hs.LNS (Hypsu_148385) was so low that it was not detectable in a protein gel (Supplementary Fig. S6A). As Aa.LNS had the highest expression level and Ap.LS is the only specific monoterpene synthase, they were chosen as the representatives of fungal LNS and linalool synthase for further studies. However, none of them was soluble based on solubility analysis with B-PER II reagent (Thermo Scientific™) (Supplementary Fig. S6A). Many approaches were tested but failed to improve their solubility (such as abiotic condition optimization: lowering incubation temperature, tuning inducer dosages, media additives, and protein fusion). Refolding of insoluble fraction could be another solution which we did not test because it is very time-consuming to optimize the best conditions. The N-terminal fusion of Aa.LNS with a maltose-binding protein or thioredoxin did not help (Supplementary Fig. S6B). Different chaperone systems (DnaK-dnaJ, GroES-GroEL) and trigger factor (TF) in E. coli were further tested. It was found that TF chaperone could slightly improve the solubility of the synthases. With the optimal condition (3.3 mM arabinose to induce TF chaperone and 0.1 mM IPTG to induce Aa.LNS, Supplementary Fig. S7) and further separation by size exclusion chromatography, we managed to purify enough soluble Aa.LNS for in vitro characterization. Yet its purity was quite low with ~16.3% (Fig. 4A and full gel image at Supplementary Fig. S10). In contrast, relatively high purity of soluble Ap.LS (~71.2%) was obtained with the same experimental conditions (Fig. 4B and full gel image at Supplementary Fig. S11). Consistent with the E. coli cultures producing the respective synthase, purified enzymes reconfirmed that Aa.LNS can use FPP and GPP to produce nerolidol and linalool, respectively. However, Ap.LS was only active with GPP but not with FPP (Fig. 4). Based on the data in Fig. 4A, B (Supplementary Data 2 and 3), Km and kcat values of Ap.LS and Aa.LNS were calculated. The Km and kcat values of Aa.LNS for FPP were 9.0 ± 2.3 μM and 3.3 ± 0.3 min−1, respectively, and slightly lower for GPP with 6.7 ± 4.6 μM and 0.5 ± 0.1 min−1, respectively (Table 1). The Km value of Ap.LS for GPP with 3.8 ± 0.7 μM was slightly lower than that for Aa.LNS, whereas kcat was much higher with 6 ± 0.3 min−1. To compare the catalytic efficiencies among the known linalool synthases and LNSs, kcat/Km value of Ap.LS was the highest, which is about 21-fold, 29-fold, 3-fold, and 4-fold higher than that of Aa.LNS, La.LS (Q2XSC5) from Lavandula angustifolia (Lavender)5, Ma.LS (Q8H2B4) from Mentha aquatica23, and of the bacterial Sc.LNS11, respectively. As for Aa.LNS as a nerolidol synthase, although the kcat value of Aa.LNS for FPP was more than five times lower compared to the bacterial one, the Km value was similar to that of the bacterial Sc.LNS11 and about half as that of Zm.LNS from Zea mays (Maize) (Table 1)24.
In vivo activity comparison of linalool synthases and LNSs from three kingdoms and applications in linalool production
Due to potential issues such as poor expression and solubility when expressed in cells and the localization difference (cytosolic/membrane bound in vivo versus a one-pot aqueous reaction), the advantages of in vitro enzyme kinetics (Table 1) may not be readily transferable into cellular applications, such as metabolic engineering, where in vivo activities are more critical than in vitro ones. To test the best candidate for microbial linalool production, our previously engineered E. coli strain was used to compare linalool synthases from three kingdoms: Ap.LS, Sc.LNS, and Cb.LS from Clarkia breweri (Q96376) as representatives for fungi, bacteria, and plantae, respectively. They were separately cloned into pET-11a vector (Novagen). Together with a p15A vector carrying the whole mevalonate pathway genes2, the bacterial strains grown in ZYM media produced linalool at 381.2, 8.7 and 1.3 mg/L for fungal, bacterial, and plant linalool synthases, respectively (Fig. 4C). The linalool yield using Ap.LS (fungal) is about 44- and 287-fold as high as that using Sc.LNS (bacterial) and Cb.LS (plant), respectively. As the bacterial densities for different strains are similar, around 10–12 (Fig. 4D), the high yield of linalool in the Ap.LS strain was because of its relatively high in vivo activity (here we refer to the total activity that is the result of both the specific activity and the amount of active enzyme) but not of biomass. A previous study also supported that the bacterial Sc.LNS is better than plant linalool synthases in linalool production in terrific broth (TB) media10. In the same TB media, the linalool titers reached 601.2 mg/L for Ap.LS stain, about 65% higher than previously reported using Sc.LNS10. Our study demonstrated that fungal Ap.LS is even superior to the bacterial one, in both activity and selectivity. Although Sc.LNS has a higher activity than plant Cb.LS, it prefers FPP (lower Km and higher kcat) to GPP as the substrate11. Therefore, Sc.LNS produced a larger amount of nerolidol than linalool in E. coli whose cytosol contained both FPP and GPP; in contrast, Ap.LS produced 100% linalool.
High activity contributes to high titers, rates, and yields (TRYs) of linalool production and low manufacturing cost. High specificity would greatly simplify the downstream purification process and further reduce the overall production cost. The superior activity and selectivity of Ap.LS make it more suitable for microbial production of linalool than its plant and bacterial counterparts. Thus, this study sets up a foundation for future works of linalool bioproduction that is greener, safer, sustainable, and of exceptional enantiopurity ((R)-linalool), as compared to chemical synthesis. However, to translate into commercial applications, more studies are required to further improve the linalool TRYs and to overcome the toxicity issue of linalool, which can be addressed by genetic engineering (e.g., metabolic engineering, efflux transporter engineering25), directed evolution, and bioprocess developments (e.g., in situ product remove fermentation using suitable liquid solvents and/or solid absorbents)26.
Structural comparison of linalool synthases and LNSs from three kingdoms (plants, fungi, and bacteria)
Next, we generated a phylogenetic tree with 35 plant enzymes (including 4 nerolidol synthases, 9 linalool synthases and 22 LNSs), 1 bacterial LNS (D5SL78), and 9 fungal enzymes (Supplementary Table S3). The enzymes were clearly separated into two major clades (one is plant, clade 1, and the other is microbial, clade 2, Fig. 5). The bacterial LNS was closer to the fungal ones, in clade 2. Specifically, the sequence identity among fungal, bacterial, and plant LNSs or linalool synthases are only 8–15%, which includes those metal-binding sites (Supplementary Fig. S5). Overall, plant linalool synthases/LNSs are larger with 500–900 amino acids than microbial ones with 300–400 amino acids. One particular enzyme D8RNZ9, which is a LNS isolated from the nonseed plant Selaginella moellendorffii (Spikemoss)27, is more closely related to the bacterial LNS than plant LNSs. It was hypothesized that it could stem from horizontal gene transfer from microbes to plants or that seed plants lost these LNS enzymes during evolution from nonseed plants27. The phylogenetic tree indicates the evolutionary divergence of fungal, bacterial, and plant linalool synthases and LNSs.
Subsequently, we compared the protein structures of the linalool synthases from the fungus (A. pediades), the bacterium (S. clavuligerus), and the plant (M. aquatica). The crystal structure of Sc.LNS is 5NX5 (PDB ID)10. The homolog models of Ap.LS and Ma.LS (Q8H2B4) were built based on the crystal structure of (+)-bornyl diphosphate synthase from Salvia officinalis (1N1B/1N21)28 and 1,8-cineole synthase from S. clavuligerus (5NX7)10, respectively. As the sequence similarity between Ap.LS and 1,8-cineole is only 20.3% (and active-site identity is 49%), the Ap.LS model may have some deviations from its real structure. Nevertheless, their active-site regions are highly conserved (Fig. 6A). Fungal linalool synthase and bacterial linalool synthase are much more similar to each other than to plant linalool synthase in both active-site regions (Supplementary Table S4) and overall structures (Fig. 6B–D). As a typical plant monoterpene synthase, Ma.LS has two domains (α and β domains) and thus is noticeably larger than the other two synthases with active site residing only in the α domain (catalytic domain, Supplementary Fig. S8). In contrast, microbial linalool synthases, Sc.LNS and Ap.LS, are similar to typical class I terpene cyclases with a single domain, despite with acyclic products (Supplementary Fig. S9). Both GPP (its analog, 2-fluorogeranyl diphosphate) and FPP were docked into the three models. We mainly analyzed the interactions of the three linalool synthases with GPP. With 9 hydrophobic interactions with GPP, Ap.LS had the highest amount, as compared to 7 of Sc.LNS and 6 of Ma.LS (Fig. 6B–D). Except for the negatively charged pyrophosphate (Ppi) head, GPP is largely hydrophobic, thus these hydrophobic interactions may contribute to the high activity of Ap.LS. The number of hydrogen bonds identified for the three enzymes was similar. In addition, Ap.LS had the highest binding affinity (−7.6 kcal/mol) to GPP, followed by Sc.LNS (−7 kcal/mol) and Ma.LS (−6.4 kcal/mol). The binding affinity inversely correlated with the Km values of the three enzymes (Table 1), where higher binding affinity contributed to a lower Km value. As compared for Ap.LS and Ap.LNS, Ap.LS has higher binding affinity to GPP than Ap.LNS (−7.3 kcal/mol) but lower binding affinity (−8.5 kcal/mol) to FPP than Ap.LNS (−9.0 kcal/mol). The binding affinity data are nicely correlated with their difference in monoterpene and sesquiterpene activities.
Furthermore, we superimposed the 3D structures of the active sites of the three enzymes (Fig. 7A and Supplementary Table S4). Residues in the binding pocket of plant linalool synthase showed the greatest divergence (green regions, Fig. 7A), although structure folding remains conserved. Overall, as shown in the gray regions (Fig. 7A) and highlighted in Supplementary Table S4, there were 8 conserved residues among the three enzymes, including the aspartate-rich motif, D(D/E)XXD, responsible for Mg2+ cofactor and substrate binding1 and NSE triad, (N/D)Dxx(S/T)xxxE, responsible for the substrate binding and coordination of the diphosphate and trinuclear Mg2+ [PPi-(Mg2+)3] cluster1. All these structural analyses partially explain the activity difference among the three linalool synthases from different kingdoms. Nevertheless, there are other factors that might also contribute to the high in vivo activity of Ap.LS in E. coli, such as the non-active-site residues, protein expression, and solubility (Supplementary Fig. S6).
Through the structural comparison of linalool synthases from different species, we have observed that the binding affinities of enzymes to the substrates (GPP or FPP) nicely correlated with their activities, especially the Km values, where higher binding affinity typically contributes to a lower Km value. However, it is much more complex to explain the difference in kcat values, due to the large structural difference among these enzymes. To address these, molecular dynamics simulation is advantageous in evaluating the dynamic interactions between enzyme and substrates, intermediates, and products. To do that, it is more appropriate to use the accurate crystal structures of these enzymes, which is one of our future works.
Mechanism study on the Ap.LS selectivity
Lastly, we attempted to understand the specificity of Ap.LS as compared to other fungal LNSs. Particularly, Ap.LNS and Ap.LS share the highest identity 77.9%; hence, we compared their difference of residues surrounding the substrate-binding pocket. In total, five residues were found to be different (Fig. 7B and Supplementary Table S5): (1) in the Ppi head region (A59:S58, E316:Q315, I153:V152, here, the former and latter residues refer to that of Ap.LS and Ap.LNS, respectively, especially that E316 is expected to have charged interactions with the Ppi group of GPP/FPP); (2) in the FPP tail region (L60:M59, which could affect the interaction with FPP); and (3) in GPP tail region (G181:A180, which might affect flexibility of helix where the tail end of GPP resides). A series of single mutants were constructed for Ap.LS, A59S, L60M, G181A, and E316Q. However, none of these single mutants had effect on the selectivity of Ap.LS (Fig. 7E). We speculated that the specificity might be the synergistic result of multiple residues. The region A59–L60 was particularly interesting, as A59 and L60 are in close proximity to both the Ppi head and hydrocarbon tail of FPP (Fig. 7D). Indeed, the combination of A59S and L60M mutations resulted in the production of a trace amount of nerolidol, ~2% of total amount of linalool and nerolidol produced (Fig. 7C, E), which indicated that the two mutations are sufficient to convert Ap.LS from a monofunctional linalool synthase to a bifunctional LNS. Adjacent to A59–L60, another residue is also different between Ap.LS (V61) and Ap.LNS (I60). Although single mutation V61I had no effect on the selectivity of the wild-type Ap.LS, the introduction of V61I enhanced the nerolidol production by 12-fold (~40% of total linalool and nerolidol produced) and decreased linalool production by 45% on the basis of the double mutant A59S–L60M (Fig. 7C, E). It seems that the mutation L60M–A59S favors sesquiterpene activity (nerolidol formation) by stabilizing the ligand in a favorable position (A59S) and by promoting the easier leave of the Ppi group from the binding pocket (L60M, Fig. 7D). The third mutation V61I further enhances the effect by pushing M60 and S59 closer to FPP (Fig. 7D).
In addition, we also combined the mutations of two regions and obtained the quadruple mutant A59S–L60M–V61I–E316Q; however, it did not further increase nerolidol production. Furthermore, we observed that the wild-type Gm.LNS, whose corresponding residues are more similar to those 59–61 of Ap.LS (S59:A59, L60:L60, V61:V61, Supplementary Fig. S9), produced the highest amount of linalool (15%) among all the wild-type fungal LNSs (Fig. 3C). As such, we concluded that the region residues (A59–V61) play an essential role for the high specificity of Ap.LS.
Lastly, we further tested whether the mutation of the same region could change the selectivity of Ap.LNS to produce only linalool. By introducing 5 mutations (S58A+M59L+I60V+T62P+V63L, or “58–63” mutant in Fig. 7F) or 6 mutations (S58A+M59L+I60V+T62P+V63L+E64G, or “58–64” mutant), linalool percentage was increased from 11% (wild type) to 86% (“58–64” mutant). However, none of the mutation could completely eliminate the nerolidol production, indicating that additional residues are also playing roles in regulating the selectivity.
Here, aided by structural comparison of Ap.LS and Ap.LNS, we managed to identify the key residues that alter the selectivity of Ap.LS. The success here has two broad meanings. First, such a structure-based method is of general application in understanding the catalytic mechanism. Our method works even without protein crystal structures but using homology model. And recent achievement by the Google DeepMind’s AlphaFold 2 (https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology) can further support our method. Second, the identification of key residues inspires future studies, such as the rational design and engineering of linalool synthases, nerolidol synthases, or bifunctional linalool and nerolidol synthases. More broadly, it might also encourage studies in understanding the selectivities of various TPSs. It was known that some are highly specific and have only a single product, whereas others have multiple products, particularly, the γ-humulene synthase from Abies grandis, which generates 52 different sesquiterpenes29. The underlying catalytic mechanism is fascinating but not fully understood. Our study here provides some insights, and further research is required.