Phylogeny-based genome mining is based on the understanding of the mostly modular structure of bio-synthetic gene clusters.It is theorized that this mostly modular structure comes from a quickly evolving defense system where new molecules are produced by randomly swapping and shuffling domains and modules. As an example, the program Natural Product Domain Seeker , constructs a phylogenic tree based on ketosynthase domains and condensation domains of PKS and NRPS genes, respectively. KS and C domains are two of the enzyme families used to construct phylogenic trees in order to predict compound structures. This phylogenic tree can be utilized to give information about the function of the PKS and NRPS gene searched for, its evolutionary history, and the novelty of products produced in the secondary metabolite cluster containing the gene. Lastly, resistance gene directed genome mining and target directed genome mining involve identifying bio-synthetic gene clusters that contain self-resistance genes. For organisms that produce antibiotics or anti-fungals, there needs to be a development of a self-resistance method to avoid suicide. One self-resistance mechanism is the use of efflux pumps to transport the compounds to extracellular space. Another self-resistance mechanism involves the inclusion of self-resistance enzymes in bio-synthetic gene clusters. These SREs are mutated copies of the housekeeping target that retain activity and are not inhibited by the natural product produced, thereby keeping the organism alive. These SREs are typically found in secondary metabolite clusters. Therefore, an approach searching for these SREs can be developed to find bio-synthetic gene clusters.
Utilizing this knowledge, Moore et al. were one of the first groups to utilize a targeted genome mining approach. They screened for housekeeping copies of genes in 86 similar strains of Salinospora and screened for location near bio-synthetic gene clusters. They identified the second copy of a bacterial fatty acid synthase colocalized within a cluster that contained a PKS-NRPS hybrid gene. They annotated the cluster, heterologously expressed the genes,flood table and after chemical characterization, elucidated that the cluster produced thiolactomycin, which is a fatty acid synthase inhibitor. To demonstrate its capabilities, target directed genome mining has been used to locate bio-synthetic gene clusters with known bio-molecular targets, for discovering natural products with desired bio-molecular targets, and for discovering the bio-molecular targets of known natural products. Therefore, searching for resistance enzymes in a secondary metabolite cluster has become an increasingly appealing genome mining approach for finding new clusters and subsequently, novel natural products. In recent years, tools and programs have been developed to search for new bio-synthetic clusters more quickly. These programs have the ability to predict the entire secondary metabolite gene cluster.Anti-SMASH identifies polyketide synthase and non-ribosomal peptide synthetase core genes in potential clusters and then outputs the cluster information in a user-friendly interface that can be readily searched through. Secondary Metabolite Unknown Regions Finder is another one of these programs. SMURF evaluates secondary metabolite gene clusters by scoring the nearness of core genes with the different tailoring genes near the core gene. Additionally, there is another program that is more specified in its search called Antibiotic Resistance Target Seeker . ARTS specifically queries for antibiotic resistance genes in bacteria that can lead to bio-synthetic gene clusters for possible novel drug targets.
We utilized anti-SMASH to elucidate the non-plant olivetolic bio-synthetic pathway.Since its inception, the Tang lab has utilized various methods of genome mining to identify many natural products and novel enzymes, as well as elucidate the bio-synthetic pathways of natural products in addition to the production of novel natural products through the engineering of bio-synthetic genes. One such example is the further characterization ofthe bio-synthetic pathway of zearalenone, a member of the resorcylic acid lactone family of products produced from the fungal species Gibberella zeae, and production of novel resorcylic acid lactones achieved through the reconstitution of the polyketide synthase involved in the biosynthesis of zearalenone. RALs are polyketides, exclusively produced by fungi, consisting of a macrolactone ring with a 2,4-dihydroxybenzoic acid moiety embedded. The first discovered RAL, radicicol was characterized from the fungal species Monocillium nordinii in 1953, with 200 more RALs having been identified from a variety of fungal species since then. RALs are potent molecules that exhibit of variety of biological activities including having antimalarial, anti-cancer, anti-microbial, mitogen activating protein -kinase inhibitor, TAK1 inhibitor, heat shock protein inhibitor, and estrogen receptor against properties. Many RALs consist of 14 membered lactone rings, although there also exists RALs consisting of 10, 12, and 16 membered lactone rings. The RAL bio-synthetic gene cluster typically consists of two polyketide synthases: a highly reducing polyketide synthase and a non-reducing polyketide synthase . Regarding the bio-synthetic pathway of RALs, the HRPKS generates the terminal hydroxyl group that becomes the macrocyclizing nucleophile. The chain is then transferred to the NRPKS where it is further elongated and then goes through aldol cyclization to form the enzyme bound resorcylic thioester. A fused thioesterase domain in the NRPKS then performs macrocyclization to release the final RAL product. Furthermore, considerable structural diversity at the C6 position of the RAL can be generated by utilizing different HRPKSs that are able to synthesize a variety of reduced products.
Type I polyketide synthases contain multiple functional and catalytic domains, generating most of the polyketides that have been characterized. Furthermore, type I polyketide synthases are divided into two separate categories: iterative type I and modular type I. Modular type I polyketide synthases are more commonly found in bacteria. They are large multimodular enzymes having assembly line like characteristics, condensing acyl substrates module by module, where the order of the module defines the order of the functional groups of the final elaborated compound. Iterative type I polyketide synthases, more commonly found in fungi, contain a single multidomain, and iteratively use the domain, similarly to fatty acid synthases operate, to generate the programmed polyketide product. Type III polyketide synthases are found in plants , although a few have been elucidated from microbes, and are much smaller than type I and type II PKSs. They are homodimers of ketosynthases; therefore, they extend chain length through iterative decarboxylative Claisen condensation and are responsible for producing compounds such as stilbene, flavonoids, and alkylresorcinols from plants. Type III PKSs release their products to either the active site cysteine of the enzyme or the carrier molecule, coenzyme A. There have also been reports of Type III PKSs utilizing an acyl carrier protein bound substrate as the starter substrate, similar to type I and type II PKSs.Since our platform utilizes two type I iterative polyketide synthases, it is appropriate to go into more detail concerning these megasynthase enzymes. Fungal PKSs resemble bacterial type II PKSs in that the catalytic domains of both classes of enzymes are iteratively utilized during polyketide synthesis and resemble bacterially type I modular PKSs in that the catalytic domains of both fungal PKSs and bacterial type I modular PKSs are linearly arranged. However, fungal PKSs differ from bacterial type I modular PKSs in rules dedicated to chain elongation, regioselective cyclization, and starter-unit selection.There are three types of fungal polyketide synthases: highly reducing polyketide synthases , partial reducing polyketide synthases , and non-reducing polyketide synthases .HRPKSs generate highly reduced compounds that can be furthered modified to produce compounds such as lovastatin. Fungal HRPKS domains contain, minimally, a ketosynthase domain, a malonyl-CoA: acyl carrier protein transacylase domain, and an acyl carrier protein domain. These HRPKSs also contain tailoring domains such as an enoyl reductase domain, a dehydratase domain, a methyltransferase domain, and a ketoreductase domain. These domains are interactively utilized to produce the reduced polyketide product, with the HRPKS employing the tailoring domains in different arrangements for each extension cycle. PRPKSs typically synthesize phenolic aromatic compounds such as 2,4-dihydroxybenzene and 6-methylsalicylic acid . As their name implies, these enzymes utilize their iterative domains to generate partially reduced polyketide compounds. The ketoreductase domain is the key domain controlling the reductive programming in PRPKSs, through judicious reduction of the polyketide compounds.
6-MSA is a perfect example of this, with the PRPKS responsible for producing 6-MSA undergoing just one round of reduction by the KR domain and one round of dehydration by the DH domain. NRPKSs, similar to HRPKSs and PRPKSs, minimally contain KS, AT, and ACP domains. Separate from the other two polyketide synthase types, however, NRPKSs also harbor a starter unit: acyl carrier protein transacylase domain that takes up the starter unit,4×8 flood tray and a product template domain which acts as an aldol cyclase. They also may contain a methyltransferase domain and usually contain a domain for product release such as a thioesterase domain. The SAT domain’s role is to take up the starter unit, and to transfer the starter unit onto the ACP domain where it is moved to the KS domain, undergoing decarboxylative Claisen condensation with an extender unit transferred from the AT domain. An example of a starter unit would be a malonyl-CoA unit or if in conjunction with a HRPKS, the product produced from the HRPKS. Iterative use of these domains of the NRPKS extend the chain and the PT domain cyclizes the product and then the product is programmed for release by the releasing domain. All the RALs elucidated contain the 2,4-dihydroxybenzoic acid moiety otherwise known as the β-resorcylic acid moiety, the same moiety comprising the core of tetrahydrocannabinol, cannabidiol, cannabigerol, and the rest of the cannabinoids from the Cannabis sativa plant. Furthermore, the first key intermediate in the cannabinoid bio-synthetic pathway is olivetolic acid, a β-resorcylic acid with a pentyl alkyl chain at the C6 position. Olivetolic acid is found in small quantities in Cannabis sativa extracts; therefore, this key intermediate is expensive. Additionally, although not fully studied for its biological activity, it is proposed to have antimicrobial, photoprotective, and cytotoxic activities. Due to the similarities between olivetolic acids and RALs which the Tang lab is quite familiar with, we hypothesized that fungal bio-synthetic pathways containing a tandem PKS pair may be able to produce olivetolic acid or related molecules that vary in the C6 position chain length and saturation. Therefore, we hypothesized that, by using genome mining to look for tandem fungal polyketide synthases, we could find a bio-synthetic gene cluster in fungi that produces olivetolic acid. The terminal TE domains in the NRPKSs that produce RALs are responsible for the macrocyclization reaction. In order to produce resorcylic acid instead of RALs, the releasing enzyme must catalyze a hydrolysis reaction instead of esterification. In fungal PKSs, TEs that catalyze hydrolytic release have been characterized and are typically free-standing enzymes. With this in mind, we performed genome mining of sequenced fungal genomes for bio-synthetic gene clusters that encode a HRPKS, a NRPKS, and a standalone TE. Among theclusters identified by antiSMASH,one set of homologous clusters satisfied this particular criterion . The ova cluster from Metarhizium anisopliae encodes a typical HRPKS and a NRPKS that is not fused to a terminal TE domain. Instead, a didomain enzyme Ma_OvaC containing an N-terminal ACP and a C-terminal TE is present in the cluster. Further sequence analysis of the ACP domain showed the well-conserved DSL triad in all functional ACPs, in which the serine is post-translationally phosphopantetheinylated, is mutated to NQI.This suggests the ACP domain is unlikely to carry out the canonical function of acyl chain shuttling, thus the enzyme is designated as a ψACP-TE. Previously, a ψACPmethyltransferase fusion enzyme was found in a fungal PKS pathway, in which the ψACP facilitates protein-protein interactions between the NRPKS and the ψACP- MT to enable methylation of the growing polyketide intermediate.Hence, we hypothesize the ψACP domain in Ma_OvaC may have a similar role in facilitating the catalytic function of the TE domain on a PKS-bound intermediate. The M. anisopliae cluster contains additional genes encoding a transcriptional factor and a flavin-dependent monooxygenase. Alignment of homologous clusters from various fungal species showed that HRPKS, NRPKS, and ψACP-TE are conserved , including the inactivated ACP triad . None of these clusters have been characterized and no product has been reported in the literature. Based on these analyses, we predict that the trio of HRPKS, NRPKS, and ψACP-TE will make resorcylic acids that are structurally related to OA.To examine the product profile of the ψACP-TE containing pathways, we heterologously expressed Ma_OvaA, B, and C in the model fungus Aspergillus nidulans A1145 ΔSTΔEM strain.This strain has been used in reconstitution of fungal bio-synthetic pathways, and contains genetic deletions that inactivated biosynthesis of endogenous metabolites sterigmatocystin and emericellamide B.