Whole-genome sequencing (WGS) of WT and CRISPR/Cas9-edited grapevine plants
Recently, we established an efficient CRISPR/Cas9 genome editing system in grapevine and obtained 22 VvWRKY52 mutant plants from 72 T-DNA-inserted transgenic plants12. Here, we designed four CRISPR/Cas9 sgRNAs based on the sequence of VvbZIP36, a gene that has been shown to play a role in drought stress responses47 and obtained one mutant grapevine plant. The Agrobacterium-mediated genetic transformation process used is shown in Fig. 1. In our previous study, we analyzed six potential off-target sites from 12 transgenic VvWRKY52 lines with biallelic mutations for off-targets using target sequencing. No off-target mutations were identified12, but the method used has a limited detection range. To comprehensively evaluate potential off-target effects on a genome-wide scale, we performed WGS of six VvWRKY52 lines with biallelic mutations, as well as one VvbZIP36 mutant and three WT (cv. Thompson Seedless) plants. The sequencing depth was approximately х58–х67. The sequencing depth of each independent line is listed in Table S3. For each gene, four sgRNAs were designed, as shown in Table S1. Three WT plants (control) were regenerated from embryogenic calli, and pro-embryonal masses (PEMs) were induced by floral explants (Fig. 1).
WGS detection of on-target mutations
In our previous study, we tested on-target site mutations in four sgRNAs (sgRNA1, 2, 3, and 4) of Cas9-edited VvWRKY52 lines by Sanger sequencing12. The results showed that the efficiency of gene editing was 28%, 6%, 17%, and 25% for the four sgRNAs. As a result, we obtained a total of 22 mutant plants from 72 T-DNA-containing transgenic plants12. After identifying the targeted mutations, we selected 6 lines (W52_37, 38, 42, 51, 52, and 60) with biallelic mutations for use in WGS. In this study, we also designed four sgRNAs (sgRNA5, 6, 7, and 8) for VvbZIP36 (Table S1) and constructed a CRISPR/Cas9 multitarget vector, which was used for the transformation of Thompson Seedless plants. A total of 85 positive transgenic lines were obtained, of which only one mutant line (B36_45) was identified by Sanger sequencing (Fig. S1), and these lines were selected for WGS.
To select an appropriate reference genome for WGS analysis, the clean reads of 10 samples were mapped to the grape reference genome (PN40024; https://www.ncbi.nlm.nih.gov/) and Thomson Seedless genomes (http://openprairie.sdstate.edu/vitis_vinifera_sultanina/1). Compared with the Thomson Seedless (used for genetic transformation in this study) genome, the mapping rate on PN40024 as the reference genome was higher in all 10 samples (Table S4). One reasonable explanation is that the PN40024 reference genome is more complete than that of Thompson Seedless. Considering the better annotation of the PN40024 reference genome, it was used for the following analysis.
The WGS data suggested that specific on-target mutations were introduced into CRISPR/Cas9-edited but not WT plants (Fig. 2). We found that three sgRNAs (sgRNA1, 3, and 4) induced on-target mutations in VvWRKY52 and one sgRNA (sgRNA5) induced on-target mutations in VvbZIP36 (Fig. 2). For sgRNA1, we detected five mutation types, including short insertions (+1), short deletions (−1, −3, and −8), and large deletions (−29). For sgRNA4, we detected five mutation types, including short insertions (+1) and short deletions (−1, −2, −5, and −11). For sgRNA5, only one short insertion (+1) was detected. In addition, a 52-bp deletion was detected in W52_38 and W52_52 between sgRNA3 and sgRNA4, consistent with the previous results12. These results indicated that different sgRNAs can induce different types of mutations and that the most common types of mutations are short insertions and deletions, indicating that the CRISPR/Cas9 system can be used for precise genome editing in the grapevine.
SNP and indel analysis in WT and Cas9-edited plants
To identify potential off-target mutations, we analyzed the number of SNPs and indels in the 7 Cas9-edited plants. As shown in Table 1, compared to the grape reference genome, between 7,295,904 and 7,463,331 SNPs and between 617,915 and 639,742 indels were present as variants in the three WT plants. Most were genetic variations between Thompson Seedless, used in this study, and PN40024, used as the reference genome. A total of 6,551,278 SNPs and 513,774 indels were common to all three WT plants (Figs. S2, S3). In addition, compared to the grape reference genome, we identified between 7,308,740 and 7,724,670 SNPs and between 621,999 and 718,423 indels in the 7 Cas9-edited plants (Table 1). The variation between the Cas9-edited plants compared to the core variation, namely, the genetic variation shared by all three WT plants compared to the reference sequence of PN40024, was between 757,462 and 1,173,392 SNPs and between 108,225 and 204,649 indels (Table 1). We also identified between 202,008 and 272,397 SNPs and between 26,391 and 55,414 indels in the Cas9-edited transgenic grapevines that were not present in the three WT plants (Table 1 and Figs. S2, S3). For this reason, they were named “private variations”.
The annotation of these variations indicated that the fewest variations occurred in exon regions, and most variations occurred in intergenic regions (Table 2). We found between 27,224 and 35,927 SNPs and between 668 and 893 indels in exonic regions in the WT plants and between 36,549 and 47,086 SNPs and between 898 and 1270 indels in exonic regions in the Cas9-edited plants (Table 2). When analyzing the SNP mutation types, we found that A to G (15.03–15.27%), C to T (19.54–19.92%), G to A (19.57–19.92%), and T to C (15.06–15.30%) were the four most frequent mutations in the Cas9-edited plants (Fig. 3a). The most common indel variations were 1–2 bp in length, and these variations occurred more frequently in Cas9-edited plants than WT plants (Fig. 3b).
Off-target detection in Cas9-edited grapevine plants
To identify possible off-target mutations, the eight sgRNA sequences were aligned with the grape reference genome using Cas-OFFinder software44. Potential off-target sites with ≤5 mismatches in the sgRNAs were selected for further analysis. These comprised 603 (PAM: NGG), 939 (PAM: NAG), and 1730 (PAM: NGA) potential sites (Fig. 4, Fig. S4, Table 3 and Data S1). In the 7 Cas9-edited plants, we found only one indel variation in W52_52 (Table 3), which is likely due to the off-target activity of the Cas nuclease. Subsequently, Sanger sequencing was used to confirm this off-target mutation (Fig. 5). As reported previously, the types of mutations caused by the CRISPR/Cas9 system are often short insertions or short deletions13. Interestingly, the only off-target mutation we found was a 35-bp insertion (Fig. 5). These results suggest that the application of CRISPR/Cas9 to grapevine is highly specific and that few off-target mutations are generated.
Analysis of new off-targets generated by genetic variation among Thompson Seedless and the grape reference genome, PN40024
Considerable genomic variation between the Thompson Seedless cultivar, which is often used for grapevine transformation, and the reference cultivar (PN40024) was identified. Considering that the analysis of potential off-target sites is based on the grape reference genome, such variations might affect the interpretation of the results of this study. To take this into account, we used the 6,551,278 SNPs and 513,774 indels overlapping in the three WT plants to “correct” the grape reference genome (Figs. S2, S3), and the newly generated reference genome was used for potential off-target mutation analysis. This resulted in the identification of 47 (PAM: NGG), 60 (PAM: NAG), and 136 (PAM: NGA) new potential off-target sites (Table 4 and Data S2). When we analyzed the variation in these new potential off-target sites, no mutations marking off-target events were identified based on the WGS data.