We have here conducted a comparative analysis using two NGS platforms to clarify the structure of the sSMC in two different patients. We first performed WGS using an Illumina short-read sequencer. Without enrichment, 99% of all the sequence reads were mapped to an off-target region and thus wasted. To avoid this, we applied adaptive sampling, which can be easily performed by running the MinKNOW software with the fasta file of the target region. The ONT guidelines recommend that the target region should be the region ranging from 0.1 to 10% of the whole genome, which in the case of the human genome should be about 3–300 Mb. The recommended enrichment region for adaptive sampling is very suitable for the analysis of sSMC sizes. The increase in the on-target coverage we observed in our present analysis varied between the patient samples and was about 11.5- and 11.0-fold, respectively. The enrichment region in our current analyses was specified as the entire chromosome 19 for the pericentric sSMC of case 1, but only the short-arm of chromosome 10 for the para-centric sSMC of case 2. The enrichment status of the specified region may also vary depending on various factors related to ONT sequencing, such as the number of active pores per flowcell and read length of each sample. The average read length in the on-target region was 7.3 kbp for case 1 and 1.9 kbp for case 2 in our current study, even though these samples were adjusted under almost identical conditions. Few previous studies can be used as a reference for adaptive sampling at present. More data are needed in this regard to optimize the variables in adaptive sampling methods for the efficiency of on-target enrichment.
Long-read sequencing can extract split reads instead of discordant reads so that the breakpoint junctions can be predicted with nucleotide resolution. The breakpoint junctions that include the center of inverted repeats (i.e., BP1 of case 1, BP2 of case 2) were difficult to confirm by PCR because they adopted a hairpin structure during the PCR and sequencing reactions. However, based on predictions from our long-read sequencing data, we were able to design primers that would not form hairpin structures and thus would more readily confirm the breakpoint junctions (Figs. 1 and 2). In this present study, we examined whether cuteSV software, which was reported to have very high performance as a variant caller for long-read sequencers by ONT , could be a supportive tool in detecting structural variants. However, we did not find it to be useful in detecting breakpoints because of a high rate of pseudo-positive variant calling (Tables S1, S2). We speculated in this regard that the number of reads available in GridION is insufficient to detect such complex chromosomal structural rearrangements by software analysis even if the enrichment process was a success. We thus needed to perform manual analysis using IGV software to detect the breakpoints of the chromosomal rearrangements, but long-reads enriched in the target region were found to be sufficient to identify complex rearrangement sites which could not be uncovered by short-read sequencing. We contend from our present findings that the adaptive sampling method is an excellent methodology for accurately predicting a chromosome structure with a small number of reads.
When considering the complex genome rearrangement of sSMCs, it is noteworthy that most of them may consist of non-reference genomic sequences such as centromeric satellite sequences . This may affect the enrichment of adaptive sampling and sequencing because accurate mapping of sequence reads is not possible in this region. In fact, in the present study, sequence read accumulation was observed in the border region of centromere, suggesting that it was affected by non-specific repeat sequences of non-reference region (Fig. 3). As a possible way to solve these sequencing problems, it may be effective to use sequence data from the recently released telomere-to-telomere project for the reference genome . This sequence data includes all centromeric satellite arrays and the short arms of all five acrocentric chromosomes, which will enable accurate mapping of non-reference sequences in adaptive sampling. Excessive mapping to non-reference sequences observed in case 2 may be also avoided by using the complete genome reference.
The two sSMC cases analyzed in our current study were de novo occurrences as the parents of each patient had a normal karyotype. Since both cases presented with somatic mosaicism, the generation of the sSMC is predicted to have occurred post-fertilization in both instances. In order to clarify the underlying developmental mechanism, the genotyping of SNVs was performed with trio samples in case 1 and demonstrated that the sSMC was derived from the mother while the normal chromosomes 19 were derived from both parents. The genotype of the two maternally derived chromosomes in case 1, i.e., the sSMC and a normal chromosome 19, were matched, suggesting that they were sister chromatids. Recently, some sSMCs with a mosaic form and UPD with a normal counterpart have been reported [10, 11, 22]. Kurtas et al. have also proposed a mechanism whereby cells with sSMCs were originally trisomic and generated by meiotic error and that an sSMC is the result of a trisomy rescue in the post-fertilization embryo. It was speculated by these authors that the lagging chromosomes during a segregation error in the early embryo might form micronuclei, which may then undergo complex genomic rearrangements such as chromothripsis to create sSMCs . It is postulated that the same mechanism underlies SMC generation and ring chromosome formations, de novo unbalanced translocations, and insertional translocations when the lagging chromosome is generated in a disomic cell [11, 23, 24]. We believe that the sSMCs in both of our current study cases were likely developed via this mechanism.
It is intriguing to note that both of our current study cases showed inverted duplications, the presence of which implies that the fragmented chromosomes are produced in association with DNA replication. Our breakpoint analysis indicated a small intervening region as a boundary between the inverted and non-inverted segments. The sizes of this small intervening region were 676 and 2181 bp. Further to this, we identified microhomology or microinsertions at the breakpoint junctions. These features are similar to those that form dicentric chromosomes [20, 25] and indicate the involvement of a replication fork block followed by a restart via template switching within the same replication fork. In micronuclei, where complex genome rearrangements are thought to occur, reduced or delayed DNA replication induces a replication stall leading to chromothripsis-like complex chromosomal rearrangements [26, 27]. Reports on previous sSMC cases suggest that copy number gain is a common phenomenon in micronuclei, indicating that impaired DNA replication is a key mechanism in the formation of SMCs [11, 22].
In conclusion, for the structural analysis of sSMCs, particularly in cases that show discontinuous copy number abnormalities, long-read sequencing with adaptive sampling overcomes the complexity issues associated with breakpoints and enables the efficient decoding of sSMC structures. This methodology has allowed us to further speculate on the possible mechanism of sSMC formation, which is likely to involve trisomy rescue via impaired replication and chromosome shattering. Our present results also indicate that long-read sequencing with adaptive sampling is a valuable tool for analyzing complex structural rearrangements.