Ultra-fast proteomics with Scanning SWATH

Latest Research

Materials

Water (LC–MS grade, Optima, no. 10509404), acetonitrile (LC–MS grade, Optima, no. 10001334), methanol (LC–MS grade, Optima, no. A456-212) and formic acid (LC–MS grade, Thermo Scientific Pierce, no. 13454279) were purchased from Fisher Chemicals. Human cell lysate (MS Compatible Human Protein Extract, Digest, no. V6951) and trypsin (Sequence grade, no. V511X) were purchased from Promega. DL-dithiothreitol (BioUltra, no. 43815), iodoacetamide (BioUltra, no. I1149) ammonium bicarbonate (eluent additive for LC–MS, no. 40867), yeast nitrogen base without amino acids (no. Y0626) and glass beads (acid washed, 425–600 µm, Sigma, no. G8772) were purchased from Sigma-Aldrich. Urea (puriss. P.a., reag. Ph. Eur., no. 33247H) and acetic acid (eluent additive for LC–MS, no. 49199) were purchased from Honeywell Research Chemicals. Rosuvastatin calcium (no. S2169), fluvastatin sodium (no. S1909), pyrimethamine (no. S2006), pitavastatin calcium (no. S1759), pemetrexed disodium hydrate (no. S7785), pravastatin sodium (no. S3036), clotrimazole (no. S1606), miconazole nitrate (no. S1956), lovastatin (no. S2061), ketoconazole (no. S1353), atorvastatin (no. S2077), methotrexate disodium (no. S5097), simvastatin (no. S1792), uniconazole (no. S3660) and itraconazole (no. S2476) were purchased from Selleck Chemicals, and pralatrexate (no. A4350) was purchased from APExBIO. Control samples for the SARS-CoV-2 study were prepared from commercial human plasma (EDTA, Pooled Donor, Genetex, no. GTX73265).

Clinical samples from patients with COVID-19

Sampling was performed as part of the Pa-COVID-19 study, a prospective observational cohort study assessing the pathophysiology and clinical characteristics of patients with COVID-19 at Charité Universitätsmedizin Berlin47. All patients with SARS-CoV-2 infection as proven by positive PCR from respiratory specimens and willing to provide written informed consent were eligible for inclusion. Exclusion criteria were refusal to participate in the clinical study by the patient or their legal representative, or clinical conditions that do not allow for blood sampling. The study assessed epidemiological and demographic parameters, medical history, clinical course, morbidity and quality of life during the hospital stay of patients with COVID-19. Serial, high-quality biosampling consisting of various sample types with deep molecular, immunological and virological phenotyping was performed. Treatment and medical interventions followed the standard of care as recommended by current international and German guidelines for COVID-19. The severity of illness in the present study follows the WHO ordinal outcome scale48. The Pa-COVID-19 study was carried out according to the Declaration of Helsinki and the principles of Good Clinical Practice (International Conference on Harmonization 1996) where applicable, and was approved by the ethics committee of Charité Universitätsmedizin Berlin (no. EA2/066/20).

Sample preparation

The human cell lysate was obtained commercially (Promega) and was dissolved in 0.1% formic acid. Plasma samples were prepared as previously described23.

The yeast samples for drug response measurements were prepared and digested as follows: The auxotrophic S. cerevisiae strain BY4741 (∆his3, ∆leu2, ∆ura3, ∆met15) was rendered prototrophic by genomic knock-in of the missing genes. This prototrophic, wild-type strain was grown on agar plates containing synthetic minimal medium for 3 days. Subsequently, colonies were inoculated in synthetic minimal liquid medium (25 ml) and incubated at 30 °C for 1 day. The yeast culture was transferred to 96-deep-well plates and drugs were added to achieve a working concentration of 10 μM (1 ml total volume per well). The yeast culture was incubated at 30 °C and was grown overnight to exponential phase. Cells were pelleted by centrifugation at 3,220 relative centrifugal force for 5 min, the supernatant was discarded and plates were stored at −80 °C until further processing.

200 μl 0.1 M ammonium bicarbonate in 7 M urea and glass beads (~100 mg per well) were added to the frozen pellet. Subsequently, the plates were sealed (Cap mats, Spex, no. 2201) and lysed in a bead beater for 5 min at 1,500 r.p.m. (Spex Geno/Grinder). After 1 min of centrifugation at 4,000 r.p.m., 20 μl of 55-mM DL-dithiothreitol was added (final concentration 5 mM) with mixing, and the samples were incubated for 1 h at 30 °C. Subsequently, 20 μl of 120 mM iodoacetamide was added (final concentration 10 mM) and incubated for 30 min in the dark at room temperature. One milliliter of 100-mM ammonium bicarbonate was added, centrifuged for 3 min at 4,000 r.p.m. and 230 μl was transferred to prefilled trypsin plates (9 μl of 0.1 μg μl–1 trypsin). After incubation of the samples for 17 h at 37 °C, 24 μl of 10% formic acid was added. The digestion mixtures were cleaned using C18 96-well plates (96-Well MACROSpin C18, 50–450 μl, The Nest Group, no. SNS SS18VL). For solid-phase extraction, 1-min centrifugation steps at the described speeds (Eppendorf Centrifuge 5810 R) were applied to force liquids through the stationary phase. A liquid handler (Biomek NXP) was used to pipette the liquids onto the material to facilitate four 96-well plates per batch. The plates were conditioned with methanol (200 μl, centrifuged at 50g), washed twice with 50% acetonitrile (ACN, 200 μl, centrifuged at 50g and flow-through discarded) and equilibrated three times with 3% ACN and 0.1% formic acid (200 μl, centrifuged at 50/80 and 100g, respectively, and flow-through discarded). Then, 200 μl of digested samples was loaded (centrifuged at 100g) and washed three times with 3% ACN and 0.1% formic acid (200 μl, centrifuged at 100g). After the last washing step, the plates were centrifuged once more at 180g before elution of peptides in three steps, twice with 120 μl and once with 130 μl of 50% ACN (180 g), into a collection plate (1.1 ml, square well, V-bottom). The collected material was completely dried on a vacuum concentrator (Eppendorf Concentrator Plus) and redissolved in 40 μl of 3% ACN and 0.1% formic acid before transfer to a 96-well plate (700 μl round, Waters, no. 186005837). QC samples for repeat injections were prepared by pooling 3 μl of each digested sample. All pipetting was done with a liquid handling robot (Biomek NXP automated liquid handler), shaking was performed with a thermomixer (Eppendorf Thermomixer C) after each step and, for incubation, a Memmert IPP55 incubator was used.

LC–MS

Liquid chromatography was performed on an Agilent Infinity II ultra-high-pressure system coupled to a Sciex TripleTOF 6600. Peptides were separated in reversed-phase mode using an InfinityLab Poroshell 120 EC-C18 at a column temperature of 30 °C. The dimensions of the columns were 2.1 mm internal diameter, 30 mm length and 1.9-μm particle size for the yeast drug screen, and 2.1 mm internal diameter, 50 mm length and 1.9-μm particle size for all other measurements. For K562 benchmarks, a gradient was applied that ramps from 3 to 36% buffer B in 5 min (buffer A: 1% acetonitrile and 0.1% formic acid; buffer B: acetonitrile and 0.1% formic acid) with a flow rate of 800 µl min–1. For washing the column, the flow rate was increased to 1 ml min–1 and the organic solvent was increased to 80% buffer B in 0.5 min, and was maintained for 0.2 min at this composition before reverting to 3% buffer B in 0.1 min. Subsequently the column was equilibrated for 2.1 min (Supplementary Table 10). An IonDrive Turbo V Source was used with ion source gas 1 (nebulizer gas), ion source gas 2 (heater gas) and curtain gas set to 50 psi, 40 psi and 25 psi, respectively. The source temperature was set to 450 °C and the ion spray voltage to 5,500 V.

For comparison of different gradient lengths (0.5, 1, 3 and 5 min), we applied linear gradients ramping from 3 to 36% buffer B (buffer A: 1% acetonitrile and 0.1% formic acid; buffer B: acetonitrile and 0.1% formic acid) with a flow rate of 800 µl min–1. For Scanning SWATH and conventional stepped SWATH the duty cycles were adjusted accordingly (Supplementary Tables 1 and 3). For conventional SWATH this was done by adjusting the number of variable windows to reach cycle times comparable to Scanning SWATH duty cycles (Supplementary Tables 2 and 3). For this particular comparison, the accumulation times of MS1 and MS/MS scans were 10 and 25 ms, respectively.

The 1-min gradients used for the measurement of patient samples were slightly adjusted: 3 µg of the digested proteins was injected and we applied linear ramping from 3 to 15% buffer B (buffer A: 1% acetonitrile and 0.1% formic acid; buffer B: acetonitrile and 0.1% formic acid) in 0.1 min, followed by linear ramping from 15 to 40% buffer B in 0.9 min (Supplementary Table 7 shows detailed gradient parameters).

For the yeast drug screen we reduced the column length to 3 cm (InfinityLab Poroshell 120 EC-C18, 2.1 mm internal diameter, 30 mm length and 1.9-μm particle size) and increased the flow rate during the column wash to 2.3 ml min–1, which reduced the method overhead time to 140 s (Supplementary Table 5 shows detailed gradient parameters).

Scanning SWATH settings, operation and calibration

The Scanning SWATH runs were acquired with a Scanning SWATH beta version. If not mentioned otherwise, the following settings were applied in the Scanning SWATH runs: the precursor isolation window was set to 10 m/z and a mass range of 400–900 m/z was covered in 0.5 s. These settings provided a compromise between identification and quantification performance. We optimized the window size on yeast (S. cerevisiae) whole-proteome tryptic digests and a 5-min, high-flow, water-to-acetonitrile gradient where we tested window sizes ranging from 3 to 20 m/z, covering a precursor range of 400–900 m/z. The best results in terms of identification and quantitative precision were achieved with a window size of 10 m/z (Supplementary Fig. 2a). Reducing the window size further would have resulted in even higher identification numbers due to reduced interference, but the resulting shorter effective accumulation times would have lowered quantitative precision. Raw data were binned in the quadrupole or precursor dimension into 2-m/z bins, providing a resolution in the Q1 dimension that allowed the effective use of Q1 scores. The MS1 scan was omitted for the benchmarks, and data were acquired in high-sensitivity mode.

The instrument control software calculates an radio frequency/direct current (RF/DC) ramp that was applied to quadrupole filter 1. The ramp is calculated from the experimental start transmission mass, stop transmission mass, transmission width and cycle time. The calculation uses previously acquired calibrations to calculate ramps for mass DACS and resolution DACS. The quadrupole start mass is calculated as experiment start mass minus transmission width, and the quadrupole stop mass as experiment stop mass plus transmission width. This allows for correct precursor profiles of all fragments at the boundaries of the experimental mass range. Collision energy is calculated using the +2 Rolling Collision energy equation based on the center masses for each transmission window. This results in a small collision energy spread depending on the width of the transmission window relative to the range being scanned. In these experiments the effect is typical around a spread of 1 eV for a given precursor.

The instrument acquisition software organizes ion detection responses into calculated 2-m/z precursor isolation bins given the current TOF pusher pulse number relative to the start of the scan applying the Scanning SWATH offset curve described above. The 2-m/z precursor isolation bins are organized in the data file as adjacent experiments, allowing for the extraction of precursor profiles for any given fragment in a given cycle by tracing fragment response across experiments, as well as normal chromatographic profiles across cycles.

The bins-to-sum (consolidation of data points in time-of-flight dimension) was set to 4 (4 × 25 picoseconds (ps) = 100 ps) for the K562 benchmark experiment, and to 8 (8 × 25 ps = 200 ps) for all other experiments.

Scanning SWATH calibration was obtained while processing each sample file from the sample data. An automated algorithm finds the maximum residual precursors for each transmission window across the entire sample. This results in several accurate mass TOF measurements paired with the centroid of the quadrupole mass traces per quadrupole transmission region, of which there are usually ten or more per 100 dalton (Da). For example, if the algorithm used the three best residual precursors across the LC for a given transmission region and the scan range was 500 Da with a transmission width of 10, there would be 500/10 × 3 = 150 calibration point pairs consisting of quadrupole mass and TOF accurate mass. Since it is possible that an intense peak within the quadrupole transmission region is not in fact a residual precursor, a selection algorithm filters out points using an outlier rejection algorithm that considers local variance. Typically a point is evaluated relative to its neighbors in a 50–100-Da region. Once a multipoint calibration curve is obtained, the calibration is applied to the data by updating the begin and end mass region defined in the header from each experiment stored, such that the center is calculated from the calibration function while maintaining continuity of boundaries in adjacent experiments.

Matching precursors to MS/MS fragment traces in DIA–NN

The DIA–NN method takes full advantage of the fourth dimension in Scanning SWATH data. In DIA–NN, a set of scores is calculated for each precursor–spectrum match (PSM), to distinguish true signals from noise using linear classifiers and an ensemble of deep NNs. DIA–NN also selects the ‘best’ fragment ion per PSM, as the one with the clearest signal, with other fragment ions then being assessed by comparing their MS2 traces to those of the best fragment22. Scores specifically related to Q1 profile assessment have now been added to DIA–NN algorithms. The Q1 profiles are extracted at the apex of the respective elution peak and the following scores are calculated. (1) Those that reflect the similarity of the Q1 profiles of the fragments and the nonfragmented precursor to the Q1 profile of the best fragment. One score is calculated as the sum of correlations between Q1 profiles of the fragments and the Q1 profile of the best fragment, as designated by DIA–NN during candidate elution peak identification22. The other score is the correlation of the Q1 profile of the nonfragmented precursor and the Q1 profile of the best fragment. (2) Scores that reflect how well Q1 profile shapes match the expected triangular shape. For each fragment, a score is calculated with values between 0 and 1, reflecting whether its Q1 profile increases monotonically to the left from the apex. These scores are then multiplied by the correlation between elution profiles of the fragments and the best fragment, and summed across all the fragments. A similar sum is calculated reflecting whether Q1 profiles are monotonically decreasing to the right from the apex. (3) The difference between the centroid of the Q1 profile of the best fragment and the library precursor mass is calculated. DIA–NN calculates the scores listed in (1) – (3) at three different scales by using the three, seven or 11 bins closest to the Q1 profile apex, respectively, yielding 3 × (2 + 2 + 1) = 15 scores in total (Supplementary Fig. 1 gives further details on the algorithm).

Only the monoisotopic fragment masses are used for Q1 profile assessment because the Q1 profiles of different fragment isotopologs are shifted relative to each other. We illustrate this for a doubly charged precursor (Supplementary Fig. 8). As one would expect, the Q1 profiles of the +1 13C and the +2 13C fragment isotopologs are shifted by ~0.5 and ~1 m/z to the monoisotopic mass, respectively. Depending on precursor mass and fragment mass, a small fraction of the monoisotopic fragments might also result from a +1 13C precursor isotope. This does slightly distort the Q1 profile of monoisotopic fragments but, as this distortion/shift is in the range of the mass accuracy of the quadrupole, it and its impact are negligible in practice.

Conventional DIA and SWATH runs (for benchmark)

The conventional 5-min SWATH method is based on one previously published23. To render it comparable to the developed Scanning SWATH method, we applied the same 0.5-s duty cycle and the same precursor mass range of 400–900 m/z as in the developed Scanning SWATH method. Each duty cycle consists of one MS1 scan with 20-ms accumulation time, and 17 MS/MS scans with variable windows (Supplementary Table 11) and 25-ms accumulation time.

The DIA–FAIMS data, acquired on an Evosep One LC system coupled to an Orbitrap Exploris 480, were downloaded from ProteomeXchange (dataset PXD016662). Triplicate runs with 500-ng HeLa tryptic digests loaded on a column (the highest load in this dataset), a compensation value of −45 V for FAIMS, a resolving power of 15,000 and a cycle time of 1 s were considered because these runs provided the best identification numbers while maintaining quantitative accuracy27. DIA–FAIMS data were analyzed with a project-specific library acquired on the same setup (PXD016662; ‘5min-library.kit’). For the analysis in DIA–NN, the library was exported from Spectronaut (v.13.12.200217.43655 (Laika)) with the ‘Export Spectral Library’ function and reannotated with the ‘Reannotate’ function in DIA–NN using the UniProt83 human canonical proteome (3AUP000005640). The DIA–FAIMS data were analyzed with Spectronaut (v.13.12.200217.43655 (Laika)) and DIA–NN but, as the identification numbers were higher with DIA–NN, we used these values for the benchmark.

Data processing and analysis

Raw data processing was carried out with DIA–NN v.1.7.12 and with default settings in ‘robust LC (high accuracy)’ mode. Protein quantities were obtained using the MaxLFQ algorithm84 as implemented in either DIA–NN (yeast drug screen) or the diann R package (https://github.com/vdemichev/diann-rpackage) (all other samples).

The data processing and batch correction for patient measurements were done as described previously23. Briefly, the report was filtered at 0.01 precursor-level q-value and 0.05 protein-group-level q-value. Intrabatch correction was performed for each peptide precursor separately and based on the sample preparation controls, using linear regression on the injection number. Linear regression was applied only for at least ten data points. Testing of the relation between log2-transformed protein levels and WHO severity grade (as classified according to the WHO ordinal scale48) was performed using Kendall’s Tau test as implemented in the EnvStats R package85 (adjusted P < 0.01, Benjamini–Hochberg for multiple testing82). Proteins were considered for differential expression analysis only when identified in at least 90% of individuals/patients.

For the analysis of yeast drug screen data, proteins were considered only if detected in >50% of the samples, and samples were removed if they had <80% of the maximum identification number across samples. Only proteins identified with proteotypic (that is, specific) peptides and 0.01 protein q-value were considered. The differential expression analysis (drug-treated versus DMSO-treated) for the yeast drug screen was done on the log2-transformed protein quantities using a t-test (two-sided), considering proteins detected in at least three out of the four replicates. The Benjamini–Hochberg procedure82 was used for multiple testing correction. Drugs were considered for the subsequent analysis only if they had >20 differentially expressed proteins, and samples treated with folic acid and FIN56 were excluded from the analysis because these do not belong to the three studied drug classes.

Coefficients of variation were calculated for each protein or precursor as its empirical standard deviation divided by its empirical mean, and are reported in percentages. CV values were calculated for proteins or precursors identified in at least two replicate measurements. PCA analysis was always performed only on ubiquitously identified proteins—imputation was not used. Heatmaps were generated with the ComplexHeatmap R package and default settings86. Pathway enrichment was performed with the clusterProfiler R package87 and wikipathways database88. Z-scores were calculated by dividing the (centered) protein quantities by their standard deviations. All plots were generated with R (v.3.6.3)89.

Spectral libraries

The libraries for the K562 benchmark experiments and for the yeast drug screen were generated from ‘gas-phase fractionation’ runs using Scanning SWATH and small precursor isolation windows. First, 5 µg of K562 cell lysate (Promega) or 5 µg of yeast digests was injected and run on a nanoAcquity ultra-performance LC (Waters) coupled to a SCIEX TripleTOF 6600 with a DuoSpray Turbo V source. Peptides were separated on a Waters HSS T3 column (150 mm × 300 µm, 1.8-µm particles) with a column temperature of 35 °C and a flow rate of 5 µl min–1. A 55-min linear gradient ramping from 3% ACN/0.1% formic acid to 40% ACN/0.1% formic acid was applied. The ion source gas 1 (nebulizer gas), ion source gas 2 (heater gas) and curtain gas were set to 15 psi, 20 psi and 25 psi, respectively. The source temperature was set to 75 °C and the ion spray voltage to 5,500 V. In total, 12 injections were run with the following m/z mass ranges: 400–450, 445–500, 495–550, 545–600, 595–650, 645–700, 695–750, 745–800, 795–850, 845–900, 895–1,000 and 995–1,200. The precursor isolation window was set to m/z 1 except for the mass ranges m/z 895–1,000 and m/z 995–1,200, where the precursor windows were set to m/z 2 and m/z 3, respectively. The cycle time was 3 s, consisting of high- and low-energy scans, and data were acquired in ‘high-resolution’ mode. The spectral libraries were generated using library-free analysis with DIA–NN directly from these Scanning SWATH acquisitions. For this DIA–NN analysis, MS2 and MS1 mass accuracies were set to 25 and 20 ppm, respectively, and scan window size was set to 6.

For the analysis of COVID-19 plasma samples, a project-independent public spectral library24 was used as described previously23. The Human UniProt83 isoform sequence database (3AUP000005640) was used to annotate the library. The library was first automatically refined based on the dataset in question at 0.01 global q-value (using the ‘Generate spectral library’ option in DIA–NN). DIA–NN performs such refinement by finding the highest-scoring identification for each library precursor across all runs in the experiment, and then replacing the library data with the empirically observed spectrum and retention time.

Empirical FDR estimation with two-species library

Because FDR calculations are software- and acquisition-mode-specific, thus potentially affecting benchmarking results, we also compared Scanning SWATH data with conventional stepped SWATH using the two-species library approach, which estimates true-positive calls in an unbiased fashion on the basis of an empirically measured FDR8,22. We augmented the human library with A. thaliana precursors, obtained from ProteomeXchange (dataset PXD012710, Arabidopsis proteome spectral library, ‘Arabidopsis_Library_TripleTOF5600_Spectronaut.xls’), as negative controls. Peptides that matched both the UniProt83 human canonical proteome (3AUP000005640) and the UniProt A. thaliana canonical proteome (3AUP000005648) were removed from the library. Spectra and retention times in the merged human/A. thaliana library were replaced with in silico predicted values whenever possible using the deep-learning-based prediction integrated in DIA–NN. Empirical FDR was estimated as previously described22. In short, empirical FDR is the ratio of A. thaliana precursors and human precursors identified multiplied by the ratio of human precursors and A. thaliana precursors in the library (only precursors ranging 400–900 m/z were considered).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Products You May Like

Leave a Reply

Your email address will not be published.