Phosphorylation site report from a DIA-MS experiment
Skowronek et al. 2022 reported a large difference between the performance of Spectronaut and DIA-NN. For DIA-MS-based phosphoproteomics, one of the issues was that it was not easy to extract a phosphosite report from the different processing tools. In Pham et al. 2024, we presented a software tool called sitereport
in the Python package msproteomics to extract a phosphosite report for both Spectronaut and DIA-NN, allowing for a comparative assessment of their performances.
We are curious about the new number of phosphosites for the newly released DIA-NN 2.0.2!
To that end, we analyze the same dataset consisting of 4 timsTOF DIA-MS runs from Skowronek et al. by DIA-NN 2.0.2 using the same spectral library. The --export-quant
switch is turned on. The result can be downloaded here.
We then convert the DIA-NN parquet output to a tab-delimited text format. This can be done in R as in our previous blogpost. Nevertheless, we have added a script in the latest release of the msproteomics
package, version 1.1.0, for convenience. Note that installation of an additional Python package pyarrow
is required for this step.
python -m pip install pyarrow
diann_parquet_to_tsv -i report.parquet -o report.tsv
Then, we continue the processing as before
read_diann -o report_msproteomics.tsv -E Intensities -f uniprot-reviewed_yes_AND_organism__Homo_sapiens__Human___9606___.fasta report.tsv
sitereport report_msproteomics.tsv
The Venn diagram shows the number of phosphosites returned by the three different software packages/versions and their overlaps. DIA-NN 2.0.2 detects considerably more phosphosites than DIA-NN 1.8.2 beta 27, 23440 sites versus 19730 sites respectively. Note that the number of phosphosites includes multiplicities as in MaxQuant and Spectronaut. Please refer to our publication for detail.
Other filtering
According the discussion here, another filtering for DIA-NN consists of
- both lib & global q-value filter at 0.5 (that is 50%)
- both lib & global peptidoform q-value filter at 0.5
- run-specific q-value filter 0.01
- run-specific PEP filter 0.01
- run-specific peptidoform q-value filter 0.01
- if using channels, run-specific channel q-value filter 0.01
We can apply this filtering on the DIA-NN report prior to creating the tab-delimited text file. We will use R for this step
<- arrow::read_parquet("report.parquet")
raw
$Intensities <- paste(raw$Fr.0.Quantity, raw$Fr.1.Quantity, raw$Fr.2.Quantity,
raw$Fr.3.Quantity, raw$Fr.4.Quantity, raw$Fr.5.Quantity,
raw$Fr.6.Quantity, raw$Fr.7.Quantity, raw$Fr.8.Quantity,
raw$Fr.9.Quantity, raw$Fr.10.Quantity, raw$Fr.11.Quantity,
rawsep = ";")
<- raw$Lib.Q.Value <= 0.5 &
selected $Global.Q.Value <= 0.5 &
raw$Lib.Peptidoform.Q.Value <= 0.5 &
raw$Global.Peptidoform.Q.Value <= 0.5 &
raw$Q.Value <= 0.01 & raw$PEP <= 0.05 &
raw$Peptidoform.Q.Value <= 0.01
raw
write.table(raw[selected, ], "report-filtered.tsv", sep = "\t",
row.names = FALSE, quote = FALSE)
Then we process the filtered data as before, but without the default filters. This can be done by setting the values ‘none’ to the filters as follows.
read_diann -o report_msproteomics-filtered.tsv -E Intensities -f uniprot-reviewed_yes_AND_organism__Homo_sapiens__Human___9606___.fasta report-filtered.tsv
sitereport report_msproteomics-filtered.tsv -output_site new_filtering_site.tsv -output_peptide new_filtering_peptide.tsv -site_filter_double_less none none -site_filter_double_greater none none -peptide_filter_double_less none none
The number of sites with the new filtering is slightly less than the default msproteomics filtering. The a large number of overlapping sites suggests that the two filtering approaches are highly similar. Note that this is much higher than the number reported by DIA-NN site report, report.phosphosites_90.tsv (14441 sites) and report.phosphosites_99.tsv (12467 sites). This is most likely because msproteomics sitereport uses phosphorylation multiplicities.