Kinase-substrate relationship inference bioinformatics tools | Protein interaction analysis - omicX
The knowledge about kinase-substrate relationships, compiled in databases like .. many utilities from this library, we would strongly advice the reader to have a. kinase substrate efficacy is a reliable barometer for suc- cessful inhibitor . relationship between Km and enzyme affinity in the presence of. ATP. The kinetic .. of Chemistry, San Diego State University) for helpful advice. answer these questions, the field of identifying kinase substrate relationships . While the advice and information in this book are believed to be true.
Introduction Protein kinases are major effectors of cellular signaling, in the context of which they form a highly complex and tightly regulated network that can sense and integrate a multitude of external stimuli or internal cues. Deregulation of kinase signaling can lead to severe diseases and is observed in almost every cancer [ Hanahan and Weinberg, ]. For instance, a single constitutively active kinase, originating from the fusion of the BCR and ABL genes, can give rise to and sustain chronic myeloid leukemia [ Sawyers, ].
Accordingly, the small molecule inhibitor of the BCR-ABL kinase, Imatinib, has shown unprecedented therapeutic effectiveness in affected patients [ Sawyers et al.
Fuelled by these promising clinical results, due to the essential role for kinases in the patho-mechanism of cancer, and because kinases are in general pharmacologically tractable [ Zhang et al. However, not all eligible patients respond equally well, and in addition, cancers often develop resistance to initially successful therapies. This calls for a deeper understanding of kinase signaling and how it can be exploited therapeutically [ Cutillas, ]. By definition, the activity of a kinase is reflected in the occurrence of phosphorylation events catalyzed by this kinase.
Thus, analysis of kinase activity was traditionally achieved by monitoring the phosphorylation status of a limited number of sites known to be targeted by the kinase of interest using immunochemical techniques [ Bertacchini et al. This, however, requires substantial prior-knowledge and yields a comparably low throughput.
Other approaches exist, e. High-coverage phosphoproteomics data should indirectly contain information about the activity of many active kinases. The high-content nature of phosphoproteomics data, however, poses challenges for computational analysis. For example, only a small subset of the described phosphorylation sites can be explicitly associated with functional impact [ Beltrao et al.
As a means to extract functional insight, methods to infer kinase activities from phosphoproteomics data based on prior-knowledge about kinase-substrate relationships have been put forward [ Qi et al. The knowledge about kinase-substrate relationships, compiled in databases like PhosphoSitePlus [ Hornbeck et al. ELM [ Dinkel et al. Alternatively, computational resources to predict kinase-substrate relationships based on kinase recognition motifs and contextual information have been used to enrich the collections of substrates per kinase [ Horn et al.
The inferred kinase activities can in turn be used to reconstruct kinase network circuitry or to predict therapeutically relevant features such as sensitivity to kinase inhibitor drugs [ Casado et al. In this chapter, we start with a brief description of phosphoproteomics data acquisition, highlighting challenges for the computational analysis that may arise out of the experimental process. Subsequently, we will present different computational methods for the estimation of kinase activities based on phosphoproteomics data, preceded by the kinase-substrate resources these methods use.
One of these methods, namely KSEA Kinase-Substrate Enrichment Analysiswill be explained in more detail in the form of a guided, stepwise protocol, that is available as part of the Python open-source Toolbox kinact Toolbox for Kinase Activity Scoring at http: Phosphoproteomics Data Acquisition For a summary of technical variations or available systems for the experimental setup of phosphoproteomics data acquisition, we would like to refer the interested reader to dedicated publications such as [ Riley and Coon, ; Nilsson, ].
We provide here a short overview about the experimental process to facilitate the understanding of common challenges that may arise for the data analysis that we will focus on.
Mass spectrometry-based detection of peptides with post-translational modifications PTM usually requires the same steps, independent of the modification of interest: After the experimental work, additional data processing steps are required to identify the position of the modification, e.
For almost every step, different protocols are available, starting from various proteases for protein digestion to different data acquisition methods for MS [ Riley and Coon, ]. Phosphopeptide Enrichment Naturally, the enrichment of phosphopeptides is a pivotal step for phosphoproteomics. Next to the enrichment method used, the choice of the protease [ Giansanti et al.
For phosphopeptide enrichment, the field is dominated by immobilized metal affinity chromatography IMAC and metal oxide affinity chromatography MOACwhich all exploit the affinity of the phosphorylation towards metal ions. Alternatively, more traditional biochemical methods involving immunoaffinity purification are also in use for enrichment of phosphopeptides, although these are generally limited to studies of phosphotyrosine [ Rush et al.KINASES & PHOSPHATASES: CELL CYCLE & CANCER
Of note, the different enrichment methods show limited overlap in the detected phosphopeptides, although this can also be observed for replicates of runs using the identical enrichment method, as discussed below [ Ruprecht et al. Variations in the chromatography method used as well as the multitude of mass spectrometry instrument types are reviewed in detail elsewhere [ Riley and Coon, ]. Generally, the quality of the chromatographic separation will have a big impact on the number of phosphopeptides that can confidently be identified.
Chromatography runs of higher quality also take more time, so that a trade-off between resolution and throughput must be devised for each experiment. Therein, precursor ions from a first survey scan are selected -typically based on relative ion abundance- in order to generate fragmentation spectra in a second MS run [ Domon and Aebersold, ], for which a database search yields the corresponding peptide sequences [ Nesvizhskii, ].
As a result, peptide detection in DDA is on the one hand biased towards high abundance species, but also considerably irreproducible due to stochastic precursor ion selection [ Liu et al. However, this problem may be solved to some extent by extracting ion chromatograms of the peptides that are missing in some of the runs that are being compared [ Cutillas and Vanhaesebroeck, ; Cutillas et al.
This targeted approach overcomes many of the issues of shotgun methods, but is usually not feasible for large-scale investigation of the complete phosphoproteome.
Data-independent acquisition DIAe. However, the spectra generated by DIA-MS are usually highly complex and require intricate data extraction techniques, which is even more challenging for modified peptides.
Recently, a computational resource for the detection of modified peptides has been put forward [ Keller et al.
Overall, the available methods for DIA have as yet to mature in order to challenge the use of DDA in large-scale studies of the phosphoproteome [ Riley and Coon, ].
Phosphoproteomics-Based Profiling of Kinase Activities in Cancer Cells
Quantitative Phosphoproteomics As for regular proteomics, several experimental methods or post-acquisition tools exist to quantitate detected phosphopeptides. Those can roughly be divided into isotope labeling and label-free quantitation. In general, stable isotope labeling requires more experimental effort than label-free quantitation, but at the same time enables multiplexing of samples with different isotopes or combinations.
Stable isotope labeling by metabolic incorporation of amino acids SILAC is mainly used for cell cultures, in the medium of which different stable isotopes are provided that will be incorporated into the proteins of the cells. At the point of analysis, cell extracts are mixed and then jointly investigated with mass spectrometry.
Mass differences between peptide pairs due to the isotopic labeling can be exploited for relative quantitation [ Ong et al. Currently, up to three conditions light, medium, heavy can be multiplexed. Chemical modification of peptides with tandem mass tags TMT or isobaric tags for relative and absolute quantitation iTRAQ are two different methods based on tags with reactive groups that bind to peptidyl amines in the peptides after protein digestion. In the first MS run, the peptides with different isobaric tags are indistinguishable, but upon fragmentation in the second MS run, each tag generates a unique reporter ion fragmentation spectrum, which can be used for relative quantitation of the tagged peptides [ Thompson et al.
Label-free quantitation LFQon the other hand, relies mainly on post-acquisition data analysis, so that no modification of the essential experimental workflow needs to be implemented. Comparison of an -in theory- unlimited number of different samples is therefore possible, which is associated with the downside of prolonged analysis time as multiplexing samples is not possible.
While label-free approaches usually provide a deeper coverage of the proteome than label-based methods, the reproducibility and precision of quantification are inferior, so that more technical replicates are needed for confident quantification in LFQ [ Li et al. Typically, label-free quantitation is achieved by integration of peak area measurements, i.
For the case of phosphoproteomics, in contrast to regular proteomics, an additional challenge for quantitation arises from the fact that information from different peptides of the same protein cannot be integrated.
While in regular proteomics the abundances of every peptide in the protein can be combined, the quantitation of a single phosphosite depends on direct measurements of peptides with the specific modification. This may give rise to problems for subsequent analysis, if this analysis is conducted on protein rather than on phosphosite level.
Several of such search engines now exist; popular ones include Mascot, Sequest, Protein Prospector, and Andromeda [ 54 — 57 ]. The false discovery rate FDR may be determined by performing parallel searches against scrambled or reversed protein databases containing the same number of sequences as the authentic protein database.
The FDR is then calculated as the ratio of positive peptide identifications in the decoy database divided by those derived from the forward search. Deriving peptide sequences with these methods is a relatively straightforward process. However, site localization can be a problem when peptide sequences contain more than one amino acid residue that can be phosphorylated.
To address this problem, several methods to determine precise localization of phosphorylation within a phosphopeptide have been published. Ascore uses a probabilistic approach to assess correct site assignment [ 58 ] and the algorithm has been applied alongside the Sequest search engine.
The Mascot delta score, introduced by the Kuster group, simply determines the differences in Mascot scores between the different possibilities for phosphosite localization within a phosphopeptide [ 59 ]. The larger the delta score, the greater the probability of correct site assignment.
Other similar methods have been published [ 60 ] and some of them are now incorporated into search engines [ 61 ]. The output of the phosphopeptide identification step generally contains scores for both the probability of correct peptide sequence identification and phosphosite localization.
Pitfalls in the Analysis of MS-Based Phosphoproteomics Data Although the available experimental methods for MS-based phosphoproteomics data acquisition have evolved considerably over the last years, leading to a steadily increasing number of detected phosphosites, several limitations remain for the investigation of signaling processes using phosphoproteomics data.
While it has been estimated that there are aroundphosphorylation sites in the human proteome [ 62 ], the number of phosphosites that can be identified in a single MS experiment usually ranks around 10, to up to 40, [ 63 ].
Therefore, the sampled phosphoproteomic picture is incomplete.
Kinase-substrate relationship inference software tools | Protein interaction data analysis
It has to be taken into account though, that, not all possible phosphorylation sites are expected to be modified at the same time point. This is caused by context-dependent regulation of phosphosites. For example, some phosphosites are controlled differentially at different cell cycle stages, while others only change under specific external stimulation such as growth factors or other effector molecules [ 6465 ]. The hope is therefore that a significantly larger portion of phosphosites could be mapped with improving technology and by increasing the diversity of biologically relevant conditions analyzed.
So far though, in different MS runs or replicates, usually a distinct set of phosphosites is detected, as the selection of precursor ions is stochastic.
This leads to incomplete datasets with a high number of missing data points, challenging computational investigation of the data such as clustering or correlation analysis. However, as discussed above, approaches in which phosphopeptide intensities are compared across MS run post-acquisition minimize this problem [ 38 ]. The functional impact of a phosphorylation event is known only in the minority of cases [ 15 ]. Indeed, it has been hypothesized that a substantial fraction of phosphorylation sites are non-functional [ 66 ], since phosphorylation sites tend to be poorly conserved throughout species [ 67 ].
Deciphering kinase-substrate relationships by analysis of domain-specific phosphorylation network.
Although approaches to studying the function of individual phosphorylation events have been proposed [ 68 ], it may be that a large part of the detected phosphosites serves no function at all. Thus, non-functional sites add a substantial amount of noise to phosphoproteomics data and complicate the computational analysis. The inference of kinase activity from phosphoproteomics data that will be described in the next section aims to overcome these limitations, by the integration of the information from many phosphosites, along prior knowledge on kinases-substrate relationships, into a single measure for the kinase activity.
It is important though to keep in mind that any bias in the experimental workflow will affect these scores. In particular, since highly abundant precursor ions are more likely to be selected for fragmentation and therefore detection, targets of upregulated kinases are more probably detected.
Therefore, highly active kinases will be preferentially detected, although downregulated kinases may be identified when comparing different conditions. Computational Methods for Inference of Kinase Activity Traditionally, biochemical methods have been common to study kinase activities in vitro and are still broadly used [ 6970 ]. However, on the one hand those methods are generally limited in throughput and time-consuming. On the other hand in vitro methods might not accurately reflect the in vivo activities of kinases in a specific cellular context.
MS-based methods have also been applied for assaying kinase activity [ 910 ]. Here, the abundances of known target phosphosites are monitored by MS after an in vitro enzymatic reaction. Since every phosphorylation event results—by definition—from the activity of a kinase, phosphoproteomics data should be suitable to infer the activity of many kinases from a comparably low experimental effort.
This task requires computational analysis of the detected phosphorylation sites phosphositessince thousands of phosphosites can routinely be measured in a single experiment. Several methods have been proposed in recent years, all of which utilize prior knowledge about kinase-substrate interactions, either from curated databases or from information about kinase recognition motifs.
Resources for Kinase-Substrate Relationships As the large-scale detection of phosphorylation events using mass spectrometry became routine, many freely available databases that collect experimentally verified phosphosites have been set up, including PhosphoSitePlus [ 20 ], Phospho.
The databases differ in size and aim; PHOSIDA for example provides a tool for the prediction of putative phosphorylation sites and recently also added acetylation and other posttranslational modification sites to its scope.
ELM computes a score for the conservation of a phosphosite. Signor is focused on interactions between proteins participating in signal transduction. PhosphoNetworks [ 73 ] is dedicated to kinase-substrate interactions, but the information is on the level of proteins, not phosphosites. The arguably most prominent database for expert-edited and curated interactions between kinases and individual phosphosites that have not been derived from in vitro studies is PhosphoSitePlus, currently encompassing 16, individual kinase-substrate relationships .
Also in the Phospho. As it has been estimated that there are between[ 76 ] and[ 62 ] possible phosphosites in the human proteome, the evident low coverage of the curated databases motivated the development of computational tools to predict in vivo kinase-substrate relationships. These methods identify putative new kinase-substrate relationships based on experimentally derived kinase recognition motifs, which was pioneered by Scansite [ 77 ] that uses position-specific scoring matrices PSSMs obtained by positional scanning of peptide libraries [ 78 ] or phage display methods [ 79 ].
Another approach, Netphorest [ 80 ] tries to classify phosphorylation sites according to the relevant kinase family instead of predicting individual kinase-substrate links.
However, the in vitro specificity of kinases differs significantly from the kinase activity inside of the cell, biasing the experimentally identified kinase recognition motifs [ 81 ]. The integration of contextual information, for example co-expression, protein-protein interactions, or subcellular colocalization, markedly improves the accuracy of the predictions [ 69 ]. The software packages NetworKIN [ 82 ] recently extended in the context of the resource KinomeXplorer [ 22 ], correcting for biases caused by over-studied proteins and iGPS [ 23 ] are examples for methods that combine information about kinase recognition motifs, in vivo phosphorylation sites, and contextual information, e.
Recently, Wagih et al. Based on the assumption that functional interaction partners of kinases derived from the STRING database are more likely to be phosphorylated by the respective kinase, they should therefore contain an amino acid motif conferring kinase specificity.
This can then be uncovered by motif enrichment. The described methods provide predictions that are very valuable but not free from error, for example due to the described differences in in vitro and in vivo kinase specificity or the influence of subcellular localization.
Thus, the predicted kinase-substrate interactions should be considered hypotheses to be tested experimentally [ 85 ]. To efficiently utilize local sequence information and functional information, we develop multiple kernels using the radial basis function RBF Scholkopf et al.
Subsequently, we use MKL to combine multiple kernels and build a support vector machine SVM model using the combined kernel. The comparative analysis is based on fold cross-validation process and the collected data from the Phospho.
ELM Dinkel et al. Furthermore, with an independent test dataset extracted from the PhosphoSitePlus Hornbeck et al. The results show that ksrMKL has better prediction performance than these existing tools. Materials and Methods Data collection and preparation In this study, we adopt an experimental identification of phosphorylation sites in human with kinase information dataset, including 1, unique phosphorylation sites in substrates collected from the latest version of Phospho.
In terms of a specific kinase, the phosphorylation sites that are known to be modified by this kinase are considered as positive samples, and the phosphorylation sites that are not known to be modified by this kinase are used as negative samples. To ensure reliable results Li et al. The detailed information of this dataset is summarized in Table S1. Besides, local sequences of the corresponding phosphorylation sites are also extracted containing seven residues upstream and seven residues downstream.
The length local sequence is converted to a dimensional vector. In addition, several recent studies Fan et al.