Protein kinase activating mutations drive the development and metastasis of many cancers. However, the identification of these variations has been limited to direct experimental evidence or sequence-based comparisons with previously characterised mutations, impractical for functional characterisation and use in guiding clinical decisions.
We developed Kinact, an innovative method that combines experimental evidence on kinase point mutations with machine learning techniques to generate a robust predictive model. We manually curated a high quality dataset of 384 mutations across 42 different kinases with strong experimental evidence. The structural effects of these mutations on the wild-type residue environment, interatomic interactions and protein stability and flexibility were calculated, which along with residue conservation within the kinase family group, were used to train supervised learning algorithms. Kinact was able to accurately identify activating mutations, achieving a precision of 90% and AUC of 0.96 under 10-fold cross-validation, and a precision of 81% and AUC of 0.89 across blind tests, significantly outperforming the gold-standard methods, SIFT and PolyPhen-2 (p-value < 0.01).
Kinact provides a framework to better understand the role of kinase dysregulation in cancer. By systematically examining the effects of clinically observed kinase variants available in the COSMIC database of cancer somatic mutations using Kinact, we were able to identify key activating cancer variants. This highlights the importance of tools that would help the identification and understanding of these mutations. To facilitate the rapid characterisation of how variants are likely to affect kinase activity, we have made Kinact freely available as a user friendly and easy to use web server at <http://biosig.unimelb.edu.au/kinact/>.