Affinity estimation models of proteins for intelligent drug design based on pseudoconvolutions and nonlinear regressors
DOI:
https://doi.org/10.33448/rsd-v11i8.31222Keywords:
Affinity Markers; Amino Acids, Peptides and Proteins; Artificial intelligence.Abstract
Purpose: The emergence of new viruses and, consequently, new diseases make the rapid and precise design of new drugs increasingly necessary. With the availability of large databases of proteins and affinity measures, it is possible to build scoring functions for predicting molecular affinity. These functions are fundamental to intelligent drug design. Objective: In this work, we propose a scoring function to predict affinity between two proteins. The method is based on extracting features by transfer learning on sequences represented on pseudo-convolutions. Method: The pseudo-convolutions organize the sequences into base neighborhood distributions. Each distribution is then represented by an image. Two proteins are then transformed into two images that are concatenated together, forming the third image. Through deep transfer learning, this resulting image is then represented by a vector of attributes, which have dimensionality reduced by Random Forest. Finally, the vector of attributes reduced is applied to a regression learning machine that returns the degree of affinity of the two proteins. Results: We used the Affinity Benchmark Version 2 database. 145 complexes were used for model training and 35 for testing. The results showed a performance equal to or better than the state-of-the-art methods of evaluating protein affinity, considering the correlation coefficients of Pearson, Spearman and Kendall. The best results were 0.66, 0.70, and 0.52. Conclusion: The proposed method can characterize protein sequences so that the binding affinity between two proteins can be estimated without simulating the three-dimensional structure of the complex.
References
Baca-Carrasco, D.; Velasco-Hernández, J. X. (2016). Sex, mosquitoes and epidemics: an evaluation of zika disease dynamics. Bulletin of Mathematical Biology, 78 (11), 2228–2242.
Baldi, A. (2010). Computational approaches for drug design and discovery: An overview. Systematic reviews in Pharmacy, 1 (1), 99.
Ballester, P. J.; Mitchell, J. B. (2010). A machine learning approach to predicting protein– ligand binding affinity with applications to molecular docking. Bioinformatics, 26 (9), 1169–1175.
Breiman, L. (2001). Random forests. Machine learning, 45 (1), 5–32.
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 ieee conference on computer vision and pattern recognition (p. 248-255).
Durrant, J. D.; McCammon, J. A. (2011). Nnscore 2.0: a neural-network receptor–ligand scoring function. Journal of chemical information and modeling, 51 (11), 2897–2903.
Gomes, J. C.; Masood, A. I.; Silva, L. H. d. S., da Cruz Ferreira, J. R. B., Júnior, A. A. F.; dos Santos Rocha, A. L.,; de Oliveira, L.C. P.; da Silva, N. R. C.; Fernandes, B. J. T.; Dos Santos, W. P. (2021). Covid-19 diagnosis by combining rt-pcr and pseudo-convolutional machines to characterize virus sequences. Scientific Reports,11 (1), 1–28.
Guedes, I. A.; Barreto, A. M. S.; Marinho, D.; Krempser, E.; Kuenemann, M. A.; Sperandio, O.; Dardenne, L. E.; Miteva, M. A. (2021). New machine learning and physics-based scoring functions for drug discovery. Scientific Reports, 11 (1), 3198.
Halgren, T. A. (1996). Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of Computational Chemistry, 17 (5-6), 490-519.
Hung, C.-L.; Chen, C.-C. (2014). Computational approaches for drug discovery. Drug development research, 75 (6), 412–418.
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.
Katara, P. (2013). Role of bioinformatics and pharmacogenomics in drug discovery and development process. Network Modeling Analysis in Health Informatics and Bioinformatics, 2 (4), 225–230.
Khamis, M. A.; Gomaa, W.; Ahmed, W. F. (2015). Machine learning in computational docking. Artificial Intelligence in Medicine, 63 (3), 135–152.
Lai, T. L.; Robbins, H.; Wei, C. Z. (1978). Strong consistency of least squares estimates in multiple regression. Proceedings of the National Academy of Sciences of the United States of America, 75 (7), 3034.
Nicola, M.; Alsafi, Z.; Sohrabi, C.; Kerwan, A.; Al-Jabir, A.; Iosifidis, C.; Agha, M.; Agha, R. (2020). The socio-economic implications of the coronavirus pandemic (covid-19): A review. International journal of surgery, 78 , 185–193.
Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. (2017). Protein–ligand scoring with convolutional neural networks. Journal of chemical information and modeling, 57 (4), 942–957.
Shevade, S.; Keerthi, S.; Bhattacharyya, C.; Murthy, K. (2000). Improvements to the smo algorithm for svm regression. IEEE Transactions on Neural Networks, 11 (5), 1188-1193.
Tian, H.; Hu, S.; Cazelles, B.; Chowell, G.; Gao, L.; Laine, M.; Li, Y.; Yang, H.; Li, Y.; Yang, Q.; Tong, X.; Huang, R.; Bjornstad, O. N.; Xiao H.; Stenseth, N. C. (2018). Urbanization prolongs hantavirus epidemics in cities. Proceedings of the National Academy of Sciences, 115 (18), 4707–4712.
Trott, O.; Olson, A. J. (2010). Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31 (2), 455–461.
Vreven, T.; Moal, I. H.; Vangone, A.; Pierce, B. G.; Kastritis, P. L.; Torchala, M.; Chaleil, R.; Jiménez-García, B.; Bates, P. A.; Fernandez-Recio, J.; Bonvin, A. M. J. J.; Weng, Z. (2015). Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. Journal of molecular biology, 427 (19), 3031–3041.
Wang, C.; Zhang, Y. (2017). Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest. Journal of computational chemistry, 38 (3), 169–177.
Wójcikowski, M.; Ballester, P. J.; Siedlecki, P. (2017). Performance of machine-learning scoring functions in structure-based virtual screening. Scientific Reports, 7 (1), 1–10.
Zhang, Y.; Wang, Y.; Zhou, W.; Fan, Y.; Zhao, J.; Zhu, L.; Lu, S.; Lu, T.; Chen, Y.; Liu, H. (2019). A combined drug discovery strategy based on machine learning and molecular docking. Chemical Biology & Drug Design, 93 (5), 685–699.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Laila Barros Campos; Janderson Romário Borges da Cruz Ferreira; Wellington Pinheiro dos Santos
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.