Affinity estimation models of proteins for intelligent drug design based on pseudoconvolutions and nonlinear regressors

Authors

DOI:

https://doi.org/10.33448/rsd-v11i8.31222

Keywords:

Affinity Markers; Amino Acids, Peptides and Proteins; Artificial intelligence.

Abstract

Purpose: The emergence of new viruses and, consequently, new diseases make the rapid and precise design of new drugs increasingly necessary. With the availability of large databases of proteins and affinity measures, it is possible to build scoring functions for predicting molecular affinity. These functions are fundamental to intelligent drug design. Objective: In this work, we propose a scoring function to predict affinity between two proteins. The method is based on extracting features by transfer learning on sequences represented on pseudo-convolutions. Method: The pseudo-convolutions organize the sequences into base neighborhood distributions. Each distribution is then represented by an image. Two proteins are then transformed into two images that are concatenated together, forming the third image. Through deep transfer learning, this resulting image is then represented by a vector of attributes, which have dimensionality reduced by Random Forest. Finally, the vector of attributes reduced is applied to a regression learning machine that returns the degree of affinity of the two proteins. Results: We used the Affinity Benchmark Version 2 database. 145 complexes were used for model training and 35 for testing. The results showed a performance equal to or better than the state-of-the-art methods of evaluating protein affinity, considering the correlation coefficients of Pearson, Spearman and Kendall. The best results were 0.66, 0.70, and 0.52. Conclusion: The proposed method can characterize protein sequences so that the binding affinity between two proteins can be estimated without simulating the three-dimensional structure of the complex.

Author Biographies

Laila Barros Campos, Universidade de Pernambuco

Laila Campos has a degree in Electrical Engineering from the University of Pernambuco (2019). She was a monitor of Electromagnetism 1 for three semesters (2016.2, 2017.1 and 2017.2) at Escola Politécnica de Pernambuco. She did an internship in the area of renewable energy at the company Aeroespacial Tecnologia e Sistemas Renováveis Ltda for 1 year and 4 months (November 2016 to March 2018), working on the optimization of energy production in wind and solar parks using software such as Windographer for statistical analysis of meteorological data in addition to WindPRO and WindSim for layout planning. She currently works at Petrobras Transpetro S/A since July 2019, working on project improvements that aim to more autonomously query existing data in SAP business management software through the Python programming language and SQL databases.

Janderson Romário Borges da Cruz Ferreira, Universidade de Pernambuco

Janderson Ferreira is a PhD student in Computer Engineering at UPE, Brazil. Master in Computer Engineering from UPE, Brazil. Graduated in Computer Science FACAPE, Brazil. Sandwich period at the University of Santiago de Compostela - Campus Santiago. Spain. He works with research and consulting in the areas of Computer Vision, Machine Learning, Artificial Intelligence.

Wellington Pinheiro dos Santos, Universidade Federal de Pernambuco

Wellington Pinheiro dos Santos holds a degree in Electrical Electronic Engineering (2001) and a Master's degree in Electrical Engineering (2003) from the Federal University of Pernambuco, and a PhD in Electrical Engineering from the Federal University of Campina Grande (2009). He is currently Associate Professor (exclusive dedication) at the Department of Biomedical Engineering at the Center for Technology and Geosciences - School of Engineering of Pernambuco, Federal University of Pernambuco, working in the Undergraduate Program in Biomedical Engineering and in the Graduate Program in Biomedical Engineering, of which was one of the founders (2011). He founded the Center for Social Technologies and Bioengineering at the Federal University of Pernambuco, NETBio-UFPE (2012). He is also a member of the Graduate Program in Computer Engineering at Escola Politécnica de Pernambuco, Universidade de Pernambuco, since 2009. He has experience in the area of ​​Computer Science, with an emphasis on Graphic Processing (Graphics), working mainly on the following topics: digital image processing, pattern recognition, computer vision, evolutionary computing, numerical optimization methods, computational intelligence, image formation techniques, virtual reality, game design and applications of Computing and Engineering in Medicine and Biology. He is a member of the Brazilian Society of Biomedical Engineering (SBEB), of the Brazilian Society of Computational Intelligence (SBIC, ex-SBRN), and of the International Federation of Medical and Biological Engineering (IFMBE).

References

Baca-Carrasco, D.; Velasco-Hernández, J. X. (2016). Sex, mosquitoes and epidemics: an evaluation of zika disease dynamics. Bulletin of Mathematical Biology, 78 (11), 2228–2242.

Baldi, A. (2010). Computational approaches for drug design and discovery: An overview. Systematic reviews in Pharmacy, 1 (1), 99.

Ballester, P. J.; Mitchell, J. B. (2010). A machine learning approach to predicting protein– ligand binding affinity with applications to molecular docking. Bioinformatics, 26 (9), 1169–1175.

Breiman, L. (2001). Random forests. Machine learning, 45 (1), 5–32.

Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 ieee conference on computer vision and pattern recognition (p. 248-255).

Durrant, J. D.; McCammon, J. A. (2011). Nnscore 2.0: a neural-network receptor–ligand scoring function. Journal of chemical information and modeling, 51 (11), 2897–2903.

Gomes, J. C.; Masood, A. I.; Silva, L. H. d. S., da Cruz Ferreira, J. R. B., Júnior, A. A. F.; dos Santos Rocha, A. L.,; de Oliveira, L.C. P.; da Silva, N. R. C.; Fernandes, B. J. T.; Dos Santos, W. P. (2021). Covid-19 diagnosis by combining rt-pcr and pseudo-convolutional machines to characterize virus sequences. Scientific Reports,11 (1), 1–28.

Guedes, I. A.; Barreto, A. M. S.; Marinho, D.; Krempser, E.; Kuenemann, M. A.; Sperandio, O.; Dardenne, L. E.; Miteva, M. A. (2021). New machine learning and physics-based scoring functions for drug discovery. Scientific Reports, 11 (1), 3198.

Halgren, T. A. (1996). Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of Computational Chemistry, 17 (5-6), 490-519.

Hung, C.-L.; Chen, C.-C. (2014). Computational approaches for drug discovery. Drug development research, 75 (6), 412–418.

James, G.; Witten, D.; Hastie, T.; Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.

Katara, P. (2013). Role of bioinformatics and pharmacogenomics in drug discovery and development process. Network Modeling Analysis in Health Informatics and Bioinformatics, 2 (4), 225–230.

Khamis, M. A.; Gomaa, W.; Ahmed, W. F. (2015). Machine learning in computational docking. Artificial Intelligence in Medicine, 63 (3), 135–152.

Lai, T. L.; Robbins, H.; Wei, C. Z. (1978). Strong consistency of least squares estimates in multiple regression. Proceedings of the National Academy of Sciences of the United States of America, 75 (7), 3034.

Nicola, M.; Alsafi, Z.; Sohrabi, C.; Kerwan, A.; Al-Jabir, A.; Iosifidis, C.; Agha, M.; Agha, R. (2020). The socio-economic implications of the coronavirus pandemic (covid-19): A review. International journal of surgery, 78 , 185–193.

Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. (2017). Protein–ligand scoring with convolutional neural networks. Journal of chemical information and modeling, 57 (4), 942–957.

Shevade, S.; Keerthi, S.; Bhattacharyya, C.; Murthy, K. (2000). Improvements to the smo algorithm for svm regression. IEEE Transactions on Neural Networks, 11 (5), 1188-1193.

Tian, H.; Hu, S.; Cazelles, B.; Chowell, G.; Gao, L.; Laine, M.; Li, Y.; Yang, H.; Li, Y.; Yang, Q.; Tong, X.; Huang, R.; Bjornstad, O. N.; Xiao H.; Stenseth, N. C. (2018). Urbanization prolongs hantavirus epidemics in cities. Proceedings of the National Academy of Sciences, 115 (18), 4707–4712.

Trott, O.; Olson, A. J. (2010). Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31 (2), 455–461.

Vreven, T.; Moal, I. H.; Vangone, A.; Pierce, B. G.; Kastritis, P. L.; Torchala, M.; Chaleil, R.; Jiménez-García, B.; Bates, P. A.; Fernandez-Recio, J.; Bonvin, A. M. J. J.; Weng, Z. (2015). Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. Journal of molecular biology, 427 (19), 3031–3041.

Wang, C.; Zhang, Y. (2017). Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest. Journal of computational chemistry, 38 (3), 169–177.

Wójcikowski, M.; Ballester, P. J.; Siedlecki, P. (2017). Performance of machine-learning scoring functions in structure-based virtual screening. Scientific Reports, 7 (1), 1–10.

Zhang, Y.; Wang, Y.; Zhou, W.; Fan, Y.; Zhao, J.; Zhu, L.; Lu, S.; Lu, T.; Chen, Y.; Liu, H. (2019). A combined drug discovery strategy based on machine learning and molecular docking. Chemical Biology & Drug Design, 93 (5), 685–699.

Downloads

Published

24/06/2022

How to Cite

CAMPOS, L. B.; FERREIRA, J. R. B. da C.; SANTOS, W. P. dos. Affinity estimation models of proteins for intelligent drug design based on pseudoconvolutions and nonlinear regressors. Research, Society and Development, [S. l.], v. 11, n. 8, p. e40311831222, 2022. DOI: 10.33448/rsd-v11i8.31222. Disponível em: https://rsdjournal.org/index.php/rsd/article/view/31222. Acesso em: 16 apr. 2024.

Issue

Section

Health Sciences