Converting the genomic knowledge base to build protein specific machine learning prediction models; a classification study on thermophilic serine protease

Jithin S. Sunny, Atul Kumar, Khairun Nisha, Lilly M. Saleena

Forskningsoutput: TidskriftsbidragArtikel i vetenskaplig tidskriftPeer review

Sammanfattning

Several machine learning models have been formulated for protein classification based on an important prerequisite for industrial usage, thermostability, and described herein a classification model for a specific enzyme; serine protease. For building the classifier, 283 thermophilic and 200 mesophilic bacterial genomes were mined for their respective serine protease sequences. Features were extracted from 760 sequences, followed by feature selection. We deployed a random forest-based classifier that identified thermophilic and non-thermophilic serine proteases with an accuracy of 97.11%, higher than other benchmark machine learning methods. Knowledge of thermostability and amino acid positional shifts can be vital for downstream protein engineering techniques. Thus, a web platform has been proposed to emphasize the real-time application of this enzyme-specific classification model. We designed a framework that can aid protein engineers in combining their sequence data and the classification model and employ it to align query sequences against the custom databases and identify similar novel enzymes along with their thermophilic nature.

Originalspråkengelska
Sidor (från-till)3615-3622
Antal sidor8
TidskriftBiologia
Volym77
Nummer12
DOI
StatusPublished - 2022 dec.

Ämnesklassifikation (UKÄ)

  • Medicinsk genetik

Fingeravtryck

Utforska forskningsämnen för ”Converting the genomic knowledge base to build protein specific machine learning prediction models; a classification study on thermophilic serine protease”. Tillsammans bildar de ett unikt fingeravtryck.

Citera det här