2019: Article in press
Research Article

Multi-Site Protein Subcellular Localization Based on Deep Convolutional Neural Network

Cong H
School of Information Science and Engineering, Shandong Normal University, Jinan, China
Liu H
Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
Chen Y
School of Information Science and Engineering University of Jinan, Jinan, China
Zhao Y
Shandong Provincial Key Laboratory of Network Based Intelligent Computing University of Jinan, Jinan, China
Wang L
Shandong Provincial Key Laboratory of Network Based Intelligent Computing University of Jinan, Jinan, China
Published February 6, 2019
Keywords
  • Multi-site subcellular localization,
  • Deep convolution neural network,
  • Ensemble learning,
  • Features fusion

Abstract

Each part of internal structure of cells which is commonly mentioned as subcellular is highly ordered and interconnected has unique functions. The experiments show that deviated protein delivery to the corresponding subcellular causes of human disease. Studies of protein localization can clarify pathogenesis and find treatments. As protein subcellular localization has a very important position in the field of biology, the research in this area is extremely active. Most of the existing protein sub cellular localization methods are more suitable for single-site sub cellular localization. This paper proposed an algorithm based deep convolution neural network which is suit for multi-site protein subcellular localization and the algorithm is implemented on the human protein database to verify and analyze the performance. In order to further improve the classification result of the algorithm, it was combined ensemble learning and features fusion. It can be inferred from experiments that the proposed algorithm is effective in multi-site protein subcellular localization and the overall correct rate of classification is 59.13% which is higher than SAE, SVM and RF. The algorithm proposed in this paper is more uniform and less affected by the number of samples. When the data samples are different, the classification results will have a certain impact, but the overall classification is good. Besides ensemble learning and features fusion are effective for improving classification result.