Background The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). Method A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. Results The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. Conclusions The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains.
【저자키워드】 deep learning, machine learning, classification, Hybrid learning, Representation learning,