Skip to main content
Article

Teacher-student collaborative knowledge distillation for image classification

Chuanyun XuCollege of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, ChinaWenjian GaoSchool of Artificial Intelligence, Chongqing University of Technology, Chongqing, 400054, ChinaTian LiComputer Science Department, RWTH Aachen University, Aachen, 52074, GermanyNanlan BaiSchool of Artificial Intelligence, Chongqing University of Technology, Chongqing, 400054, ChinaGang LiSchool of Artificial Intelligence, Chongqing University of Technology, Chongqing, 400054, ChinaYang ZhangCollege of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China
2022en
ABI

Abstract

A single model usually cannot learn all the appropriate features with limited data, thus leading to poor performance when test data are used. To improve model performance, we propose a teacher-student collaborative knowledge distillation (TSKD) method based on knowledge distillation and self-distillation. The method consists of two parts: learning in the teacher network and self-teaching in the student network. Learning in the teacher network allows the student network to use knowledge from the teacher network. Self-teaching in the student network is to build a multi-exit network based on self-distillation and provide deep features as supervised information for training. In the inference stage, we use ensembles to vote on the classification results of multiple sub-models in the student network. The experimental results demonstrate the superior performance of our method compared with a traditional knowledge distillation method and a self-distillation-based multi-exit network.

Identifiers

Citations and references

Cited by 20 references