Volume 4 number 1 (03)

Original research

PERFORMANCE OF VISION TRANSFORMER ON GARBAGE IMAGE CLASSIFICATION

Pages 25-34

DOI 10.61552/JEMIT.2026.01.003

ORCID Nam Tran Quy


Abstract This study makes an evaluation of the performance on Vision Transformer model with size of 16x16 words (ViT 16x16) for classifying of garbage images. There are some convolutional neural network (CNN) with technique of transfer learning, namely VGG16, ResNet50, InceptionV3, EfficientNetB7, which are employed for comparison. In each implementation of respective model, the same techniques for image augmentation and hyper-parameters such as, optimizer, activation function and learning rate are employed as the same values among all models. The same dataset of garbage was also applied into those models with the similar splitting on dataset of training, validation and testing. The dataset with 12 different image labels with various kinds of garbage are employed. The experimental results on performance of all models brings the fact that the ViT 16x16 gave the best results at 92%, which is higher the second best model namely VGG16 at 86% and much higher than most of other pre-train models in evaluating garbage images classification.

Keywords: Garbage Image, Vision Transformer, Transfer Learning, Classification, Performance.

Recieved: 21.08.2024 Revised: 22.09.2024. Accepted: 21.10.2024.