TY  - THES
T1  - CoVirNet a multimodal web-based book genre classification system using a dual-color-space input CNN-ViT architecture
A1  - Acompañado, Emiline Barcent Jloise S.
A2  - Yusiong, John Paul T.
LA  - English
UL  - https://tuklas.up.edu.ph/Record/UP-8027390931312009872
AB  - Book genres are not always clearly defined and classifying them based solely on visual or textual patterns can be unreliable. While recent models attempt to improve genre classification by combining both cues, key limitations remain in how input representations and model architectures are designed. This paper proposes CoVirNet-a multimodal classification system that predicts a book's genre using its cover image and title. The model processes two color space representations of the image through a hybrid architecture, where a Convolutional Neural Network (CNN) and a Vision Transformer (VIT) operate in parallel, while the book title is processed using a BERT-based encoder. Evaluated on the BookCover30 dataset, CoVirNet outperforms both the state-of-the-art results and its baseline variants, achieving 65.23% Top-1 accuracy and 84.70% Top-3 accuracy. These results underscore the benefits of color space fusion, architectural hybridization, and deep text modeling in improving multimodal book genre classification. 
CN  - LG 993.5 2025 C66 A36
KW  - Machine learning.
ER  -