CLASS-DEPENDENT AND CROSS-MODAL MEMORY NETWORK CONSIDERING SENTIMENTAL FEATURES FOR VIDEO-BASED CAPTIONING

Class-dependent and cross-modal memory network considering sentimental features for video-based captioning

Class-dependent and cross-modal memory network considering sentimental features for video-based captioning

Blog Article

The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better.This paper aims to consider the importance of cross-modal mapping.We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation.

Firstly, we develop class-dependent memory for recording the alignment shamrock belt buckle between video features and text.It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels.Then, to understand the sentiments conveyed in the videos and generate 14765-prb-a01 accurate captions, we add sentiment features to facilitate commonsense caption generation.

Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods.These results have practical significance for understanding video content better.

Report this page