AI Driven Approach for Description of Visual Contents

Chirag Jindal (Chandigarh University); Tushar Kr Sharma; Satyam Gupta; Pulkit; Jyoti

doi:10.63169/GCARED2025.p7

Abstract

Since the previous decade, Deep Learning has has emerged in many domains that are similar to or better than human experts. Recently, deep learning strategies have obtained wonderful observations due to being able to their higher-level functionality and the ease of access to top acceleration computing assets. Such kinds of perspectives are expanded with regard to the current clarifying of mixed media system content utilizing their structural and eventual structure. In this paper, we have come forward with a strategy that brings out the video captioning techniques and Natural Language Processing systems, to output a title and a matching abstract for the video. This type of technique can possibly be used in numerous supplication zone, counting, the cineplex business, video search engines like google, protection inspection, video warehouses/ databases, data capitals, plus many more.
We likewise produce a video portrayalstructure based on deep learning which first concentrates visual highlights from video outlines utilizing profound convolutional brain organizations (CNN) and next passes the made portrayals into some kind of lengthy transient memory- based language model. So as to encapsulate exact details of a person’s existence, an adjusted syndicate CNN is introduced. The initiate pipe is trainable, tail-to-tail, and a capability of learning dense aesthetic properties together with a perfect construction for the era of essential terminology descriptions of tape clip streams. The analysis is performed by determining the grade for Analysis of relocation together with obvious directing in addition to Recall-Oriented Understudy for Gusting Evaluation (ROUGE) result-linked method created and individual interpret video information to get a meticulously created data file. The particular video interpretation created a throw standard ingredient research in addition to initiating deep understanding structure are likewise collate along with the particular ROUGE scores.
The majority of the effort counted upon Recurrent Neural Network (RNN) and lately awareness instrument is usually as well put in order to build the design take it to concentrate on some shapes associated with the video whilst producing one and all term in an explaining sentence. In this paper, we likewise center around a grouping to-succession approach with a fleeting

consideration system. We examine and look at the outcomes from various consideration model setups.