Video Metadata Classifier

Built a machine learning model to classify and tag videos based on content for efficient management.

Skills, Tech Stack, and Libraries

Skills: Natural Language Processing, Text Classification, Video Content Analysis, Model Deployment
Tech Stack: Python, SQL, AWS (S3, EC2), Flask
Libraries: PyTorch, TensorFlow, OpenCV, Transformers (Hugging Face), Pandas, NumPy, NLTK

Description and Approach

Objective:

I developed a machine learning-based classifier to analyze video content and generate metadata tags automatically. This system aimed to enhance video management and searchability by categorizing videos based on their content.

Approach:

Data Collection and Preprocessing:
- Extracted audio and visual content from videos using OpenCV and FFmpeg.
- Used speech-to-text models to transcribe audio into textual data.
- Cleaned and tokenized textual data using NLTK to prepare it for classification tasks.
Feature Engineering:
- From the video frames, extracted visual features using pre-trained convolutional neural networks (CNNs) like ResNet.
- Combined visual features with text embeddings generated using Transformers (Hugging Face).
Model Development:
- Built a multi-modal model leveraging both text and visual features for classification.
- Fine-tuned a pre-trained Transformer model for text classification to generate contextual tags.
- Trained the model using labeled datasets of video content and associated metadata.
Pipeline Creation:
- Designed a pipeline to process new video uploads, extract features, and generate metadata tags in real-time.
- Deployed the pipeline as a REST API using Flask for easy integration with video platforms.
Visualization and Search:
- Stored the generated metadata tags in a SQL database for efficient querying.
- Built a lightweight search interface that allowed users to retrieve videos based on metadata keywords.

Code Flow:

Load videos using OpenCV, extract audio, and perform speech-to-text transcription.
Process text data for classification using Transformers.
Extract visual features from frames using pre-trained CNNs.
Combine text and visual features for classification using a multi-modal model.
Deploy the model pipeline via Flask and integrate metadata with a SQL database.

Results

The project successfully delivered an automated metadata tagging system with the following outcomes:

Enhanced Video Searchability: Metadata improved search precision, enabling quick retrieval of relevant videos.
Time Savings: Automated tagging reduced manual annotation time by 80%.
High Accuracy: The model achieved 90% accuracy in categorizing videos based on content.
Scalable Deployment: The REST API pipeline enabled seamless integration with existing video platforms.

This project demonstrated the effective use of machine learning for video content analysis, making video libraries more accessible and manageable.

Git Link

For more information and code, visit the Git link.

Github