Back to Projects
Image-to-Video Generation of Anime/Comics
3D Latent Video Diffusion with Multi-Stage Generative Pipeline
📋 Project Overview
Fine-tuned a 3D latent video diffusion model on 400K+ frames, achieving a 71.8% SSIM improvement over baseline interpolation methods. Designed a multi-stage generative pipeline with encoder-decoder networks, RIFE interpolation, and motion-aware conditioning.
Benchmarked generative quality using FVD and SSIM, surpassing state-of-the-art baselines in temporal consistency and visual fidelity.
âš¡ Key Highlights
- 3D Latent Video Diffusion: Fine-tuned on 400K+ frames for anime/comics domain
- 71.8% SSIM Improvement: Over baseline interpolation methods
- Multi-Stage Pipeline: Encoder-decoder networks, RIFE interpolation, motion-aware conditioning
- Benchmarking: FVD and SSIM evaluation, surpassing SOTA in temporal consistency and visual fidelity
Skills Demonstrated
Video Diffusion
Deep Learning
RIFE
Encoder-Decoder
SSIM
FVD
Generative AI
PyTorch
Computer Vision
More details and images coming soon.