Back to Projects

Image-to-Video Generation of Anime/Comics

3D Latent Video Diffusion with Multi-Stage Generative Pipeline

Image-to-Video Generation

📋 Project Overview

Fine-tuned a 3D latent video diffusion model on 400K+ frames, achieving a 71.8% SSIM improvement over baseline interpolation methods. Designed a multi-stage generative pipeline with encoder-decoder networks, RIFE interpolation, and motion-aware conditioning.

Benchmarked generative quality using FVD and SSIM, surpassing state-of-the-art baselines in temporal consistency and visual fidelity.

âš¡ Key Highlights

  • 3D Latent Video Diffusion: Fine-tuned on 400K+ frames for anime/comics domain
  • 71.8% SSIM Improvement: Over baseline interpolation methods
  • Multi-Stage Pipeline: Encoder-decoder networks, RIFE interpolation, motion-aware conditioning
  • Benchmarking: FVD and SSIM evaluation, surpassing SOTA in temporal consistency and visual fidelity

Skills Demonstrated

Video Diffusion Deep Learning RIFE Encoder-Decoder SSIM FVD Generative AI PyTorch Computer Vision

More details and images coming soon.