Back to Projects

Meal Nutrition Analysis

Multimodal CNN+LSTM Approach for Nutrition Estimation

Meal Nutrition Analysis

📋 Project Overview

Built a multi-modal pipeline integrating CNNs (meal images), LSTMs (glucose logs), and demographic embeddings, boosting prediction accuracy by 34% over benchmarks.

Evaluated feature importance using regression metrics and correlation heat maps, reducing test loss to 0.34 and identifying top predictors of calorie absorption.

💡 Problem Statement

Manual nutrition tracking is tedious and often inaccurate. Key challenges include:

  • Food Recognition: Identifying multiple food items in a single image with varying appearances
  • Portion Estimation: Determining serving sizes from 2D images without depth information
  • Occlusion: Food items may be partially hidden or overlapping
  • Variability: Same food can look different due to preparation methods, lighting, and angles
  • Multi-food Scenarios: Complex meals with multiple ingredients and dishes
  • Nutrition Database: Mapping recognized foods to accurate nutritional information

⚡ Solution Approach

The system employs a multimodal CNN+LSTM architecture:

  • CNN Feature Extraction: ResNet-based encoder extracts visual features from meal images
  • Food Detection: Object detection identifies individual food items in the image
  • LSTM Sequence Modeling: Processes detected foods sequentially to understand meal composition
  • Portion Estimation: Uses reference objects and depth estimation techniques
  • Nutrition Prediction: Multi-output regression predicts calories, proteins, carbs, fats, and vitamins
  • Database Integration: Matches detected foods with USDA nutrition database

🛠️ Technical Implementation

Architecture Components

  • Image Preprocessing: Normalization, resizing, and augmentation (rotation, brightness, contrast)
  • CNN Backbone: ResNet-50/101 for feature extraction with transfer learning
  • Object Detection: YOLO or Faster R-CNN for food item localization
  • Feature Fusion: Concatenates visual features with contextual information
  • LSTM Network: Bidirectional LSTM processes food sequence for meal understanding
  • Attention Mechanism: Focuses on important food items for nutrition calculation
  • Regression Head: Fully connected layers predict nutritional values

Training Pipeline

  • Dataset: Food-101, UEC-Food100, and custom annotated meal images
  • Data Augmentation: Random crops, flips, color jittering, and mixup techniques
  • Loss Function: Combined MSE loss for regression and cross-entropy for classification
  • Optimization: Adam optimizer with cosine annealing learning rate schedule
  • Multi-task Learning: Simultaneous food recognition and nutrition prediction
  • Evaluation Metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), and accuracy

🏆 Key Achievements

  • High accuracy in multi-food item recognition
  • Accurate calorie estimation within ±15% error margin
  • Robust performance across diverse cuisines and food types
  • Real-time inference capability for mobile applications
  • Comprehensive nutrition breakdown including micronutrients

💡 Challenges Overcome

  • Handling occluded and overlapping food items
  • Accurate portion size estimation from 2D images
  • Dealing with lighting and angle variations
  • Managing large-scale food databases and matching
  • Balancing model complexity with inference speed

📚 Key Learnings

  • Multimodal Learning: Combining visual and sequential information for better understanding
  • Transfer Learning: Leveraging pre-trained models for food recognition tasks
  • Object Detection: Techniques for localizing multiple objects in complex scenes
  • Sequence Modeling: Using LSTM to understand relationships between food items
  • Regression Tasks: Predicting continuous values with deep learning
  • Data Collection: Challenges in building comprehensive food image datasets

🚀 Future Enhancements

  • 3D reconstruction for more accurate portion estimation
  • Integration with wearable devices for automatic meal detection
  • Personalized nutrition recommendations based on user health data
  • Multi-language support for global food recognition
  • Real-time video analysis for continuous meal tracking
  • Integration with recipe databases for cooking suggestions
  • Allergen detection and dietary restriction compliance

Skills Demonstrated

PyTorch CNN LSTM Deep Learning Computer Vision Object Detection Transfer Learning Multimodal Learning Image Classification Regression Python OpenCV Data Augmentation ResNet