System Architecture

SafeWalk integrates mobile, vision, and language AI technologies into a seamless system that assists visually impaired individuals in real-time. Below is a breakdown of our core components.

SafeWalk System Architecture Diagram

Architecture overview: Front-end app orchestrates detection and description flows

1. Mobile Front-End

  • Developed in Flutter for cross-platform capability
  • Real-time video feed with overlayed obstacle detection results
  • Voice-based interaction and feedback with gesture control
  • Front/back camera toggle + fast alert trigger system

2. YOLOv5s Obstacle Detection

  • Trained on 90,855 labeled images across 27 object classes
  • Converted to TorchScript for on-device performance
  • Detects and classifies objects with bounding boxes
  • Provides spatial inputs for the voice alert logic

3. Qwen2 Visual Language Model (VLM)

  • Cloud-hosted using AWS SageMaker + Lambda
  • Receives cropped image input and returns natural language captions
  • Optimized prompt engineering for environmental awareness
  • Audio output is synthesized using TTS for user-friendly feedback

Deployment Strategy

  • YOLOv5s: Runs on-device via TorchScript for low-latency and offline use
  • Qwen2: Accessed via secure REST API for scalable and language-rich output
  • Designed for modular expansion: wearable camera support, iOS app, etc.