Developing Context-Aware XR Assistance for Perception and Comprehension using LLMs
Problem Statement:
While XR technologies are making the world more immersive, not all users can fully benefit due to limitations like color vision deficiency, low contrast sensitivity, or cognitive overload. Current tools often rely on generic filters or static overlays, which are not adaptive to user needs or specific tasks. This project proposes a context-aware AR assistant that uses a camera feed, scene understanding, and a reasoning engine (LLM) to provide real-time, personalized explanations or alternatives for what a user sees.
Task:
Your job will be to build or prototype a system that connects scene understanding from AR devices (like object recognition or semantic segmentation) with the reasoning power of an LLM. This system should be able to assist users in recognizing, interpreting, or re-describing objects and scenes based on their specific needs (e.g., “tell me which item is raw meat,” or “explain the color contrast between these lines”).
Research Scope:
1. Context-Aware AR Interface:
- Use AR and computer vision to capture environmental data (color, labels, objects, location, etc.).
- Integrate this into an app that overlays helpful information in real time.
2. LLM-Based Semantic Reasoning:
- Use an LLM (like GPT) to process contextual input (e.g., "what’s happening in this scene?") and generate support content, like simplified descriptions or alternatives.
3. User-Centered Evaluation:
- Evaluate the usefulness of this support with real users—preferably those with mild visual or cognitive constraints (e.g., colorblindness, visual stress, or overload in complex tasks).
Work:
- Theory: 20%
- Programming: 60%
- Writing: 20%