Virtual Eye Vision With Hololens (DeveloperWeek Global 2020)

Discover the capabilities of HoloLens and Cognitive Services in a demo showcasing face detection, text-to-speech synthesis, and multimodal interaction, with potential applications in virtual assistance, education, and entertainment.

Key takeaways
  • HoloLens: A holographic device that provides a combination of sensors for describing, identifying the environment around the person. It has a computer itself, with a Qualcomm CPU, holographic processing unit, memory, and persistent storage, and connectivity aspects.
  • Cognitive Services: AI-powered services that run in the cloud or at the edge, with machine learning built-in, allowing developers to build applications without needing to be data scientists.
  • Face Detection and Identification: Using the HoloLens device, a demo of face detection and identification is showcased, with the ability to recognize age, gender, emotions, and other attributes.
  • Text-to-Speech Synthesis: Text-to-speech synthesis is also demonstrated, with the ability to customize voice styles, rates, pitch, and volume.
  • Programmability: The Cognitive Services API is accessible through REST or SDKs, allowing developers to build applications using various programming languages (e.g., .NET, Java, JavaScript, Python).
  • Edge Computing: The HoloLens device can run the Cognitive Services locally, reducing the need for cloud connectivity and allowing for offline usage.
  • Accessibility: The technology has the potential to enhance the lives of people with visual impairments, enabling them to “see” the world around them through voice commands and guidance.
  • Multimodal Interaction: The HoloLens device allows for multimodal interaction, incorporating voice, hand gestures, and eye tracking to convert camera feeds into voice instructions.
  • Neural Voice: The device has a neural voice option, which uses neural networks to build a voice as natural and human-like as possible.
  • Video Recognition: The Cognitive Services API can recognize scenes, objects, and facial expressions in videos.
  • Application Possibilities: The technology has the potential to be used in various applications, such as virtual assistance, education, and entertainment.
  • Open-Source: The source code for the demo is available on GitHub, allowing developers to work with and learn from the technology.
  • Directions and Guidance: The device can provide directions and guidance to users, enhancing their navigation and interaction with the environment.