Microsoft Research Lab Asia.
- VITRA Redefines VLA Pre-training Paradigms via Human Video Reconstructionby Microsoft Research Team on May 29, 2026 at 7:49 am
When you see robots participating in running races or performing folk dances on stage, you might envision a future where a simple natural language command is all it takes for a robot to tidy up a desk, clean a room, or even serve tea. For a robot to truly “understand human speech,” “perceive the world,” The post VITRA Redefines VLA Pre-training Paradigms via Human Video Reconstruction appeared first on Microsoft Research.
- Phi-Ground: Improving how AI agents navigate screen interfacesby Microsoft Research Team on January 19, 2026 at 7:28 am
Imagine an AI assistant that can navigate a computer the same way humans do—clicking buttons, filling out forms, and moving between applications—all by simply interpreting what’s on the screen. This vision is becoming a reality through computer use agents—AI systems designed to operate software interfaces autonomously. Yet for these agents to function, they need to The post Phi-Ground: Improving how AI agents navigate screen interfaces appeared first on Microsoft Research.
- Deep Video Discovery: Using agentic search to analyze long-form videoby Microsoft Research Team on December 19, 2025 at 8:01 am
Extracting useful information from long videos, whether meeting recordings, experimental data, or lecture content, requires painstaking manual review. AI tools offer some help: language-vision models can summarize short clips or answer questions when videos are divided into clear scenes or chapters. But for hours‑long recordings packed with information and lacking obvious structure, current models are The post Deep Video Discovery: Using agentic search to analyze long-form video appeared first on Microsoft Research.
- Where AI meets neuroscience: Yansen Wang’s pursuit of human-centered innovationby Microsoft Research Team on December 11, 2025 at 10:27 am
“Curiosity drives scientific breakthroughs, and the tools we create often reflect the human motivations behind that curiosity.” For Yansen Wang, a senior researcher at Microsoft Research Asia, this philosophy has guided his work at the intersection of AI and neuroscience. Wang’s interest in science began early. While his classmates spent hours searching for information to The post Where AI meets neuroscience: Yansen Wang’s pursuit of human-centered innovation appeared first on Microsoft Research.
- UI-E2I-Synth: Realistic and challenging UI grounding benchmark for computer-use agentsby Microsoft Research Team on November 24, 2025 at 4:15 am
AI assistants, designed to perform actions on behalf of users, may not be as capable as current benchmarks suggest. New research reveals that existing tests for UI grounding—the ability of assistants to locate elements in the graphical user interface (GUI)—have been overestimating the performance of visual language models (VLMs), which power these assistants. This becomes The post UI-E2I-Synth: Realistic and challenging UI grounding benchmark for computer-use agents appeared first on Microsoft Research.
- UI-Evol: Compute-use Agents Act on Knowledgeby Microsoft Research Team on November 17, 2025 at 4:21 am
Computer-use agents are AI systems that autonomously navigate and interact with software applications through graphical user interfaces (GUIs), and they are emerging as a new capability in artificial intelligence. By navigating and manipulating the same visual interfaces that people use, they can perform complex tasks on behalf of users, from filling out forms to managing The post UI-Evol: Compute-use Agents Act on Knowledge appeared first on Microsoft Research.
- DocReward: Advancing professional document design through AI evaluationby Microsoft Research Team on November 13, 2025 at 4:00 am
In recent years, as the shift toward agentic AI has accelerated, automation has advanced to handle increasingly complex tasks, from document and code generation to image creation, visual understanding, and mathematical reasoning. This trend points to the growing need to transform traditional software into intelligent agents. When core productivity platforms like Microsoft Office evolve into The post DocReward: Advancing professional document design through AI evaluation appeared first on Microsoft Research.
- OPA-DPO: Efficiently minimizing hallucinations in large vision-language modelsby Microsoft Research Team on October 27, 2025 at 3:54 am
Large vision-language models are improving at describing images, yet hallucinations still erode trust by introducing contradictions and fabricated details that limit practical applications. In response, Microsoft Research Asia has developed On-Policy Alignment DPO (OPA-DPO), a new algorithm that aligns expert feedback with the model’s own output distribution before training begins. This “on-policy” alignment slightly alters The post OPA-DPO: Efficiently minimizing hallucinations in large vision-language models appeared first on Microsoft Research.
- Microsoft study shows AI assistants help with development for programmers who are blind or have low visionby Microsoft Research Team on September 30, 2025 at 1:30 am
Developers who are blind or have low vision have historically been limited to back-end programming, but new research suggests AI programming assistants are changing that in remarkable ways. A Microsoft Research Asia study found that developers who use screen readers can now tackle previously challenging tasks like UI development through an AI-assisted software development technique The post Microsoft study shows AI assistants help with development for programmers who are blind or have low vision appeared first on Microsoft Research.
- StreamMind: AI system that responds to video in real timeby Microsoft Research Team on August 15, 2025 at 3:00 am
Imagine a pair of smart glasses that detects its surroundings and speaks up at critical moments, such as when a car is approaching. That kind of split-second assistance could be transformative for people with low vision, but today’s visual AI assistants often miss those moments. The problem isn’t that the technology can’t detect its environment. The post StreamMind: AI system that responds to video in real time appeared first on Microsoft Research.














