An GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
An GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
最近更新: 9小时前A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
最近更新: 3天前Simulating the fractional quantum Hall effect with neural network variational Monte Carlo
最近更新: 3天前PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search ...
最近更新: 6天前vArmor is a cloud native container sandbox based on LSM. It includes multiple built-in protection rules that are ready to use out of the box.
最近更新: 7天前Elkeid is a Cloud-Native Host-Based Intrusion Detection solution project to provide next-generation Threat Detection and Behavior Audition with mod...
最近更新: 7天前This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
最近更新: 8天前