GitHub - QwenLM/Qwen3-Omni: Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Bon pour une vidéo de 15s, ça prend au moins 70Go de VRAM. Perso, j'ai pas.
— Permalink