AI Models
Explore our collection of AI models covering image generation, video generation, text chat and more with different capabilities and pricing
All Models
Qwen-Image is a multimodal AI model adept at understanding and generating responses based on visual inputs. It interprets images, recognizes objects, scenes, and context, and answers related questions with high accuracy. Designed for diverse applications, it supports visual reasoning, content moderation, and accessibility tools. Trained on vast datasets, Qwen-Image delivers robust performance across languages and domains, making it ideal for developers seeking intelligent image analysis integrated seamlessly into their platforms or services.
Qwen-Image-Fast is a LoRA-quantized accelerated version of Qwen-Image, offering significantly faster image generation speeds while maintaining high quality. Using advanced optimization techniques including LoRA (Low-Rank Adaptation) and quantization, this model delivers rapid text-to-image generation with excellent support for Chinese prompts and natural language descriptions. Perfect for applications requiring quick turnaround times without compromising on image quality.
Qwen-Image-Edit-2509 is an image editing model released by Alibaba Tongyi Qianwen Team in September 2025. Based on the original architecture with further training, it introduces multi-image editing capabilities, supporting semantic-level fusion and consistent editing of multiple source images through image concatenation. The model can respond to text instructions for precise local or global modifications to single or multiple images, with outstanding performance in preserving character features and iterative editing.
FLUX.1-dev is an open-source text-to-image diffusion model developed by Black Forest Labs, featuring powerful image generation capabilities and exceptional detail representation. The model excels in complex scene understanding, human pose control, and multilingual prompt support, while also supporting high-resolution output. As a development version of the FLUX.1 series, it provides researchers and developers with flexible fine-tuning and deployment options, representing a significant advancement in the open-source image generation field.
Wan2.2 I2V 14B is an image-to-video generation model developed by Alibaba's Tongyi Wanxiang Lab in July 2025. With 14 billion parameters, it uses a Mixture of Experts (MoE) architecture containing high-noise and low-noise expert models that respectively handle overall video structure and detail optimization, transforming static images into dynamic videos with cinematic aesthetic styles.
Gemini-2.5-Pro is Google's multimodal large language model that can simultaneously understand text, images, audio, and video content with powerful reasoning capabilities. The model supports up to 1 million token context window, excels at code generation, transformation and editing, and can process video inputs up to 3 hours long. Additionally, it performs exceptionally well in logical analysis, world knowledge, and complex problem solving.