- Context Window
- 1M tokens (1,500 pages text / 30K lines code)
- Context Window - Constraints
- Ultra tier required for maximum capacity; standard tiers limited to 32K tokens
- Context Window - Applicable Modalities
- Text, Image, Video
- Media Resolution Control
- High/Low resolution modes via media_resolution parameter
- Media Resolution Control - Constraints
- High resolution maximizes fidelity but increases token consumption and latency
- Media Resolution Control - Applicable Modalities
- Image, Video
- Video Frame Rate Processing
- 10+ FPS high-speed sampling optimized
- Video Frame Rate Processing - Constraints
- Higher frame rates significantly increase computational demands
- Video Frame Rate Processing - Applicable Modalities
- Video
- Spatial Pointing Precision
- Pixel-precise 2D coordinate output
- Spatial Pointing Precision - Constraints
- Requires clear visual references; accuracy degrades with occlusion or blur
- Spatial Pointing Precision - Applicable Modalities
- Image
- Native Aspect Ratio Processing
- Preserves original image/video aspect ratios
- Native Aspect Ratio Processing - Constraints
- Improves quality but requires flexible input handling
- Native Aspect Ratio Processing - Applicable Modalities
- Image, Video
- Thinking Mode Context
- 192K tokens in Deep Think mode
- Thinking Mode Context - Constraints
- Limited to 10 prompts/day in Ultra tier
- Thinking Mode Context - Applicable Modalities
- Text, Image, Video