Elon Musk-led artificial intelligence (AI) startup xAI has just introduced its first model with a visual processing capability, Grok 1.5 Vision (Grok 1.5V), on Friday.
xAI debuts its first AI model with visual processing
xAI’s press release, the Grok-1.5V is the company’s first-gen multimodal AI model. The company outlined the remarkable features of the new version of its Grok language model, including its strong text capabilities.
The Grok-1.5V can seamlessly generate a wide range of visual information such as documents, diagrams, charts, screenshots, and photographs, among others.
The AI startup also disclosed seven examples of how the model can utilize visual data, including the following:
- Writing code from a diagram
- Calculating calories
- From a drawing to a bedtime story
- Explaining a meme
- Converting a table to csv
- Help with rotten wood on a deck
- Solving a coding problem
Comparison with other AI models
xAI also presented benchmarking information comparing the new Grok 1.5V with its multimodal competitors, including OpenAI’s ChatGPT-4V, Anthropic’s Claude 3 Sonnet and Claude 3 Opus, and Google’s Gemini Pro 1.5.
Here is the Grok 1.5V benchmarking chart below:
Benchmark | Grok-1.5V | GPT-4V | Claude 3 Sonnet | Claude 3 Opus | Gemini Pro 1.5 |
MMMU Multi-discipline | 53.6% | 56.8% | 53.1% | 59.4% | 58.5% |
Mathvista Math | 52.8% | 49.9% | 47.9% | 50.5% | 52.1% |
AI2D Diagrams | 88.3% | 78.2% | 88.7% | 88.1% | 80.3% |
TextVQA Text reading | 78.1% | 78.0% | – | – | 73.5% |
ChartQA Charts | 76.1% | 78.5% | 81.1% | 80.8% | 81.3% |
DocVQA Documents | 85.6% | 88.4% | 89.5% | 89.3% | 86.5% |
RealWorldQA Real-world understanding | 68.7% | 61.4% | 51.9% | 49.8% | 67.5% |
As you can see, Grok 1.5V beat its rivals in three domains: Mathvista (Math), TextVQA (Text reading), and RealWorldQA (Real-world understanding).
“We are particularly excited about Grok’s capabilities in understanding our physical world. Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding.”
xAI
Musk open sources Grok
xAI founder Elon Musk announced in early March the company’s plans to open source Grok. The announcement came after the American billionaire publicly criticized OpenAI for allegedly ditching its not-for-profit mission.
Elon Musk even filed a lawsuit against the company he helped establish for not open-sourcing its GPT models.
The American business tycoon immediately renounced allegations in January claiming that xAI had been raising capital. He noted that the company currently has no intention to raise funds.
xAI aims to further improve Grok’s capabilities across various domains, including images, audio, and video. xAI plans to launch the new Grok 1.5V to early testers and existing Grok users “soon” without specifying the exact date.