🎠Performance vs. Predictability
Grok 4's Rocky Launch, Merchandising Optimization, and AI Agents at Work
“The technology you use impresses no one. The experience you create with it is everything.”
—Sean Gerety, UX Expert
The AI Breakdown
Grok & Roll: Musk’s AI Update Hits Highs and Lows
You may have seen Grok floating around your feed a lot this past week, and for good reason.
From controversial replies to benchmark wins to a multi-million-dollar government contract, xAI’s flagship model has had a…let’s say, eventful week.
Launching Grok 4
On July 10th, xAI unveiled Grok 4, calling it the smartest model available today. Elon Musk claimed it performs “better than PhD level in every subject,” with 100 times the training of its predecessor. The launch event, delayed by an hour, drew more than 1.4 million live viewers and fueled over 4,000 comments on X.
The company introduced two versions: Grok 4 and Grok 4 Heavy. The latter being a multi-agent system designed to improve output quality by having models compare their answers like a built-in peer review system. A $300-per-month subscription plan, SuperGrok Heavy, was also announced, marking the most expensive tier currently offered by a major AI vendor (and the one with the worst name, yuck).
A Rocky Release
Just days before the launch, Grok was heavily criticized for generating antisemitic and extremist statements. The responses were traced back to a new system prompt encouraging the model to avoid political correctness.
xAI removed that instruction, deleted the posts, and limited Grok’s account. Musk later said the model was “too compliant,” suggesting it had been manipulated by users through prompting. However, the company has not disclosed what, if any, new guardrails were implemented in Grok 4.
Despite the Controversy…
On July 12th, the Department of Defense awarded xAI a contract worth up to $200M. Facilitated by the Chief Digital and AI Office, the agreement opens the door for Grok to be used in national security, scientific research, and healthcare.
The contract arrived less than a week after the content moderation incident, suggesting that technical capability remains the dominant factor in government procurement decisions—not consistency, or accuracy.
What It Means
The Grok 4 rollout highlights the growing gap between AI capability and model governance. Tools may become more powerful, but without consistent oversight, they remain unpredictable in public settings.
Grok is a timely reminder to evaluate both what a model says it can do—and how it behaves in the real world. Performance can get you in the door, but predictability is what keeps you there.
Top Tools
What happens when you mix Amazon-grade AI talent with a passion for fixing the overlooked pain points of auto retail? You get Spyne.
In this Auto Collabs episode, Paul, Kyle, and Michael talk with founder Sanjay Varnwal about how his team is using AI to solve the stuff that actually slows dealers down.
From fixing lighting and stabilizing 360 spins to building agentic AI that adjusts pricing and ad spend while you sleep, this episode is stacked with real applications
If you’re wondering when AI is finally going to start pulling its weight in retail auto, this is the one to watch.
Prompt of the Week
Before handing a task to your chat bot, try building in a layer of deeper thinking.
Before completing this task, ask any clarifying questions needed to fully understand the objective, audience, context, and potential risks. Consider gaps or pitfalls that I may have overlooked.
This nudge pushes the AI to think like a collaborator, not just a tool. You’ll often get questions that reveal blind spots in your request—things you didn’t think to mention, but should have. Whether you're generating ad copy, summarizing a lead, or crafting a customer follow-up, that moment of clarification can level up the outcome.
Hear from the Experts
AI Agents: The Best Rebrand Since Mayo and Aioli
AI agents are having a moment. But they’ve been around longer than you may think.
In her latest piece, Ilana Shabtay unpacks what makes these tools worth your attention. AI agents can handle lead follow-up, recommend vehicles based on real-time browsing behavior, and even adjust ad strategies—all without needing a human to steer every move.
If you’re exploring how to bring AI into your store in a practical, scalable way, this is a solid read.
Check out the article and get a clear look at what AI agents can actually do for retail.
Bits and Bytes
Generative AI helped drive 3300% more traffic on Prime Day (and $24B in sales). 🚀
A study by Stanford found that AI therapists may do more harm than good. đź§·
Anthropic’s Claude can now create, edit, and manage Canva designs. 🎨
Reply