On Thursday, weeks after launching its strongest AI mannequin but, Gemini 2.5 Professional, Google printed a technical report exhibiting the outcomes of its inner security evaluations. Nevertheless, the report is mild on the main points, specialists say, making it troublesome to find out which dangers the mannequin would possibly pose.
Technical reviews present helpful — and unflattering, at occasions — information that corporations don’t at all times broadly promote about their AI. By and enormous, the AI group sees these reviews as good-faith efforts to assist unbiased analysis and security evaluations.
Google takes a special security reporting method than a few of its AI rivals, publishing technical reviews solely as soon as it considers a mannequin to have graduated from the “experimental” stage. The corporate additionally doesn’t embody findings from all of its “harmful functionality” evaluations in these write-ups; it reserves these for a separate audit.
A number of specialists TechCrunch spoke with have been nonetheless upset by the sparsity of the Gemini 2.5 Professional report, nonetheless, which they famous doesn’t point out Google’s Frontier Security Framework (FSF). Google launched the FSF final 12 months in what it described as an effort to determine future AI capabilities that would trigger “extreme hurt.”
“This [report] may be very sparse, comprises minimal info, and got here out weeks after the mannequin was already made obtainable to the general public,” Peter Wildeford, co-founder of the Institute for AI Coverage and Technique, instructed TechCrunch. “It’s unimaginable to confirm if Google resides as much as its public commitments and thus unimaginable to evaluate the security and safety of their fashions.”
Thomas Woodside, co-founder of the Safe AI Venture, mentioned that whereas he’s glad Google launched a report for Gemini 2.5 Professional, he’s not satisfied of the corporate’s dedication to delivering well timed supplemental security evaluations. Woodside identified that the final time Google printed the outcomes of harmful functionality checks was in June 2024 — for a mannequin introduced in February that very same 12 months.
Not inspiring a lot confidence, Google hasn’t made obtainable a report for Gemini 2.5 Flash, a smaller, extra environment friendly mannequin the corporate introduced final week. A spokesperson instructed TechCrunch a report for Flash is “coming quickly.”
“I hope it is a promise from Google to start out publishing extra frequent updates,” Woodside instructed TechCrunch. “These updates ought to embody the outcomes of evaluations for fashions that haven’t been publicly deployed but, since these fashions might additionally pose severe dangers.”
Google could have been one of many first AI labs to suggest standardized reviews for fashions, however it’s not the one one which’s been accused of underdelivering on transparency currently. Meta launched a equally skimpy security analysis of its new Llama 4 open fashions, and OpenAI opted to not publish any report for its GPT-4.1 collection.
Hanging over Google’s head are assurances the tech large made to regulators to take care of a excessive normal of AI security testing and reporting. Two years in the past, Google instructed the U.S. authorities it could publish security reviews for all “vital” public AI fashions “inside scope.” The corporate adopted up that promise with related commitments to different nations, pledging to “present public transparency” round AI merchandise.
Kevin Bankston, a senior adviser on AI governance on the Middle for Democracy and Know-how, referred to as the development of sporadic and obscure reviews a “race to the underside” on AI security.
“Mixed with reviews that competing labs like OpenAI have shaved their security testing time earlier than launch from months to days, this meager documentation for Google’s prime AI mannequin tells a troubling story of a race to the underside on AI security and transparency as corporations rush their fashions to market,” he instructed TechCrunch.
Google has mentioned in statements that, whereas not detailed in its technical reviews, it conducts security testing and “adversarial crimson teaming” for fashions forward of launch.