These systems use mathematical probabilities to guess the best response, not a strict set of rules defined by human engineers.As offered up in these essays more than once: not until the AI community really, really, truly comes to grips with this shit will AI be trustworthy. And the way to do that is marry AI with RDBMS - that's the "strict set of rules defined by human engineers" side of the process. You heard it here first, and quite a while ago. At least, so far as I know.
And it gets worser. Some other notes in the report
Last month, an A.I. bot that handles tech support for Cursor, an up-and-coming tool for computer programmers, alerted several customers about a change in company policy. It said they were no longer allowed to use Cursor on more than just one computer.Now you know where the title came from. Cue the Twilight Zone theme.
In angry posts to internet message boards, the customers complained. Some canceled their Cursor accounts. And some got even angrier when they realized what had happened: The A.I. bot had announced a policy change that did not exist.
"We have no such policy. You're of course free to use Cursor on multiple machines," the company's chief executive and co-founder, Michael Truell, wrote in a Reddit post. "Unfortunately, this is an incorrect response from a front-line A.I. support bot."
[S]o called reasoning ystems from companies like OpenAI, Google and the Chinese start-up DeepSeek — are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why.Really?? The innterTubes is rife with shit. Shit in, shit out. Once again, with feeling: AI without ground facts as its basis will continue to shift shit.
When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time.QED, kinda.
We still don't know how these models work exactly.What was that parable about Pandora and her box?
-- Hannaneh Hajishirzi
No comments:
Post a Comment