I have used it. Multiple times. In actual real life queries, it always falls short compared to the 3 main competitors. They are very good at gaming benchmarks, but when you use it for regular queries (and not always code, anecdotes you give are very specific / personal) it always never scores quite as well as the top 3.
The thing that makes Grok utterly untrustworthy--and therefore effectively useless--doesn't stem from how well it does or doesn't do on any benchmark, sample query, or anecdote.
It's the thing demonstrated by its answer to the hypothetical about how many innocent people's deaths would be worth trading for Elon Musk's life. Yes, phrasing it in terms of a particular ethnic group was a trick question, because it spelled out the actual logic: In a hypothetical, killing any number of humans up to half the population of earth in exchange for Elon Musk surviving is a net win.
Which is to say that, however good or bad its training dataset and algorithm is, its answers will always be filtered through the manic whims of the comically-bloated ego of the billionaire who is subsidizing it.
If you ask a question that doesn't hit any of the ego-feeding or talking points Musk tweaked the backend with that week, maybe you get a good answer. But there is absolutely no way to tell, so any and every thing you ask
might be filtered through whatever Elon has decided it should be telling you that week, right, wrong, or bonkers crazy. Most of the Musk-ified garbage the news reports on are so obvious it's funny--MechaHitler and such--but there's no way to tell what more subtle things have been tweaked away from reality and toward whatever Elon Musk wants people to believe, or just when it does a better job of hiding its ulterior motives.
And fundamentally, while
I don't like Musk, it doesn't actually matter whether I do or not--the point is that Grok's answers are being explicitly and very obviously manipulated on the back end by a single person, and that person has no financial interest or external pressure to prioritize accuracy over opinion or ideology. If there were an LLM in exactly the same situation being steered by a person who I didn't hate, it would be no more trustworthy.
And if the person were better at steering it, it would be even worse--instead of ham-fisted lies or exaggerations, it might actually be good at subtly manipulative responses to queries.
It's another flavor of Chinese LLMs that will give you false information about the Tiananmen Square Massacre despite having accurate information in the training data--the Chinese government's desire to manipulate public opinion and what is perceived as fact takes priority over any sort of objective accuracy, and there's no pressure whatsoever in the other direction. People know to look for that specific lie or omission, but you don't know what other things the LLM is subtly trying to manipulate you about at the instruction of those who control it.
No LLM maker is immune to this, but at least most of them are under some financial pressure from their investors or funders to make the thing work well.