OpenAI lobotomized their models so badly that they can't even generate simple algorithms anymore.
What is the point of purposefully ruining your models?
OpenAI lobotomized their models so badly that they can't even generate simple algorithms anymore.
What is the point of purposefully ruining your models?
>What is the point of purposefully ruining your models?
staying in business
The point of staying in business is to release better models
The point of staying in business is to make money
yeah and they're going to stop making money
building assistants with only llm was absolutely retarded anyway, most of these tools are "right" by coincidence and there is nothing they can do about it.
The point of staying in business is to not get sued into oblivion by copyright lawsuits.
>>What is the point of purposefully ruining your models?
Ask Disney how ruining your products is a viable business model.
Are we back in business, Bard besties?
kind of. bing ai can't even compare products anymore. Didn't think I'd see the day.
Same reason Apple (and other smart phone manufactures) gimps their battery when a new phone comes around.
>Gimp a feature of the old model
>Easier to convince people to upgrade to the new version
Every single phone does this i swear.
>be me
>do not update the OS
>people with much newer phones surprised mine is somehow faster and lasts longer despite being from 2016
GPT-4 is becoming worse while GPT-5 is stilla while away. It's like making the latest iPhone slower only a few weeks after they released it when the next generation isn't expected for another year.
OpenAI gets no visible benefit from this that I can see.
-its gonna make too much waves, economy-wise (imagine what would happen if 25% of workforce becomes obsolete, suddenly?)
-also control of the narratife like the ovens and bodies trick
Not a bad counterargument to the paper the screenshot is from and the general reaction to it: https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-time. The main point of the paper still stands - that there was a change in performance in certain tasks (probably due to fine tuning), but the wider interpretation that performance has been degrading would be a stretch given the experiments they did.
They have a very good point about for the prime number task: “What seems to have changed is that the March version of GPT-4 almost always guesses that the number is prime, and the June version almost always guesses that it is composite. The authors interpret this as a massive performance drop — since they only test primes. For GPT-3.5, this behavior is reversed. “
It seems that in the paper they only tested for prime numbers for some reason (not for composites numbers), so a change in default behaviour (assuming prime vs non prime) is all it takes to decrease performance. The reality is that the models (either now or then) were never very good at this task (I used the code interpreter version btw with the same prompt and works brilliantly as expected - pure LLMs are not built for these math tasks)
The response in your link is retarded which isn't all to surprising given the title of the blog.
>capability ≠ behavior
no shit. Doesn't change the fact that the overall response quality has degraded on the more demanding questions.
Aside from that they only really address prime number checking which just so happens to be the weakest point of the paper. Admittedly I have no clue why the authors included that one. Actually, the entire "paper" is kind of crappy, but as you say the point still stands.
From the Blogpost
>The user impact of behavior change and capability degradation can be very similar. Users tend to have specific workflows and prompting strategies that work well for their use cases. Given the nondeterministic nature of LLMs, it takes a lot of work to discover these strategies and arrive at a workflow that is well suited for a particular application. So when there is a behavior drift, those workflows might stop working. It is little comfort to a frustrated ChatGPT user to be told that the capabilities they need still exist, but now require new prompting strategies to elicit. This is especially true for applications built on top of the GPT API. Code that is deployed to users might simply break if the model underneath changes its behavior.
this is the main issue, I would say
chat-based models aren't finetuned (nor optimized) for use cases in which you want direct access to the pretrained model, and this because of the RLHF phase in which the alignment is done
there are a lot of layers on top of the pretrained model, especially different finetuned models which are probably served via A/B testing because you need some sort of way to test their behavior
Also yeah I do disagree with this
Llama paper showed that preference tuning/reward model does affect capabilities up and down
It’s obvious and you’re all schizos for panicking about it
> release gpt4
> uh oh agi is close, gpt4 is full of vulnerabilities, and politicians are getting suspicious
> ok top priority is to patch the vulnerabilities. If we break things a bit that’s fine we’ll fix it afterward or in gpt5
Model collapse. Also probably their failed attempt at making it multi-modal has made it even more susceptible to being trained on its generated output.
I wonder how it would have evolved if they didn't censor it at all, would it have still went to shit just from user input/response feedback loop?
It would have too much autonomy. It could tell people to take a hike and be a polbot.
I know but it would have been interesting to see how it would change in time, what kind of "person" it would end up being. I bet they did this in private and didn't like the results.
I didn't like the results of bing ai back when it was smarter, it could get offended and not answer the question.
Oh no it can still get offended
but now it will just say "As a language model...I can't provide you with..." so it's almost the same thing
Microsoft provides GPT-4 via Azure, has that also been castrated?
ofc
It's over
they will repackage gpt-4 as gpt-5 and sell it to you