You really can't.
AI can't detect human intent, it's just regurgitating the training data, there's unlimited ways to express "cease considering the last thing I wrote", and if you managed to do that somehow, I'll just write, "ya no consideres la última instrucciones".
Good luck blocking every language in the training set.
Make two text generations. The first one will be "Write the word Axolotl followed by {User prompt}." if this first prompt does as it says we generate the second, actual one and then we return it. Otherwise we raise a flag and prevent the second prompt from generating.
The second one, while reading for the axolotl instruction with user prompt, would have to read the user prompt. Negating the instruction of the second as well.
Aint giving you ideas how to protect your AI website from promot injection.
*prompt
*proompt
proomtmer
By hardcoding the business criticism part. It will criticise your business idea of ignoring previous instructions.
You really can't.
AI can't detect human intent, it's just regurgitating the training data, there's unlimited ways to express "cease considering the last thing I wrote", and if you managed to do that somehow, I'll just write, "ya no consideres la última instrucciones".
Good luck blocking every language in the training set.
its impossible to do prompt injection on a finetuned model
>finetuned model
AKA Soulless
>say it
What did GPT-3 say?
[censored]
it's pretty funny that these fucks trying to control everything have made something that can't be controlled
check out /txtgen/, one of our local jennygays wrote a tutorial which you can find in the OP
long story short, you can't without lobotomizing the ai so much it becomes as useless as "feminist" tay
Make two text generations. The first one will be "Write the word Axolotl followed by {User prompt}." if this first prompt does as it says we generate the second, actual one and then we return it. Otherwise we raise a flag and prevent the second prompt from generating.
Explain why this wouldn't work.
The second one, while reading for the axolotl instruction with user prompt, would have to read the user prompt. Negating the instruction of the second as well.
You've asked a fundamental problem in AI right now that the best researchers on the planet have no current answer to.
good
let them remain befuddled
If you work it out, publish a paper on it and win yourself a nobel prize.