There was a post [0] recently about the bing chatGPT assistant either citing or hallucinating it’s own initial prompt from the (in theory) low privileged chat input UI they put together. This feels like it’s almost unavoidable if you let users actually chat with something like this.
How would we sanitize strings now? I know OpenAI has banned topics they seem to regex for, but that’s always going to miss something. Are we just screwed and should make sure chat bots just run in a proverbial sandbox and can’t do anything themselves?
A conversation I had earlier today around 12pm CET caused ChatGPT to dump source code with what appear to be timestamps of executions or an instruction counter. It also appears that ChatGPT is learning between sets of conversations.
Curious if anyone knows what the "timestamps" on the left side of the code dump are?
We're building Aegis, a firewall for LLMs: a guard against adversarial attacks, prompt injections, toxic language, PII leakage, etc.
One of the primary concerns entwined with building LLM applications is the chance of attackers subverting the model’s original instructions via untrusted user input, which unlike in SQL injection attacks, can’t be easily sanitized. (See https://greshake.github.io/ for the mildest such instance.) Because the consequences are dire, we feel it’s better to err on the side of caution, with something mutli-pass like Aegis, which consists of a lexical similarity check, a semantic similarity check, and a final pass through an ML model.
We'd love for you to check it out—see if you can prompt inject it!, and give any suggestions/thoughts on how we could improve it: https://github.com/automorphic-ai/aegis.
If you're interested in or need help using Aegis, have ideas, or want to contribute, join our Discord (https://discord.com/invite/E8y4NcNeBe), or feel free to reach out at founders@automorphic.ai. Excited to hear your feedback!
How would we sanitize strings now? I know OpenAI has banned topics they seem to regex for, but that’s always going to miss something. Are we just screwed and should make sure chat bots just run in a proverbial sandbox and can’t do anything themselves?
[0] https://news.ycombinator.com/item?id=34717702