Sandini Aggarwal: We have many next steps. I definitely think that ChatGPT going viral has caused a lot of issues that we knew about to really flare up and become critical – things that we want to address as soon as possible. Like, we know the model is still very biased. And yes, ChatGPT is very good at rejecting bad requests, but it’s also pretty easy to write hints that make it not reject things we’d like to reject.
Liam Fedus: It’s been exciting to see the diverse and creative applications from users, but we always focus on areas that need improvement. We believe that through an iterative process in which we deploy, get feedback, and improve, we can produce the most consistent and effective technology. As our technology evolves, new challenges inevitably arise.
Sandini Aggarwal: In the weeks since launch, we’ve looked at some of the most gruesome examples people have found, the worst things people have seen in the wild. We sort of evaluated each of them and talked about how we could fix it.
Jan Leicke: Sometimes it goes viral on Twitter, but we have some people who are actually quietly reaching out.
Sandini Aggarwal: A lot of the things we found were jailbreaks, which is definitely a problem we need to fix. But because users have to try these convoluted methods to get the model to say something bad, it’s not something we completely missed, or something that really surprised us. However, this is something we are actively working on right now. When we find a jailbreak, we add it to our training and testing data. All the data we see is used in the future model.
Jan Leicke: Whenever we have a better model, we want to release it and test it. We’re very optimistic that some targeted competition training can go a long way in improving the jailbreak situation. It’s not clear if these issues will go away completely, but we think we can make jailbreaking a lot more difficult. Again, it’s not like we didn’t know jailbreaking was possible before release. I think it is very difficult to really predict what the actual security issues will be with these systems once they are deployed. So we put a lot of emphasis on monitoring what people are using the system for, seeing what happens and then reacting to it. That doesn’t mean we shouldn’t proactively address security issues when we expect them. But yes, it is very difficult to predict everything that will actually happen when the system hits the real world.
In January, Microsoft unveiled Bing Chat, a search chatbot that many believe is a version of OpenAI’s unannounced GPT-4. (OpenAI says, “Bing is powered by one of our next-generation models that Microsoft has tuned specifically for search. It includes improvements to ChatGPT and GPT-3.5.”) The use of chatbots by multibillion-dollar tech giants to protect themselves creates new challenges for those tasked with creating basic models.