The difference between this approach and its predecessors is that DeepMind hopes to use “long-term dialogue for safety,” says Jeffrey Irving, a safety researcher at DeepMind.
“That means we don’t expect the problems we have with these models – whether it’s misinformation or stereotypes or anything else – that are obvious at first glance, and we want to talk about them in detail. That means between machines and humans as well.
Sarah Hooker, who leads Cohere for AI, a nonprofit AI research lab, says DeepMind’s idea of using human preferences to improve how an AI model learns is not new.
“But the improvements are compelling and show clear benefits for human-directed optimization of dialogue agents in setting up a model in a big language,” Hooker says.
Dwe Kela, a researcher at AI startup Hugging Face, says Sparrow is a “nice next step that follows a general trend in AI as we try more seriously to improve the security aspects of large language model deployments.”
But there is a lot of work to be done before these conversational AI models can be deployed in the wild.
The sparrow still made mistakes. The model sometimes gets off topic or makes random answers. The determined participants were also able to make the model break rules 8% of the time. (This is still an improvement over older models: previous DeepMind models broke the rules three times more than Sparrow.)
“For areas where human harm can be high if the agent responds, such as providing medical and financial advice, this may still feel to many as an unacceptably high failure rate,” Hooker says. The work is also built around an English language model, “while we live in a world where technology has to safely and responsibly serve many different languages,” she adds.
Kela points out another problem: “Relying on Google for information leads to unknown biases that are difficult to detect, since everything is closed source.”