Generative AI Is Making Companies Even More Thirsty for Your Data
Zoom, the company that normalized attending business meetings in your pajama pants, was forced to unmute itself this week to reassure users that it would not use personal data to train artificial intelligence without their consent.
A keen-eyed Hacker News user last week noticed that an update to Zoom’s terms and conditions in March appeared to essentially give the company free rein to slurp up voice, video, and other data, and shovel it into machine learning systems.
The new terms stated that customers “consent to Zoom’s access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage of Service Generated Data” for purposes including “machine learning or artificial intelligence (including for training and tuning of algorithms and models).”
The discovery prompted critical news articles and angry posts across social media. Soon, Zoom backtracked. On Monday, Zoom’s chief product officer, Smita Hasham, wrote a blog post stating, “We will not use audio, video, or chat customer content to train our artificial intelligence models without your consent.” The company also updated its terms to say the same.
Those updates seem reassuring enough, but of course many Zoom users or admins for business accounts might click “OK” to the terms without fully realizing what they’re handing over. And employees required to use Zoom may be unaware of the choice their employer has made. One lawyer notes that the terms still permit Zoom to collect a lot of data without consent. (Zoom did not respond to a request for comment.)
The kerfuffle shows the lack of meaningful data protections at a time when the generative AI boom has made the tech industry even more hungry for data than it already was. Companies have come to view generative AI as a kind of monster that must be fed at all costs—even if it isn’t always clear what exactly that data is needed for or what those future AI systems might end up doing.
The ascent of AI image generators like DALL-E 2 and Midjourny, followed by ChatGPT and other clever-yet-flawed chatbots, was made possible thanks to huge amounts of training data—much of it copyrighted—that was scraped from the web. And all manner of companies are currently looking to use the data they own, or that is generated by their customers and users, to build generative AI tools.
Zoom is already on the generative bandwagon. In June, the company introduced two text-generation features for summarizing meetings and composing emails about them. Zoom could conceivably use data from its users’ video meetings to develop more sophisticated algorithms. These might summarize or analyze individuals’ behavior in meetings, or perhaps even render a virtual likeness for someone whose connection temporarily dropped or hasn’t had time to shower.
The problem with Zoom’s effort to grab more data is that it reflects the broad state of affairs when it comes to our personal data. Many tech companies already profit from our information, and many of them like Zoom are now on the hunt for ways to source more data for generative AI projects. And yet it is up to us, the users, to try to police what they are doing.
“Companies have an extreme desire to collect as much data as they can,” says Janet Haven, executive director of the think tank Data and Society. “This is the business model—to collect data and build products around that data, or to sell that data to data brokers.”
The US lacks a federal privacy law, leaving consumers more exposed to the pangs of ChatGPT-inspired data hunger than people in the EU. Proposed legislation, such as the American Data Privacy and Protection Act, offers some hope of providing tighter federal rules on data collection and use, and the Biden administration’s AI Bill of Rights also calls for data protection by default. But for now, public pushback like that in response to Zoom’s moves is the most effective way to curb companies’ data appetites. Unfortunately, this isn’t a reliable mechanism for catching every questionable decision by companies trying to compete in AI.
In an age when the most exciting and widely praised new technologies are built atop mountains of data collected from consumers, often in ethically questionable ways, it seems that new protections can’t come soon enough. “Every single person is supposed to take steps to protect themselves,” Havens says. “That is antithetical to the idea that this is a societal problem.”