image

Privacy in an AI Era: How Do We Protect Our Personal Information?

When I’m talking about the data supply chain, I’m talking about the ways that AI systems raise issues on the data input side and the data output side. On the input side I’m referring to the training data piece, which is where we worry about whether an individual’s personal information is being scraped from the internet and included in a system’s training data. In turn, the presence of our personal information in the training set potentially has an influence on the output side. For example, a generative AI system might have memorized my personally identifiable information and provide it as output. Or, a generative AI system could reveal something about me that is based on an inference from multiple data points that aren’t otherwise known or connected and are unrelated to any personally identifiable information in the training dataset. At present, we depend on the AI companies to remove personal information from their training data or to set guardrails that prevent personal information from coming out on the output side. And that’s not really an acceptable situation, because we are dependent on them choosing to do the right thing.

Article Link: https://hai.stanford.edu/news/privacy-ai-era-how-do-we-protect-our-personal-information