We use a multi-tiered safety system to limit DALL·E 3’s ability to generate potentially harmful imagery, including violent, adult or hateful content. Safety checks run over user prompts and the resulting imagery before it is surfaced to users. We also worked with early users and expert red-teamers to identify and address gaps in coverage for our safety systems which emerged with new model capabilities. For example, the feedback helped us identify edge cases for graphic content generation, such as sexual imagery, and stress test the model’s ability to generate convincingly misleading images.
As part of the work done to prepare DALL·E 3 for deployment, we’ve also taken steps to limit the model’s likelihood of generating content in the style of living artists, images of public figures, and to improve demographic representation across generated images. To read more about the work done to prepare DALL·E 3 for wide deployment, see the DALL·E 3 system card.
User feedback will help make sure we continue to improve. ChatGPT users can share feedback with our research team by using the flag icon to inform us of unsafe outputs or outputs that don’t accurately reflect the prompt you gave to ChatGPT. Listening to a diverse and broad community of users and having real-world understanding is critical to developing and deploying AI responsibly and is core to our mission.
We’re researching and evaluating an initial version of a provenance classifier—a new internal tool that can help us identify whether or not an image was generated by DALL·E 3. In early internal evaluations, it is over 99% accurate at identifying whether an image was generated by DALL·E when the image has not been modified. It remains over 95% accurate when the image has been subject to common types of modifications, such as cropping, resizing, JPEG compression, or when text or cutouts from real images are superimposed onto small portions of the generated image. Despite these strong results on internal testing, the classifier can only tell us that an image was likely generated by DALL·E, and does not yet enable us to make definitive conclusions. This provenance classifier may become part of a range of techniques to help people understand if audio or visual content is AI-generated. It’s a challenge that will require collaboration across the AI value chain, including with the platforms that distribute content to users. We expect to learn a great deal about how this tool works and where it might be most useful, and to improve our approach over time.