Official Summary Text
AB 412, as amended, Bauer-Kahan.
Generative artificial intelligence: training data: copyrighted materials.
Existing federal law, through copyright, provides authors of original works of authorship, as defined, with certain rights and protections. Existing federal law generally gives the owner of the copyright the right to reproduce the work in copies or phonorecords and the right to distribute copies or phonorecords of the work to the public. Existing federal law provides that sound recordings fixed before February 15, 1972, are not subject to copyright, but are subject to similar rights and protections under the Classics Protection and Access Act.
Existing law requires, before each time that a generative artificial intelligence system or service, as defined, or a substantial modification to a generative artificial intelligence system or service, released on or after January 1, 2022, is made available to Californians for use, regardless of whether
the terms of that use include compensation, a developer of the system or service to post on the developer’s internet website documentation, as specified, regarding the data used to train the generative artificial intelligence system or service.
This bill would require a developer of a generative artificial intelligence model
to, among other things, document any covered materials that the developer knows were used by the developer to train the model. The bill would require the developer
to make available
on its internet website
a mechanism
on the developer’s internet website
allowing a rights owner to
submit a
request
for
information about the developer’s use of
the rights owner’s
covered materials that would allow the rights owner to provide the developer with, among other things, registration, preregistration, or index numbers
and fingerprints
for one or more covered materials.
The bill would require a developer to document and retain any requests received from rights owners for a specified time period.
The bill would, subject to specified exceptions, require a developer to, within 30 days of receiving that request from the rights owner, assess whether the
covered material represented by a fingerprint provided by the rights owner is likely to be present in the developer’s dataset
developer used the rights owner’s covered materials to develop the model
and provide the rights owner with a list of
their
covered
materials that were used to train the model and are likely to be present in the developer’s dataset,
materials,
as specified. The bill would provide that each day following the 30-day period that a developer fails to provide a rights owner with that information constitutes a discrete violation. The bill would authorize a rights owner who complies with specified requirements for submitting a request that is not provided with information according to these provisions to bring a civil action against the developer for specified relief. The bill would provide that its
requirements do not apply to a model that meets certain criteria, including, among other things, being trained exclusively using data the developer makes publicly available at no cost to users. The bill would provide that it does not impose liability on a telecommunications service, information service, or cable service provider, as specified. The bill would define various terms for these purposes.