top of page
  • Writer's pictureAndrew Alam-Nist

Artistic License or Theft? – Generative AI and Intellectual Property

Updated: Jul 3, 2023






Consider the three paintings above: all three depict me, but they differ radically in their use of color, stroke, composition, and style. The first is a relatively traditional oil painting, depicting a certain Andrew Alam-Nist on a summer day in a white shirt with an expressive impressionist background; the second resembles pop art, juxtaposing its orthodox framing of my head with vibrant purple hair and an abstract purple, green, and orange backdrop; the final one eschews the medium of paint entirely – it is digital art that plays around with simplifying the general color schemes of my face and body, and geometricizing their setting. From impressionism to iPad art, each painting is an exemplar of a different movement.


What do all three works have in common (other than depicting yours truly)?They aren’t actually paintings. All three are generated using Lensa, an iOS and Android app that, for the price of 2 or 3 dollars, lets you choose from a list of themes, and then generate an array of different stylised images. While each image the app produces resembles the work of a professional artist, they are not made by any skilled artisan. They are instead generated by a nameless processor in an unspecified Amazon Web Services data center.


Lensa is only one of a number of services that has sprung up to offer art at a bargain rate since the AI explosion at the end of 2022. With 1.5 million active users, OpenAI’s DALL-E 2 allows the user to produce an image corresponding to any prompt the user inputs for the meager price of one credit (you are given 10 credits for free each month and can top up by paying $15 for 115 more); Stability AI’s stable diffusion, starting at $9, allows a user to fine-tune image generation with greater precision; Photoshop’s new ‘AI Generate’ tool, included in its regular subscription, allows designers to frictionlessly add features to an image that never existed in the first place. These services, while powerful and immensely useful, raise a number of concerns. One of these is particularly pressing for the art industry – the intellectual property problem.


Even at its highest levels, generative AI is shaking up the artistic landscape. The Museum of Modern Art in New York City recently hosted a generative AI exhibition which displayed artwork produced by generative AI that was trained using the museum’s own collection. Maristhuis in the Hague hung a AI-produced variant of Vermeer’s Girl With A Purple Earring while the original was away on loan. Every day, AI-generated art becomes more used, more recognized, and more respected. With its low cost, ease of use, and increasing artistic merit, the risk of AI making conventional artists redundant or, at very least, assuming overwhelming market dominance, is serious.


***


Unlike many other new technologies, whose disruption of existing market structures is something some economist will (probably rightly) rationalize as a necessary side effect of the ‘creative destruction’ inherent in innovation, it seems likely that generative AI does or should have some kind of legal responsibility to the artists it threatens. The reason for this lies in how generative AI develops its models. Programmers train generative models by feeding the AI colossal amounts of data – in the case of an app like Lensa or Dall-E, a huge number of images. DALL-E, for instance, is estimated to be trained using 400 million labeled images, including photographs, paintings, and other pieces of artwork. There is very little oversight of what these datasets actually consist of, or whether the material in them is copyrighted. The upshot of this is that software companies’ right to use them to train their models is legally nebulous. In the early days of Lensa, the program used to develop the images at the top of this article, the model would often even display a chaotic squiggle on the bottom right corner of its images, a relic of the signatures artists would write on their art. Since then, the company has made efforts to remove this artifact from its images – we can only guess why.


Due to the recentness of generative AI, a number of questions about its legal status remain unanswered. Do software companies need to acquire the copyright to each image they use to train their datasets? Does using copyrighted images in models qualify as fair use? Should software companies recompense anyone whose copyrighted images are used without their consent? As of now, there is no established regulation or legal precedent for any of these issues. For want of a better description, we are in the generative AI Wild West. However, several lawsuits that are currently underway promise to help illuminate where the law stands.


***


Stability AI, a startup promising to generate images from text descriptions, is currently facing several lawsuits over its alleged intellectual property infringement. The London-based firm, which openly discloses the origin of its dataset, is being sued by Getty Images, a stock images provider, for misusing over 12 million images owned by Getty. Getty characterizes Stability AI’s actions as “brazen misuse” of its intellectual property. Stability AI, along with two other companies, is also facing a class action lawsuit in San Francisco from a large number of artists who claim Stability AI’s models have violated their copyright to the art they create. The facts of these cases are, for the most part, clear cut. In both instances, it is clear the models do, at least to some extent, actively use images for which the plaintiffs – whether Getty or the individual artists – have copyrights. However, the legal implications of these facts are far less certain.


In the U.S., the benchmark that will likely be used to determine whether generative AI violates intellectual property laws is whether its usage of copyrighted images to train models falls under the “fair use” of intellectual property – in effect, whether the output of the AI model is sufficiently transformative that it qualifies as its own work.


At first glance, this benchmark favors the defendants. A case that will probably have precedential value is the Google Books case. This case revolves around Google Books, a service from Google that scans large numbers of books, and stores them in a database which subsequently provides snippets of the books to users of Google’s search engine. Despite many books in Google’s database being copyrighted, the U.S. Court of Appeals for the Second Circuit upheld Google’s business practices. Its reasoning was as follows: “The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use.” This decision is important because it suggests, at least in U.S. law, that the process of copying – employed by all generative AI – falls under fair use when the output product is substantively different. Stability AI will likely use this conclusion to argue that, just as snippets are substantively different from a whole book, AI art is substantively different from true human art and thus falls under fair use. Can AI really be a substitute for that human touch in authentic artwork?


Maybe not. But also, maybe it can. Even in this lawsuit, the fault lines in the defendants’ case quickly become apparent. The Second Circuit found it essential that “the revelations do not constitute a significant market substitute for the protected rights of the individuals.” Piggybacking on this, Getty will almost certainly argue that, by providing cheap, easily accessible images that resemble the originals, the images that are generative AI do provide a drastic market substitute to the laborious, expensive process of making art by hand. In the end, it will be up to the courts to decide whose arguments are more credible.


***


Regardless of the outcome of the lawsuits, they are unlikely to alone be sufficient to answer generative AI’s intellectual property problems. As with any legal case dealing with a new technology, the cases are trying to apply laws which were not written to deal with the issues they describe. There seems to be a good case that, even if generative AI does fall under fair use according to existing legislation, the authors of the original artwork still deserve some form of pecuniary compensation. To me, there seems to be something morally objectionable about large software companies using the fruits of ordinary artists’ labor to make them obsolete. If legislators agree, they could introduce laws that address precisely this – requiring the owners of copyrighted material to be compensated for their art, or giving them the right to withdraw their art from training sets.


Regardless of whether you buy this ethical argument, there are further intellectual property issues legislators should look to. There are a number of problems that current U.S. and E.U. intellectual property frameworks are not even calibrated to detect. Large open-source databases, such as Common Crawl, a 501(c)(3) non-profit organization that scrapes the webs of the internet to then freely distribute their datasets to the public, are generally allowed to include copyrighted images in their databases. This is not itself objectionable, as such companies have no intention of making profit from their databases. Serving the common good, they are the favorite children of American fair use policies. However, less altruistic siblings such as Stability AI or OpenAI often subsequently use such databases for activities they do intend to make profit from. They can do so because Common Crawl makes their databases open source, giving anyone a right to use them. The upshot of this is, through the middleman of non-profit organizations, many companies can circumvent copyright laws. This likely needs to be addressed.


However, to do so would not be easy. A lot of the discussion of what goes into the databases to train large generative models, such as the use of middleman companies, is purely speculative. This is because most generative AI models are notoriously opaque. The only reason Stability AI has had to face multiple lawsuits is that they openly publish the sources of their training datasets. Most companies are not so generous. Since it is not publicly known what data most firms train on, there is little possibility of other firms or individuals knowing if their copyrighted work has been used, and launching lawsuits accordingly. Unlike the traditional focus of copyright, it is essentially impossible to trace an image produced by generative AI back to any particular influence. If I were to print T-shirts with an anthropomorphised walking mouse in red shorts, it would be pretty easy to tell I was copying a certain Disney Property rhyming with ‘Picky Louse’. Bearing only miniscule similarities to any original, it is much harder with the images at the top of this article. This means that, without open tracking and disclosure of models’ datasets, it is impossible to enforce copyright laws..


***


There is thus a spate of issues which the U.S., E.U., and others should address with future legislation. At the moment, none of these issues are a direct focus of the world’s most important legislatures. Neither the draft AI act in Europe, nor resolutions introduced in either the United States House of Representatives or Senate, contain any stipulations specifically referring to the intellectual property issues raised by AI. This is something that lawmakers should change.


Intellectual property is not a problem that, in the world of AI, is unique to image-generating large language models. Similar concerns exist for other large language models, such as ChatGPT, a program that generates dynamic text which often is based on copyrighted books or journal entries. Likewise, intellectual property is far from the only concern raised by large image-generating models. Image generating models could be misused for misinformation, cybercrime, pornography, and noxious actions we are not even aware of yet. Good, bad, and sometimes ugly, large data models may turn out to be one of the most impactful technological advances of the early 21st Century. Our legal and legislative systems must now determine what society gives back to the artists, photographers, and countless others who are paving the road to their greatness.




Comments


bottom of page