Sunday, October 6, 2024
Uncategorized

Think AI Is Bad for Authors? The Worst Is Yet to Come

It has been just over a year since the launch of ChatGPT and the subsequent widespread awakening of the power and perils of AI, particularly among authors and publishers. What previously felt like science fiction—an autonomous, generative-AI engine that could author unique and credible written output based on simple prompts—was suddenly not only real, but ubiquitous.

(Do AI Dream of Electric Deeps?)

Along with journalists grappling with AI-generated news stories, teachers trying to weed out essays their students didn’t write, and technologists warning of the existential threat to humanity of unregulated AI, the publishing industry has abruptly needed to understand the implications of this transformative technology. As both a full-time author and a 20-year veteran of Silicon Valley, including as a founder and executive at multiple AI-based software startups and investment funds, I have deep expertise and a unique vantage point from both sides on this subject.

My observation: Authors are absolutely right to worry about the negative impacts of AI, but it’s not for the reasons we think. And the worst implications may be yet to come.

The Most Prevalent Concerns of AI for Writers

To-date, the most prevalent concerns among authors about AI primarily fall into three main areas:

AI model training—Rightfully so, many authors worry that their writing, conceived through their creativity, skill, and hard work, is being systematically fed into large language models as the basis for training generative AI algorithms without their consent.Lack of remuneration—Even more concerning is the prospect that tech companies will profit from the exploitation of proprietary literary work without any financial consideration paid to the author.Production of AI-generated books—As generative AI models improve at a seemingly exponential rate, the prospect of an influx of AI-generated books could curtail the livelihood of professional writers, who already compete with the roughly two million books published by human beings each year.

While these concerns are absolutely valid and the publishing industry is taking appropriate measures to protect itself, I believe the biggest threat to authors from AI is still looming on the horizon. Let me explain.

In terms of AI model training, this concern reached a fever pitch in August when The Atlantic ran a series of revelatory articles by freelance writer Alex Reisner that disclosed which specific published books were being used in a dataset called Books3 to train AI engines like ChatGPT. Reisner even cleverly devised a tool for looking up specific authors and titles to determine if they are in the dataset.

In addition to the social media firestorm, thousands of authors (including myself) signed this open letter to the leaders of AI companies demanding permission and compensation for the use of copyrighted work. The urgency escalated in the fall with several class action lawsuits brought by a long list of bestselling authors as well as the Author’s Guild itself against OpenAI, the company behind ChatGPT. Further, publishers, including my own, have hastened to include legal disclaimers in their books explicitly prohibiting AI training without permission.

While these legal cases will help clarify a gray area in copyright law (essentially, whether scraping copyrighted work constitutes “fair use”), I’m not sure it’s the remedy authors are hoping for—which, let’s face it, for many authors amounts to a hope that the entire technology crawls back into the hell hole from which it came. What we as authors tend to imagine is an AI engine ingesting our writing and then spitting out something that infringes on our work. Though there have been some egregious examples of copyright infringement, such as the AI-generation of a new book in George R.R. Martin’s Game of Thrones series, existing copyright laws clearly prohibit such practices. It will never be permissible for AI to overtly mimic or outright plagiarize the work of established authors, just as it would be if such work were created by a human. Generative AI companies don’t want their platforms misused for such obviously illegal activity either. So, while it is appropriate for adversely impacted authors to defend themselves, I don’t see copyright infringement being an issue for the vast majority of authors.

The mere presence of copyrighted work in large language models is a more nuanced concern. As alarming as it may be if you’re, say, Michael Connelly, to discover 45 of your books in the Books3 training set, it’s still a minuscule drop in the proverbial ocean. The idea of artificial intelligence is that it mimics human intelligence, but at machine scale. Just like a child learning to read and write, the more the AI engine consumes, the better it understands. Generative AI companies aren’t hand-picking well-written books for training purposes. They want to train their engines on all the content ever generated by all humanity, that just happens to include all the best authors in the world, ever. From the vantage point of AI companies, the answer to “should this content go into the training data” is always yes, no matter what the content is. And, frankly, at some level, I want any generative AI tool in widespread use to have read Michael Connelly, as well as Michael Crichton, Michael Chabon, and any other Michael authors.

Check out Mike Trigg’s Burner here:

Bookshop | Amazon

(WD uses affiliate links)

Given the sheer scale of these large language models, the degree to which AI-generated content has been influenced by any one book or author is impossible to determine. Even the programmers of these AI engines cannot say the extent to which the output of a deep learning neural network is influenced by the inclusion of any particular piece of training data—that is the promise of software that operates in the same mysterious ways as the human brain. Every human author produces content that is subtly influenced by everything they’ve ever read, especially the authors they admire. It would be impossible for me to quantify how much these and other authors have influenced my own writing style, even though they indisputably have.

Which brings us to the question of compensation. If a specific piece of copyrighted material is only a tiny fraction of the input, and that material has an indeterminable impact on the output, how would one calculate an amount due to the author or publisher of that work? Again, I intuitively get the argument that it seems like there should be some compensation, but practically speaking it’s nearly impossible to devise a mathematic formula for how those royalty checks could be determined. Furthermore, given the scale, it’s hard to imagine those checks amounting to anything more than a cup of coffee, even for the biggest bestselling authors.

Finally, the concern that AI-generated content could marginalize human writers is also valid. As Mary Rasenberger, CEO of Authors Guild, summarized this concern when she said, “Creators are feeling an existential threat to their profession, so there’s a feeling of urgency.” Again, intuitively, I understand the worry that “AI could take over my job”—setting aside the fact that this exact worry of one’s job being automated has been prevalent among low-skilled, manual workers for decades. Now that same prospect exists for white-collar, well-educated, creative occupations as well. As authors, we are far from alone—AI is profoundly changing every job from computer programmer to investment banker. While I support the disclosure of AI-generated content, I don’t believe human authors are going to become extinct. Instead, I see AI as a productivity tool in the vein of offset printing, word processing, or digital publishing. Though some jobs and tasks may be rendered obsolete, new skills will emerge, writers will focus their energy on more high-value, creative work, and readers will continue to value human-generated, original content.

The Real AI Threat for Writers: Discovery Bias

Which brings us to a largely unforeseen threat of AI that has received little attention. As valid as the concerns listed above are, the biggest impact of AI on authors in the long run will have less to do with how content is generated than how it is discovered. To understand this threat, it’s informative to step back and consider why generative AI platforms are being created in the first place. Hint: It’s not so fans can auto-generate the next Colleen Hoover novel. The reason Microsoft has plunked down a $13 billion investment into OpenAI, and Alphabet (Google), Meta (Facebook & Instagram), and most other tech giants see OpenAI as an existential threat is the age-old question of how content is discovered. Technology markets tend to be a zero-sum game. And the current battle, arguably the only battle, being waged between these tech titans is how 7.9 billion people find things on the internet.

Since the advent of the printing press, control over what content we consume has been a source of power and wealth. For centuries, the arbiters of content discovery were primarily religious leaders. Then in the 18th and 19th centuries with the rise of newspapers and modern printing techniques, publishers became the gatekeepers for curating the best content. The emergence of radio and television in the 20th century gave us new means of discovering content, and for advertisers to find customers. With the arrival of the internet and its promise of all the world’s content now only a click away, each subsequent generation of tech companies has aspired to dominate content discovery. AOL and Yahoo tried to organize content into categories. Then Yahoo, and much more successfully Google, made content discoverable through a search engine—and turned it into a $1.8-trillion company. Later, Facebook made content discoverable through news feeds from our friends—becoming itself a $1-trillion company. The common thread through all these content discovery mechanisms? Advertising.

AI will fundamentally change how we discover content. And, therein, lies the biggest threat to authors. In a future of AI-curated content, whose content do you think will be discoverable? Short answer: Whoever pays for that privilege. AI is an advertiser’s dream. Rather than placing ads adjacent to Google search results or embedded in an Instagram feed, AI can just tell the user what to read, what to buy, what to do, without the pesky inconvenience of autonomous thought. This prospect of Discovery Bias will further concentrate the publishing industry into fewer and fewer bestselling authors—the ones with the name recognition, publicity teams, and promotional budgets to generate a self-perpetuating consumption loop. Readers, unwittingly subjugated to AI decision-making, will become more compartmentalized, discovering only authors who are “similar to author X,” as storylines, characters, and cover art become even more copycat. This Discovery Bias will happen not only in book sales but in every facet of the modern economy.


With a growing catalog of instructional writing videos available instantly, we have writing instruction on everything from improving your craft to getting published and finding an audience. New videos are added every month!

So far, OpenAI has claimed to shun advertising, but we’ve seen this movie before. What I see is a tried-and-true tech industry strategy: Offer a free service to amass a large audience, leverage users’ actions to train algorithms and develop a proprietary asset, then harness that audience and asset by selling it to advertisers. Every time you perform a search on Google, its algorithms learn based on what you click. Every time you post attributes on your profile or like something on Facebook, its algorithms learn what to feed you next and those attributes are sold out the back door as targeting for advertisers. OpenAI is doing the exact same thing with ChatGPT: building a user base and training its engine with human feedback. It’s only a matter of time before they sell that to advertisers. I can assure you, OpenAI is not performing a public service, conducting a science experiment, or operating a nonprofit. They are a for-profit corporation. Their recent public melodrama eradicated any lingering doubt about that. And advertising is their path to becoming a multi-trillion-dollar company.

Unfortunately for authors, nothing is going to put the generative AI genie back in the bottle. Like it or not, we are entering the age of artificial intelligence, and we will all need to adapt. Authors and the publishing industry are wise to be skeptical, informed, and vigilant to protect against the negative consequences of this technology. But we also shouldn’t be luddites pretending generative AI doesn’t, or shouldn’t, exist or failing to utilize it for our own benefit. The inherent limitation of generative AI—that it is, almost by definition, reductive and derivative in its outputs—is the opportunity we human authors should embrace: crafting stories that are original, emotional, and compelling. Stories that understand and examine the human condition, elicit empathy, and evoke everything from fear to love. That is still something that no technology can replicate.

3 thoughts on “Think AI Is Bad for Authors? The Worst Is Yet to Come

  • I like the helpful info you provide in your articles.
    I’ll bookmark your blog and check again here
    regularly. I’m quite sure I will learn lots of new stuff right here!
    Best of luck for the next!

  • I do not know if it’s just me or if everybody else experiencing issues with your website.
    It looks like some of the written text on your content are running off the
    screen. Can someone else please comment and let me know
    if this is happening to them as well? This could be a
    issue with my internet browser because I’ve had this
    happen before. Many thanks

Comments are closed.