Authors:
(1) PIOTR MIROWSKI and KORY W. MATHEWSON, DeepMind, United Kingdom and Both authors contributed equally to this research;
(2) JAYLEN PITTMAN, Stanford University, USA and Work done while at DeepMind;
(3) RICHARD EVANS, DeepMind, United Kingdom.
Table of Links
Storytelling, The Shape of Stories, and Log Lines
The Use of Large Language Models for Creative Text Generation
Evaluating Text Generated by Large Language Models
Conclusions, Acknowledgements, and References
A. RELATED WORK ON AUTOMATED STORY GENERATION AND CONTROLLABLE STORY GENERATION
B. ADDITIONAL DISCUSSION FROM PLAYS BY BOTS CREATIVE TEAM
C. DETAILS OF QUANTITATIVE OBSERVATIONS
E. FULL PROMPT PREFIXES FOR DRAMATRON
F. RAW OUTPUT GENERATED BY DRAMATRON
Abstract
Language models are increasingly attracting interest from writers. However, such models lack long-range semantic coherence, limiting their usefulness for longform creative writing. We address this limitation by applying language models hierarchically, in a system we call Dramatron. By building structural context via prompt chaining, Dramatron can generate coherent scripts and screenplays complete with title, characters, story beats, location descriptions, and dialogue. We illustrate Dramatron’s usefulness as an interactive co-creative system with a user study of 15 theatre and film industry professionals. Participants co-wrote theatre scripts and screenplays with Dramatron and engaged in open-ended interviews. We report critical reflections both from our interviewees and from independent reviewers who watched stagings of the works to illustrate how both Dramatron and hierarchical text generation could be useful for human-machine co-creativity. Finally, we discuss the suitability of Dramatron for co-creativity, ethical considerations—including plagiarism and bias—and participatory models for the design and deployment of such tools.
1 INTRODUCTION
Large language models (LLMs) become remarkable and useful in co-creative applications as their ability to generate text improves [12, 25, 55]. While their use is primarily limited to assisting in natural language processing tasks [28, 111], these models show particular promise for automatic story generation [3, 86] as an augmentative tool for human writers [105] and for live performance. Examples of such creative uses of LLMs include the generation of the script of short film Sunspring (2016) [74], It’s No Game (2017), Sollicitors (2020) [1] or The First Horror Movie Written Entirely By Bots (2021) [79], improvisational theatre alongside robots by company Improbotics (2016) [10, 65–67], collaborative script writing for theatre play AI [106], and THEaiTRE company’s [90–92, 96] AI: When a Robot Writes a Play (2021).
Models able to generate coherent stories could be useful for co-writing theatre scripts and screenplays. This is a difficult task for LLMs because the narrative of a script or screenplay must exhibit long-term coherence and reincorporation, and LLMs are limited in their ability to model long-range dependencies (e.g., to reincorporate information from many pages ago). This limitation stems from the context window of LLMs, which today is limited to at most 2048 tokens (i.e. about 1500 words) in state-of-the-art models [76, 84].
In this work, we present Dramatron, a system that uses LLMs to generate scripts and screenplays hierarchically through a method we call hierarchical story generation. Dramatron leverages the strengths of LLMs and combines well-designed prompts (see Appendix E) and prompt chaining [118] with structured generation for long range coherence across the entire script. This process results in greater story coherence than “flat” sequential text generation. Our method is, in spirit, similar to hierarchical neural story generation [37], but generates scripts that far surpass 1000 words. Hierarchical generation of stories can produce an entire script—sometimes tens of thousands of words—from a single user-provided summary of the central dramatic conflict, called the log line [103]. From the input log line, Dramatron can generate an entire script with a title, list of characters, a plot (i.e. a list of scene summaries with settings and beats), location descriptions, and dialogue (see Fig. 1). The user can intervene at any stage of the hierarchical generation. They can solicit alternative generations, edit and rewrite output text, or continue text generation. In this way, the human interactively co-writes the script. Our methods can be used with any LLMs that accept an input prompt and then predict which tokens come next.
To evaluate Dramatron’s usability and capabilities, instead of relying on online crowd-sourced annotation and evaluation from non-expert raters, we engaged 15-experts in two-hour long user study sessions to co-write a script alongside Dramatron. The experts playwrights and screenwriters from the theatre and film industry were paid a consulting fee for their engagement. They provided feedback on both the interactive co-authorship process, and artistic opinion and analysis of the outputs co-written with Dramatron. Our inclusive research methodology invited participation from experts during the creative design and development process: their feedback directly led to incremental improvements of the system. We provide a summary of the iterative tool refinement process that emerged from their feedback. A collection of scripts co-written with this process were produced and staged at the Edmonton International Fringe Theatre Festival in August 2022. Reflections from the creative team are presented, as are comments from reviewers, as these represent critical reflections on human-machine co-creativity. Our study design and data collection process was validated and approved by HuBREC (Human Behavioral Research Ethics Committee), which is a research ethics committee run within Deepmind which includes and is chaired by academics from outside the company. To the best of our knowledge, this work represents the largest expert user study conducted on co-creative authorship to date.
The paper is structured as follows. Section 2 provides background on storytelling and how log lines become full length scripts. Section 3 provides background on LLMs and their use in creative text generation and details on interaction with Dramatron. Section 4 provides details on the design of our human co-authorship study. Section 5 presents the major themes summarizing the qualitative interviews. Section 6 covers the quantitative results from the human user study. Section 7 explores the potential impact of these systems on the broader creative community. Finally, the Appendix includes related work on automated story generation (Appendix A), as well as detailed prompt sets (Appendix E), an example of a raw generated script (Appendix F) and four examples of edited scripts (Appendix G). Overall, this paper presents Dramatron and a pathway toward human-machine co-creativity that uplifts human writers and artists while leveraging novel artificial intelligence systems such as LLMs.
This paper is available on arxiv under CC 4.0 license.