This article explores the notion of human and computational creativity as well as core challenges for computational musical creativity. It also examines the philosophical dilemma of computational creativity as being suspended between algorithmic determinism and random sampling, and suggests a resolution from a perspective that conceives of “creativity” as an essentially functional concept dependent on a problem space, a frame of reference (e.g. a standard strategy, a gatekeeper, another mind, or a community), and relevance. Second, this article proposes four challenges for artificial musical creativity and musical AI: (1) the cognitive challenge that musical creativity requires a model of music cognition, (2) the challenge of the external world, that many cases of musical creativity require references to the external world, (3) the embodiment challenge, that many cases of musical creativity require a model of the human body, the instrument(s) and the performative setting in various ways, (4) the challenge of creativity at the meta-level, that musical creativity across the board requires creativity at the meta-level. Based on these challenges it is argued that the general capacity of music and its creation fundamentally involves general (artificial) intelligence and that therefore musical creativity at large is fundamentally an AI-complete problem.
Computational creativity and creative AI are amongst the areas of computer science that attract the most attention and interest. We are celebrating success and breakthroughs, from performances of artificially composed Beethoven-style string quartets, Flow Machines and impressive fully computationally generated pop songs (or pop music excerpts) like Daddy’s car, to AI-written screenplays like Sunspring.1 While research is continuously advancing, it is useful to step back and critically reconsider the achievements and some of the conceptual foundations underlying human, natural and artificial creativity, as well as the challenges involved in musical creativity. This philosophical article provides some reflection and critical discussion on this topic from a neutral stance for researchers and artists to reflect upon and position themselves towards. The text has two distinct parts: (1) an analysis of the notion of “creativity” and of two classical philosophical dilemmas that come with it (Section 2), and (2) a discussion of four challenges (Section 3) suggesting the AI-completeness of the problem of general musical creativity.2
Recent years witnessed a major growth of research on human and computational musical creativity in computer science, MIR and music psychology (Iñesta et al., 2016; Sturm et al., 2019a; Carnovalini and Rodà, 2020; Fernández and Vico, 2013; Schiavio and Benedek, 2020; Miranda, 2021). There are many different problem settings involved. One of the most central tasks is style replication at audio or symbolic level, ranging from certain genres or personal styles (such as Bach’s chorale style, or string quartets) to common-practice, ethnic, jazz or pop styles in general. Other problem settings include computational expressive performance (Widmer and Goebl, 2004; Kirke and Miranda, 2009), computer-assisted composition (Papadopoulos et al., 2016; Huang et al., 2019), human-AI musical interaction (Holland et al., 2019), up to general musical creativity or even music as a part of the domain-general creativity of an AI (Wiggins, 2018).
Taking one step back, what do we mean by “creativity”, and how do we relate novelty, innovation or transformation with the concept? For instance, are models “creative” that generate jazz lead sheets, chorales in Bach’s style, Indian tabla, or Balinese Gamelan? Is style replication “creative”? Human or computational style replication creates novel instances, but the style and its constraints existed beforehand, even if its boundaries may be soft. Is an algorithm that constructs a set of some hundred stimuli for a perception experiment “creative”? Success in the replication of a closed domain (even though this notion of a self-contained, “closed” domain is problematic in reality—see Section 3), may often produce detailed insights into the properties and rules of the system as well as the engineering of its underlying mechanics (e.g. Kippen, 1992; Ebcioglu, 1992; Allan and Williams, 2005) rather than genuine “creativity”.
What we call “creative” depends on the perspective. First, it is useful to distinguish between music generation and creative models, in which the former aims to generate instances within a given, predefined setting or style, and the latter focuses on the modeling of the phenomenon of “creativity” itself. Within the latter, attempts to explicate “creativity” commonly require properties of the outcome to go beyond mere generation and replication, such as novelty, originality, discovery, something unexpected, sometimes termed capital “C” Creativity (e.g. Cohen, 1999). To some extent, such ideas and the focus placed on them have their origin in the fairly recent, romantic aesthetic notion of the “genius” and the “creative spark”.3 Originality or novelty, however, are not sufficient by themselves. The so-called “standard definition of creativity” (Runco and Jaeger, 2012) involves two major aspects: originality and effectiveness (in terms of the usefulness of the invention), often also termed novelty and value (Stein, 1953; Barron, 1955; Sarkar and Chakrabarti, 2011; Carnovalini and Rodà, 2020). Orthogonally, creativity may also be characterized as a continuum scaling from small, single micro-choices up to large-scale invention.4
Yet these accounts of creativity are still very general. When it comes to the details—in particular, regarding constructive accounts and the generation of originality—attempts to characterize creativity quickly appear problematic. Although there have been attempts to define creativity, and in particular, characterize kinds of creative acts, processes or results (Rhodes, 1961; Ritchie, 2001; Boden, 2004; Colton et al., 2011; Schiavio and Benedek, 2020; Carnovalini and Rodà, 2020), it is also a commonplace in the discourse on creativity that its heart defies attempts of explicit definition (e.g. de Sousa, 2008). In a similar vein, Cope (2005) and Jordanous (2012) discuss the circularity in encyclopedic definitions of creativity. For such reasons, many authors avoid an explicit definition (Colton and Wiggins, 2012; Wiggins et al., 2015). While it may seem intuitive that creativity is too hard to define, this insight bears two substantial philosophical dilemmas concerning computational creativity.
Science is what we understand well enough to explain to a computer. Art is everything else we do. (Donald Knuth)
You insist that there is something a machine cannot do. If you tell me precisely what it is a machine cannot do, then I can always make a machine which will do just that. (John von Neumann)
There are two classical philosophical problems concerning formal accounts of creative models and the way their results are generated. According to the first, any formal account of a creative model will be limited to its algorithmic definition and thus establishes a contradiction in terms with the flexible and free originality required by the notion of creativity.
By analogy, a formal account of creativity may thus appear as self-contradictory as an explicit algorithm predetermining all the single choices that supposedly establish agency with free will.5 This philosophical concern about algorithmic creative models has been raised throughout the history of artificial intelligence. An early, similar formulation, stating that an algorithm does not truly “originate anything; it can do whatever we know how to order it to perform”, goes back as early as to Ada Lovelace (Lovelace, 1842; Bringsjord et al., 2001). It is already quoted and discussed by Alan Turing (1950) along with the Turing test, as well as by other classical texts on computational creativity (e.g. Hofstadter (1979) and Hofstadter (1995), who also notes analogies to John Searle’s Chinese room argument against an artificial intelligence possessing linguistic understanding (Searle, 1980); both philosophical problems (for the second, see the next paragraph) are also brought up, but not resolved, in Marsden (2000) and Carnovalini and Rodà (2020, Section 2.2)).
Following from the first, the second philosophical problem relates to the distinction between algorithmic determinism and randomness in creative models. The notion of computation in the Church-Turing paradigm involves full computability and entirely deterministic behavior (Turing, 1950; Church, 1936). Within the functional paradigm, an algorithm : input → output entails a fully explicit definition of its functional composition (Milewski, 2018). Thus, an algorithm modeling creativity can only result in a fully deterministic, static account of creative production running into the contradiction outlined above. If the algorithm, on the other hand, involves non-determinism introduced through random numbers or random choices, in functional terms: : input × rand → output, the core root of “creativity” will essentially be founded in (or even reduced to) a source of random numbers. From a philosophical perspective, this leads to a dilemma as the way formal accounts obtain results is fixed by providing the algorithm, which seems to prevent originality. A deterministic algorithm produces merely a deterministic result and a non-deterministic algorithm grounds its decisions purely in randomness. Thus, this dilemma prompts the (apparent) conclusion that formal accounts cannot satisfactorily capture the notion of creativity by definition (particularly for followers of the romantic notion of the creative spark). If the source of creativity, however, is argued to lie in the richness of the model’s input, the creative aspect is outsourced externally hence (apparently) rendering the model not an intrinsically creative model any longer.
Both dilemmas seem to point to an intrinsic conceptual inexplicability at the very heart of the notion of creativity—as does the quote by Knuth above—, the freedom and flexibility of which seems to elude the very concept of formal or algorithmic definition. Furthermore, in the aftermath of each modeling success, the strategy and even the problem setting itself may be regarded as a task of engineering rather than an instance of creativity, successively shifting the boundary of the domain of creativity (the “moving target problem”)—as hinted at by the von Neumann quote above. Yet, despite the accumulation of practical solutions, a general, formal computational account of creativity still seems out of sight. It is hard to assess something that one cannot define, and this reflects down to the difficulties in evaluating the success of models of general creativity without resorting to the “oracle” of human evaluators.
This section and the following present a detailed analytical argument suggesting that both outlined dilemmas resolve as pseudo-problems when changing the perspective on the notion of creativity.
To begin, one way in which we conceive of creativity in nature itself is in evolutionary processes (Dawkins, 1990). They involve mutation, recombination (cross-over), drift, and, particularly, selection and adaptation to an external environment, which constitute an external source of unpredictable complexity (or uncertainty, from the perspective of the system6) and defines the feature landscape that the genetic process explores. Such an evolutionary or systemic perspective involves two sources of randomness: one within the system (mutation, etc.), and one external to the system (stemming from the complexity of the environment). The latter offers one first response to the dilemma by framing creativity as an outcome of the entire system, and partially by rooting creativity in the uncertain properties of the environment combined with the genetic operations. In this vein, there are many applications of evolutionary or genetic algorithms in musical creativity (Loughran and O’Neill, 2020), for instance, conceptual blending that recombines material to model originality (e.g. Fauconnier and Turner, 2002; Kaliakatsos-Papakostas and Queiroz, 2017; Cope, 2005). Most learning-based models draw much of their generated diversity from their training data.
Addressing the second dilemma, the evolutionary perspective points to an important insight, namely that it is less crucial for characterizing creativity to look at how an outcome was created, but more important to analyze what was created and how predictable or original it is for the observer or programmer. Even purely deterministic algorithms, such as fractal or chaotic processes, may be sufficiently complex to be unpredictable in practice and produce outcomes that are stunning to its programmers (see even historical programs as early as Winograd’s SHRDLU, Winograd, 1972). One reason why cases like deep dreaming (Mordvintsev et al., 2015), GPT-3 (Brown et al., 2020), or AlphaZero (Silver et al., 2018) are considered so creative is that their outcomes are highly surprising, unpredictable and interesting to the programmers or interacting humans. In fact, this characterizes the essence of the Lovelace test (Bringsjord et al., 2001), which attests creativity once the algorithm produces an output that cannot be explained by its inventor or observer. Effectively, this turns the previous dilemma of algorithmic creativity on its head, leading to a criterion of creativity. Also, this aspect points, more deeply, to the fact that creativity is a concept that is closely intertwined with a systemic or an external observer’s reference, as argued in the next section.
Generalizing this argument, the understanding of creativity is conceptually shifted towards solutions in a large and complex possibility space that are hard to find—an understanding that is also shared in the psychological discourse. Whether the solution is found, e.g., in a deterministic way, by an evolutionary algorithm, or by other sampling or Monte Carlo search methods, plays a secondary role.
Creativity is not absolute, but fundamentally relative to a frame of reference. Refining the standard definition above, this subsection argues that human and computational creativity can be understood as producing a solution in a complex possibility space defined by a certain problem setting (Boden, 2004), which (a) is difficult to find in comparison with a given reference (strategies, minds, context), (b) lies within the boundaries of the problem setting (for open problem domains), and (c) is of use or relevant. This characterization of creativity is systemic because the three conditions involve reference to the overarching system. Depending on the problem setting, all three conditions on the solution may be assessed formally, computationally or by human gatekeepers that are part of the system or the social setting (Csikszentmihalyi, 1991). Several analytical examples shall underpin this conception of creativity.
Although chess had been regarded as practically “solved” from an engineering perspective at the latest since the famous DeepBlue matches against Garry Kasparov (in 1996 & 1997), the success story of Deepmind’s AlphaZero made frontcover news in science and chess magazines in 2018 (Silver et al., 2018). Unlike previous engines, which required some forms of human knowledge, in particular, a database of chess openings or sample games as well as an expert coded evaluation function, AlphaZero acquired the entirety of its knowledge by unsupervised reinforcement learning from millions of games against itself. What amazed the chess world was the unprecedented “creativity” with which it played, such as finding unexpected positional resources or risky long-term, positionally motivated piece sacrifices (Zwanzger, 2018), which had been very hard to find for previous engines. Relating this example to the definition above, chess rules define a complex possibility space, and because the game of chess is a closed domain, every successful strategy lies entirely within this problem setting and is relevant (conditions (b) and (c)). The remarkable creativity of AlphaZero comes about because it locates regions in the search space that have been inaccessible for the reference strategies, i.e. previous engines and expert minds (condition (a)).
Generally, the creativity of an identified solution is relative to the capacities of a human or computational “mind” of reference and its strategies for exploring the search space—irrespective of how the solutions were computed, hence resolving the dilemmas above. Practically, even purely deterministic chess engines have been regarded as sources of creativity for human experts, while it was the opposite in the late 1980s (Marsland and Schaeffer, 1990). Conversely, research on the beauty of artificial chess puzzles involves formalizing expert intuition as well as reference strategies of human cognitive heuristics including their particular difficulties in traversing the search tree (Iqbal and Yaacob, 2006; Iqbal, 2006).
Although music is not a game like chess and has no simple reward signal like winning that facilitates reinforcement learning methods, certain problem settings in music composition are of a similar nature. Some musical rule-systems, such as counterpoint, harmonic syntax, voice-leading and free polyphony, define closed formal problem settings that span an enormous search space, which is hard to traverse and affords for rare, original solutions and strategies to be found. Analogously, the creativity of the solution may be assessed in comparison with human or computational reference strategies (Wiggins, 2006; Jordanous, 2012; Agres et al., 2016; Gifford et al., 2018). In contrast to chess, however, the third aspect of creativity, relevance, cannot merely be checked by the rules of the system but only by assessing whether it convinces a human mind (a gatekeeper) or a cognitive model (see the challenge raised in Section 3.1).
Further, unlike the closed domain of chess, breaking the system’s rules and out-of-the-box thinking may be part of a creative strategy in music (Meyer, 1996). Out-of-the-box thinking and rule breaking come in different varieties, in particular, overcoming main problem solving strategies as well as transforming the problem setting itself. Consider the puzzle in Figure 1. This small, well-known puzzle illustrates a problem with a solution that requires overcoming the initial search strategy and the possibility space that are in some way primed by the problem statement. Because there exists no solution where the lines remain inside the boundaries of the circumscribing square, a creative solution requires overcoming this constraint (there is even a creative solution involving only three lines!) in terms of transforming the primed possibility space (the puzzle statement never included a constraint with the boundaries of the circumscribing square). This case constitutes a simple example of transformational creativity (Boden, 2004). Similar problems exist in the domain of chess: specifically, there are puzzles, such as ones employing fortress positions, that even non-expert humans can solve and (classical) chess engines cannot because the solution does not appear within the horizon of their search tree. Known for a long time, such positions are mostly characterized by their requiring of dropping standard search strategies and identifying some kinds of logical invariants that characterize the evaluation of the position. What happens in these cases, crucially, is not that the games or the possibility spaces are transformed, but that the strategies change: in both cases, the solutions (drawing outside the box or using logical invariances in chess) are not ‘prohibited’ by the rules, but they lie outside common strategies or primed possibility spaces, yet inside the overall possibility space defined by the game or puzzle. Such kinds of solutions may often turn valuable for better understanding the complex possibility space.
Furthermore, there is creativity at the meta-level that transforms or innovates the problem setting or the rules themselves, and with it the possibility space (Boden, 2004). This is particularly relevant in open domains like art, music, or also science. An example of a very specific, yet open problem setting is the task to modify the rules of chess such that there are less indecisive (draw) results overall (Tomašev et al., 2020). Other creative challenges may be almost entirely unrestricted such as the classical divergent thinking task of coming up with as many uses of paperclips as possible (Guilford, 1967). In artistic domains, the problem setting itself is generally open and only vaguely defined (e.g. the composition of a “convincing” piece) and therefore, many particular problem settings are restrictive (e.g. generate a style-conformant jazz leadsheet). Creativity at the meta-level may innovate the selective problem instantiation (e.g. compose using quotations, compose a musical joke, compose combining different musical forms), often by including other factors that were outside the scope of simpler settings (or even meaningless within them). Such kinds of creativity at the meta-level are very common in music from the 19th century and beyond. Because of its open nature, creativity at the meta-level may require strong intelligence and is therefore particularly challenging for current-day computational engineering (thus motivating the challenge proposed in Section 3.4).
Because of the general openness of their problem settings and the relevance of creativity at the meta-level, domains like art or science require gatekeepers (experts, communities, markets) to assess and select for creative success and relevance, following Csikszentmihalyi’s argument (Csikszentmihalyi, 1991, 1996). As a simple example, the color drawing of a small child may be a sign of creativity for her age and for her parents. It may also reflect our innate human disposition for creativity tracing back to prehistoric cave drawings or music (Morley, 2013; Higham et al., 2012). Yet the same drawing is regarded differently in reference to human art history. Childlike drawings by Miró, however, reminiscent of childhood innocence, rather constitute a milestone in 20th-century art history in this appropriate frame of reference. Creativity is assessed differently for the child than for an art museum. Why the Miró painting is regarded as creative depends on the trajectory of art history and expert judgment on its intellectual underpinning and its fit to the trajectory. In the extreme case, the same object or performance may be regarded as an act of artistic creativity or worthless depending on the context and the gatekeepers (see, for instance, Marcel Duchamp’s Fountain, or La Monte Thornton Young’s, Compositions 1960). Because of their centering on creativity at the meta-level, artistic output in avant-garde art and music is the most complex to assess.7 Yet even for more self-contained problem settings like evaluating artificial jazz leadsheets, Bach chorales, Swedish Slängpolska, etc., borderline cases call upon the open, fluid boundaries of natural styles that require intelligent gatekeepers (experts, communities or markets) to judge whether they lie within and meaningfully extend the genre. Altogether, assessing creative outcomes in the light of open problem settings, previous approaches, artistic domain norms, or even social, historical context and discourse context requires substantial world knowledge and strong general intelligence, which is why human social systems resort to gatekeepers to assess (human or computational) creativity.
The argued relativity of creativity also resolves the moving target problem described in Section 2.1, which conflates domain creativity with creativity in computational modeling of a domain. Despite successful computer models, domains like chess, poker, Indian tabla, or four-part chorale composition remain interesting creative challenges for humans. This is because even though the involved algorithms are well understood, they are impossible to consistently and efficiently execute for humans, and also because no patterns or shortcuts are found that render the activity uninteresting, such as a simple analysis would show in the case of Tic-Tac-Toe. What computational engineering of creative domains may continue to reveal, however, is that many domains of human creativity that are hard for humans may be easy for machines and vice versa (see also Mitchell, 2021).
To conclude, this section argued for an account of creativity in terms of locating relevant solutions in the problem’s possibility space that are hard to find in relation to a reference strategy or mind. Generally, creativity may be conceived of as a functional concept, similarly to the functionalist accounts in the philosophy of mind (Levin, 2018; Cunningham, 2000), that is independent of its instantiation and generation, and may be realized in multiple ways (Putnam, 1967). As such, it is less important how solutions were computed, but what they are and how they relate to the problem setting and the required relevance, the possibility space and known strategies. After all, computational creative models may employ strategies very different from human ones (Marsden, 2000). While the innovation of new strategies in the light of reference strategies may be in principle evaluated computationally (as in chess), open domains like general music composition require substantial intelligence, and therefore (still) human gatekeepers, particularly, in terms of assessing domain boundaries, norms, context and relevance. As a consequence, general musical creativity may ultimately require strong, general artificial intelligence (being AI-complete). The next section elaborates this question in the light of four central challenges.
Music is often conceived of as pure structure or “absolute music” (Hanslick, 1854; Dahlhaus, 1991), and, consequently, it has often been a prime domain for computational creativity and witnessed some of the earliest attempts at computational composition (e.g. Hiller, 1970). However, most varieties of music in our world exhibit properties that go substantially beyond mere play with structure; they play with the mind and employ a plethora of means of expression, rich references to the world and the body, varieties of meaning, higher-order thought, embodiment, or even forms of humor. It is such cases that make music in its varieties particularly human and relevant for humans and their societies. If such forms of music are the goal of artificial creativity, their features raise substantial challenges for creative musical AI and entail that the overall problem of general human-like music creation (as opposed to a narrow problem setting like specific style replication) should be considered AI-complete. In other words, the full modeling of the capacity of music is not a partial AI problem, but will require human-level cognition and general intelligence (Adams et al., 2012) to a very far extent and thus require properties of strong AI (Russell and Norvig, 2021; Bach, 2009; Hofstadter, 1979). Four challenges to musical AI illustrate and outline this point.
As argued above, creativity lies in the eye of the beholder, and in music, it is the listener’s mind that knows in an instant whether a new musical creation “works”. Fundamentally, music is a cognitive phenomenon; it is there to be experienced by minds (Pearce and Rohrmeier, 2012; Koelsch, 2012) including their biological foundations (see Challenge 3). Outside the (human) mental sphere, music does not exist (Wiggins et al., 2010), would not have emerged and would have little meaning.
Music is the product of a long evolution that has shaped it for the human mind and its constraints—similarly to language (Christiansen and Chater, 2008). To a large extent, musical structure is adapted to the conditions of human perception, learning, representation, reproduction, and performance (Peretz and Zatorre, 2005; Peretz, 2006; Huron, 2006). In addition, the mind is a sense-making and intentionality machine (Dennett, 2008, 1971), and hence, many aspects and effects of music exist for their interpretation and sense-making by an attending mind, including musical intentionality, meaning, and semantics (Nattiez, 1990; Koelsch, 2011; Polth, 2001; Rohrmeier and Koelsch, 2012, see also Challenge 2). Altogether, these cognitive foundations of music raise the first challenge:
Challenge 1. General artificial musical creativity will ultimately require a cognitive model of music.
Here, “cognitive model of music” refers to a computational model of the different aspects of music perception and processing identified by music theory, psychology and neuroscience (Wiggins, 2012a; Pearce and Rohrmeier, 2012). To give an example, a musical interactive agent will contain, at least, approximations of representations of basic cognitive structures of music (Temperley, 2001), even if implicit, such as beat and metrical inference, stream analysis, voice segregation, harmonic inference, melodic analysis, or whatever the modelled musical style demands; in addition, it will require an internal representation of the piece, its structure, the parts that other (human or non-human) agents take, and a plan of events and musical stages at different timescales. Another challenge in this context is that many of these “basic” cognitive features of music are not constant, but may also vary between cultures, such as the perception of strong and weak beats (e.g. Stobart and Cross, 2000).
Like a human composer, who gauges the intended effects against the inner ear and an assumed listener, artificial composition requires reference to a cognitive model, or an (implicit) approximation, in order to stage effects in a domain that is ultimately made to be experienced by human minds. Many structures encountered in music involve this form of grounding in terms of an interplay between a creating and a listening model, such as effects of tension, delay, anticipation, surprise, revision (Huron, 2006; Rohrmeier, 2013), effects of form (such as one-more-time patterns (Schmalfeldt, 1992), or a looming sense of finality), or harmonic effects such as preparatory or contrastive harmony (Rohrmeier, 2020a). For example, creating effects of delay, anticipation, or surprise requires setting up a musical context in which a model of the listener would anticipate a certain continuation, in order to continue with an effective divergence that elicits the desired effect. In a similar vein, it is the cognitive interplay of musical structures at different timescales and long-term dependencies, which many computational models are still lacking and which triggers a lot of research activity (Roberts et al., 2018; Guo et al., 2021). Glitches such as those in the computational screenplay Sunspring (see Note 1), make the storyline incoherent and may have an unintentional, somewhat comic effect. The same holds for musical creations based on Markov or n-gram models that drift in disoriented Brownian motion between good local creations, unable to capture long-term dependencies as established in music theory and neuropsychological research (e.g. Koelsch et al., 2013).
One central counterargument to this point is that such features and psychological effects are implicitly all contained in the training data, as well as, to some extent, in music theoretical rules. However, the adaptation, generalization and, particularly, creative expansion of such examples relies on feedback from a (human or artificial) cognitive model. For instance, it is a major challenge for a computational creative model to come up with cases such as a failed cadence, a kind of contrastive modulation, or a groove pattern, which have not been observed in the data, and are still predicted by the model to work convincingly for human listeners.
In this context, many structures identified in music theory provide useful formalizations of relevant cognitive structures (Jackendoff and Lerdahl, 2006; Wiggins, 2012b; Koelsch, 2012; Rohrmeier and Pearce, 2018; Cecchetti et al., 2020). As such, they may inform computational modeling about relevant kinds of latent concepts, relations and dependencies to take into account. Depending on the style of interest, many of these structures still posit substantial, often unsolved challenges for computational modeling and generation today, such as free polyphony and counterpoint between voices, non-local and long-term coherence, compound lines, interpretable harmonic relations, overarching musical form, motivic and repetition structure. Conversely, music-theoretical structures, however, are not sufficient for generation by themselves: just as a (perfect) grammar of English is not a sufficient model of literature or poetry writing because it would have nothing to express, a mere generative grammar model of music (e.g. Tymoczko and Meeùs, 2003; Rohrmeier, 2020b; Quick and Hudak, 2013), for instance, is similarly incomplete, abstracting from factors that establish individual expressive coherence within pieces. As proxies to a cognitive model, however, music-theoretical structures can provide essential input as well as benchmarks for evaluating computer-generated music.
Probably one of the hardest of the cognitive challenges for artificial composition is musical humor. There are many levels at which humor may act in music (Schimmel, 2014; Kitts and Baxter-Moore, 2019), and the world of musical jokes ranges from classical compositions like Haydn’s Joke Quartet (op. 33 #2) to current artists like Bobby McFerrin, Peter Schickele, Gerard Hoffnung, or Iguedesman & Joo. Several examples shall illustrate the different kinds of complexity involved in musical jokes. Mozart’s A Musical joke (K. 522) exhibits bad counterpoint and voice-leading, wrong key transitions, and a messed up fugato; yet all are framed within a reference of perfect composition technique, which sets up the context for humor to arise. Beethoven’s sonata #16, op. 31/1 in G major involves parody on Italian opera,8 i.e., complex world-reference. Another kind is the joke of the double basses failing to keep up with the cellos at the beginning of the second movement of Shostakovich’s 1st symphony.
An analysis of a little joke on Mozart’s Rondo Alla Turca (K331, III) adapted from Hans Liberg (see Figure 2) shall illustrate the complexity involved in this case.9 Apart from the right timing and musical gestures during a performance, the example grounds on world-knowledge of the audience’s musical experience in order to select a suitable, famous cliché piece which employs a simple repeated pattern; it requires knowledge that the modification of a famous piece will cause surprise (violating veridical expectancy) and knowledge under which conditions a short pattern repeated for too long may become absurd; it further needs to find the right kind of problem abstraction to identify and continue the piece’s pattern in a consistent and immediately apparent way (to the presumed listener’s mind), in this case by maintaining the left-hand, rhythm and harmony; in the final measure, world and embodied knowledge are required when the upper end of the piano range is reached.
There are cognitive and computational accounts of humor (Binsted, 1996; Hurley et al., 2011; West and Horvitz, 2019), which involve concepts such as a play on mental parsing and debugging strategies (for another cognitive analysis of musical humor, see also Huron, 2006, p. 283–288). Nonetheless, humor continues to posit very substantial challenges for computational creativity. Not only is it hard to possess sufficient musical, world, embodied and situational knowledge for setting up a joke (see Challenges 2 and 3), but, in addition, preparing a (musical) joke requires a theory of mind of a listener, similar to the setting-up of other cognitive effects as argued above.
The points outlined above provide an argument that general computational musical creativity ultimately requires (or implicitly contains) an overarching cognitive model that encompasses the psychological and theoretical foundations of human music involving the complexity of even up to a theory of mind.
Humans live in the world. In numerous cases, music makes references to the external world through some kind of semiotics, semantics or pragmatics (Eco, 1976; Nattiez, 1990; Jankélévitch, 1961; Schlenker, 2017, 2019; Koelsch, 2011), and involves “ecological listening” (Clarke, 2008). This, in turn, requires world-knowledge and a world model for both the understanding and production of such world references. This aspect of the music capacity gives rise to the second challenge:
Challenge 2. General artificial musical creativity will ultimately require a way of establishing semiotic, semantic and pragmatic references to the external world.
Music across the board involves world reference and ecological listening. Pictorial music, program music, and film music include countless examples, such as hammering and nails in Bach’s St Matthew Passion; depictions of water such as in Ravel’s Jeux d’eau, Debussy’s La Mer and Smetana’s Moldau; imitation of bells in Grieg’s lyrical piano piece Klokkeklang; a heavy cart approaching in Mussorgsky’s Bydło in his Pictures at an Exhibition; or also Messiaen’s very naturalistic imitation of birds in the Catalogue d’oiseaux and the Quatuor pour la fin du temps; or Janáček’s close imitation of speech prosody, just to name a few. One of Wilhelm Killmayer’s Five Romances for violin and piano (1987) alludes to a grammophone and the music being stuck in a repetitive loop. Music may also directly incorporate or embed sounds from the world, such as city scapes, streets, conversations, the sea, etc., and make reference and statements about these contexts within other aspects of the music. Steve Reich’s Different Trains (1988) constitutes a well-known example.
In most cases of musical semantics, the concrete observed note events cannot merely be derived by reference to a model of the music theoretical systems of tonality and form (Polth, 2001), but also require a model of the sound properties and motion dynamics of external world objects (Schlenker, 2019). In addition to the examples above, this adaptation of motion dynamics is exemplified by rhythms imitating horse hooves such as in Schubert’s Erlkönig, Rossini’s William Tell Overture, Aaron Copland’s Rodeo Suite or Aerosmith’s Back in the Saddle Again. The peculiar occurrences of the octave leaps B♮5-B♮4 in the first movement of Bruckner’s ninth symphony (mm.219–223) may only be explicable in terms of associations of distant flashes (Polth, 2001). Other instances are depictions of approaching and receding objects, such as a funeral procession, e.g. in Chopin’s second sonata, 3rd movement, or Mahler’s fifth symphony, 1st movement. Such kinds of musical meaning and associations may further be coupled with film to underline the video semantics, such as Strauss’s Thus Spake Zarathustra employed with the sun rising at the beginning of Stanley Kubrick’s Space Odyssey 2001 (Schlenker, 2019). It is not only that such pieces take reference or mimick the world in some fashion, but that several compositional decisions and musical developments can only be fully understood and created in terms of their world reference. Many of these examples only work because of cognitive cross-domain mappings in music such as pitch-height priming physical space (high-low, wide-narrow), priming the speed of moving objects, transformations or actions, or even priming higher-order thoughts such as good/bad (Eitan and Timmers, 2010), linking back to Challenge 1.
Moving from world reference in terms of modeling the shape and dynamics of a world object to more complex cases, plenty of musical pieces reference and comment on affairs in the world. An example is Jimi Hendrix’s guitar solo on the Star Spangled Banner at Woodstock 1969, which involves references to sirens, guns, screaming and others, being considered as a political statement and anti-war protest (see also Clarke, 2008). The Pet Shop Boys employ in their song Go West (1993), very strikingly the same harmonic (Romanesca) schema as underlies the Russian national anthem. Similarly, the much earlier example of the beginning of Bach’s chorale Es ist genug constitutes an example of music-text relations and musical meta-text, which is integral of Bach’s chorale style (Daniel, 2000), but could not be reasonably derived from a mere MIDI-encoded data set alone.
Finally, not all semantic aspects of musical composition appear in the sound. Compositions may employ features that are barely audible or downright inaudible, such as merely enharmonically respelled pitches in piano music, or symbolism in the score. Again this concerns the common problem setting of modeling Bach’s chorale style. Examples from a different angle involve references between melody and names through the musical and literal writing system, such as occurrences of B-A-C-H (B♮) in Bach’s works as well as a meme in many subsequent works, D♯–B(= Dis–H) as Dimitri Shostakovic’s signature, and Schumann’s ABEGG Variations. Such kinds of references require very substantial world-knowledge and reasoning beyond the musical domain.
Humans, and their minds, live in a body. And therefore, music does not merely arise in the airless space of Platonic ideas and plain formal structures. Humans inhabit a world, and make music on physical instruments for an experience grounded in biological psychophysiology (Schiavio et al., 2014; Korsakova-Kreyn, 2018). Cognition depends on the biology of the body (Varela et al., 2016), and meaning and conceptual spaces depend on the body and its world (Lakoff and Johnson, 2003; Lakoff, 1989). Also, the human mind does not end at the boundaries of the body (Clark, 2008); rather, the mind and the body’s proprioception are extended to instruments. Embodiment and embodied cognition have shaped music and music making in profound ways, and it is these factors that posit another main challenge for musical AI:
Challenge 3. To be relevant for humans, general artificial musical creativity will need to involve a model of the relevant aspects of the human body, the instrument, the interaction and the performative context.
Music is naturally bound by properties and limits of the human body (Koelsch, 2012). Musical timescales are adapted to the timescales of the body. Dance rhythms, such as those in Swedish Slängpolska or funk drum beats, are adapted to human danceability (Witek et al., 2014). Music may portray human heart beat, and also manipulate it (Koelsch and Jäncke, 2015). Pitch is used respecting the limits of human pitch perception and optimized for discrimination by the inner ear (Huron, 2016), for example in terms of chord spacing (Huron and Sellmer, 1992). For a computational model without a natural human-like hearing system and organism, such constraints and peculiarities may be hard to acquire and to generalize to unseen cases.
Furthermore, the possibilities and constraints of the human body and of instruments are reflected in musical structure. The voice-setting and texture of a string quartet is different from that of piano music. The shape and ability of the hand, and the existence of two hands with four fingers and a flexible thumb each shapes the piano and piano music (Sudnow, 1978) as well as other instrumental music. Similar conclusions may be drawn for cello music being represented in, or shaped by, the body of the composer/performer (Le Guin, 2005). It is the acoustic properties of the piano and silently touched notes that create particular effects in György Ligeti’s etude Touches bloquées. The sizzling effect of Chopin’s “thirds etude” (op. 26 #6) or Ondine from Ravel’s Gaspard de la nuit is amplified by the near impossibility of executing the figure precisely with the hand. Furthermore, it is the very impossibility of humans not to entrain to a beat or an external source (Clayton, 2005), that makes Steve Reich’s Piano Phase (1967) or Violin Phase (1967) have such interesting effects and be so hard to perform. In sum, the play with such cases and with the limits of human instrumental playing may posit various challenges for computational models without a model of the human body and its perceptual system.
Finally, other complex cases arise from live musical interaction and the communication between musicians (Cross, 2013; Moran, 2013). Aspects of the music as recorded may arise from the stage setting and live interaction of the musicians and even the audience. In an Oscar Peterson live performance (Munich, 1999), an overly long introduction is explained by the musicians looping until every member of the group slowly walks onto the stage one by one to join in. Many musical phenomena arise from the playing situation and interactions, be it in a jazz ensemble, a rock band or a string quartet and its score. Such effects of embodied live interaction involve peculiar challenges for models of automatic music generation and their inference methods.
Humans are capable of dealing with complex hierarchies and higher-order structures and thought across all domains of cognition (Jackendoff, 2007; Hofstadter, 1979). Because these forms of higher-order organization go so easily for humans, they also occur ubiquitously in music, and thus give rise to the fourth challenge:
Challenge 4. General artificial musical creativity requires modeling creativity at the meta-level.
Examples from several domains shall illustrate this point.
Higher-order concepts in composition techniques and musical form. Often, the form of a composition involves an overarching higher-order idea that lies outside traditional theoretical (first-order) models of musical form, composition technique or tonality. This involves compositional concepts like Kirnberger or Mozart coming up with the very idea of musical dice games, or Bach with an entire retrograde composition, or Thomas Tallis with the composition of a Madrigal (Spem in alium) for no less than 40 simultaneous voices (eight 5-voice choirs), a piece which then required extensive subsequent additions to compositional technique (Roth, 1998). The pieces in Ligeti’s Musica Ricercata are organized by adding one additional pitch class to the available material for each successive piece. Another line of examples concerns form embeddings: the idea to incorporate a fugue within the sonata concept (Liszt, B minor sonata, S.178; Beethoven, op. 109, 110, 134; Brahms, op. 38; Schubert, D 760), or a Mazurka within a Polonaise (Chopin, Polonaise F♯m, op. 44). Various modern collage compositions or Vaporwave in pop music also fall into a similar category. While many of these compositional concepts could be engineered or hard-wired once defined, the creative achievement lies not in carrying them out but in their discovery. As much as Ligeti’s etude #1, Désordre or Steve Reich’s Violin Phase and Piano Phase may be recreatable algorithmically (Taube, 2003), the creative point lies in the very idea in the first place, in other words, the much harder problem of creativity at the meta-level.
Even in traditional tonal styles, many musical pieces are driven by various ideas at the meta-level that do not come from standard tonality and composition technique: the rising and continuously reharmonized peak characterizing the formal evolution in Schumann’s Träumerei Op.15 #7; the key changes every second bar in John Coltrane’s Moments’ Notice, illustrating exactly the title’s concept; the ever-drifting, unstable key center in Bill Evan’s Time Remembered, illustrating the dreamlike vanishing nature evoked by the theme of the composition; the very idea to compose a piece solely based on the (unusual) major-third cycle in John Coltrane’s Giant Steps; the melodic Hepokoski and Darcy, 2006). Creative organization at the meta-level is not only ubiquitous in jazz and avant-garde music (Sutherland, 1994; Whittall, 2000), it is also common in non-Western music, such as the scalar expansion technique in classical North Indian Alaps (Widdess, 1981; Finkensiep et al., 2019). Generally, creativity at the meta-level is an integral part of music across history and cultures. Even simpler phenomena like motivic development and variation constitute ideas to establish higher-order organization. Also, musical humor, as discussed in Challenge 1, constitutes a major case of creativity at the meta-level.over ♭IImaj7 on the words “slightly out of tune” in Antônio Carlos Jobim’s Desafinado; the idea to compose an entire A-part on a single note and the B-part on simplistic scale movement in Jobim’s One Note Samba; the concept of “deformation” in classical sonata (
The difficulties in addressing such challenges shall be illustrated by the hard problem of form embedding in a classical music problem setting. What would be required in order to model the embedding of a fugue into a sonata as in Liszt’s B-minor sonata, or Beethoven’s late works? A model of such kind would at first require a successful model of fugue. Second, it would require a successful model of sonata form including everything from motives, to themes, to formal functions, to repetition structure and overarching coherence in terms of a dramaturgic plan. Third, it would need to come up with the idea to combine the two, and not as a sequence but within a part of the sonata form without breaking the overarching concept. Fourth, it may choose to generate a theme that works for a fugue as well as a sonata, and then, fifth, ensure smooth transitions and embedding within the overarching formal and dramaturgic plan. To conceive of a musical piece like this requires strong intelligence and creativity at the meta-level, even though the concrete milestones may differ for human and computational composition strategies. It is already a significant challenge to merely reverse-engineer this case, yet the hard problems lie in a creative model discovering the very idea of form combination, a problem class that is itself very hard to generalize or even to define in concrete terms. Challenges like this are hard nuts to crack and require a high degree of intelligent abstraction.
Musical quotations and allusions. Another frequent phenomenon across musical styles are quotations and allusions, which mostly come with semantic or pragmatic intentions (see also Challenge 2). Simple, well-known examples are Berlioz quoting Dies Irae in his Symphonie fantastique or the Pet Shop Boys alluding to the Russian Anthem in Go West as discussed before. Another one is Hans Zimmer’s film music at the finale of Interstellar alluding to Stanley Kubrik’s quoting of Strauss’s Zarathustra reflecting on Nietzsche. John Scofield’s Not You Again is a contra-fact over the chord changes of the jazz standard There Will Never Be Another You. Quotations and allusions may even be purely conceptional, and even be without any audible event such as Ligeti’s three silent Bagatelles referencing John Cage’s 4’33”. The application of quotations and allusions through sampling techniques is also ubiquitous in electronic and pop music and employed to establish all kinds of stylistic, intra- and extra-musical references, commentaries, or parodies (Shuker, 2013). The challenge of quotations for computational creativity lies in their nature as second-order compositional operations above first-order compositional techniques as well as in their semantic and pragmatic content. The fact that such kinds of creativity can arise in an entirely spontaneous manner, such as Michel Petrucciani and Eddy Louis discovering and adopting When the Saints Go Marching In within the different tune Caraibes (in their album Conference de Press II), highlights the extent to which such higher-order concepts and strategies are easily and readily available for human musicians, even while playing and solving other more immediate tasks of musical performance, improvisation and creation.
The range of points and examples above outlines various challenges to musical AI and general artificial music creation. Overall, the issues raised concern (a) the requirement of a cognitive model for a wide range of musical phenomena, (b) the need of complex world-knowledge and world-reference in order to understand and to set-up forms of musical meaning, (c) the ways in which embodied cognition and performative context shape music; and (d) the ways in which creativity at the meta-level plays a major role in music creation across the board. As it often involves all four challenges, generating musical humor may probably be one of the hardest problems for musical AI.
Human music in general is a very open and diverse phenomenon. The wide range of cases and examples raised in the context of the four challenges illustrate what the general human musical creative capacity is capable of and render it an AI complete problem. As pointed out, even seemingly simple genres like Bach’s chorales that are used as standard cases (Ebcioglu, 1992; Allan and Williams, 2005; Hadjeres et al., 2017) cannot be fully isolated and comprehensively modeled without their world reference and creativity at the meta-level. Because of music’s AI-completeness, it is sensible that tests of general artificial intelligence were proposed that rely on musical creativity (e.g. the Musical Directive Toy Test, or the Musical Output Toy Test, Ariza, 2009).
The overarching problem that all these challenges point to is that major aspects of music can likely not be explained (and thereby modeled) by reference to musical structure or a dataset of musical examples alone, but that messy and complex factors of the surrounding world have a share in their impact on making music human and relevant for humans. Conversely, it is no coincidence that breakthroughs in AI such as the Atari game models involve making a world (external to the model) part of the modeling procedure (Mnih et al., 2013). Despite major breakthroughs like GPT-3, it will be hard to overcome some of the challenges by mere learning from ever larger musical datasets. Many of the examples of creativity at the meta-level above are unique, with their overarching idea hard to abstract and to generalize from sparse cases without strong abstracting intelligence. Such creations may come out distorted or strange, for instance, because the concept may be only carried out halfway, or underlying world references may be messed up. New methodological innovation in model architectures may be needed in order to model aspects of creativity at the meta-level in more advanced ways.
While problems of creativity at the meta-level and examples like the ones discussed are certainly worth detailed attention and investigation, on the other hand, creative modeling efforts should not be slowed down or limited by dragging the problem setting exclusively onto the court of human exercises and forms of creativity. There are plenty of innovative and impressive creative models entirely remote from traditional composition tasks and techniques. In contrast, successful computational creativity will likely produce ideas and musical strategies that are very different from human ones—yet nonetheless the goal of artificial creativity remains music relevant for humans (paceLoughran and O’Neill, 2017).
Although it is almost a cliché or a taboo in AI research, it useful to reflect on why humans do music after all. Being much more than a play of structure and a stimulus for background, mood and dancing, music created for humans and their societies plays indispensable roles in social contexts and situations, political expression, rituals, religion, bonding, mother-infant interaction, or artistically reflecting shades of the human condition of existence. Such aspects comprise essential motivations for artists and social groups, and constitute major driving forces of music in human evolution and history (Wallin et al., 2000; Morley, 2013; Cross, 2003; Huron, 2001; Honing, 2018). While such aspects leave their traces in musical structure as illustrated above, they involve full and humanized intelligence and hence require strong general AI, which is why they play less of a role in (practical) research on computational music creativity.10 Yet, they are useful for a general perspective as well as for a reference and ground for further discussion in light of rapidly advancing computational achievements (Brown, 2021).
The four challenges as well as the cases outlined above may help to inspire or provoke future developments in artificial musical creativity. They may provide conceptual clarifications, point out blind spots, or give inspiration for new kinds of model architectures, milestone problems, setups or evaluation procedures. It will require interdisciplinary exchange with research insights from cognitive science, psychology, neuroscience, music theory, and cross-cultural research to tackle the challenges of strong AI in a domain as complex as human music.
To conclude, the consequences of advancing artificial musical creativity deserve brief reflection. Despite bleak predictions, chess computers did not end the human chess sport and professional competitions, and they did not result in chess games becoming over-explored and uninteresting. In contrast, they generated a huge resource of innovation, lifting the game to a new level (Sadler and Regan, 2019). What will be the consequences of advanced intelligent musical composition and musical tools, of machine performers that outperform humans, or of gazillions of machine compositions flooding the market? Developments and breakthroughs in artificial musical creativity will have major economic and social consequences, significant impact on the balance of the art sector and its market dynamics, and they will come with new legal and intellectual copyright issues. Such topics require responsibility and more extensive discussion amongst experts and in the public sphere (advancing, e.g., discussions of Holzapfel et al., 2018; Ben-Tal et al., 2020; Sturm et al., 2019b; Brown, 2021).
3See, e.g., Kant (1790). The notion of the genius and its creativity, however, is far from absolute and rather represents a fairly recent idea in the history of Western aesthetics (e.g. Goehr, 1992; Bauman, 2004).
4This understanding of creativity touches upon the related philosophical problems of decision making, formation of free will, or even conscious agency (Wiggins, 2018; List, 2019; Cunningham, 2000; Glymour, 2015).
7Once creative value is almost exclusively placed on creativity at the meta-level, the “meta” strategy takes on a life of its own and consequently collapses the evaluation of creativity. The continuous loosening of assumptions and restrictions naturally produces creativity to a certain extent by transcending known strategies. On the backside, however, this process exhausts itself fairly quickly and leads to an accumulation of hollow strategies that make solutions within a known possibility space seem obvious or familiar, but do not actually generate new and relevant problem solutions anymore.
8See the Masterclass by András Schiff: https://www.theguardian.com/music/classical/page/0,,1943867,00.html.
9The example is adapted from Hans Liberg’s video at https://www.youtube.com/watch?v=wfm-3EbOXyE.
10See, however, Hofstadter’s impassioned critique based on such considerations (Hofstadter, 1979, ch. 19); for a different perspective, however, that argues for disregarding who music is for and avoiding a humanized perspective when evaluating creativity see also Loughran and O’Neill (2017).
This research has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (GA No. 760081), and from the Swiss National Science Foundation (GA No. 182811). I thank Claude Latour for supporting this research through the Latour Chair in Digital Musicology. I owe particular thanks to the editors, the three reviewers, and to Christoph Finkensiep, Robert Lieck, Gabriele Cecchetti, and Markus Neuwirth for many valuable comments and discussions.
The author has no competing interests to declare.
Adams, S. S., Arel, I., Bach, J., Coop, R., Furlan, R., Goertzel, B., Hall, J. S., Samsonovich, A., Scheutz, M., Schlesinger, M., Shapiro, S. C., and Sowa, J. F. (2012). Mapping the landscape of humanlevel artificial general intelligence. AI Magazine, 33(1):25–42. DOI: https://doi.org/10.1609/aimag.v33i1.2322
Agres, K., Forth, J., and Wiggins, G. A. (2016). Evaluation of musical creativity and musical metacreation systems. Computers in Entertainment, 14(3):1–33. DOI: https://doi.org/10.1145/2967506
Allan, M. and Williams, C. (2005). Harmonising chorales by probabilistic inference. In Weiss, Y., Schölkopf, B., and Platt, J. C., editors, Advances in Neural Information Processing Systems (NIPS), pages 25–32.
Ariza, C. (2009). The interrogator as critic: The Turing Test and the evaluation of generative music systems. Computer Music Journal, 33(2):48–70. DOI: https://doi.org/10.1162/comj.2009.33.2.48
Bach, J. (2009). Principles of Synthetic Intelligence PSI: An Architecture of Motivated Cognition. Oxford University Press, Oxford. DOI: https://doi.org/10.1093/acprof:oso/9780195370676.001.0001
Barron, F. (1955). The disposition towards originality. Journal of Abnormal and Social Psychology, 51:478–485. DOI: https://doi.org/10.1037/h0048073
Bauman, T. (2004). Becoming original: Haydn and the cult of genius. The Musical Quarterly, 87(2):333–357. DOI: https://doi.org/10.1093/musqtl/gdh014
Boden, M. (2004). The Creative Mind: Myths and Mechanisms. Routledge, London, UK. DOI: https://doi.org/10.4324/9780203508527
Bringsjord, S., Bello, P., and Ferrucci, D. (2001). Creativity, the Turing Test, and the (better) Lovelace Test. Minds and Machines, 11(1):3–27. DOI: https://doi.org/10.1023/A:1011206622741
Brown, O. (2021). Sociocultural and design perspectives on AI-based music production: Why do we make music and what changes if AI makes it for us? In Miranda, E. R., editor, Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity, pages 1–20. Springer, Cham, CH. DOI: https://doi.org/10.1007/978-3-030-72116-9_1
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. (2020). Language models are few-shot learners. In 34th Conference on Neural Information Processing Systems (NeurIPS).
Carnovalini, F. and Rodà, A. (2020). Computational creativity and music generation systems: An introduction to the state of the art. Frontiers in Artificial Intelligence, 3(14). DOI: https://doi.org/10.3389/frai.2020.00014
Christiansen, M. H. and Chater, N. (2008). Language as shaped by the brain. The Behavioral and Brain Sciences, 31(5):489–508. DOI: https://doi.org/10.1017/S0140525X08004998
Church, A. (1936). An unsolvable problem of elementary number theory. American Journal of Mathematics, 58:34–363. DOI: https://doi.org/10.2307/2371045
Clayton, M. (2005). Observing entrainment in Indian music performance: Video-based observational analysis of tanpura playing and beat marking. Musicae Scientiae, 11(1):27–60. DOI: https://doi.org/10.1177/102986490701100102
Colton, S., Charnley, J., and Pease, A. (2011). Computational creativity theory: The FACE and IDEA descriptive models. In Ventura, D., Gervás, P., Harrell, D., Maher, M. L., Pease, A., and Wiggins, G., editors, Proceedings of the 2nd International Conference on Computational Creativity (ICCC), pages 90–95, Mexico City.
Cross, I. (2013). “Does not compute”? Music as realtime communicative interaction. AI and Society, 28(4):415–430. DOI: https://doi.org/10.1007/s00146-013-0511-x
Csikszentmihalyi, M. (1991). Society, culture and person: A systems view of creativity. In Sternberg, R. J., editor, The Nature of Creativity: Contemporary Psychological Perspectives, pages 325–340. Cambridge University Press, Cambridge, UK.
Dahlhaus, C. (1991). The Idea of Absolute Music. University of Chicago Press, Chicago. DOI: https://doi.org/10.2307/431491
Dennett, D. C. (1971). Intentional systems. The Journal of Philosophy, 68(4):87–106. DOI: https://doi.org/10.2307/2025382
Ebcioglu, K. (1992). An expert system for harmonizing chorales in the style of J. S. Bach. In Balaban, M., Ebcioglu, K., and Laske, O., editors, Understanding Music With AI: Perspectives on Music Cognition, pages 294–333. MIT Press, Cambridge, Massachusetts.
Eco, U. (1976). A Theory of Semiotics. Indiana University Press, Bloomington, Indiana. DOI: https://doi.org/10.1007/978-1-349-15849-2
Eitan, Z. and Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles: Crossdomain mappings of auditory pitch in a musical context. Cognition, 114(3):405–422. DOI: https://doi.org/10.1016/j.cognition.2009.10.013
Fernández, J. D. and Vico, F. (2013). AI methods in algorithmic composition: A comprehensive survey. Journal of Artificial Intelligence Research, 48:513–582. DOI: https://doi.org/10.1613/jair.3908
Finkensiep, C., Widdess, R., and Rohrmeier, M. (2019). Modelling the syntax of north Indian melodies with a generalized graph grammar. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 426–469.
Gifford, T., Knotts, S., McCormack, J., Kalonaris, S., Yee-King, M., and d’Inverno, M. (2018). Computational systems for music improvisation. Digital Creativity, 29(1):19–36. DOI: https://doi.org/10.1080/14626268.2018.1426613
Guo, Z., Makris, D., and Herremans, D. (2021). Hierarchical recurrent neural networks for conditional melody generation with long-term structure. In Proceedings of the International Joint Conference on Neural Networks (IJCNN).
Hepokoski, J. and Darcy, W. (2006). Elements of Sonata Theory: Norms, Types, and Deformations in the Late-Eighteenth-Century Sonata. Oxford University Press, Oxford. DOI: https://doi.org/10.1093/acprof:oso/9780195146400.001.0001
Higham, T., Basell, L., Jacobi, R., Wood, R., Ramsey, C. B., and Conard, N. J. (2012). Testing models for the beginnings of the Aurignacian and the advent of figurative art and music: The radiocarbon chronology of Geißenklösterle. Journal of Human Evolution, 62(6):664–676. DOI: https://doi.org/10.1016/j.jhevol.2012.03.003
Hiller, L. (1970). Music composed with computers –a historical survey. In Lincoln, H. B., editor, The Computer and Music, pages 42–96. Cornell University Press., Cornell, USA. DOI: https://doi.org/10.7591/9781501744167-007
Holland, S., Mudd, T., Wilkie-McKenna, K., McPherson, A., and Wanderley, M. M. (2019). New Directions in Music and Human-Computer Interaction. Springer, Cham, CH. DOI: https://doi.org/10.1007/978-3-319-92069-6
Holzapfel, A., Sturm, B. L., and Coeckelbergh, M. (2018). Ethical dimensions of music information retrieval technology. Transactions of the International Society for Music Information Retrieval, 1(1):44–55. DOI: https://doi.org/10.5334/tismir.13
Honing, H. (2018). The Origins of Musicality. MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/10636.001.0001
Huang, C. Z. A., Hawthorne, C., Roberts, A., Dinculescu, M., Wexler, J., Hong, L., and Howcroft, J. (2019). Bach Doodle: Approachable music composition with machine learning at scale. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 793–800.
Hurley, M. M., Dennett, D. C., and Adams, R. B. (2011). Inside Jokes: Using Humor to Reverse-Engineer the Mind. MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/9027.001.0001
Huron, D. (2001). Is music an evolutionary adaptation? Annals of the New York Academy of Sciences, 930:43–61. DOI: https://doi.org/10.1111/j.1749-6632.2001.tb05724.x
Huron, D. (2006). Sweet Anticipation: Music and the Psychology of Expectation. MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/6575.001.0001
Huron, D. (2016). Voice Leading: The Science Behind a Musical Art. The MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/9780262034852.001.0001
Huron, D. and Sellmer, P. (1992). Critical bands and the spelling of vertical sonorities. Music Perception, 10(2):129–149. DOI: https://doi.org/10.2307/40285604
Iñesta, J. M., Conklin, D., and Ramírez, R. (2016). Machine learning and music generation. Journal of Mathematics and Music, 10(2):87–91. DOI: https://doi.org/10.1080/17459737.2016.1216369
Jackendoff, R. (2007). Language, Consciousness, Culture: Essays on Mental Structure. MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/4111.001.0001
Jackendoff, R. and Lerdahl, F. (2006). The capacity for music: What is it, and what’s special about it? Cognition, 100:33–72. DOI: https://doi.org/10.1016/j.cognition.2005.11.005
Jordanous, A. (2012). A standardised procedure for evaluating creative systems: Computational creativity evaluation based on what it is to be creative. Cognitive Computation, 4(3):246–279. DOI: https://doi.org/10.1007/s12559-012-9156-1
Kaliakatsos-Papakostas, M. and Queiroz, M. (2017). Conceptual blending of harmonic spaces for creative melodic harmonisation. Journal of New Music Research, 46(4):305–328. DOI: https://doi.org/10.1080/09298215.2017.1355393
Kirke, A. and Miranda, E. R. (2009). A survey of computer systems for expressive music performance. ACM Computing Surveys, 42(1). DOI: https://doi.org/10.1145/1592451.1592454
Kitts, T. M. and Baxter-Moore, N., editors (2019). The Routledge Companion to Popular Music and Humor. Routledge, New York, NY. DOI: https://doi.org/10.4324/9781351266642
Koelsch, S. (2011). Towards a neural basis of processing musical semantics. Physics of Life Reviews, 8(2):89–105. DOI: https://doi.org/10.1016/j.plrev.2011.04.004
Koelsch, S. and Jäncke, L. (2015). Music and the heart. European Heart Journal, 36(44):3043–3049. DOI: https://doi.org/10.1093/eurheartj/ehv430
Koelsch, S., Rohrmeier, M., Torrecuso, R., and Jentschke, S. (2013). Processing of hierarchical syntactic structure in music. Proceedings of the National Academy of Sciences of the United States of America, 110(38):15443–15448. DOI: https://doi.org/10.1073/pnas.1300272110
Korsakova-Kreyn, M. (2018). Two-level model of embodied cognition in music. Psychomusicology: Music, Mind, and Brain, 28(4):240–259. DOI: https://doi.org/10.1037/pmu0000228
Lakoff, G. and Johnson, M. (2003). Metaphors We Live By. University of Chicago Press, Chicago. DOI: https://doi.org/10.7208/chicago/9780226470993.001.0001
Le Guin, E. (2005). Boccherini’s Body: An Essay in Carnal Musicology. University of California Press, Berkeley. DOI: https://doi.org/10.1525/california/9780520240179.001.0001
List, C. (2019). Why Free Will Is Real. Harvard University Press, Cambridge, MA. DOI: https://doi.org/10.4159/9780674239807
Loughran, R. and O’Neill, M. (2017). Limitations from assumptions in generative music evaluation. Journal of Creative Music Systems, 2. DOI: https://doi.org/10.5920/JCMS.2017.12
Loughran, R. and O’Neill, M. (2020). Evolutionary music: Applying evolutionary computation to the art of creating music. Genetic Programming and Evolvable Machines, 21:55–85. DOI: https://doi.org/10.1007/s10710-020-09380-7
Lovelace, A. (1842). Translator’s notes to an article on Babbage’s Analytical Engine. In Taylor, R., editor, Scientific Memoirs: Selected from the Transactions of Foreign Academies of Science and Learned Societies, and from Foreign Journals, pages 691–731. Richard and John E. Taylor, London, UK, 3rd edition.
Luhmann, N. (2000). Art as a Social System. Stanford University Press, Stanford. DOI: https://doi.org/10.1515/9781503618763
Marsden, A. (2000). Music, intelligence and artificiality. In Miranda, E. R., editor, Readings in Music and Artificial Intelligence, pages 15–28. Harwood Academic Publishers, Amsterdam. DOI: https://doi.org/10.1007/978-1-4613-9080-0
Marsland, T. A. and Schaeffer, J., editors (1990). Computers, Chess, and Cognition. Springer, New York. DOI: https://doi.org/10.1007/978-94-009-8947-4
Miranda, E. R. (2021). Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity. Springer, Cham, CH. DOI: https://doi.org/10.1007/978-3-030-72116-9
Mitchell, M. (2021). Why AI is harder than we think. arXiv preprint, arXiv:2104.12871v2. DOI: https://doi.org/10.1145/3449639.3465421
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. In NIPS Deep Learning Workshop. arXiv preprint, arXiv:1312.5602
Moran, N. (2013). Music, bodies and relationships: An ethnographic contribution to embodied cognition studies. Psychology of Music, 41(1):5–17. DOI: https://doi.org/10.1177/0305735611400174
Mordvintsev, A., Olah, C., and Tyka, M. (2015). Inceptionism: Going deeper into neural networks. In Google Research Blog, https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html
Morley, I. (2013). The Prehistory of Music: Human Evolution, Archaeology, and the Origins of Musicality. Oxford University Press, Oxford. DOI: https://doi.org/10.1093/acprof:osobl/9780199234080.001.0001
Papadopoulos, A., Roy, P., and Pachet, F. (2016). Assisted lead sheet composition using FlowComposer. In International Conference on Principles and Practice of Constraint Programming, pages 769–785. Springer, Cham. DOI: https://doi.org/10.1007/978-3-319-44953-1_48
Pearce, M. and Rohrmeier, M. (2012). Music cognition and the cognitive sciences. Topics in Cognitive Science, 4(4):468–484. DOI: https://doi.org/10.1111/j.1756-8765.2012.01226.x
Peretz, I. (2006). The nature of music from a biological perspective. Cognition, 100(1):1–32. DOI: https://doi.org/10.1016/j.cognition.2005.11.004
Peretz, I. and Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of Psychology, 56:89–114. DOI: https://doi.org/10.1146/annurev.psych.56.091103.070225
Quick, D. and Hudak, P. (2013). Grammar-based automated music composition in Haskell. Proceedings of the First ACM SIGPLAN Workshop on Functional Art, Music, Modeling & Design (FARM), pages 59–70. DOI: https://doi.org/10.1145/2505341.2505345
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., and Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden.
Rohrmeier, M. (2013). Musical expectancy: Bridging music theory, cognitive and computational approaches. Zeitschrift der Gesellschaft für Musiktheorie, 10(2):343–371. DOI: https://doi.org/10.31751/724
Rohrmeier, M. (2020a). The syntax of jazz harmony: Diatonic tonality, phrase structure, and form. Music Theory and Analysis, 7(1):1–63. DOI: https://doi.org/10.11116/MTA.7.1.1
Rohrmeier, M. and Pearce, M. (2018). Musical syntax I: Theoretical perspectives. In Bader, R., editor, Springer Handbook of Systematic Musicology, pages 473–486. Springer, Berlin and Heidelberg. DOI: https://doi.org/10.1007/978-3-662-55004-5_25
Rohrmeier, M. A. and Koelsch, S. (2012). Predictive information processing in music cognition: A critical review. International Journal of Psychophysiology, 83(2):164–175. DOI: https://doi.org/10.1016/j.ijpsycho.2011.12.010
Runco, M. A. and Jaeger, G. J. (2012). The standard definition of creativity. Creativity Research Journal, 24(1):92–96. DOI: https://doi.org/10.1080/10400419.2012.650092
Sarkar, P. and Chakrabarti, A. (2011). Assessing design creativity. Design Studies, 32(4):348–383. DOI: https://doi.org/10.1016/j.destud.2011.01.002
Schiavio, A. and Benedek, M. (2020). Dimensions of musical creativity. Frontiers in Neuroscience, 14(578932). DOI: https://doi.org/10.3389/fnins.2020.578932
Schiavio, A., Menin, D., and M., J. (2014). Music in the flesh: Embodied simulation in musical understanding. Psychomusicology: Music, Mind, and Brain, 24(4):340–343. DOI: https://doi.org/10.1037/pmu0000052
Schlenker, P. (2017). Outline of music semantics. Music Perception, 35(1):3–37. DOI: https://doi.org/10.1525/mp.2017.35.1.3
Schlenker, P. (2019). Prolegomena to music semantics. Review of Philosophy and Psychology, 10(1):35–111. DOI: https://doi.org/10.1007/s13164-018-0384-5
Schmalfeldt, J. (1992). Cadential processes: The evaded cadence and the “one more time” technique. Journal of Musicological Research, 12(1–2):1–52. DOI: https://doi.org/10.1080/01411899208574658
Searle, J. (1980). Minds, brains and programs. Behavioral and Brain Sciences, 3(3):417–457. DOI: https://doi.org/10.1017/S0140525X00005756
Shuker, R. (2013). Understanding Popular Music. Routledge, London and New York. DOI: https://doi.org/10.4324/9780203188019
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144. DOI: https://doi.org/10.1126/science.aar6404
Stein, M. I. (1953). Creativity and culture. Journal of Psychology, 36:311–322. DOI: https://doi.org/10.1080/00223980.1953.9712897
Stobart, H. and Cross, I. (2000). The Andean anacrusis? Rhythmic structure and perception in Easter songs of Northern Potosí, Bolivia. British Journal of Ethnomusicology, 9(2):63–92. DOI: https://doi.org/10.1080/09681220008567301
Sturm, B. L., Ben-Tal, O., Monaghan, U., Collins, N., Herremans, D., Chew, E., Hadjeres, G., Deruty, E., and Pachet, F. (2019a). Machine learning research that matters for music creation: A case study. Journal of New Music Research, 48(1):36–55. DOI: https://doi.org/10.1080/09298215.2018.1515233
Sturm, B. L. T., Iglesias, M., Ben-Tal, O., Miron, M., and Gómez, E. (2019b). Artificial intelligence and music: Open questions of copyright law and engineering praxis. Arts, 8(3):115. DOI: https://doi.org/10.3390/arts8030115
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236):433–460. DOI: https://doi.org/10.1093/mind/LIX.236.433
Varela, F. J., Thompson, E., Rosch, E., and Kabat-Zinn, J. (2016). The Embodied Mind: Cognitive Science and Human Experience. MIT Press, Cambridge, MA. DOI: https://doi.org/10.7551/mitpress/9780262529365.001.0001
West, R. and Horvitz, E. (2019). Reverse-engineering satire, or “Paper on computational humor accepted despite making serious advances”. In 33rd AAAI Conference on Artificial Intelligence. AAAI Press. DOI: https://doi.org/10.1609/aaai.v33i01.33017265
Widdess, R. (1981). Aspects of form in North Indian ālāp and dhrupad. In Widdess, D. and Wolpert, R., editors, Music and Tradition: Essays Presented to Laurence Picken, pages 143–181. Cambridge University Press, Cambridge, UK.
Widmer, G. and Goebl, W. (2004). Computational models of expressive music performance: The state of the art. Journal of New Music Research, 33(3):203–216. DOI: https://doi.org/10.1080/0929821042000317804
Wiggins, G. A. (2006). A preliminary framework for description, analysis and comparison of creative systems. Knowledge-Based Systems, 19(7):449–458. DOI: https://doi.org/10.1016/j.knosys.2006.04.009
Wiggins, G. A. (2012a). Computer models of (music) cognition. In Rebuschat, P., Rohrmeier, M. A., Hawkins, J. A., and Cross, I., editors, Language and Music as Cognitive Systems, pages 169–188. Oxford University Press, Oxford. DOI: https://doi.org/10.1093/acprof:oso/9780199553426.003.0018
Wiggins, G. A. (2012b). Music, mind and mathematics: Theory, reality and formality. Journal of Mathematics and Music, 6(2):111–123. DOI: https://doi.org/10.1080/17459737.2012.694710
Wiggins, G. A. (2018). Creativity, information, and consciousness: The information dynamics of thinking. Physics of Life Reviews, 1:1–39. DOI: https://doi.org/10.1016/j.plrev.2018.05.001
Wiggins, G. A., Müllensiefen, D., and Pearce, M. T. (2010). On the non-existence of music: Why music theory is a figment of the imagination. Musicae Scientiae, 14(1 (suppl)):231–255. DOI: https://doi.org/10.1177/10298649100140S110
Wiggins, G. A., Tyack, P., Scharff, C., and Rohrmeier, M. A. (2015). The evolutionary roots of creativity: Mechanisms and motivations. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1664):20140099. DOI: https://doi.org/10.1098/rstb.2014.0099
Winograd, T. (1972). Understanding Natural Language. Academic Press, New York. DOI: https://doi.org/10.1016/0010-0285(72)90002-3
Witek, M. A., Clarke, E. F., Wallentin, M., Kringelbach, M. L., and Vuust, P. (2014). Syncopation, body-movement and pleasure in groove music. PLoS ONE, 9(4):e94446. DOI: https://doi.org/10.1371/journal.pone.0094446