Although music is full of repetitive motifs and themes, artificially intelligent temporal sequence models have yet to demonstrate the ability to model or generate musical compositions that satisfy steerable, long-range constraints needed to evoke such repetitions. Markovian approaches inherently assume a strictly limited range of memory while neural approaches—despite recent advances in evoking long-range dependencies—remain largely unsteerable. More recent models have been published that attempt to evoke repetitive motifs by imposing unary constraints at intervals or by collating copies of musical segments. Although the results of these methods satisfy long-range dependencies, they come with significant—potentially prohibitive—sacrifices in the musical coherence of the generated composition or in the breadth of satisfying compositions which the model can create. We present

Music is full of long-range dependencies. Examples include themes that recur at regular or irregular intervals; motifs across pitch, rhythm, and harmonic progression; and lengths of segments and phrases. The repetitive nature of music contributes to the the fluency with which a song is processed by the human mind and has been linked to the success with which music earns popularity and renown (

Despite the significance of such long-range dependencies, nearly all computational models for algorithmic music composition exhibit significant limitations in their ability to model such dependencies. Resolving this discrepancy between the long-range dependencies in human-composed music and the lack thereof in computationally-composed music has been considered one of the foremost challenges for the next decade of music information research (

Although several models of composition have been proposed, they generally fall into one of a few common types. Neural approaches constitute one popular approach to sequence generation. Long short-term memory (LSTM) models reliably produce unique and musically-coherent sequences (

Markovian models represent a second common type of music composition model. Markov models are a class of probabilistic sequence models that fundamentally assume the composition of a sequence depends strictly on the immediate context, without regard for what precedes that context (

Others have focused on the specific case of imposing

Despite numerous efforts, methods have thus far not been found to impose constraints to control generation in deep learning and neural architectures (

Besides constraint-driven approaches, another common approach to creating long-range repetitive and phrasal structure in music is using a form of “copy-paste”, generating musical subsequences that are then duplicated at intervals (

We present a novel solution to address the long-range dependency problem. To our knowledge, this is the first method which has been demonstrated to generate compositions according to Markov probabilities—ensuring a local cohesion throughout the sequence— with steerable, long-range, dynamic dependencies, allowing the imposition of wide-spread motifs and repetitions. We demonstrate how the model is capable of generating music that simultaneously exhibits a variety of patterns and motifs in pitch, rhythm, harmony, and lyrics.

Our approach involves the combination of

Although our main purpose is ultimately to define a model capable of enforcing

The implementation and application of relational constraints (which we define below) has not yet been carefully examined in the context of Markov models. In our experience in implementing such constraints, we happened upon some unique applications of these constraints that we had not initially thought of that are worthy of mention. These are also discussed in this section.

In speaking of long- and short-range dependencies in a sequence of music, we can formally define a _{1}, _{n}_{1} × _{n}_{1}, _{n}_{i}_{i}_{1}, _{n}_{1}_{2}) | _{1} and _{2} are notes of the same pitch} or {(_{1}_{2})| _{1} is a word that rhymes with _{2}}.

Given a sequence of random variables _{1}, _{n}_{i},ρ_{i} = x_{i}_{i}_{i},X_{j},ρ_{i} = x_{i}, X_{j} = x_{j}_{i},x_{j}_{3},{

A _{i},P_{T})_{i}_{x}∈Σ _{i}_{T}_{y∈Σ} _{T}_{0}, _{n})

As defined by Pachet et al. (_{i},P_{T})_{1}, _{1})_{1}, _{1}_{i}_{i}

Sequences that satisfy

_{1}, …, _{l–}_{1} where each matrix _{i}_{T}_{i}_{i}

The properties, construction, and normalization of _{1} which generates a sequence according to the probability function _{1}, …, _{n}_{1}) · _{2}|_{1}) · … · _{n}_{n}_{–1}). However, _{d}_{i}_{1}, …, _{i}_{–1}) = _{i}_{i–d,}_{i}_{–1}). The fact that the order of _{d}_{1} (where each element _{i}_{d}_{1}, …, _{m}_{j}_{d}

In _{d}_{j}_{k}_{j}_{k}_{d}

_{1},_{2}, and _{3} according to two unary constraints: _{3} = (_{3}, {_{4} = (_{4}, {_{3} constraint. This constraint is an example of a floating constraint in that the part of speech (POS) constraint is effectively satisfied by any satisfying token appearing at sequence positions 3, 4, or 5. States marked with a black ‘X’ are pruned due to the further addition of the _{4} rhyme constraint. The _{4} constraint is an example of a dynamic constraint in that in the 6-word sequence generated from the model, the rhyme group relating the 4th and 6th words is dynamically chosen at run time. Grey transitions represent transition probabilities that are zeroed as a result of applying or ensuring arc-consistency of constraints.

The NHMM model was originally defined to enforce unary constraints. With a _{3} to 3-tuples in which the first and last words do not rhyme (see

In a traditional _{i}_{i-}_{1} and _{i}_{i,}ρ_{i–}_{1}. However, _{i–}_{1} represents transitions from _{i}_{–}_{1} to _{i}_{i–}_{1} and _{i}_{i–}_{1}_{i}_{i–}_{1}_{i,}ρ)_{i–}_{1} to _{i}_{i–}_{1}. Thus without increasing the polynomial runtime required for NHMM construction, we can extend the range of short-range dependencies from the Markov order (

See endnote 1 for a link to our source code for building

There are many different types of dependencies that can be represented as relational constraints in higher-order NHMMs. We will look at two examples here. These same types of dependencies are possible and are subsequently demonstrated as long-range dependencies, but the more narrow range of short-range dependencies provides a better context for an introduction. To facilitate ease of understanding we will use natural language to demonstrate the examples, but they are just as powerfully applied to sequences of pitches, rhythms, chords, or other musical viewpoints. When referencing parts of speech (POS), we use the POS tags from the Penn Treebank Project.

As mentioned in the introduction, one way of evoking the impression that two positions of a composition are related (e.g., in a repeated motif) is to impose identical or related unary constraints at each position. For example, Barbieri et al. (

Using static constraints to create (the illusion of) dependencies comes with huge limitations. As an example, consider that the rhyme group to which “today” belongs is only one of thousands of possible rhyme groups, meaning the range of viable solutions that are accessible to the model is reduced by several orders of magnitude. In the case of music where we might constrain pitches, rhythms, or chords to match in order to create a motif, static constraints have a similarly reducing effect on the solution space, with the exact reduction being equal to the size of the pitch, rhythm, and chord domains. This means an 88× reduction to model a pitch dependency with static constraints (if we limit ourselves to the range of pitches on a standard keyboard); a 15× reduction to model a rhythm dependency (if our shortest duration is a 16th note, our longest a whole note, and allowing double-dotted durations); and a 4,000× or more reduction to model chords.

Relational constraints in higher-order NHMMs enable a class of dependency we call

A dynamic dependency is achieved in _{i}_{i+}_{1} represents a subsequence in which two elements of interest satisfy the desired property (e.g., rhyme). Without additional modifications to the NHMM, this requires that the dynamically constrained positions must be within

In many applications the dynamic constraints of interest relate elements at very distant positions. Such dynamic constraints are not well-suited for the approach we have outlined thus far. As the Markov order

The examples in Section 2.4 use values of

Often when composing a sequence with dependencies, it is desirable to allow some flexibility in precisely _{i–d},X_{i–d+1},ρ)

An example of the application of floating dependencies is in defining semantic constraints. Barbieri et al. (

Floating dependencies can also be used to not over-restrict syllable-level NHMMs when imposing word-level constraints. Using a Markov process over words, Barbieri et al. (^{l–1}

Floating dependencies solve this problem by providing a means of imposing word-level POS templates without sacrificing expressiveness _{i}_{i}_{i}_{+1}. Associated with each syllable is the POS of the word from which the syllable was derived, enabling extraction of a (_{i},X_{i}_{+1},

In each of the three following examples we demonstrate a unique advantage afforded by dynamic and/or floating dependencies in NHMMs.

We demonstrate the use of dynamic dependencies in the context of a lyric-generation problem which we call the

After careful consideration of several prototypical solutions to the DBTB problem, we found that solutions generally followed one of 5 rhythmic templates (i.e., sequences of stressed [1] and unstressed [0] syllables): [011001], [0110101], [01010010], [01101011], or [01010101010]. Because there are variable satisfactory lengths and the length _{1}, _{n}_{1}, _{1})

(_{1},{

(_{1}

For _{i}_{i}_{+}_{j–}_{1}= 1 and unstressed if _{i}_{+}_{j–}_{1} = 0})

(_{1}, _{1}_{n} =_{n} =

If _{n} =_{1}_{1}

(_{2}_{3},{

NHMM 1 | NHMM 2 | NHMM 3 | NHMM 4 | NHMM 5 | |

NHM Order ( |
4 | 5 | 5 | 6 | 8 |

NHM Length ( |
3 | 3 | 4 | 3 | 4 |

Syllable Seq Length ( |
6 | 7 | 8 | 8 | 11 |

Rhythmic Template ( |
[011001] | [0110101] | [01010010] | [01101011] | [01010101010] |

Training Sentences | 3,892,039 | 2,654,884 | 2,654,884 | 2,051,040 | 1,239,850 |

Training Pronunciations | 366,062,046 | 286,075,704 | 286,075,704 | 255,086,072 | 208,330,754 |

Solutions Generated | 5 | 5 | 4 | ||

Generated Example | “a dish of pickled fish” | “a cot and a chamber pot” | “a pillar that was a mirror” | “a mouse or a rat in the house” | |

Average Novelty | 3.65 | 3.66 | |||

Average Rhyme | 4.18 | 4.00 | |||

Average Rhythm | 3.12 | 3.16 | |||

Average Amusement | 2.39 | 2.48 | |||

Average Likability | 2.51 | 2.66 | |||

Note that dependencies 4 and 5 are examples of dynamic dependencies spanning the rhyming positions. Dependency 6 is an example of a floating dependency that uses a syntax tree with POS templates derived from traditional DBTB solutions (see

As training data for our DBTB models we used the Corpus of Contemporary American English (COCA) (

We trained five NHMMs, one for each of the rhythmic templates identified above. All but one of the five trained models found satisfying solutions (see _{s}

To evaluate the effectiveness of higher-order NHMMs, we performed a quantitative assessment of subjective reactions to the DBTB solutions generated by our models. We obtained 62 human-generated solutions by inviting university faculty and students to “come up with your own novel ending”. Of these we sampled a random subset to obtain an equal number of computer-generated and human generated solutions (44 of each). We conducted 94 surveys in which participants were asked to rate five DBTB solutions (randomly and evenly sampled from our mixed pool) on Likert scales for novelty, rhyme, rhythm, amusement, and overall likability (each solution thus being rated an average of 5.34 times). Results are shown in

The top five highest-scoring computer-generated solutions, their scores (averaged over the 5 criteria), and the NHMM that created them were:

“a way out of the bay” – 4.04 – NHMM 1

“a place in Chevy Chase” – 3.85 – NHMM 1

“a sign of the decline” – 3.80 – NHMM 1

“a scar shaped like a star” – 3.75 – NHMM 1

“a stream that wound like a dream” – 3.74 – NHMM 2

The five median-scoring computer-generated solutions were:

“a boy with a new toy” – 3.16 – NHMM 1

“a tree where I could see” – 3.06 – NHMM 1

“a thread from the side of his head” – 3.04 – NHMM 4

“a sheet over the seat” – 3.03 – NHMM 1

“a sty in her left eye” – 3.00 – NHMM 1

The five lowest-scoring computer-generated solutions were:

“a pillar that was a mirror” – 2.7 – NHMM 3

“a man who trained and ran” – 2.67 – NHMM 1

“a ticket at the last minute” – 2.51 – NHMM 3

“a comer of the room mother” – 2.25 – NHMM 3

“a vision of the ambition” – 2.24 – NHMM 3

We found that the novelty scores of computer-generated solutions were competitive with those of solutions generated by humans. The rhyme scores were also very competitive, with the difference being partially explained by our failure to consider the matching of syllable onsets in multisyllabic words (e.g., “pillar”,”mirror”) in NHMM 3. We hypothesize that the difference observed in the rhythm score may be due to the confounding variables of rhyme and grammaticality. For example, the poor rhyming performance of NHMM 3 (the only model with multisyllabic rhymes) may have carried over into the ratings for other aspects of NHMM 3 solutions. Likewise it seems a reasonable hypothesis that, although poor grammar does not affect rhythm, it may nonetheless be perceived to affect the rhythm; however, we did not include grammaticality in our assessment. The amusement and likability scores, with which the computer struggled most, may have been affected by the genre-appropriateness of some of the computer-generated solutions (e.g., “a flood of spurting blood”, “an orphan and an abortion”).

To demonstrate the generality of floating dependencies, we designed two different syllable-level NHMMs to generate haiku with floating semantic dependencies. The first was a 5th-order NHMM of length 13 with a

We imposed a

For generating prosodic rhythm, we designed a 4th-order NHMM over rhythm tokens of length 6 to generate rhythms for the lyrics “No more monkeys jumping on the bed”. We trained the model on lyric/rhythm sequences from the Wikifonia dataset. Given the rhythmic stress template of the lyrics, [111010101], the model imposes floating stress constraints requiring only two thirds of syllable stresses to match appropriately to emphasized rhythmic positions (one constraint over the first five syllables and another over the last four). The model has 4/4 floating time signature constraints at all positions (see

In the first example in

Most motifs and dependencies in music occur across ranges far longer than reasonable Markov order. To address this need we demonstrate a new method for modeling long-range dependencies with NHMMs. To our knowledge, this is the first method which has been demonstrated to generate compositions according to Markov probabilities, ensuring a local cohesion throughout the sequence, with long-range, dynamic dependencies, allowing the imposition of wide-spread motifs and repetitions. There are no limits on the range length of dependencies using this approach. The Markov order can be chosen without impact on the ability to impose long-range dependencies. We demonstrate the abilities of this model by generating sheet music compositions with verse-chorus structures, including repeating pitch, rhythm, harmonic, and lyric motifs.

An important key to generating music with long-range dependencies is the recognition that

The exposition of the method for imposing long-range dependencies relies on the definitions of relational constraints, Markov models, and NHMMs provided in Section 2.1. In addition, we define a DFA as a quintuple _{0},

Q is a finite set of states;

Σ is a finite set of symbols termed the alphabet;

_{0} ∈

A sequence _{1}, _{n})_{0}, _{n}_{i}_{–1,}_{i}_{i}_{n}

Note that when combining a DFA and Markov model, the domain Σ for variables _{1}, _{n}

Given a set of binary relational constraints _{i},X_{j},ρ_{i}s_{j}_{i}_{j}^{n}_{1} ∈ _{i–1}s_{i}_{j},s_{k})_{j},X_{k},ρ_{i}_{T}_{m}(s) >

The algorithm takes the general approach of building breadthwise a tree-like DFA where each layer (of edges) in the tree represents a position in the sequence to be generated. Each state _{i}_{j}_{qi},V_{qi},q_{i}_{qi}_{qi}_{i}_{k},ρ_{qi}_{k}

When expanding the DFA from a state ⟨_{qi},V_{qi},q_{i}_{j}_{qi}_{q}′,V_{q}′,q_{q}′_{qi}_{j}_{j},X_{k},ρ_{k},ρ_{q}′_{q}′_{2} such that 1) _{2} is a valid transition according to _{2}) ∈ _{j}_{+1,}_{q}′_{q}′_{j}_{+1}, _{q}′_{q}′_{q}′_{n}

_{1},_{4},_{rhyme}_{rhyme}

_{1}_{4},

_{i} =

The time and space requirements for the algorithm vary significantly depending on the inputs. Each constraint in

One way to optimize the construction of a _{4},_{4},_{4},_{Mary}_{Mary},X_{4},

Given a

where for sequences _{1}, _{n})_{U}(X_{i}) =_{1}, _{n}) =_{j},ρ)_{qi}_{i}_{j}

As automata lack probabilities, we use the _{0}, _{i},P_{T}_{1}, _{n}_{i}_{i}_{1}, _{n}_{1}, _{n})

The set of initial probabilities

and the transition probability function

A new set of unary “state-sensitive” constraints _{i},ρ)

The final step of Algorithm 2 constructs an NHMM from the unnormalized

Papadopoulos et al. (

The approach we present, shown in Algorithm 2, samples sequences under regular and Markov constraints with exact probability using an NHMM. An NHMM computes all marginal distributions ahead of time, speeding up sampling time at the cost of an increased build time (see

Verse-chorus structure is effected in lyrical sectional-form music when subsequences of different types or

Repetition of subsequences requires the use of a series of binary relational constraints using the equality binary relation. Rather than hand-craft a set of relational constraints _{v}_{i},X_{j},{x,y

In addition to the global, long-range structure imposed by these binary relational constraints, we added the following constraints:

_{1}_{i},X_{j},{x,y

We tried different Markov orders for the models used to generate each of the view points. We found that for generating eighth-note sequences of chords, a Markov order of 4–5 worked well. For eighth-note pitch sequences, 3–4 worked well. For eighth-note sequences of rhythms, 3–4 worked well. And for lyrics (we used a syllable-level model), a Markov order of 4–5 worked well.

The Markov models _{H}_{P}_{R}_{1}

Given the relational constraints _{v}_{v}_{v}_{v}

To demonstrate the efficacy of the model, we generated 12 compositions at random and analyzed their long-range dependencies. Of these 12 compositions, 7 were generated which satisfy a set of 147 constraints extracted from

Long-range dependency constraints satisfied in generated compositions.

Constraint Set 1 ( |
Constraint Set 2 ( |
||

Form | Ternary (ABA) | 32-bar form (AABA) | |

Measures | 12 | 32 | |

Events per measure | 8 | 8 | |

Composition length | 96 | 256 | |

Total constraint count | Harmony | 48 | 64 |

Rhythm | 16 | 96 | |

Lyrics | 32 | 26 | |

Pitch | 48 | 80 | |

Rhyme | 3 | 7 | |

Average constraint range (in eighth notes) | Harmony | 48 | 184 |

Rhythm | 80 | 138 | |

Lyrics | 64 | 192 | |

Pitch | 48 | 160 | |

Rhyme | 16 | 27 | |

Despite being generated from only 2 different sets of constraints, the 12 compositions are otherwise significantly varied. As shown in

Although for simplification we have focused exclusively on binary relations, the algorithms and logic in this approach could be adapted to allow for _{i}_{1}, _{in}, ρ)

Of particular interest is the runtime required to construct a

The solution we have presented enables probabilistic sampling of sequences that exhibit complex relational structure while maintaining natural, fully-Markovian transitions and allow via low Markov orders a broad range of expression. The ability to impose long-range dependencies using relational and regular constraints represents a critical achievement for AI composition, enabling researchers to focus on how to effectively use such tools to model music at the level of musically meaningful events and patterns.

Source code and data available at

i.e., values for variables—even values which do not violate constraints—are pruned if such values cannot possibly be part of a solution consistent with constraints.

See

The authors have no competing interests to declare.