Pitch-class distributions are of central relevance in music information retrieval, computational musicology and various other fields, such as music perception and cognition. However, despite their structure being closely related to the cognitively and musically relevant properties of a piece, many existing approaches treat pitch-class distributions as fixed templates.

In this paper, we introduce the Tonal Diffusion Model, which provides a more structured and interpretable statistical model of pitch-class distributions by incorporating geometric and algebraic structures known from music theory as well as insights from music cognition. Our model explains the pitch-class distributions of musical pieces by assuming tones to be generated through a latent cognitive process on the Tonnetz, a well-established representation for harmonic relations. Specifically, we assume that all tones in a piece are generated by taking a sequence of interval steps on the Tonnetz starting from a unique tonal origin. We provide a description in terms of a Bayesian generative model and show how the latent variables and parameters can be efficiently inferred.

The model is quantitatively evaluated on a corpus of 248 pieces from the Baroque, Classical, and Romantic era and describes the empirical pitch-class distributions more accurately than conventional template-based models. On three concrete musical examples, we demonstrate that our model captures relevant harmonic characteristics of the pieces in a compact and interpretable way, also reflecting stylistic aspects of the respective epoch.

However, despite the meaningful structure of PCDs, many existing approaches do not make the relevant aspects explicit in the way PCDs are modeled and represented. In fact, PCDs are mostly treated as fixed templates, whose structure is only interpreted in a post-hoc manner. This is also reflected in the way PCDs are commonly visualized, namely as categorical distributions of twelve chromatic semitones, where visual proximity does not necessarily reflect musical relations. Ignoring music-theoretical insights about the structure of tonal space for representing and visualizing PCDs conceals deeper structural relations that are reflected in these distributions.

The goal of this paper therefore is to provide a compact, structured and interpretable representation of PCDs. The proposed

We quantitatively evaluate our model against several baseline models on a corpus of 248 pieces from the Baroque, Classical, and Romantic era, demonstrating that the TDM provides music-theoretically reasonable interpretations. Moreover, it models the empirical PCDs more accurately than the unstructured baseline models as measured by the

Many large corpora are available in the MIDI format, which encodes pitch through an integer representation, commonly interpreted as

This implies three shortcomings (see Appendix A for a more detailed discussion): 1) the visual representation in a linear arrangement suggests some implicit ordering and proximity relation that is not explicitly modeled; 2) the commonly used chromatically ascending order does not reflect tonal relations well, especially with respect to harmony and key, where an arrangement along the line or circle of fifths would be better suited; and 3) the space of pitch classes exhibits an inherent cyclic topology, which is not reflected in a linear arrangement. The TDM overcomes these limitations by using the Tonnetz as basis representation for PCDs, which preserves pitch spelling, and by representing proximity relations via steps on the Tonnetz. Moreover, the Tonnetz extends the purely fifth-based representation by additionally allowing for proximity relations based on major and minor third steps.

The pervasive categorical representation of PCDs is also used to display the results of psychological probetone experiments where participants rate different degrees of stability of tones in 12TET (

Several studies have shown that PCDs suggest an arrangement that is essentially equivalent to the Tonnetz (see Section 3.2). Krumhansl and Kessler (

The cognitive relevance of this structure has been further investigated by Krumhansl (

Milne and Holland (

Our approach builds upon these prior works on the cognitive relevance of the Tonnetz by using it more generally to represent empirical PCDs.

The basic structure of the TDM (see Section 4) is similar to that of probabilistic topic models (

A topic model for music should therefore go beyond those developed for natural language by explicitly modeling the structural relations of pitch classes in tonal music and incorporating the geometric and algebraic structures known from music theory as well as insights from music cognition.

The only music-specific application of topic models we are aware of is by Hu and Saul (

Consequently, while Hu and Saul evaluate their model by comparing its classification accuracy to that of established template-based key finders, we use the KLD to the empirical distribution as a performance measure, which makes a direct comparison of the present approach with that of Hu and Saul difficult. Note, however, that our baseline model with two static key profiles is conceptually very similar to the model by Hu and Saul.

Based on the music-theoretical considerations discussed above, we present a cognitive interpretation of the Tonnetz and describe a model of the latent cognitive process, grounded in the perception of tonal music.

We represent tones on the

Tonal space ℑ and interval space ℐ on line of fifths: tonal pitch-classes and tonal pitch-class intervals along with their integer representation in terms of the number of steps along the line of fifths (using C as a reference tone).

Note that in this representation, we are able to distinguish enharmonically equivalent intervals, such as +A4, the ascending augmented fourth (tritone), and +d5, the ascending diminished fifth. This is a musically and cognitively relevant distinction: since the interval +d5 commonly resolves inwards into a third (+M3 or +m3) and +A4 (the tritone) commonly resolves outwards into a sixth (+m6 or +M6), these intervals create different harmonic expectations that imply different cognitive representations. As discussed in Section 2.1, a major drawback of the commonly used MIDI format is that it cannot be used to represent these musically important distinctions.

Our goal in this paper is to provide a statistical model for music that allows to describe tonal pitch class distributions of musical pieces in a compact, structured and interpretable way. In the following, the term

While the line of fifths connects all possible pitch classes and is of central importance in tonal music (

Primary intervals with respect to C.

The infinite expansion of this graph is called the

Baroque: Johann Sebastian Bach, Prelude in C major, BWV 846 (1722),

Classical: Ludwig van Beethoven, Sonata op. 31, no. 2 in D minor ‘Tempest’, 1st mov. (1802),

Romantic: Franz Liszt,

Tonal pitch-class distributions plotted as heatmaps on the Tonnetz (darker colors correspond to higher probabilities). The plots were generated with the Python library

In the case of tonal pitch classes, the Tonnetz is topologically equivalent to the

When looking at the distributions in Figure

A central idea of the TDM is that the generation of tones is associated to certain motions on the Tonnetz. Specifically, the Tonnetz captures harmonic relations, so that moving on the Tonnetz corresponds to typical harmonic changes occurring in a piece. These changes happen on several time scales, from large scale modulations between different sections of a piece, to harmonic progressions, and polyphony on the surface level. Motion on the Tonnetz thus reflects the deep structural dynamics of a piece.

Different paths through the Tonnetz express different harmonic interpretations of the involved tones. This distinction of harmonic functions expressed in different paths on the Tonnetz is an explicit part of our model.

For instance, the tone E can be reached from the tone C by ascending four perfect fifths or by ascending a major third. In the first case, E is conceived to be the perfect fifth of A, which is the perfect fifth of D, which is the perfect fifth of G, which, ultimately, is the perfect fifth of C, reflecting a recursive application of applied dominants, while in the second case, the tone E stands in a direct major third relation to C. We take this distinction as expressing two different interpretations of the harmonic function of E relative to C. That is, we assume that it corresponds to a difference on the cognitive level, because it relates the tones C and E in two different ways.

In some contexts, this might be reflected in different tunings and ways of hearing, as ascending four perfect fifths leads to the Pythagorean major third E, while the E reached by ascending a major third corresponds to the major third E in just intonation. However, since these different paths from C to E are indistinguishable in a notated score, our cognitive model does

In the TDM, we assume that different paths have different probabilities of occurrence, that is, that some paths are more likely than others. For instance, we assume that different steps occur with different probabilities and that paths cannot have an arbitrary length. The step probabilities and the path-length distribution are explicit components of the TDM (Section 4). Altogether, these aspects reflect different ways of hearing tonal relations, as well as different stylistic characteristics of tonal music in general and of concrete pieces more specifically.

Tonal music is fundamentally characterized by the existence of one or more tonal centers that are related to each other in a hierarchical manner, for instance, by modulating through different local keys (

The purpose of the TDM is to model the intuitions presented in Section 3 formally and derive quantitative measures that capture the diffusion of probability mass from the tonal origin along the different axes of the Tonnetz. To this end, we define an explicit generative model, in which each tone in a piece is generated by starting at the tonal origin and taking a number of steps on the Tonnetz. As described in Section 3.3, there are many possible paths connecting two tones on the Tonnetz, which correspond to different cognitive interpretations of their harmonic relation. In our model, these different derivations are treated as a latent representation, which is marginalized out to determine the overall probability of reaching a particular tone.

In the general formulation of our model, we include the possibility of multiple tonal origins, and allow for an arbitrary set of intervals, as well as a generic path-length distribution. For the evaluation of the model on tonal music, presented in Section 5, we then make the more specific assumptions motivated in Section 3. Specifically, we assume a single tonal origin (Section 3.4) and restrict the allowed interval steps to the set of primary intervals present in the Tonnetz (Section 3.2).

The TDM has the basic structure of a topic model with two nested levels of generation, as shown in Figure

We extend this basic structure by splitting the corpus-level and piece-level variables into multiple distinct variables with a clear semantics and by replacing the inner generative step for a single tone with a model of the underlying latent cognitive process. The complete model is shown in Figure

Let ℑ ≡ ℤ be the space of all possible TPCs and ℐ = {

For each piece

that is, the distribution of tonal origins _{c}_{c}_{w}_{w}_{λ}

where the prior depends on the specific path-length distribution being used: for a Poisson distribution the conjugate prior is a gamma distribution, for a binomial distribution it is a beta distribution.

For each tone

that is, a number of steps ^{0},…,^{n}^{0} from the distribution of tonal origins and then transitioning ^{i}^{i}^{+1} by adding an interval; finally, the last tone ^{n}

For a given corpus _{c}, α_{c}, H_{w}, α_{w}, h_{λ}

where, following Bayes’ theorem, ^{0},…,^{n},n

The expansion of the marginal likelihood ^{i}^{i}^{–1},^{0}|^{0},…,^{n},n

The marginal likelihood ^{i}^{0},…,^{i}^{–1} has already been marginalized out: the latent cognitive process is a Markov chain in the tonal space ℑ and the marginal distribution ^{i}

Computing the marginal likelihood ^{0},…, ^{n},n^{–5}).

1: | |||

2: | ^{0}| |
||

3: | |||

4: | |||

5: | |||

6: | ^{i} |
||

7: | |||

8: | |||

For our evaluation, we make three additional assumptions that are specific to tonal music and well-established in music theory (see Section 3): 1) we assume that only a single tonal origin per piece exists and that all tones are a priori equally likely to become the tonal origin; 2) we restrict the number of allowed interval steps to the six primary intervals present in the Tonnetz; 3) we assume a uniform prior over the path-length variable

We infer values for the piece-level variables

We evaluate the TDM in two ways: first, we introduce several baseline models (Section 5.1) and perform a quantitative comparison (Section 5.2) on a corpus of 248 pieces from different historical epochs. Second, we perform a detailed qualitative analysis (Section 5.3) on three exemplary pieces, inspecting the inferred parameters and discussing our musical interpretation of these results.

We introduce several baseline models to verify whether the structural assumptions incorporated in our model effectively improve performance. In particular, we are interested in validating the impact of the Tonnetz topology and the assumed latent process on model performance.

The two static baseline models,

The static baseline model consists of a single global PCD, that is, a fixed template or profile. The model has an individual parameter for each tonal pitch class; together these parameters determine the shape of the profile and they are trained via gradient descent on the entire corpus. The profile is individually matched to each piece (during training and for evaluation) by shifting it along the line of fifths. The model thus has a large number of continuous corpus-level parameters (one for each tonal pitch class) but only a single discrete piece-specific variable (the transposition). The optimal profile corresponds to the purple line in Figures

This model is identical to the static model described above but comprises two different static PCDs. Matching with a specific piece is done by choosing the best profile

To specifically test the impact of the Tonnetz topology, we introduce a reduced version of the TDM that uses only the line-of-fifths topology. This model is identical to the full TDM (with binomial path-length distribution), except that it only uses fifth steps (no major or minor thirds). The model has three piece-specific variables (one for weighting +P5 against –P5 and two for the binomial distribution) and produces bell-shaped PCDs on the line of fifths, including Gaussian and skewed bell shapes.

For the quantitative evaluation of the TDM, we used a corpus of 248 pieces that are representative for the Baroque, Classical, and Romantic era: all preludes and fugues from Bach’s

All models were trained on the entire corpus to minimize the

Comparison of model performance for composers from different historical epochs: Bach (Baroque), Beethoven (Classical), Liszt (Romantic). The box plots indicate quartiles (25%, 50%, 75%), whiskers extend 1.5 times the interquartile range (IQR). The mean is indicated as a dashed red line. Single pieces are shown as black dots in a swarm plot, outliers as larger diamonds.

As expected, the static model with a single global PCD performs worst. The inferred profile can be seen in the detailed analysis of the single pieces in Figure

Also not surprisingly, the static model with two profiles performs considerably better. Notably, for the Bach pieces this model performs as well as the TDM (Poisson) and almost as well as the TDM (Binomial). Presumably, the main reason for the strong performance of the static model on Bach’s pieces is the fact these pieces are typically confined to a relatively well-defined range on the line of fifths, after which the PCD quickly drops to zero. This shape does not correspond to the smooth decay assumed in the TDM, which is more typically observed in the pieces by Beethoven. In contrast to the relatively good performance for Bach’s pieces, for Beethoven and Liszt the static two-profile model performs even worse than the reduced line-of-fifth TDM (Binomial, 1D).

Inspecting the two inferred profiles in Figure

Finally, we compare the performance of the different versions of the TDM. For the line-of-fifths based TDM (Binomial, 1D), we observe a decreasing performance from Bach to Beethoven and Liszt. This corresponds to the music-theoretic insight that Bach’s harmony is predominantly fifths-based, while Beethoven incorporates an increasing amount of third-based harmonic progressions, which is yet again extended by Liszt. Beethoven’s pieces therefore tend to be multimodal on the line of fifths – but not on the Tonnetz. In contrast, some of Liszt’s pieces are multimodal even on the Tonnetz and tend to be fragmented on the line of fifths.

This is also strongly reflected in the different performances of the full TDMs, which both show the best results for Beethoven. However, the reason for the decrease in performance for Bach and Liszt as compared to Beethoven are presumably different.

As mentioned above, Bach’s pieces tend to have PCDs with a relatively sharp decay, which conflicts with the smooth decay of the TDM, more commonly found in Beethoven’s pieces. On the other hand, the mentioned fact that (due to the extended tonality) Liszt’s pieces tend to be multimodal even on the Tonnetz, means that, even though they have a smooth decay, they may not be well captured by the TDM. A slight modification of the TDM to allow for multiple tonal origins, might also allow to model this kind of extended tonality appropriately.

We chose three exemplary pieces with the goal to evaluate whether the TDM is able to capture characteristics of the respective piece and historical period. We used the same three pieces for which the empirical PCDs are shown in Figure

Baroque: Johann Sebastian Bach, Prelude in C major, BWV 846 (1722),

Classical: Ludwig van Beethoven, Sonata op. 31, no. 2 in D minor ‘Tempest’, 1st mov. (1802),

Late-Romantic: Franz Liszt,

As described above, the piece-level variables were determined via MAP/ML estimation. The results are shown in Figure

Comparison of the empirical PCD (gray bars with shaded background) with different models (colored plots). The corresponding Kullback-Leibler divergence is indicated in square brackets after the model.

For each of the three example pieces, the MAP estimates for the parameters of the binomial TDM are shown in Figure _{i}_{i}

Optimal parameters for the TDM (Binomial) model (see text for details).

For Bach’s prelude (Figures _{–P5} = 0.475) reflects that the tonal origin G lies a fifth above the most frequent tone and global tonic C. The strong descending fifth is balanced by the combination of the ascending fifth (_{+P5} = 0.288) and the descending minor third that also ascends along the line of fifths (_{–m3} = 0.137). Interestingly, explaining the PCD using the global tonic C as the tonal origin (results not shown) is only marginally less accurate with correspondingly changed weights (stronger ascending fifth and a strong ascending major third instead of the descending minor third). Given the lack of temporal information, this ambiguity is musically consistent and reflects the harmonically close relation of G and C (also see Section 3.4). The general importance of the line of fifths for the organization of the tonal material is characteristic of Baroque pieces.

The optimal parameters for Beethoven’s Sonata movement (Figures _{+P5} = 0.331, _{–P5} = 0.356). However, as opposed to Bach’s piece, a significant amount of the probability mass (≈30%) is assigned to the two major third components (_{+M3} = 0.160, _{–M3} = 0.137), again essentially symmetric. This is typical for pieces in the minor mode due to the above-mentioned overall prominence of the (minor) tonic and (major) dominant triads. However, it can also be interpreted as reflecting the stylistic changes in the Classical period that allow for broader ranges of mediantic (i.e. third-based) local key relations, as can also be observed in this movement. The approximate point symmetry of the empirical distribution around the pitch class A and along the three axes of the Tonnetz (see Figure

The most diverse distribution of pitch classes is found in Liszt’s piano piece (Figures

The model was able to capture a number of important characteristics of this piece. The harmonic structure of the entire piece is fundamentally governed by major third relations. Its three sections (

Since each of the sections is largely diatonic with some ornamental chromaticism, it is not surprising that also for this Romantic piece the perfect fifths together account for more than 50% of the overall weights, followed by the descending minor and major thirds (0.216 and 0.192, respectively). This entails, for example, that the upper major third A♯ of the frequently used F♯ major triad is largely explained by the combination of an ascending fifth and a descending minor third, while D and B♭ are more directly explained as descending major third steps. In particular, the difference of the model explanations between B♭ and A♯ is meaningful since these tonal pitch classes bear different harmonic implications, which would be lost in a neutral pitch-class representation.

We presented the

The model was evaluated quantitatively on a corpus of 248 pieces showing superior performance to traditional models. Comparing against several baseline models, the positive impact of incorporating the Tonnetz structure was demonstrated. Furthermore, a detailed analysis of three exemplary pieces showed that the TDM is able to capture characteristic properties of these pieces and the respective period.

The TDM is well-suited to study a range of relevant questions in digital musicology and MIR. It may be extended and adapted in multiple ways. For instance, it can be adapted to incorporate more specific assumptions about the underlying cognitive processes and the relevant musical style and it allows for corpus-based studies to investigate historical developments and stylistic differences among a large number of musical pieces. In MIR, the piece-level variables (tonal origin, interval weights, and path-length distribution) can be conceived as a tonal fingerprint of the piece, going beyond the notion of a pitch profile, and thus allowing to determine its tonality in a novel way. The model can be extended to include a larger (or infinite) set of intervals ℐ by employing a full Dirichlet process prior and the modeled cognitive process can be adapted by using independent path-length distributions for the different intervals. The tonal space can be augmented to include tuning differences and take into account different interpretations for different generation paths. And finally, the model may be generalized to include a time component that takes sequential and syntactic dependencies in musical pieces into account.

The TDM thus provides a novel approach to modeling PCDs in a compact and musically interpretable way, while outperforming existing approaches in terms of accuracy. It may thereby serve as a broad foundation for further developing generative models of PCDs in music and opens up multiple highly promising directions for future research.

The additional file for this article can be found as follows:

Appendix. DOI:

The data and code to reproduce our results can be found at

This project was partially funded through the Swiss National Science Foundation within the project “Distant Listening – The Development of Harmony over Three Centuries (1700–2000)”. Also, this project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program under grant agreement No 760081 – PMSB. Martin Rohrmeier acknowledges the kind support by Mr Claude Latour through the Latour chair of Digital Musicology at EPFL.

The authors have no competing interests to declare.