We implement a novel approach to automatic harmonic analysis using a clustering method on pitch-class vectors (chroma vectors). The advantage of this method is its lack of top-down assumptions, allowing us to objectively validate the basic music theory premise of a chord lexicon consisting of triads and seventh chords, which is presumed by most research in automatic harmonic analysis. We use the discrete Fourier transform and hierarchical clustering to analyse features of the clustering solutions and illustrate associations between the features and the distribution of clusters over sections of the sonata forms. We also analyse the transition matrix, recovering elements of harmonic function theory.

Automatic harmonic analysis has been a goal of music information retrieval and music cognition researchers since the infancy of the fields. The first attempts (such as

Micchi et al. (

There are many studies that use unsupervised methods, such as neural networks and hidden Markov models (HMMs) to perform harmonic analysis and non-harmonic tone identification tasks (

Research on harmonic identification from music audio also typically requires a given chord lexicon. The elements of these chord vocabularies are usually expressed as binary pitch-class vectors, the equivalent of pitch-class sets (

In another large body of research, the main goal is characterizing harmonic successions. In such contexts researchers usually avoid the need for automatic harmonic analysis through the use of human annotators. This method has been used for Bach chorales (

Another strategy that has been used for simplifying the process of harmony identification in order to study norms of harmonic succession is to focus exclusively on music with simple textures, especially the chorale repertoire (

In most empirical research on harmony, then, chord lexicons of traditional triads and seventh chords serve as prior assumptions backed by music theoretic traditions rather than empirical verification. A smaller body of research exists that empirically tests such lexicons by making fewer theoretically freighted initial assumptions about what kinds of harmonic objects qualify as chords, inspired by Quinn (

In the present study, we explore a different method of investigating chord identity and succession in a corpus without prior assumptions about the chord vocabulary. Unlike the HMM-based approaches just described, which infer chord identities from norms of harmonic succession, we use a clustering-based approach which makes no inferences from chord succession, but instead requires a fixed temporal segmentation relying on metrical conventions of the repertoire. This clustering process may be seen as an alternative to a filtering process as applied by Sears and Widmer (

The primary distinguishing strategy of this approach is to represent the basic harmonic object not as a simple set of pitches or pitch-class sets, but as a twelve-place vector that assigns a weight to each pitch class. This kind of mathematical object, known as a

In the present study we take pitch-class counts over quarter-note beats, a timescale associated with harmonies as opposed to keys, and apply

A basic assumption in traditional theories of tonal harmony is what we might call

The present study is primarily exploratory with the most important conclusion being the efficacy of this method. We organize it into three parts, an analysis of the clustering solution (Experiment 1), an analysis of the distribution of clusters across parts of the sonata forms (Experiment 2), and an analysis of first-order transitions between clusters (Experiment 3). While the results largely confirm traditional music theory, Experiment 1 suggests a chord lexicon that differs substantially from the lists of binary-valued triads and seventh chords that serve as the starting point for most research in automatic harmonic analysis. A relatively small number of triads and seventh chords are needed to classify harmony in the corpus, but the central harmonic functions, tonic and dominant, come in multiple forms. Experiments 2 and 3 show that the same triads can have different functions, identifiable in transition data and the distribution across formal sections, and these distinct functions are actually discernable on the basis of pitch-class content (they are associated with distinct clusters).

We also show that applying the discrete Fourier transform (DFT) on pitch-class vectors produces an effective map of keys and harmonic functions. This method has previously been used to characterize musical keys (

Our corpus consists of seventeen movements from Mozart’s Piano Sonatas. These were chosen for the relative simplicity and conventionality of their harmony, and their status as standards of the classical “common practice” norms of harmony and form. They also stand out for a diversity of texture that makes automatic harmonic analysis a difficult task. The harmonic successions of this repertoire also have been previously studied by Tymoczko (

Because our method requires isolating beats, we restricted our dataset to movements with quarter-note beats, meaning meters of 2/4, 3/4, or 4/4. To take advantage of the conventions of sonata form in analyzing the data, we also restricted to movements in major-mode sonata form or closely related forms—one exposition-recapitulation (also known as sonata without development) and one sonata-rondo movement are included. We omitted minor mode pieces because there are very few minor mode pieces in this corpus making a representative sample unavailable. We used data from the Yale Classical Archives (

Dataset. All pieces are piano sonata movements by W.A. Mozart.

PIECE | KEY | QUARTER NOTES | METER | MEASURES | |
---|---|---|---|---|---|

K279 i | C major | 546 | 4/4 | 136 | |

K279 ii | F major | 306 | 3/4 | 102 | |

K279 iii | C major | 426 | 2/4 | 213 | |

K280 i | F major | 593 | 3/4 | 198 | |

K282 i | E♭ major | 276 | 4/4 | 69 | |

K283 i | G major | 356 | 3/4 | 119 | |

K284 i | D major | 1010 | 4/4 | 252 | |

K309 i | C major | 1174 | 4/4 | 294 | |

K311 i | D major | 584 | 3/4 | 195 | |

K330 i | C major | 719 | 2/4 | 360 | |

K330 iii | C major | 703 | 2/4 | 352 | |

K332 i | F major | 665 | 3/4 | 222 | |

K332 ii | B♭ major | 160 | 4/4 | 40 | |

K333 i | B♭ major | 658 | 4/4 | 164 | |

K545 i | C major | 267 | 4/4 | 67 | |

K570 i | B♭ major | 600 | 3/4 | 200 | |

K576 ii | A major | 200 | 3/4 | 67 | |

The method of parsing scores by quarter notes is not ideal because the quarter note might have a different meaning at different tempos and in different movement types. However the present corpus is fairly uniform and mostly consists of allegro-type first movements, so we expected this convenient parsing to be reasonably effective. In future research it would be worth exploring coupling our procedure here with automatic parsing methods.

The

We applied the

Reduction and clustering procedure illustrated on mm. 1–2 of K.279.

The

To find an optimal number of clusters, we used the within cluster sum of squares (WCSS) method, which sums the squared distance of all the points within a cluster to the cluster centroid. As the number of clusters increases, the WCSS value of the model decreases. We determined that 20 is an optimal number of clusters by plotting the WCSS by the number of clusters and finding the inflection point of the elbow-shaped curve.

We used the _{0}) simply sums the weights and the last five duplicate the information of the first through the sixth (_{1}–_{6}). The indexes of these coefficients denote equal divisions of the octave. Their magnitudes are transposition independent and indicate how strongly the vector weights that division of the octave, and the transposition-dependent phase values correspond to the nearest transposition of that division of the octave. The fifth coefficient, _{5}, corresponds to the weighting of the pitch-class vector on the circle of fifths, what can be called its _{1} gives a weighting on the pitch-class circle. Other important coefficients, _{3} and _{4}, indicate when a vector is concentrated around some division of the pitch-class circle by three or four respectively, and so are useful for identifying triads and seventh chords. Previous research (_{3} and _{5}, are effective in estimating the key of passages of tonal music and sorting harmonic functions. Because typical pitch-class profiles of major and minor keys have most of their energy in _{3} and _{5}, a two-dimensional space on the phases of these, denoted φ_{3} and φ_{5}, can serve as a map of key relatedness (

We sorted the clusters using their DFT _{5} and _{3}, the coefficients that dominate pitch-class profiles for keys (according to

For low-dimensional data, it is possible to visualize clusters using coordinates, but this is difficult for 12-dimensional data. We used hierarchical clustering to produce a dendrogram relating the clusters obtained by the

where _{c}_{1},C_{2}) the merging cost of combining the clusters C_{1} and C_{2}.

We also examined the possibility that Mozart’s usage of harmonies in the different clusters may have evolved over time, from 1774 to 1788. We ran simple regressions between the dates of pieces and the frequency of each cluster, and found no significant correlations, so we did not pursue this any further.

_{5} coefficient of their DFTs, from flat to sharp (see below). Each is given a short-hand name according to the following rules. If a pitch class has a weight greater than the average for that centroid, it appears in the name; if it exceeds one and a third standard deviations for that centroid, it appears at the beginning of the name, otherwise it is in parentheses at the end. This gives an approximation to the pitch-class content good enough to give each of the twenty centroids a unique name.

Centroids of the twenty clusters resulting from applying

The two metrics gave similar results. Both include 3–4 clusters for C major and G major triads, two for F major triads, one for an A minor triad, and one or two for D major triads or seventh chords. They also both include 4–6 clusters for C or G major scales or scale segments, and one for a D minor scale or C♯ diminished seventh. The Manhattan solution also has an F♯ diminished seventh cluster (0). Overall, this suggests that Mozart’s harmony largely consists of I, IV, V, ii, and vii^{o7}/V chords in the home key (C major) and dominant key (G major), and that there are multiple forms of I and V, distinguishable by the weightings of chord tones and non-harmonic tones.

The analysis of centroid spectra resulted in groups for

_{3} and _{5},

_{4} and _{5}, and

_{1} and _{5}.

This confirmed the hypothesis that _{3} and _{5} would be principal dimensions for most clusters. In particular, _{5} appears to be pervasive, reflecting the strong diatonicity of the style, while the other dimensions of harmonic activity, defined by _{4} and _{1}, may sometimes take precedence over _{3}. We will refer to the three groups as _{3} is the coefficient of a function dividing the octave into three parts, it will tend to be large for triads or subsets of triads, hence the term “triadic.” The term “tetradic,” by analogy, refers to a division of the octave into four parts and the fact that _{4} will tend to be large for seventh chords or their subsets. Where the groups overlap, we choose the larger of _{3}, _{4}, or _{1} to classify clusters. Only one cluster in each solution does not have _{5} in its top three (Manhattan number 0 and Euclidean number 16), and both of these have _{4} as the top coefficient, so we classify them as tetradic. Cluster 0 of the Manhattan solution clearly represents a diminished seventh chord and is dominated by _{4}. (Cluster 16 of the Euclidean solution is less clear.) _{1}–_{6}, divided up into the three groups.

Spectra of the centroids of the 20 clusters, grouped according to whether |_{3}| _{4}| _{1}|

As the name implies, the triadic group captures the clusters that are clearly centered around some major or minor triad. The tetradic group mostly includes clusters near a dominant or diminished seventh chord. The scalar clusters center on contiguous sets of three or four notes from the C major or G major scale and probably represent moments in the music where only a melodic line is present, or the harmony is represented by a single note rather than a complete chord.

_{3}/φ_{5}, φ_{4}/φ_{5}, and φ_{1}/φ_{5}, respectively (φ_{k} is the phase of coefficient _{3} is among the top three coefficients in magnitude, tetradic if _{4} is among the top three, and scalar if _{1} is among the top three, and these groups overlap. In general, the phase value for a given coefficient, φ_{k}_{k}

Phase space plot for centroids of _{5}, in all cases, while the horizontal axis is the phase of _{3}, _{4}, and _{1} respectively. Ranges show the circular variance for each cluster:

The space for triadic clusters, φ_{3}/φ_{5}, is essentially Krumhansl’s (_{5} dimension separates the clusters by position on the circle of fifths or sharpness and flatness. In all three groups, the values of φ_{5} are limited to a narrow region, about 1/3 of the entire cycle. This reflects the overall conservatism of the harmonic style of this corpus and underlines the importance of the diatonic dimension for tonal harmony. In contrast, the centroids spread out fairly evenly in the other dimensions (_{3}, _{4}, and _{1}).

The φ_{3} and φ_{4} dimensions in _{1} dimension represents locations in the octave, which are also relatively evenly represented across scalar clusters.

Dendogram showing hierarchical clustering solution.

All the pieces included in the corpus except two are in sonata form, and we expected that certain clusters would be characteristic of certain parts of the form since these are characterized by a conventional modulatory scheme. A typical sonata form begins with a main theme in the home key (C major) and then modulates to the key of the dominant (G major) for a subordinate-theme section. This is called the

To investigate whether clusters were characteristic of certain formal sections, we recorded the probability of observing each cluster for a given formal section. We then made a correlation matrix of these probabilities for the 20 clusters and grouped clusters with large (>.5) correlations. We then plotted the cluster centroids in φ_{3}/φ_{5} space to investigate possible associations between the pattern of occurrence across formal sections and regions in this space.

The cluster probabilities across formal sections grouped very clearly into three patterns, shown in

Probability of each cluster appearing in each of five parts of a sonata form, split into three groups.

The resulting grouping clearly relates to the standard modulatory scheme of sonata form: clusters typical of the exposition are those associated with the dominant key, and are relatively infrequent in recapitulations. Clusters typical of developments are those associated with common minor keys: A minor (cluster 16), D minor (cluster 1), and G minor (cluster 0). Home key clusters are particularly infrequent in the subordinate themes of expositions, and reflect harmonies characteristic of C major, especially the subdominant (F major) and dominant seventh (G^{7}).

The grouping of clusters based on formal section is consistent with their positioning in the φ_{3}/φ_{5} space, shown in _{5} and concentrate in one half of the full φ_{3} cycle, with the other half occupied by the home key group. Clusters typical of developments appear in peripheral parts of the space, reflecting the use of minor keys which have a greater φ_{5} spread. The use of Cluster 10 in developments probably reflects the conventional concluding dominant pedal.

The twenty cluster centroids in the phase space for _{5} and _{3}, with regions grouping clusters that have a similar frequency profile over the five main sections of a sonata form.

A topic of widespread interest in research on eighteenth-century tonal harmony is harmonic succession, which is often described by cognitive music theorists as a kind of syntax (

The complete transition data are a 20-by-20 matrix, giving the probability of cluster

To analyze the transitions in the two form-based groups (the subordinate key and home key groups described above, omitting the development group which contains only four clusters) we constructed smaller transition matrices containing only the clusters in each group. We eliminated diagonals since transitions from a cluster to itself only indicate when harmonies last longer than a single quarter note. We then renormalized so that transitions could be interpreted as percentages within the smaller group.

We converted these to sum and difference matrices by adding and subtracting the transpose. The sum matrices are symmetrical and represent how often representatives of two clusters are juxtaposed, regardless of order. The difference matrix is anti-symmetrical and represents how much more often a representative of cluster

In our second analysis we grouped clusters based on similar transition behavior and constructed a transition matrix on the resulting groups. To make these groups we first made a correlation matrix by calculating Pearson’s

We analyzed the resulting matrix in the same way as the form-based matrices. We eliminated the diagonal, renormalized, and took sum and difference matrices. For the sum matrix, we considered values that exceeded the average of their row and column plus the average standard deviation of their row and column. For the difference matrix, we considered values in excess of one standard deviation.

Finally, for each of these we plot the resulting transitions in a φ_{4}/φ_{5} space, which we found to be the most suitable for illustrating these results.

_{4}/φ_{5} space, with heavy arrows for large values in both the sum and difference matrices, lighter arrows for values that appear only in the difference matrix, or double-headed for those only in the sum matrix. Typical predominant–dominant (16–17, 3–4, 7–9) and dominant–tonic (17–13, 14–13, 4–5, 9–5) functional successions of the C major and G major keys, directed from left to right, are evident. (The term “predominant” refers to chords that typically lead to the dominant, such as IV, ii, or V/V.) Both keys also include a cadential 6–4 progression, going right-to-left from a fifth-heavy tonic triad to a dominant.

Sum and difference matrices for subordinate key and home key clusters in percentages over just the transitions involving these clusters and excluding trivial transitions. Values exceeding 1 standard deviation are highlighted. Redundant values (given by symmetry and antisymmetry) are excluded (in difference matrices we retain just the positive values).

Transition data for subordinate key and home key groups, plotted in φ_{4}/φ_{5} space. Large arrows show high values in both the difference and sum matrices, small arrows in the sum or difference matrices only (double-headed for sum).

The reduction process for the 20×20 matrix resulted in 6 groups of clusters with 4 clusters remaining ungrouped. The six groups reflect identifiable functions: tonic (ton), dominant (dom), and predominant (pd) functions in the two keys, home key (HK) and subordinate key (SK):

SKdom: 0.C(E♭F♯), 14.ABC(D), 17.DF♯A(C)

SKpd: 1.GB♭(C♯D), 11.CE(G), 15.D(GB), 16.A(CE), 18.E(CG).

HKpd: 3.FAC, 7.FA(D), 8.G(CE)

HKdom: 4.FG(BD), 9.DF(EG)

HKton: 5.C(E)

SKton: 10.G, 13.G(BD)

The remaining four are ungrouped clusters: 2.D(F), 6.FG(EA), 12.D, and 19.B(DG).

Comparing these groups to the hierarchical clustering solution shown in

_{4}/φ_{5} plot. Dotted lines connect the clusters to the group label, except for the cadential 6–4 chords (8 in the HKpd and 15 in the SKpd group) which are not close to the rest of their group. Other than the cadential 6–4 chords, the space effectively sorts the six primary functions: home key below and dominant key above, and the three within-key functions arranged left-to-right such that functional motion always cycles in this direction. As in

Sum and difference matrices for cluster groups. Values exceeding 1.5 standard deviations are highlighted. Redundant values (given by symmetry and antisymmetry) are excluded (in difference matrices we retain just the positive values).

Transition data for the cluster groups plotted in φ_{4}/φ_{5} space. Group labels are positioned roughly in between their members, which are connected by dotted lines (with the exception of clusters 8 and 15 for clarity). Heavy arrows show high values in both the difference and sum matrices, lighter arrows in the sum or difference matrices only (double-headed for sum).

The functional logic is apparent from the patterns in ^{o7}/ii in the home key). The only retrograde motion is the weaker tendency to convert tonic of the SK to a dominant of the HK (e.g., by adding a seventh, F, to a G major triad). Of the four ungrouped clusters (besides HKton), only one makes an appearance in the sum and difference matrices, cluster 19, a third-weighted G major triad. This largely behaves like SKton, except that it has a more symmetrical relationship with SKdom, and therefore remains separate from that group.

Pitch-class vectors, which have been shown by a wealth of previous research to be effective representatives of keys (

The present study is primarily preliminary and exploratory, and details of the results are clearly dependent on the chosen corpus. Nonetheless, we can already draw some significant conclusions about efficacy of a weighted pitch-class model of tonal harmony and derive some implications for music theory.

Music theory conventions make a few implicit claims about how tonal harmony works. First, Roman numeral conventions limit the possible harmonic objects to a set of triads and seventh chords drawn from common tonal scales. They also tend to imply that, except for differences of inversion, all such triads and sevenths are unitary objects. Second, as a partial redress to the shortcomings of the Roman numeral conventions, theories of harmonic function typically highlight certain elements of the Roman numeral lexicon as being of special significance, such as tonic and dominant, and will also sometimes assign multiple functions to a single harmonic object (e.g., IV as “subdominant” vs. “predominant”).

In broad strokes, our results support this received music theory in many respects. First, the majority of clusters we found can be reliably associated with triads and seventh chords, supporting the basic Roman numeral convention. Second, we found that a small handful of these triads and seventh chords are very prevalent, while the majority are either not common or distinct enough to be detected by the clustering procedure. On the other hand, for the more important tonic and dominant functions, the clustering solution distinguished three or four varieties of the same triad. These all behave differently—in no case did our transition-based grouping procedure combine any of the four C major triad clusters, or three G major triad clusters. This generally supports the notion from function theory that the same harmonic object can function in multiple ways, although our data supports this idea for tonic and dominant triads where function theory tends to focus more on multiple functions for other kinds of triads.

This support for music theory conventions, however, is only in broad strokes. When we consider details, our results show that the implications of music theory conventions are imprecise and in some respects may oversimplify the reality.

First, the idea of limiting harmony to triads and seventh chords is largely, but not entirely, supported. In the clusters obtained using the Manhattan metric, we found four “scalar” clusters (

Second, the results support the idea from function theory that two triadic harmonies, tonic and dominant, are overwhelmingly the most important in this repertoire. They also support the idea of three functions, predominant, dominant, and tonic, one of which (predominant) is based on similarity of behavior, not on similarity of pitch-class content. In the groupings of clusters based on transition data (section 4.2), tonic and dominant categories group chords close together in the hierarchical clustering solution of

Third, the results support the idea of multiple functions for individual triads, especially for tonic and dominant triads. The clustering solution identified four C major triads (clusters 5, 8, 11, 18), and three G major triads (clusters 13, 15, and 19). None of these were grouped on the basis of transitions in section 4.2. Some might be associated with distinct inversions, especially numbers 8 and 15, which appear to be cadential 6–4 chords. Our analysis also suggested that cluster 11 represents IV of G major, distinguishing it from the other C major clusters as tonics of C major. Taking this one step farther, then, we might conjecture that clusters 18 and 19 represent first-inversion triads. If this is the case it means that first-inversion triads have distinct function, specifically in the case of tonic triads, but also that these distinctly functioning chords are identifiable purely through registral weighting of pitch classes, without necessarily separating out the bass line. This might be possible due to the dependency of doubling on inversion observable, e.g., in Aarden and Von Hippel’s (

That these functional distinctions emerge is remarkable since, unlike in a hidden-Markov algorithm, as applied, for example, by Mavromatis (

We have also found that applying the DFT to pitch-class vectors provides an effective dimensional reduction in which aspects of similarity in pitch-class content and harmonic function remain observable. This recommends its use as a feature representation in studies of harmony in notated music, much as Harte et al. (_{3}/φ_{5} space, effective in sorting harmonies by key, and a different φ_{4}/φ_{5} space useful for observing the norms of harmonic succession, and magnitudes of _{1}, _{3}, and _{4} useful for sorting scalar, triadic, and tetradic chord types. Extrapolating from this, we might conjecture that scalar types (high |_{1}|) are generally incidental and non-functional, triadic types (high |_{3}|) important for defining key, and tetradic types (high |_{4}|) important for local functional succession.

This study suggests further research in a few directions. The corpus we used is relatively limited, and different results could be expected from a larger or more harmonically expansive or varied corpus. The results also depend upon musical texture, so a corpus featuring different kinds of textures (such as more contrapuntal textures) would make a valuable comparison. One limitation of the present study is that the quarter-note parsing used throughout does not necessarily optimally isolate individual harmonies. A method of parsing based on similarity of pitch-class content may greatly improve the clustering method.

We used python’s sklearn library for all machine learning methods. Code is available at

Thanks to the editor and anonymous reviewers for the helpful comments on the drafts, which greatly improved the paper.

The authors have no competing interests to declare.