Modeling Popularity and Temporal Drift of Music Genre Preferences

In this paper, we address the problem of modeling and predicting the music genre preferences of users. We introduce a novel user modeling approach, BLL u , which takes into account the popularity of music genres as well as temporal drifts of user listening behavior. To model these two factors, BLL u adopts a psychological model that describes how humans access information in their memory. We evaluate our approach on a standard dataset of Last.fm listening histories, which contains fine-grained music genre information. To investigate performance for different types of users, we assign each user a mainstreaminess value that corresponds to the distance between the user’s music genre preferences and the music genre preferences of the (Last.fm) mainstream. We adopt BLL u to model the listening habits and to predict the music genre preferences of three user groups: listeners of (i) niche, low-mainstream music, (ii) mainstream music, and (iii) medium-mainstream music that lies in-between. Our results show that BLL u provides the highest accuracy for predicting music genre preferences, compared to five baselines: (i) group-based modeling, (ii) user-based collaborative filtering, (iii) item-based collaborative filtering, (iv) frequency-based modeling, and (v) recency-based modeling. Besides, we achieve the most substantial accuracy improvements for the low-mainstream group. We believe that our findings provide valuable insights into the design of music recommender systems.


Introduction
Music recommender systems play a pivotal role in popular streaming platforms such as Last.fm, 1 Pandora, 2 or Spotify 3 to help users find music that suits their taste.Existing music recommender systems typically employ collaborative filtering algorithms based on the users' interactions with music items (i.e., listening behavior or ratings), sometimes in combination with content features (e.g., acoustic features of songs) in the form of hybrid music recommender systems (Celma, 2010;Schedl et al., 2018b).
Problem.While music recommender systems can provide quality recommendations to listeners of popular music, related research (Schedl and Bauer, 2018;van den Oord et al., 2013) has shown that they tend to fail listeners who prefer niche artists and genres.A reason for that is the scarcity of usage data of such types of music as music consumption patterns are biased towards popular artists (van den Oord et al., 2013;Celma, 2010;Celma and Cano, 2008).In this paper, we introduce a novel user modeling and genre prediction approach for users with different music consumption patterns and listening habits.We focus on three user groups: (i) LowMS, i.e., listeners of niche music, (ii) HighMS, i.e., listeners of mainstream (MS) music, and (iii) MedMS, i.e., listeners of music that lies in-between.The main problem we address in this work is how to exploit variations in listening habits to improve personalization for all three user groups.We investigate this problem by predicting the music genres a user is going to listen to in the future.
Approach and methods.We model the users' listening behavior in terms of fine-grained music genre preferences.To that end, we use behavioral data in the form of listening events, i.e., the listening history of which genres a user has listened to in the past.Our approach is based on the Base-Level Learning (BLL) equation from the cognitive architecture ACT-R (Anderson and Schooler 1991;Anderson et al., 2004) that accounts for the timedependent decay of item exposure in human memory.It quantifies the usefulness of a piece of information based on how frequently and recently a user accessed it in the past.This time-dependent decay takes the shape of a power-law distribution.Related work has employed the Research on music preferences in light of psychology.Research in music psychology (North and Hargreaves, 2008) has shown that a range of factors impact music preferences (Schedl et al., 2015), such as emotional state (Cantor and Zillmann, 1973;Juslin and Sloboda, 2001;Rodà et al., 2014), a user's current activity, their self-view and self-esteem (North and Hargreaves, 1999), the cognitive functions of music (e.g., music as a way to communicate and to self-reflect) (Schäfer and Sedlmeier, 2010), as well as personality (Cattell and Anderson, 1953;Arnett, 1992;Dollinger, 1993;Rentfrow and Gosling, 2003;George et al., 2007;Delsing et al., 2008;Dunn et al., 2012;Schedl et al., 2018a).
For instance, Rentfrow and Gosling (2003) showed that the Big Five personality traits (i.e., openness to experience, agreeableness, extraversion, neuroticism, and conscientiousness) influence genre preferences in music and that music preferences can be categorized along specific dimensions (e.g., reflective & complex, intense & rebellious, upbeat & conventional, and energetic & rhythmic music); the structure of music preferences is also discussed by Delsing et al. (2008).Greenberg et al. (2015) found that a person's cognitive approach (i.e., their tendency towards empathy versus systemizing versus balancing both) impacts their music genre preferences.A user's music preference is also impacted by familiarity (Pereira et al., 2011;Schubert, 2007).This has been attributed to the so-called mere exposure effect (Peretz et al., 1998), which means that prior exposure can positively influence music liking.In our work, we also incorporate prior exposure (in this case, to a music genre) into our model.
Temporal dynamics of music preferences.Music preferences are often dynamic due to variations in user taste (Kim et al., 2018), or evolving music taste (Moore et al., 2013).One can distinguish between research on long-term temporal dynamics of listening behavior and short-term dynamics.Studies investigating long-term dynamics research on, for example, how music preferences of children and young adults evolve (Hargreaves et al., 2015;Leadbeater, 2014), or how user tastes change over time and how artists develop (Moore et al., 2013).
Studies investigating short-term dynamics typically assess users' listening behaviors (Aizenberg et al., 2012;Park and Kahng, 2010) on a fine-granular basis (e.g., time of the day) to detect patterns and periodicity in listening behavior, or in the case of Krause and North (2018), to study the relationship between music preferences and seasons of the year.The latter approaches are typically intended to help create predictive models of music preferences to create playlist recommendations for music streaming services, among others.As we describe in detail in Section 3, in our data, we observe interesting temporal dynamics in users' genre listening histories.Specifically, the time-dependent decay of number of plays per genre follows a power-law distribution, so our users tend to listen to genres to which they have recently listened.

Personalization for music recommendation.
A number of aspects make personalization in music recommender systems challenging, such as, e.g., the variability of listening intent and purpose of music consumption, insufficient ratings and usage data, as well as users' tendency to appreciate recommendations of items that have been previously recommended (Schedl et al., 2018b), but also the dependence of music preferences on the user's personality traits or emotional state.In this vein, Selvi and Sivasankar (2019) extracted the user's emotional context from social media messages as well as their current time context and incorporated both to generate personalized music recommendations.Ferwerda et al. (2015) used a specific personality-enriched dataset that provided links to users' listening histories on Last.fm to leverage personality traits to predict a user's genre preferences.Zheng et al. (2018) proposed a tag-aware dynamic music recommendation framework that represents musical tracks via user-generated tags and generates time-sensitive recommendations.Koenigstein et al. (2011) incorporated a temporal analysis of user ratings assigned to music pieces and item popularity trends into a matrix factorization approach to mitigate the issue of insufficient item ratings.The latter is a common problem that causes (music) recommender systems to suffer from bias towards popular items.Due to insufficient amou nts of usage data for less popular items, many recommendation algorithms cannot provide useful recommendations for consumers of less popular and niche items (Abdollahpouri et al., 2019;Celma, 2010;van den Oord et al., 2013).Recent work (Vall et al., 2019) has yet provided evidence that deep-learning-based methods (i.e., recurrent neural networks) seem to be less biased towards popular items.
In our work, we use only listening histories as a data source to model user preferences and to generate recommendations.As we show in Section 3, we observe that all users in our dataset tend to consume items they have listened to frequently and recently in the past, where the time-dependent decay of this item consumption count follows a power-law distribution.Correspondingly, the Base-Level Learning (BLL) equation from the cognitive architecture ACT-R (Anderson and Schooler, 1991;Anderson et al., 2004) describes a time-dependent decay of item exposure in human memory in the form of a powerlaw distribution.Leveraging these similarities between characteristics of music consumption patterns and cognition models (i.e., ACT-R in our case), we propose here to use the BLL equation to describe listeners' behavioral music consumption traces.

Data and Method
In this section, we present the dataset we use for our study and statistical analyses we carry out.We outline the approach of this work and the baselines, which we employ to validate our proposed method.

Dataset and Statistical Analyses
First, we describe the Last.fmdataset, as well as the selected genre mapping procedure.We report statistical analyses for (i) music genre popularity, (ii) average pairwise user similarity, (iii) popularity of music genre preferences, and (iv) temporal drifts of music genre preferences.
Dataset description and availability.For our study, we use a dataset gathered from the online music service Last.fm, namely the LFM-1b dataset. 4LFM-1b contains listening histories of more than 120,000 users, totaling to about 1.1 billion individual listening events accrued between January 2005 and August 2014.Each listening event is characterized by a user identifier, artist, album, track name, and a timestamp (Schedl 2016).Besides, the LFM-1b dataset contains user-specific demographic data such as country, age, gender as well as additional features such as mainstreaminess, which is defined as the overlap between the user's listening history and the aggregated listening history of all Last.fmusers in the dataset.More precisely, the mainstreaminess of a user corresponds to the average distance between all artists' relative frequencies in the user's listening profile and the artists' relative frequencies among all users in the dataset (Schedl and Hauger, 2015).
Mapping listening events to music genres.Since we are interested in modeling and predicting music genre preferences, we enhance the listening events in the LFM-1b dataset with additional genre information.Therefore, we use an extension of the LFM-1b dataset, termed LFM-1b User-Genre-Profile (i.e., LFM-1b UGP) dataset (Schedl and Ferwerda, 2017), which describes the genres of an artist in a listening event by exploiting social tags from Last.fm.
Among others, LFM-1b UGP contains a weighted mapping of 1,998 music genres and styles available in the online database Freebase 5 to Last.fm artists.In part, this taxonomy includes particular descriptors such as "Progressive Psytrance" or "Melodic Black Metal", and therefore allows for a fine-grained representation of musical styles.The weightings correspond to the relative frequency of tags assigned to artists in Last.fm.For example, for the artist "Metallica" the top tags and their corresponding relative frequencies are "thrash metal" (1.0), "metal" (.91), "heavy metal" (.74), "hard rock" (.41), "rock" (.34) and "seen live" (.3).This means that the tag "thrash metal" is the most popular genre tag assigned to "Metallica" and thus, its weighting is 1.0.From this list, we remove all tags that are not part of the 1,998 Freebase genres (i.e., "seen live" in our example) as well as all tags with a relative frequency smaller than .5 (i.e., "hard rock" and "rock" in our example).Thus, for "Metallica", we end up with three genres, namely "thrash metal", "metal" and "heavy metal" that we assign to all listening events of the artist "Metallica".Overall, this process gives us, on average, 2-3 genres per artist (i.e., mean = 2.466).Furthermore, 96.25% of the genres are assigned to more than one artist.
User groups based on mainstreaminess.The LFM-1b dataset contains a mainstreaminess value for each user, which defines the distance from this user's music genre preferences to the music genre preferences of the (Last.fm)mainstream.To study different types of users, we split the dataset into three equally sized groups based on their mainstreaminess (i.e., low, medium, and high).We sort the users in the dataset based on their mainstreaminess value and assign the 1,000 users with the lowest values to the LowMS group, the 1,000 users with the highest values to the HighMS group, and the 1,000 users with a value that lies around the average mainstreaminess (=.379) to the MedMS group.
Here, we consider only users with at least 6,000 and at most 12,000 listening events, a choice we made based on the average number of listening events per user in the dataset (i.e., 9,043) as well as the kernel density distribution of the data.With this method, on the one hand, we exclude users with too little data available for training our algorithms (i.e., users with <6,000 listening events), and on the other hand, we exclude so-called power listeners (i.e., users with >12,000 listening events) who might distort our results.
Furthermore, this high average number of listening events per user also means that we have enough listening events (i.e., between 6.9 to 8.2 million) to train and test the music genre preference modeling and prediction approaches, even if we only consider 1,000 users per group.Table 1 summarizes the statistics and characteristics of these three groups.
(ii) MedMS.The MedMS group represents the |U| = 1,000 users whose mainstreaminess values are between the ones of LowMS and HighMS groups (i.e., their mainstreaminess values lie around the average).This group has an average mainstreaminess value of ).Also, this user group exhibits the highest number of distinct genres (|G| = 973).
Average pairwise user similarity.Finally, the boxplots in Figure 1 show the average pairwise user similarity in the three user groups.We calculate these scores based on the genre distributions of the users and using the cosine similarity metric.We see that users in the LowMS group have a very individual listening behavior (mean user similarity = .118),while users in the HighMS group tend to listen to similar music genres (mean user similarity = .691).Again, the users in the MedMS group lie in between (mean user similarity = .392).Given these results, we expect a collaborative filtering approach based on user similarities to deliver good genre prediction results for the HighMS group.
Popularity of music genre preferences.In Figure 2, we compare the music genre popularity distributions of the LowMS, MedMS, and HighMS groups.To this end, we plot the number of listening events for the groups' top-30 genres.We find that there are some dominating genres with more than 2 million LE counts in the HighMS group, while the genre distribution is much more evenly distributed in the LowMS group with a LE count of around 500,000 for the most popular genres.We can describe the genre distribution of the MedMS group as an intermediate of the LowMS and HighMS distribution.We analyze the actual top-30 genres in these groups, and while the most popular genres Rock and Pop dominate the other genres Average pairwise user similarity Based on the dataset characteristics, we expect that a group-based modeling approach, which models a user's music genre preferences utilizing the most-frequently listened genres of all users in the group, performs fine for HighMS in relation to other modeling techniques, while for the LowMS group, a personalized modeling technique would be preferable.In the MedMS group, we expect both modeling approaches to work well due to the group being an intermediate of the HighMS and LowMS groups.
Temporal drift of music genre preferences.Next, we investigate the temporal drift of music genre preferences.The plots (a), (b), and (c) of Figure 3 show the effect of time on the genre listening behavior of our LowMS, MedMS, and HighMS user groups.We plot the relistening count of music genres over the time (in hours) since the last listening events of these genres on a log-log scale.For example, if a user u has listened to artists with genre g twice in a time interval of 1 hour, then the relistening count for "1 hour" is incremented by 1.We repeat this process for all listening events, which gives us a relistening count for each hour.We observe similar results for all three groups, which means that the shorter the time since the last listening event of a genre g, the higher its relistening count.In all three plots, we see a peak after 24 hours, which indicates that people tend to listen to similar music genres daily at the same time.However, we also see that when people have not listened to a genre for a longer period, i.e., one month (around 750 hours), the relistening count of this genre drastically drops.
Finally, we also plot the linear regression lines of the empirical data in the plots of Figure 3.In the log-log-scaled plots, we can observe a good fit of the data, which indicates that the data likely follows a power-law distribution (cf.Anderson and Schooler, 1991).This claim is supported by the high R 2 values of the fits, which are between .870 and .895.Concerning the slopes α of the lines, which describe how strongly temporal listening drifts influence the user groups, we observe values between -1.480 and -1.587.We can use these values as the d parameter of the BLL equation (Anderson et al., 2004), cf.Equation 6.
Taken together, we observe interesting temporal effects in all three user groups: Last.fm users tend to listen to genres they have listened to recently.Moreover, we find that this temporal drift of music genre preferences follows a power-law distribution.Correspondingly, we can model this drift with the BLL equation.

Modeling and Prediction of Music Genre Preferences
In this section, we describe five baseline approaches (i.e., TOP, CF u , CF i , POP u , and TIME u ) as well as our approach based on the BLL equation for modeling and predicting music genre preferences (i.e., BLL u ).
Group-based baseline: TOP.Motivated by our analysis in Figure 2, the TOP approach models a user u's music genre preferences using the overall top-k (e.g., top-30) genres Figure 3: The effect of time on genre relistening behavior for the LowMS, MedMS, and HighMS Last.fm user groups.For all three groups, we find that the shorter the time since the last listening event of a genre, the higher its relistening count.Additionally, we plot the linear fits of the data and report the corresponding R 2 estimates as well as the slopes α.We can observe a very good fit of the data, which indicates that the data likely follows a power-law distribution.

LowMS
MedMS HighMS of all users in the user group UG u (i.e., LowMS, MedMS, HighMS) to which u belongs.This is given by: where argmax k refers to the "arguments of the maxima" function for the top-k genres with maximum values, G denotes the set of k predicted genres for user u, and |GA g,UGu | corresponds to the number of times g occurs in all genre assignments GA of UG u .Thus, we describe this approach as a group-based modeling technique since it reflects the preferences of the whole user group LowMS, MedMS or HighMS.As our analysis in Figure 2 shows that the genre distribution in the HighMS group is the least evenly distributed one, we expect the TOP approach to provide good prediction accuracy results for the HighMS group while performing worse for the LowMS group in relation to other modeling techniques.
User-based collaborative filtering baseline: CF u .Userbased collaborative filtering-based approaches aim to find similar users for a target user u, i.e., the set of neighbors N u .N u is calculated using the cosine similarity between u's genre distribution and the genre distributions of all other users.Then, the top-20 users are defined as N u .Finally, CF u predicts the genres these similar users in N u have listened to (Shi et al., 2014), which is formally given by: where sim(G u , G v ) is the cosine similarity between the genre distributions of user u and neighbor v, and |GA g,v | indicates how often v has listened to genre g.Since CF u relies on user similarities, we expect it to provide good results for the HighMS group compared to other modeling approaches (see also Figure 1).
Item-based collaborative filtering baseline: CF i .Similar to CF u , CF i is a collaborative filtering-based approach, but instead of finding similar users for the target user u, it aims to find similar items (i.e., music artists).Then it predicts the genres that are assigned to these similar artists as given by: Here, A u is the set of artists u has listened to, S a is the set of similar artists for an artist a, sim(G a , G s ) is the cosine similarity between the genres assigned to a and the genres assigned to a similar artist s, and |GA g,v | indicates how often genre g was assigned to artist a (hence, in our case either 0 or 1).Again, a neighborhood size |S Au | = 20 leads to the best genre prediction results, and we also set A u to the set of the 20 artists that u has listened to most frequently.
Frequency-based baseline: POP u .The POP u approach is a personalized music genre preference modeling technique, which predicts the k most frequently listened to (i.e., most popular) genres in the listening history of a user u.POP u corresponds to the modeling approach presented in (Schedl and Ferwerda, 2017) and is given by the following equation: where G u is the set of genres u has listened to 6 and |GA g,u | denotes the number of times u has listened to tracks with genre g (i.e., the frequency).Thus, it ranks the genres u has listened to in the past by popularity.Therefore, in relation to other modeling algorithms, we expect POP u to generate good genre predictions for all users in our three user groups, but especially for HighMS, in which the popularity feature is the most important one (see Figure 2).
Recency-based baseline: TIME u .Our analysis presented in Figure 3 motivates the personalized and recency-based music genre preference modeling, where we find that people tend to listen to genres to which they have listened just very recently.Thus, TIME u predicts the most recently listened to genres that are present in the listening history of a user u, which is given by: where t u,g,n is the time since the last (i.e., the n th ) listening event of g by u.Since we find that the temporal drift of music genre preferences is an important feature for all our three user groups, TIME u should provide good prediction accuracy results for LowMS, MedMS, and HighMS in relation to other modeling approaches.
Our approach based on the BLL equation: BLL u .To combine the frequency-based modeling method POP u with the recency-based modeling method TIME u , we utilize the BLL equation from the declarative memory module of the cognitive architecture ACT-R (Anderson et al., 2004).The BLL equation quantifies the importance of information in human memory (e.g., a word or a music genre) by considering how recently (i.e., temporal drift) and frequently (i.e., popularity) it was used in the past.In our setting, we define it as follows: Here, g is a genre user u has listened to in the past, and n is the number of times u has listened to g.Further, t u,g,j is the time since the j th listening event of g by u, and d is the power-law decay factor that accounts for the feature of the temporal drift of music genre preferences.We set d to the slopes α identified in the analysis of Figure 3 (i.e., 1.480 for LowMS, 1.574 for MedMS, and 1.587 for HighMS).The resulting base-level activation values B u,g are normalized using a simple softmax function in order to map them onto a range of [0,1] where they sum to 1 (Kowald et al., 2017b): Again, G u is the set of distinct genres listened to by u.Finally, BLL u predicts the top-k genres  k u G with the highest B′ u,g values for u: Comparison of approaches.Table 2 shows how the five baselines, as well as BLL u , cover our four features of interest, i.e., (i) personalization, (ii) collaboration, (iii) popularity, and (iv) temporal drift.
Here, our BLL u approach is the only one that covers the features of personalization, popularity, and temporal drifts.Moreover, TOP, CF u , and CF i are the only approaches that consider collaboration among users and, thus, investigate the listening events of all users.We further examine which feature combination works best for predicting genres in our setting in the next section of this paper.

Experiments and Results
In this section, we outline the experimental setup (see Section 4.1) and in Section 4.2, we present the results of our study on evaluating the usefulness for modeling music genre preferences using the BLL equation.

Experimental Setup
To measure the accuracy of our music genre preference modeling approaches, we conduct a study, in which we predict the genres assigned to the artists a user is going to listen to in the future.
Evaluation protocol.We split the datasets into train and test sets (Cremonesi et al., 2008) and make sure that our evaluation protocol preserves the temporal order of the listening events, which simulates a real-world scenario in which we predict (genres of) future listening events based on past ones (Kowald et al., 2017b;Seitlinger et al., 2015).This also means that a classic k-fold crossvalidation evaluation protocol with random splits is not useful.
Therefore, we put the most recent 1% of the listening events of each user into the test set and keep the remaining listening events for training.We do not use a classic 80/20 or 90/10 split as the number of listening events per user is large (i.e., on average 7,689 per user).Furthermore, although we only use the most recent 1% of listening events per user, this process leads to three large test sets with 69,153 listening events for LowMS, 79,007 listening events for MedMS, and 82,510 listening events for HighMS.On average, there are 76 listening events per user for which we predict the assigned genres.
In Figure 4, we present boxplots showing the average duration in days per user we have available in our three test sets.We see that the average duration per user is evenly distributed across all three user groups with a median value of 11.8 days, which is also around 1% of the median value of the overall average duration per user (i.e., the sum of training and test durations).This corresponds to the 1% of the listening events per user we use for the test sets.Thus, we are going to predict the genres a user is going to listen to in this period.
Following this evaluation protocol, our goal is to validate whether our BLL-based approach (i.e., BLL u ) provides better prediction accuracy results than the five baseline approaches (i.e., TOP, CF u , CF i , POP u , and TIME u ).When investigating the numbers shown in Table 1, we also see that our prediction task is not trivial since |GA|/|LE|, i.e., the number of genre assignments per listening event (=what should be predicted), is much smaller than u G , i.e., the average number of genres a user u has listened to (=what could be predicted).

Table 2: Comparison of our five baselines as well as our
approach based on the BLL equation for modeling and predicting music genre preferences.In this table, a "" indicates that a specific approach covers a specific feature.While TOP, CF u and CF i also consider collaboration among users (i.e., investigate listening events of all users), our BLL u approach is the only one that is personalized and accounts for the features of popularity as well as temporal drifts.
Figure 4: Boxplots showing the average duration in days per user we have available in our three test sets.Across all three users groups, the average duration per user is evenly distributed with a median value of 11.8 days.

LowMS MedMS HighMS
User groups Evaluation metrics.To measure the prediction quality of the approaches, we use the following six state-of-the-art metrics (Baeza-Yates and Ribeiro-Neto, 2011): (i) Recall: R@k.Recall is calculated as the number of correctly predicted genres divided by the number of relevant genres (i.e., from the test set).It is a measure of the completeness of the predictions.
(ii) Precision: P@k.Precision is calculated as the number of correctly predicted genres divided by the number of predictions k and is a measure of the accuracy of the predictions.We report recall and precision for k = 1 … 10 predicted genres in the form of recall/precision plots.
(iii) F1-score: F1@5.F1-score is the harmonic mean of recall and precision.If 10 genres are predicted, the F1-score typically reaches its highest value for k = 5.Thus, we report it for k = 5.
(iv) Mean Reciprocal Rank: MRR@10.MRR is the mean of reciprocal ranks of all relevant genres in the list of predicted genres.
(v) Mean Average Precision: MAP@10.MAP is the mean of the average precision scores at all ranks where relevant genres are predicted.With this, it also takes the ranking of the correctly predicted genres into account.
We report MRR, MAP, and nDCG for k = 10 predicted music genres, where these metrics reach their highest values.
Evaluation framework.For reasons of reproducibility, we conduct the prediction study using our recommendation benchmarking framework TagRec (Kowald et al., 2017a), which provides the evaluation protocol and metrics described in this section.Furthermore, we also implement the modeling approaches described in Section 3.2 using TagRec.It is freely available via our Github repository. 7

Results and Discussion
In this section, we report and discuss our prediction accuracy results on evaluating the usefulness of our BLLbased music genre preference modeling approach (i.e., BLL u ) compared to five baseline approaches: (i) groupbased modeling (i.e., TOP), (ii) user-based collaborative filtering (CF u ), (iii) item-based collaborative filtering (CF i ), (iv) frequency-based modeling (i.e., POP u ), and (v) recencybased modeling (i.e., TIME u ).
Based on the features introduced in Table 2, we discuss these results concerning the influence of (i) personalization, (ii) collaboration, (iii) popularity, and (iv) temporal drift.Furthermore, we compare the results of our BLL u approach for our user groups and different numbers of predicted genres in Figure 6 as well as show the performance of the approaches in a cold-start setting in Figure 7. Finally, we also discuss the implications of our findings for personalized music recommendation.

Influence of personalization.
The personalized approaches (i.e., POP u , CF u , CF i , TIME u , and BLL u ) outperform the group-based TOP approach in the LowMS setting.This is in line with our analysis presented in Figure 2, where we Table 3: Genre prediction accuracy results of our study comparing our BLL u approach with a group-based baseline (TOP), a user-based collaborative filtering baseline (CF u ), an item-based collaborative filtering baseline (CF i ), a frequency-based baseline (POP u ) and a recency-based baseline (TIME u ).For all three user groups (i.e., LowMS, MedMS, and HighMS), the combination of popularity and temporal drift of music genre preferences in the form of BLL u provides the best results for all metrics.According to a t-test with α = .001,"***" indicates statistically significant differences between BLL u and all other approaches for all user groups.found that the music genre popularity distribution in the LowMS group is the most evenly distributed one.The same is true for the MedMS group, in which we observe a very similar performance of CF u , CF i , POP u , and TIME u .However, in the HighMS setting only the four personalized approaches, which utilize the popularity feature (i.e., POP u , CF u , CF i , and BLL u ) outperform TOP.This shows that the influence of personalization on the prediction accuracy becomes more important as the mainstreaminess of the users decreases (i.e., in the LowMS setting).

Influence of collaboration.
We investigate the genre prediction accuracy of three approaches (i.e., TOP, CF u , and CF i ) that consider collaboration among users, i.e., that analyze the listening events of all users.Here, the personalized CF u and CF i approaches provide better results than the non-personalized TOP approach for all three user groups.
Furthermore, CF u provides its best results for the HighMS group.This is in line with our analysis presented in Figure 1, which shows that the average pairwise user similarity is the highest for high-mainstream users.This is also the reason why CF i does not outperform CF u in the HighMS but outperforms it in the LowMS and MedMS settings.
Influence of popularity.We evaluate four popularitybased approaches.The first approach provides nonpersonalized genre predictions based on the preferences of all users (i.e., TOP), and the second offers personalized predictions based on user similarities (i.e., CF u ).The third approach provides personalized predictions using item similarities (i.e., CF i ), and the fourth produces personalized genre predictions based on the preferences of the individual user (i.e., POP u ).While the prediction accuracy of TOP increases with the level of mainstreaminess, the prediction accuracy of POP u decreases with the level of mainstreaminess.The prediction accuracy of CF u and CF i are relatively stable over all three user groups, with the only exception that CF u provides better results than CF i in the HighMS setting.
Thus, in the HighMS group, TOP provides a higher prediction accuracy than in the other two groups.These results are in line with our analysis presented in Figure 2, where we find that there are some dominating genres in the HighMS group, which explains the good results of TOP, CF u , and POP u in this setting.When further comparing CF u with CF i , we see that CF i outperforms CF u in the LowMS and MedMS settings.
Influence of temporal drift.Our analysis in Figure 3 reveals that users in Last.fm tend to listen to genres which they have listened to very recently.In other words, time is HighMS.We see that BLL u provides the best results for all groups and for all k = 1…10 predicted genres.important for all three user groups.However, as shown in Table 3 and Figure 5, TIME u provides the weakest accuracy results for HighMS and good prediction accuracy results for LowMS and MedMS.Thus, for HighMS, popularity is a more important feature than recency.
BLL u outperforms TIME u in all experiments.This means that our personalized modeling approach, which also considers the features of popularity and temporal drifts, can provide accurate genre predictions for all three groups in relation to other modeling techniques.
Accuracy of BLL u for different values of k.In Figure 6, we show the recall/precision results of BLL u for k = 1…10 predicted genres for the three user groups.We observe apparent differences in the accuracy value ranges when comparing the three groups.While BLL u outperforms the five baselines in all three settings (with significant differences between BLL u and all other approaches according to a t-test with α = .001),the accuracy estimates are much higher in the LowMS group (i.e., R@10 = .827and P@1 = .559)than in the MedMS group (i.e., R@10 = .674and P@1 = .419)and the HighMS group (i.e., R@10 = .603and P@1 = .377).This shows that our approach is especially useful to predict the genre preferences of users with low inclination to listen to mainstream music.
Performance in cold-start setting.Since recommender systems are often faced with situations in which users only have a few interactions available to train the underlying recommendation algorithms, we also evaluate our BLL u approach in a cold-start setting (Schein et al., 2002).For this, we extract the 1,000 users with the lowest number of LEs from the LFM-1b dataset.As we need to make sure that we have at least 1 LE per user available for training the algorithms, this procedure leads to 1,000 users with a minimum of 2 LEs and a maximum of 46 LEs per user.For these users, we have precisely 1 LE in the test set, for which we predict the assigned genres.
Our results for this experiment are shown in the recall/precision plot of Figure 7. Here, we observe very similar results to the ones of our LowMS, MedMS, and HighMS settings (see Figure 6).Thus, again BLL u provides the best accuracy results followed by TIME u , POP, CF i , and CF u .As expected, the non-personalized TOP approach provides the worst results in this setting.These results show that BLL u is also capable of effectively predicting music genre preferences in cold-start settings where users only have a few listening events available for training.

Implications for personalized music recommendation.
In this section, so far, we have shown that BLL u outperforms the baseline approaches concerning prediction accuracy in different settings (i.e., LowMS, MedMS, HighMS, and cold-start).When looking at Figure 6, this is especially true for the LowMS group, in which users do not follow the preferences of the mainstream, and thus, a personalization technique, as given by the BLL equation, is critical.If we relate this to music recommender systems, which exploit the listening histories of users to suggest other music that they might also like, our findings lead to interesting implications.Schedl and Hauger (2015) have shown that standard recommendation algorithms such as collaborative filtering cannot provide suitable music recommendations for users with low mainstreaminess.The results presented in this section support this.In other words, such users need different music recommendation algorithms that account for their highly individual listening preferences.
One way to achieve this could be to combine state-ofthe-art music recommendation algorithms (see Section 2) with our music genre preference modeling approach based on the BLL equation presented in this paper.We could use the calculated B′ u,g values given by our approach as an input for these algorithms or to rerank recommendation results based on the importance of a genre for a user.We elaborate on these ideas as well as other plans for future work in Section 5.

Conclusion and Future Work
In this paper, we presented BLL u , an approach that utilizes the features of popularity and temporal drifts to model and predict music genre preferences via fine-grained genres.We leveraged the LFM-1b dataset of more than one billion music listening events, created by approximately 120,000 users of the online music service Last.fm.We divided the users into three groups based on the proximity of their music genre preferences to the mainstream: (i) LowMS, i.e., listeners of niche music, (ii) HighMS, i.e., listeners of mainstream music, and (iii) MedMS, i.e., listeners of music that lies in-between.To take into account the popularity and temporal drift of music genre preferences, we proposed to use the Base-Level Learning (BLL) equation from the cognitive architecture ACT-R, which quantifies the importance of information in human memory (e.g., a music genre) by considering how frequently (i.e., popularity) and recently (i.e., temporal drift) it was used in the past.A comparison between BLL u and a group-based baseline (i.e., TOP), a user-based collaborative filtering baseline (i.e., CF u ), an item-based collaborative filtering baseline (i.e., CF i ), a frequency-based baseline (i.e., POP u ) as well as a recency-based baseline (i.e., TIME u ) showed that BLL u outperforms all other approaches for all three user groups in terms of prediction accuracy.
Furthermore, our results indicate that BLL u is especially useful to predict the music genre preferences of users with interest in low-mainstream music (i.e., the LowMS user group), which opens up interesting possibilities for future work in the research area of personalized music recommender systems.
Limitations and future work.So far, we limited our approach to the BLL equation of the declarative memory module of ACT-R.Since the BLL equation is only a part of the more exhaustive ACT-R framework that does not consider contextual information, one needs to consider this limitation when utilizing our approach.For example, when we model music genre preferences exclusively via past listening behavior, phenomena such as over-personalization or filter-bubble effects could occur (Nguyen et al., 2014).To overcome this, we plan to extend our model to the full activation equation of ACT-R, which also considers contextual information via its associative activation (Anderson et al., 2004).Moreover, we plan to extend our model by other components of ACT-R, for example, to investigate further context dimensions such as the mood or the current activity of the user (see, e.g., Ferwerda et al. (2015)).We could achieve this by defining and implementing so-called production rules from ACT-R's procedural memory module as, for instance, done in the SNIF-ACT model (Pirolli and Fu, 2003;Fu and Pirolli, 2007).Another limitation of our work is that we employed a rather simple definition for the mainstreaminess of a user.We, therefore, plan to extend our analysis to include more sophisticated mainstreaminess measures, e.g., based on rank-order correlation or Kullback-Leibler divergence (Schedl and Bauer, 2018).As part of future work, we plan to integrate our findings into music recommendation algorithms, with particular attention to addressing the low mainstreaminess group, since standard collaborative filtering approaches tend to fail to provide suitable music recommendations for this user group (Schedl and Hauger, 2015).For example, we plan to integrate the preference values we obtain for a specific user and a particular genre via our approach as a context dimension into a matrix factorization-based approach (Mnih and Salakhutdinov, 2008;Koenigstein et al., 2011) or a deep learning-based approach (Lin et al., 2018;Sachdeva et al., 2018).
Furthermore, we aim to apply our approach to the problem of music playlist continuation, which was also the task of the ACM RecSys Challenge 2018. 8We believe that our findings concerning the temporal relistening patterns of music genres (see Section 3.1) could help identify genres that users commonly listened to consecutively.We could then, for example, incorporate such genre sequences into the two-stage convolutional neural network (CNN) model for automatic playlist continuation that was proposed by Volkovs et al. (2018).Finally, we would like to highlight that our approach could be easily leveraged by researchers and practitioners also for other related tasks (e.g., recommending music artists) and not only for genre prediction.Thus, we hope that future work in the areas of user modeling and music recommendation will be attracted by our insights.

Reproducibility
To foster the reproducibility of our research, we use the publicly available LFM-1b Last.fm dataset (see Section 3.1).Furthermore, we provide our evaluation framework TagRec (see Section 4.1) freely for academic purposes.We hope that the approach presented in this paper and its implementation in TagRec, as well as the dataset, will attract further research on music preference modeling and recommender systems.

Figure 1 :
Figure 1: Boxplots show the average pairwise user similarity in our user groups using the cosine similarity metric computed on the users' genre distributions.While users in the LowMS group show a very individual listening behavior, users in the HighMS group tend to listen to similar music genres.

Figure 2 :
Figure 2: Number of listening events LE (in millions) forthe top-30 genres of our LowMS, MedMS, and HighMS Last.fm user groups.We find that there are some dominating genres in the HighMS group, while the genre distribution in the LowMS group is more evenly distributed.
test set (in days)

Figure 5 :
Figure5: Recall/precision plots of the baselines and our BLL u approach for the three user groups LowMS, MedMS, and HighMS.We see that BLL u provides the best results for all groups and for all k = 1…10 predicted genres.

Table 1 :
Dataset statistics for the LowMS, MedMS, and HighMS Last.fm user groups.Here, |U| is the number of distinct users, |A| is the number of distinct artists, |G| is the number of distinct genres, |LE| is the number of listening events, |GA| is the number of genre assignments, |GA|/|LE| is the number of genre assignments per listening event, u G is the average number of genres a user u has listened to, MS is the average mainstreaminess value, and Age is the average age of users in the group.HighMS group (LE count of Rock = 2,269,861), in the LowMS group, it is not as dominant(LE count of  Rock = 685,998).Furthermore, we find several genres that are not popular in the MedMS and HighMS groups but are popular in the LowMS group, such as Ambient and Black Metal.
u G MS Age LowMS 1,000 82,417 931 6,915,352 14,573,028 2.107 85.771 .125 24.582 MedMS 1,000 86,249 933 7,900,726 20,264,870 2.565 126.439 .37925.352 HighMS 1,000 92,690 973 8,251,022 22,498,370 2.727 186.010 .68821.486 in the Recall/precision plot of our BLL u approach for k = 1…10 predicted genres for the three user groups LowMS, MedMS and HighMS.We see that BLL u provides good prediction accuracy results for all groups but especially in the LowMS setting.This shows that our approach is especially useful for predicting the music genre preferences of users with low mainstreaminess values.