Aligning daily activities with personality: towards a recommender system for improving wellbeing

Recommender Systems have not been explored to a great extent for improving health and subjective wellbeing. Recent advances in mobile technologies and user modelling present the opportunity for delivering such systems, however the key issue is understanding the drivers of subjective wellbeing at an individual level. In this paper we propose a novel approach for deriving personalized activity recommendations to improve subjective wellbeing by maximizing the congruence between activities and personality traits. To evaluate the model, we leveraged a rich dataset collected in a smartphone study, which contains three weeks of daily activity probes, the Big-Five personality questionnaire and subjective wellbeing surveys. We show that the model correctly infers a range of activities that are 'good' or 'bad' (i.e. that are positively or negatively related to subjective wellbeing) for a given user and that the derived recommendations greatly match outcomes in the real-world.


INTRODUCTION AND RELATED WORK
The pursuit of happiness is the ultimate goal for many people -described by Aristotle as the meaning and purpose of life. Interestingly, the scientific pursuit of happiness and life satisfaction (together referred to as "subjective wellbeing", or SWB) has intensified in the last quarter of the 20th century. Since then, the number of articles Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. RecSys '19,  in the domain of SWB have grown exponentially [23]. Now we have a wealth of knowledge [5] on how to measure SWB and a deeper understanding on how it correlates with cognitive, behavioral, environmental, biological and genetical factors. On the other hand, interdisciplinary research has successfully demonstrated the potential of mobile computing to quantify and monitor human behaviors accurately and at a scale larger than ever before [1,21,22,28], gain insight into mental wellbeing [18,19,26] and automatically predict user characteristics such as personality traits [16,17]. The advances of mobile technologies coupled with knowledge from SWB research can open the door to proactive recommendations to help people make decisions or adapt their daily activities to maximize happiness and life satisfaction. While the field of recommender systems (RSs) has provided numerous tools to support user decision making by identifying personalized and relevant content, services or products [25], RSs that provide personalized suggestions to boost SWB have not attracted a considerable research interest yet.
One of the key issues of utilizing advances in RSs for providing SWB recommendations is the existence of appropriate datasets. Contrary to the availability of rich datasets on movies, music, books, products, etc. matched with user profiles, it is not trivial to collect equally large datasets that match precise information about users' activities and ground-truth information about their SWB. This represents a common 'cold-start' problem -a situation of having sparse historical data or not having enough information about new users [13,29]. In order to tackle the cold-start problem in developing RSs for SWB, we use personality traits in this study as a proxy to user profiles, relevant for matching daily activities to subjective wellbeing at an individual level. Our approach is inspired by the congruence between personality and daily activities identified in psychology [6] as well as by recent trends of leveraging personality traits for addressing the cold-start problem in RSs [8,13,30]. Personality has been used successfully to recommend movies [9,15,31], music [9,10,13], books [9], and also leisure activities and events [24]. However the aim of these studies was to optimize item or event selection rates rather than to optimize users' SWB. Behavioral Economics suggests that people have biases in understanding the link between behaviours and their SWB [14]. Importantly, literature also suggests that it is possible to improve an individual's SWB 1 [6,20]. Our work is novel in providing a technological foundation for an RS to support people in improving SWB, with the following distinct contributions: • A machine learning algorithm that predicts users' SWB based on the congruence between their reported Big-Five personality traits and distribution of their activities (Section 3).
• Evaluation of the predictive power of the algorithm in simulating distribution of activities that would result in low or high SWB for a specific user -thus constituting a personalized whitelist and blacklist for increasing SWB (Section 4).

IN-THE-WILD DATA COLLECTION
To build the dataset required for the SWB RS, we used a smartphone app that captures patterns of daily activities that people were engaged in over a period of 2-3 weeks. The onboarding survey (delivered through the smartphone app) prompted participants to answer the Big-Five personality questionnaire [12] and a SWB question. We used the question 'Overall, how satisfied are you with your life nowadays' as the ground-truth information for SWB, referred to as "life satisfaction" [7]. During the following 2-3 weeks, we applied the Ecological Momentary Assessment method [27] by prompting participants at five random times distributed over a day to report the activity they were engaged in at that moment. The data collection was conducted in two trials. The first trial (first dataset) ran between February and August 2018 with 151 participants completing the study. Participants in this sample were involved through a recruitment agency from five different countries (UK, Spain, Colombia, Peru and Chile). The second trial (second dataset) was conducted between January and March 2019 with 256 participants completing the study from a major UK university. Participants in this sample were recruited using an email sent to students and staff at the university. For their effort to complete the study, participants in both samples were rewarded with a monetary contribution. We used the first dataset to build the model described in Section 3 and the second to validate the RS described in Section 4. We considered this to be a rigorous way of testing the generality of our model than, for instance, performing cross-validation or using dataset splitting methodologies. This is attributed to the fact that the two datasets included different demographics (in terms of level of education, socio-economic status, age and nationality) and were collected during different periods of the year to account for potential seasonal effects.

MODELLING CONGRUENCE AND ITS RELATIONSHIP WITH WELLBEING
Psychologists have emphasized that the congruence between internal and external factors is a predictor of SWB [6]. In practice, internal factors are usually static or slow-changing (inherent to an individual's innate characteristics), whereas external factors are mostly dynamic (exhibited by an individual's behaviour and environmental circumstances). Inspired by this, we defined the congruence user model that quantifies the alignment between the internal and external factors (i.e. internal and external user models). Figure 1 summarizes the development process of the congruence model and it takes into account the difference between the internal and external user models to predict SWB. As an example, the external model may indicate that a user behaves as an introverted person (based on her activities) although she is an extroverted person (based on the Big-Five questionnaire). The congruence model quantifies the gap between the two. Subsequently, we explore if the machine learning model that considers this misalignment between who she is (internal model) and how she behaves (external model)  Figure 1: Classification of high/low SWB based on the congruence user model.
predicts a lower SWB. The performance of this model is evaluated using the first dataset, and then the generated recommendations at an individual level are validated with the second dataset.

Congruence User Model
We used the Big-Five personality traits [12] to represent the user's internal model, as they are one of the most representative personal characteristics that describe and predict human behaviour [3]. As personality is obtained from a self-reported questionnaire, we refer to this as the reported personality. Formally, for user j, the reported personality vector is given by: r represent the 5 reported personality (extraversion, agreeableness, conscientiousness, neuroticism and openness) scores respectively, for the j th user.
We built a user's external user model from his/her patterns of daily activities, e.g., eating, working, watching TV, shopping, listening to music, using social media, exercising, and so on. Due to the variety of momentary activity items, we grouped the reported activity items into n activity categories as defined by Goldberg [11]. Goldberg conducted a 10 year long study with 800 individuals, and clustered 400 activity items into 33 categories. For simplicity, we refer to the activity categories as activities throughout the paper. To model the alignment between personality and activities, we also used Goldberg's study as the state-of-the-art dataset to provide correlations between activities and personality traits.
First, we captured the distribution of a user's activities, which corresponds to the normalized frequencies of all the activities that the j th user reports. Mathematically, this is represented by a vector as: i is the frequency of activity i for user j. Moreover, the sum of all components of − → act (j) for any user j equals 1. Our external model builds a secondary personality that is exhibited through the user's activity distribution, to directly compare against the internal model. We computed a dynamic construct called exhibited personality, ì p (j) ex that contains five dimensions similar to ì p (j) r and is modelled based on the user's activity distribution − → act (j) . For a user j, the exhibited personality vector is given by: map represent the 5 mapped personality scores, as a function of f . To obtain f , we first acquired a weight vector ì w (j) that defines the positive/negative accumulated effect (or weight) of each activity on the traits. By using the correlation matrix between activities and personality defined in Goldberg [11], represented as C, we define the weight of exhibited activities on the personality as: ì Here, · is the matrix product and ì w (j) is obtained by using − → act (j) as a column matrix. Using this, we derived the weighted median personality that represents the change over or below the median personality exhibited by one's activity patterns. Thus, ì p (j) ex is obtained as the vector sum of the median personality ì p median and the weighted median personality ì p median ⊙ ì w (j) : indicating that f (ì x) = ì p median ⊙(1+C· ì x), where ⊙ is the Hadamard product. Finally, the congruence user model of a person is the difference between ì p (j) r . This difference, or delta, is given by: where e (j) ∆ are the delta components of extraversion, agreeableness, conscientiousness, neuroticism and openness respectively. As each component of ì p (j) ∆ decreases, more congruent is the user's behaviour with respect to his/her personality; and as per our hypothesis, more is the SWB.

Experimentation and Results
We built a machine learning model that predicts the SWB score for a user j, by using the individual delta scores along the five personality dimensions, ì p (j) ∆ . For preforming this analysis, we use the first dataset described in Section 2. We cluster momentary activity items into 15 activity categories from the 33 defined in [11], that are most relevant to items reported in the dataset. The values of each component of ì p r vary from 10-50 and SWB is rated on a scale from 1-10. We treat the prediction of SWB as a binary classification problem by dividing the continuous variable into high (1) and low (0), using the median value as the threshold. We use the leave-onesample-out method to evaluate the the model accuracy. We tested different machine learning algorithms, namely: random forest, naïve bayes and support vector machine (SVM), and observed that the latter performs the best. For brevity, we report results with SVM only. To assess the added value of the congruence user model, we compared it to classifiers that used only personality traits ì p   Table 1. We observed that the classifier relying on the congruence user model and the computed delta features outperforms the other classifiers in predicting SWB.
This illustrates the strength of incorporating the congruence theory into the SWB classification model, in comparison to using a typical "black-box" model that relies on the same inputs -personality and activity distributions. By relying on this model, an RS to improve a user's SWB would aim to suggest activity distributions that improves the personality-activities alignment, i.e. reduce the gap quantified through delta features. Certainly, there is more than one distribution of activities that corresponds to low values of delta features for a given user, and the importance of minimizing all of the five delta values (i.e. alignment with all the five personality traits) is not equally important. In our experiments, we observed that the relative ratio among different activities and also among delta features matters more than the absolute values of activity frequency or the delta values. This also corroborates with an intuitive assumption that there is no one unique lifestyle beneficial for an individual -i.e. understanding the range of activity distributions is beneficial to improving a user's SWB.

ACTIVITY RANGE RECOMMENDATION
In this section we describe a methodology to recommend a range of relevant activities that are 'good' or 'bad for a user, and subsequently test the results using the second dataset to assess if the model scales to a different population. The flow diagram of the RS, along with the validation procedure is summarized in Figure 2. Firstly, the activity range recommender compares different exhibited personalities for all simulated combinations of activity distributions with a user's reported personality that will result in high or low SWB, using the SWB classifier described in Section 3.2. Through this process we obtain the range of activity distributions that form the whitelist (good) and the blacklist (bad) for the user's SWB. The whitelist recommendation is validated by comparing against the actual activity distribution for users in the second dataset that have high SWB. The same is done for the blacklist recomendation with users that have low SWB.

Methodology
Though the sum of all components of − → act (j) equals 1 for a user j, i.e., n i=1 ⟨ − → act (j) i ⟩ = 1, some of these components vary significantly across the sample, while others show a lower variance. The activities that do not vary across the sample also do not predominantly impact the calculated values of exhibited personality ì p ex . It is also intuitively clear that there are certain activities (usually those that we have less control of) that occupy a similar proportion of time spent by most people (such as working, studying, sleeping, eating, etc.). Hence, to narrow down the list of recommended activities (such as using social media, watching TV, reading, exercising, etc.) that may be more actionable (i.e. more under a user's control) for providing recommendations, we consider only those activities that have high variance in the sample. Without loss of generality, it is assumed that there are m activities that have high variance from Here, λ is a relatively low value (∼0.2) that covers the joined variance of the n − m activities. For the selected m activities, we obtained all potential combinations in increments of 0.1 such that this condition is met. For each of these combinations, we calculate ì p ∆ possibilities using the SWB classifier described in Section 3, and marked the cases where this is high. Using these, we determined the range (sorted from lowest to highest) of distributions for each of the m activities that are expected to give high SWB. We performed the same procedure for the cases of low SWB, indicating the range of activity distributions that result in low SWB.

Validation and Results
We use the second dataset described in Section 2 to validate the hypothetical outputs of our RS. As with the first dataset, we use the same n = 15 categories to cluster the activities. We observed that m = 8 clusters have significant variance (> 0.1) across the sample and have the most effect on ì p (j) ex , and in this dataset, the value of λ = 0.1. Using the SWB report of users in the second dataset, we divide each user into either high or low class -if they are either over or under the median value respectively (as done previously with the first dataset). For each of the users in the high SWB class and low SWB class, we obtained the range of activity distributions that would provide high and low SWB respectively, according to our model. We evaluated the extent to which the activity proportions of the user fall within the range of all selected activities (all 8 activities) or majority (at least 5) of them. These results are reported in Table 2. It is important to note that testing an activity RS proactively would be a difficult endveour as behaviour change is a complex task, and evaluating the effectiveness of interventions is particularly difficult for health and wellbeing [2]. Contextual factors, external events and sense of autonomy play a large role in determining the extent to which a recommendation will be followed and performed by users [4]. Hence, we evaluated the effectiveness of the RS for both an ideal case (user performs all activities as per the recommendation) and a realistic case (user performs the majority of the activities).
When considering all the activities that are important for high & low SWB, 51% of users in the high SWB class fall in the range, while for low SWB class the number increases to 74%. This indicates that our method is able to infer the range of activity distributions that are beneficial for user's SWB fairly well, and furthermore it is more successful in inferring the ranges of activities that have negative consequences for SWB. When including the majority of the activities (5), these numbers improve significantly -rising to 71% for users in the high SWB class and 92% in the low SWB class. This resonates well with real-life scenarios as users may not be able to maintain the exact proportion for all activities all the time.

CONCLUSION
In this paper, we presented a novel approach to developing an RS that supports users in balancing daily activities to improve their subjective wellbeing (SWB). Developing an SWB RS is a challenging task, as it faces the cold-start problem due to the lack of largescale datasets that map user characteristics, daily activities and ground-truth information about SWB. We addressed this limitation by collecting a dataset with the above information and developed a user model based on psychological literature suggesting that the congruence between internal and external factors impacts SWB [6]. Using this model, we built a binary classifier that predicts SWB based on the alignment between an individual's behavior (in terms of distribution of activities) and his/her personality. Our model outperformed the three benchmarking classifiers that relied on the same input parameters, by 9-18%. Subsequently, we simulated the range of activity distributions that would result in high or low SWB for users in another dataset, and compared personalized recommendations (i.e. white-and black-listed activity distributions) to the ground-truth of users' activity distributions. Our model inferred the range of activities that are 'good' or 'bad' for a given user with accuracy up to 92%, demonstrating that the model successfully captures the link between SWB and the alignment between reallife activity patterns with personality. We believe that this work will encourage more research, both in RSs devoted to SWB, and in psychology, aiming to deepen understanding of the congruence between individual's daily activities and personality. In addition to its development, testing the impact of an RS of this nature is a challenge, as the inference and provision of perfectly aligned activity distributions to one's SWB does not guarantee that a user will follow the recommendations. For this reason, we have created a tool that suggests daily activities to promote SWB. We plan to test this tool in the near-future and explore the change in SWB achieved through the suggestions provided by our system.