Discover more from art fish intelligence
How I Cried in 2022: An Analysis of 365 Days of Personal Data
An investigation into my crying patterns using data I collected on myself
I am obsessed with collecting data on myself. Every day of 2022, I filled out a Google Form I made to collect data on myself, tracking items such as whether I cried, exercised, drank coffee, or washed my hair. I also collected data from Apple Health and Google Location History to get a more complete picture of my patterns and behaviors throughout the year. In this article, I provide insights into my personal experiences in 2022 through a combination of all of this data.
For me, 2022 was a year of big changes and new opportunities — I moved to New York, started a new job, and traveled to many cities. To reflect on the year that just happened in a proper data scientist fashion, I combined all of this data and analyzed it to understand my patterns of crying — where I cried, when I cried, how often I cried, and a tiny bit of insight into why I cried. The hope is that these insights will prepare me for many more days of crying to come in the New Year. (Note: these analyses are purely for fun and are not meant to be very rigorous. No statistical claims are made).
Part 1: Analyzing crying patterns in structured data
An overview of the data sources
I combined the following data:
Apple health data, exported into a CSV (following instructions). Included walking speed, step asymmetry, and distance walked/ran.
Garmin (exercise watch) data, exported into a CSV. Included heart rate, step count, and flights of stairs climbed.
Period health data from Flo, exported into a CSV.
Google Location History data, obtained through Google Takeout. Google location data is very granular in terms of geography (very specific latitude and longitude coordinates) as well as time (down to the minute). I rounded the lat/lng to the city level and chose the most common city I was in per day
Google form survey data, exported into a CSV. Included what kind of exercise I did, how much coffee I drank, and whether or not I cried.
In 2022, I cried in many different locations
In 2022, I cried a total of 48 days. I split my days among New York (where I live), Toronto (where my partner lives), Seattle (where my work is), visiting my parents, and traveling. Nearly 36% of my crying days were in Toronto. Even accounting for the fact that I spent 49 days in Toronto (compared to 239 in New York), I cried way more often in Toronto than anywhere else.
I also looked at how often I cried when I was in a certain location (% days crying in a city divided by total days spent in that city). Compared to other locations, I spent nearly a quarter of my time in Toronto crying.
So why did I cry so much in Toronto? One reason could be because I spent January and February 2022 there — not only was it excruciatingly cold, but Toronto was still under lockdown at that time. So that might explain why I spent so much time crying in Toronto. There was nothing else to do.
I waited for the weekends … to cry
Over 50% of my crying days were on the weekend. I cried a lot more on Saturdays and Sundays than on the weekdays. No fun weekend activity quite like crying.
I cried whether or not I exercised
Exercise is great for mental health. I guessed that I might cry less on the days I exercised. Above is a breakdown of types of exercise I did in different locations. I did martial arts (Muay Thai, boxing, and a singular Wing Chun class) mostly in New York and Toronto while I did more cardio (running, biking, hiking) during my visits to Seattle or to see my parents. Days of dancing (salsa, hip hop), walking (a catch-all for days I walked over 10K steps but didn’t log a specific workout), and other exercise (including yoga, home workouts, and going to the normal gym) are sprinkled among the different locations.
I calculated the percent of days I spent crying on days I did certain exercises, relative to total days I did that exercise. At first sight, it looks like I cried way more on days I went dancing. But in reality, I only went dancing 8 days in the entire year, so the fact that I cried 2 of those days is not a strong enough indicator that dancing causes crying (or vice versa, that crying causes dancing? Although that would be pretty funny).
On the other hand, I cried the least on days I did martial arts. This makes sense, given that a large reason I go to the boxing gym is to get all of my rage and frustration out. Even if the correlation is spurious, it’s still a good reason to keep doing what I’m doing.
I was surprised that I didn’t cry as much as I thought on days I didn’t exercise at all. I guess it’s good to know that on days I don’t exercise, I don’t just spend all my extra time crying at home.
I cried on different parts of my monthly cycle
I spent a large part of 2022 learning more about how different parts of women’s monthly cycles affect mood, hormonal health, and so much more. I was curious about what part of my cycle I cried more during. Anecdotally, it felt like I was always crying before my period started, so I hypothesized that I would see a lot more crying happening right before my period started.
I looked at the percent of time I spent crying for each day of my period cycle. I colored the days corresponding to the period cycle: Menstrual phase (days 1-5), Follicular phase (days 5-14), Ovulation (days 14-15), Luteal phase (days 15-28).
A large part of crying happens on Day 16 of my period. This is usually right after ovulation happens, and this makes given that a lot of hormonal fluctuation is happening then. I also cried a lot on Days 21 and 22, which is about a week before my period starts and usually when I feel the worst PMS. I cried the least on Days 13 and 15 (right before ovulation) and Days 5, 8, and 9 (the first few days after the menstrual phase ended).
So, the reality is not as clear-cut as “I cry a lot before my period.” I cry during all parts of my 28-day cycle, but not equally on each day. I cry more on days of greater hormonal fluctuations, such as right after ovulation and a before the menstrual phase begins again. But I now know to keep in mind that the time after ovulation is one especially susceptible to tears.
Thanks for reading My World in AI! Subscribe for free to receive new posts and support my work.
Part 2: Analyzing crying patterns in unstructured text
I couldn’t analyze my personal data without including at least a little bit of machine learning. In this second part of the article, I used unstructured data to further analyze my crying habits.
I journaled every single day in 2022. I used OpenAI’s text embeddings to map each day’s journal into a document level embedding (essentially, a list of numbers that capture the essence of a snippet of text).
These embeddings are very high dimensional, so I used PCA to reduce the embeddings to 2 dimensions. I plotted the first two principle components and colored each document embedding based on whether or not I cried that day. (Note: the first two principle components only explained 7% of the entire variance, which is not very high). At first glance, there didn’t seem to be much of a clear distinction between the embeddings for days I cried vs. days I didn’t cry. Perhaps this is because the reasons for crying vary differently for each instance and it is likely I didn’t write about crying in similar ways each time it happened.
Predicting for future crying days
Finally, I wanted to see if it were possible to predict which days I would be more likely to cry in the future.
For the machine learning folks: I split my dataset into train/test sets based on time (80% train, 20% test). I separated my data into training and testing. In the test data, there were only 12 days of crying (out of 72 days). This is an example of an imbalanced class, in which there are way more days of not crying than crying. In terms of modeling, I kept things simple as possible. I used an out-of-the-box Gradient Boosting Classifier from sklearn. I tried simpler models, such as logistic regression and random forest, but the results were so bad I didn’t include those. I did not do any hyperparameter tuning or additional feature engineering.
I built two classifiers. Each one predicted whether or not I cried on a given day:
The first made predictions based on all of the structured features (e.g. Google location, Apple health, survey data)
The second made predictions based on the journal embeddings
For the machine learning folks, I show the confusion matrices depicting the results of each classifier. The first model (trained without embeddings) was more likely to predict a day as crying, even if it wasn’t. The second model (trained with embeddings) did not incorrectly predict a day as crying, but it also missed most of the actual days of crying. The two models did not differ greatly.
All of this to say — neither model was very good at actually detecting actual days of crying. It’s easy to get a high accuracy by just predicting “not crying” for every day, due to the data imbalance (there were many more days of not crying than crying). However, it’s difficult (at least with this early stage of modeling without doing anything fancy) to make any simplistic conclusions about clear indicators for crying. My journal entries, especially, did not give clear indications for crying. This further supports the idea that each crying session is varied in its cause, type, and essence. Predicting whether or not I’ll cry on a given day is pretty difficult!
I love New Years — it’s my favorite holiday. I love resolving and resoluting into the new year and reflecting on the past year. There’s something special about using my personal data to reflect on my year — including my crying and exercise habits for 2022.
Not every insight was useful. According to these pie charts, I cried more often on days I washed my hair, on days I did art, and on days I drank coffee. As all three of these activities either bring me joy or are good for me, I’m not going to stop doing them.
If I had more time, I would have liked to include data from other parts of my life, such as Spotify (music listening habits), Toggl (which I use to track my working hours), and expense tracking (where does all my money go?). Additionally, I would have liked to use Apple screen data (currently not possible to export) and sleep data (didn’t track). These are things I can aim to include in next year’s analysis!
Thank you for reading my article! Please feel free to leave a comment if you have any feedback! If there is any interest in the data cleaning process, let me know and I can share that as well.
Thanks for reading My World in AI! Subscribe for free to receive new posts and support my work.