So, my Gamification of Life, has grown into a much bigger project than I thought it was. Now, the gamification part is dropping off and the concept of personal analytics of my life is gaining. Basically, I want to collect data about myself and analyse it to be kinda self-aware. At the end, make a dashboard that shows me the basic stats and maybe help find patterns where I can improve myself. This post will talk about the chats I have been having for 4-5 years on various IMs mainly, WhatsApp and Facebook.
Lets first talk about the data. I had this chat thing in mind for a year or two. Hence, during all this time whenever, I had a conversation with someone on WhatsApp I used to email it to myself. To reduce my work, I do it whenever I think a particular chat has been going on for a long time. There was no other method to get them out of the app. Then, I manually downloaded every text file and ran a script which gave me the the clean CSVs. For the purposes of this post, I have only extracted the timestamps and the person sending the replies.
Getting Facebook chats data was a time consuming task. I tried a few non-coding ways to get all the data. Something that might give me the chats in the form of text files but none helped. I tired a chrome plugin Facebook Chat Downloader. This was useful to get around 30-40k messages. But, I had a friend with who I have talked more than that. Now, even if I had chosen this method, I wasn’t sure how to get that all the data from that person. Besides, there was a lot of manual labor there. Then, I thought, lets download my Facebook data archive, they say that there will my chat data included. But, as expected, they weren’t complete. Facebook is quite restrictive towards getting your own data out of the Facebook. Their graph API is restrictive as well. I have never seen any dataset of Facebook in public domain. It kind of makes sense for Facebook as user data is the fuel they are running on. But, as a user (or, more like, as a data nerd) this brings some issues. Since, I couldn’t get my data out with the proper channels, I had to look at the requests the browser was making to mimic the same requests with my code. This takes some time to get the parameters right. Data scraping mostly deals with this concept - figure out the requests your browser is making to show you the data you want scraped and then mimic the requests with the code, giving the server an impression that you are the one making the requests. Thus, I had huge (not in size of file, but in the length of data) JSONs which were converted into the respective CSVs. I even scraped the data of group chats.
The above plot shows the messages I sent to any of my friend on WhatsApp or FB. The year 2012 (mid 2012, to be more precise) was the period I started being a bit more active on Facebook than just having an account. Interestingly, that’s also the year I started college. The blank each day comes from the sleeping time and college time, except weekends of course. That gap in January 2016, covering almost full January, was when I went for a holiday and after returning I got busy with my internship.
Man, I thought this plot will look great, but it looks okay-ish. I am all over the place. My inconsistent sleeping pattern, majority of activity during the late nights, even stretching to, 5 or 6’O clock in the morning, pretty much shows that I am a night owl. I am trying to change that though. I try to sleep early every day, but then I remember to do something that needs to be done like, writing this post at 3:40 am while listening to Human Qualities of Explosions in the Sky).
After joining the college, I talked pretty much to someone who is a very good friend of mine now. Very knowledgeable dude with whom I had interesting conversations with. The messages in the year 2012 is pretty much with him. At the end of 2012, I picked up the pace and made some new friends. By 2013, I was very busy talking. After that, I went through something rough and I dropped quite low and then again picked up the pace. After that, with sudden high spikes I am pretty much consistent in talking.
All in all, out of a whopping 90000 messages sent by me, I was most chatty in the years 2013 (24500 messages) and 2015 (26000 messages). I almost dropped sending messages by 30% from 2013 to 2014. Damn! I didn’t realize that rough episode was this rough. This was a new insight. And, the year 2016 is going pretty good relative to the previous years. I am talking more this year. Interesting. Talks about college major project, internship, seniors add ups quickly.
My talking to just one friend during the later half of 2012, kind of shows in this plot with monthly unique friend average reaching to almost 2 except the last month or so. In 2013, I made many friends with the average reaching to more than 6. Naturally, I was going to converse with this many friends which shows in the previous plot. 2013 was a good year. Then, that rough thing happened and man, I reduced the talking. In the most chatty year 2015, that peak at around September was interesting to find about. That was when the college placement season started. That month was quite active. I was a placement coordinator. I was confused about my career choices. There were other things on my mind too. Consequently, during that month, I talked to a total of 47 different friends, college seniors, mentors, coming to an all time high, monthly average of more than 9 people.
I always knew, that I don’t write much in the replies. I usually, tend to favor one word replies where ever I can. So, the plot above shows the truth in that, but it’s also a bit misleading. In my data there was also Facebook group chats. Clearly, in that case, incoming messages are more than whats coming in. But, the above plot generally shows my preference to send less number of texts and it also clearly shows the trend of the increase of my messages with the incoming messages, which, I think is obvious. This also shows that, I don’t ignore the texts of others. Just kidding, this plot barely shows that. There’s also some interesting points in the plot where average incoming and outgoing messages were equal. Well, I’ll look at that some other time.
There are a lot of things that can be done with this of data. I guess, I have just touched the surface. So, that good friend I talked to during most of the 2012, we have the largest number of messages totaling around 50,000 messages (incoming and outgoing). When I was calculating the averages per friend given days we have talked, the “score” for that friend came out to be pretty low. And, the score with someone (who do you think that someone is?), I have been talking a lot lately came out to be quite high. I found this quite interesting and quite obvious too. If the messages have been spread out to a larger range of dates then the score will be less and vice versa. Of course, the number of messages have to a bit significant in both cases. This “score” is kind of, a representative of “density” of messages on the dates. It’d be fascinating to make some visualization out of it to see whose “density” is changing with time or density throughout the history of chats and further more trends I can’t think of right now. I guess, that’s work for some other time.
As a part of the Quantified self, I also log my sleep times. It’d be amusing to look at how this data aligns with my sleep. In general, I think, after ending the last talk at night, I am up for 2-3 hours more. It’ll be really interesting to find the actual number. My another post about sleep times is due. I will definitely talk about this then. My present preference is to make the dashboard, mentioned at the start of the post. And, while working on this post, I also learned d3js. I was planning to make an interactive plot and adding it here, but then there was a lot of personal data so, I chose against it. Instead, I am going to use d3 in my dashboard. I hope I get time to work on the dashboard.
PS: Code for the above analysis and scripts to get the data can be found here - Chats