Dataclysm: Book Review

Dataclysm - What our online lives tell us about our offline selves By Christian Rudder

Dataclysm is written by co-founder and president of American dating website OkCupid Christian Rudder. As such he has access to most if not all of the data generated by the site. With its millions of users this is a treasure trove of information. In particular it's possible to observe what users do in contrast with what they say they will do. Up until now most psychological research has been done either in small scale experiments or with large questionnaires. With the huge amounts of data which are now being generated and recorded a new type of research is possible. Through the use of the data of for example OkCupid, the actual behaviour of users can be observed on a vast scale.

The book is full of interesting, insightful and beautifully depicted facts deciphered from data. For example: Facebook can predict someone's sexual orientation quite accurately using only their likes. The author explains that, for getting messages on the site, it's better to be both loved and hated than to be average. Google can use search terms to show the spread of the flu, but it can also analyse sentiments regarding race, sexual preference, and drug use. Topics which people may not speak about honestly in questionnaires, but about which their search terms will reveal their true feelings and beliefs.

Another great example of what people say and what they actually do is the age preference of men on the website. With a very clear explanation and graphs Rudder explains the discrepancy which exists between the specified preference and the actual preference of men. For example, even though 40-year old men will say they look for women aged 27-45, the women they message the most are 30 years old. Based on the stars users can use to rate each other the women who look best to them are actually 21. This goes for men of all ages: women of around 20 look best to them and they systemically message women in the lower range of their age preferences. Insights such as this, where specific user data is not important but the aggregation of data from millions of specific users is used to create insights set this book apart.

Most striking of all about the book in my opinion is the way in which the data is treated and presented. Rudder not only explains where the data is coming from and how it’s been handled, he clearly has a great understanding of the limitations and opportunities it presents. The book is not just an explanation of interesting facts, it shows a way to treat the data with respect while still gaining insight from it. He has a clear vision on what is and isn't possible and moral to do with data, and that makes me trust my data to him.