10 luglio 2014

When words become big data

The book Big Data: A Revolution That Will Transform How We Live, Work and Think, by Kenneth Cukier, Viktor Mayer-Schonberger, provides an overview of big data, what big data is and how it is being applied. It is a topic I’m starting to be passionate about and this book really satisfied my curiosity.

Translation will be more and more a big data issue

There is an enormous amount of potential value in the examination of big data and one of the most interesting examples is how translation software has been developed.
As we know, rather than using a team of translators, an enormous amount of documents, already been translated from one language to another, was used to build the models used in translation. This has been so effective that there is a joke that the efficiency of the translation software is greater when the linguists are not involved.

I don’t know if there could be any copyright issues, but since I read the Kindle ebook, I wanted to share with you my highlights. I gathered them under sections which don't represent the chapters of the book but only the topic. I hope you will enjoy those small bites of food for mind.

When words become data

“Culturomics” is a computational lexicology that tries to understand human behavior and cultural trends through the quantitative analysis of texts.

Fewer than half the number of English words that appear in books are included in dictionaries.

The cornucopia of words consists of lexical ‘dark matter’ undocumented in standard references.

Transforming words into data unleashes numerous uses. Yes, the data can be used by humans for reading and by machines for analysis.

A classic example of data’s innovative reuse is search terms.

The data had a primary use—to prove the user was human—but it also had a secondary purpose: to decipher unclear words in digitized texts.

The “quantified self” movement

Data can be cleverly reused to become a fountain of innovation and new services. The data can reveal secrets to those with the humility, the willingness, and the tools to listen.

Big data is about what, not why.

This type of thinking was a function of a “small data” environment: with so few things to measure, we had to treat what we did bother to quantify as precisely as possible.

We don’t give up on exactitude entirely; we only give up our devotion to it. What we lose in accuracy at the micro level we gain in insight at the macro level.

We don’t always need to know the cause of a phenomenon; rather, we can let data speak for itself.

What we are able to collect and process will always be just a tiny fraction of the information that exists in the world. It can only be a simulacrum of reality, like the shadows on the wall of Plato’s cave.

We are at a historical impasse where “god is dead.” That is to say, the certainties that we believed in are once again changing. But this time they are being replaced, ironically, by better evidence. What role is left for intuition, faith, uncertainty, acting in contradiction of the evidence, and learning by experience?

Collecting the information is crucial but not enough, since most of data’s value lies in its use, not its mere possession.

To datafy a phenomenon is to put it in a quantified format so it can be tabulated and analyzed.

The imprecision inherent in tagging is about accepting the natural messiness of the world.

We are beginning to realize not only that it may be impossible for a single version of the truth to exist, but also that its pursuit is a distraction.

Big data may require us to change, to become more comfortable with disorder and uncertainty.

As we transition from a hypothesis-driven world to a data-driven world, we may be tempted to think that we also no longer need theories.

Tomorrow, subsequent generations may have a “big-data consciousness”.

Big companies using big data

Amazon can recommend the ideal book, Google can rank the most relevant website, Facebook knows our likes, and LinkedIn divines whom we know. The same technologies will be applied to diagnosing illnesses, recommending treatments, perhaps even identifying “criminals” before one actually commits a crime. Just as the Internet radically changed the world by adding communications to computers, so too will big data change fundamental aspects of life by giving it a quantitative dimension it never had before.

Facebook tracks users’ “status updates” and “likes” to determine the most suitable ads to display on its website to earn revenue.

Another way of looking at it is that every Facebook user was worth around $100, since users are the source of the information that Facebook collects.

In the spirit of Google or Facebook, the new thinking is that people are the sum of their social relationships, online interactions, and connections with content.

41 gradations of blue to see which ones people used more, to determine the color of a toolbar on the site. Google’s deference to data has been taken to extremes.

 Your E-Book Is Reading You: The devices record each time users underline a passage or take notes in the margins. The ability to gather this kind of information transforms reading, long a solitary act, into a sort of communal experience.

…exactly what I did with this e-book!