The book Big Data: A Revolution That Will Transform How We Live, Work and Think, by Kenneth Cukier, Viktor Mayer-Schonberger, provides an overview of big data, what big data is and how it is being applied. It is a topic I’m starting to be passionate about and this book really satisfied my curiosity.
Translation will be more and more a big data issue
There is an enormous amount of potential value in the examination of big data and one of the most interesting examples is how translation software has been developed.
As we know, rather than using a team of translators, an enormous amount of documents, already been translated from one language to another, was used to build the models used in translation. This has been so effective that there is a joke that the efficiency of the translation software is greater when the linguists are not involved.
I don’t know if there could be any copyright issues, but since I read the Kindle ebook, I wanted to share with you my highlights. I gathered them under sections which don't represent the chapters of the book but only the topic. I hope you will enjoy those small bites of food for mind.
Translation will be more and more a big data issue
There is an enormous amount of potential value in the examination of big data and one of the most interesting examples is how translation software has been developed.
As we know, rather than using a team of translators, an enormous amount of documents, already been translated from one language to another, was used to build the models used in translation. This has been so effective that there is a joke that the efficiency of the translation software is greater when the linguists are not involved.
I don’t know if there could be any copyright issues, but since I read the Kindle ebook, I wanted to share with you my highlights. I gathered them under sections which don't represent the chapters of the book but only the topic. I hope you will enjoy those small bites of food for mind.
When
words become data
“Culturomics” is a computational lexicology
that tries to understand human behavior and cultural trends through the
quantitative analysis of texts.
Fewer than half the number of English words
that appear in books are included in dictionaries.
The cornucopia of words consists of lexical
‘dark matter’ undocumented in standard references.
Transforming words into data unleashes
numerous uses. Yes, the data can be used by humans for reading and by machines
for analysis.
A classic example of data’s innovative
reuse is search terms.
The data had a primary use—to prove the
user was human—but it also had a secondary purpose: to decipher unclear words
in digitized texts.
The
“quantified self” movement
Data can be cleverly reused to become a
fountain of innovation and new services. The data can reveal secrets to those
with the humility, the willingness, and the tools to listen.
Big data is about what, not why.
This type of thinking was a function of a
“small data” environment: with so few things to measure, we had to treat what
we did bother to quantify as precisely as possible.
We don’t give up on exactitude entirely; we
only give up our devotion to it. What we lose in accuracy at the micro level we
gain in insight at the macro level.
We don’t always need to know the cause of a
phenomenon; rather, we can let data speak for itself.
What we are able to collect and process
will always be just a tiny fraction of the information that exists in the
world. It can only be a simulacrum of reality, like the shadows on the wall of
Plato’s cave.
We are at a historical impasse where “god
is dead.” That is to say, the certainties that we believed in are once again
changing. But this time they are being replaced, ironically, by better
evidence. What role is left for intuition, faith, uncertainty, acting in
contradiction of the evidence, and learning by experience?
Collecting the information is crucial but
not enough, since most of data’s value lies in its use, not its mere
possession.
To datafy
a phenomenon is to put it in a quantified format so it can be tabulated and
analyzed.
The imprecision inherent in tagging is
about accepting the natural messiness of the world.
We are beginning to realize not only that
it may be impossible for a single version of the truth to exist, but also that
its pursuit is a distraction.
Big data may require us to change, to
become more comfortable with disorder and uncertainty.
As we transition from a hypothesis-driven
world to a data-driven world, we may be tempted to think that we also no longer
need theories.
Tomorrow, subsequent generations may have a
“big-data consciousness”.
Big
companies using big data
Amazon can recommend the ideal book, Google
can rank the most relevant website, Facebook knows our likes, and LinkedIn
divines whom we know. The same technologies will be applied to diagnosing
illnesses, recommending treatments, perhaps even identifying “criminals” before
one actually commits a crime. Just as the Internet radically changed the world
by adding communications to computers, so too will big data change fundamental
aspects of life by giving it a quantitative dimension it never had before.
Facebook tracks users’ “status updates” and
“likes” to determine the most suitable ads to display on its website to earn
revenue.
Another way of looking at it is that every
Facebook user was worth around $100, since users are the source of the
information that Facebook collects.
In the spirit of Google or Facebook, the
new thinking is that people are the sum of their social relationships, online
interactions, and connections with content.
41 gradations of blue to see which ones
people used more, to determine the color of a toolbar on the site. Google’s
deference to data has been taken to extremes.
Your
E-Book Is Reading You: The devices record each time users underline a passage
or take notes in the margins. The ability to gather this kind of information
transforms reading, long a solitary act, into a sort of communal experience.
…exactly what I did with this e-book!
Nessun commento:
Posta un commento