Passa ai contenuti principali

When words become big data

The book Big Data: A Revolution That Will Transform How We Live, Work and Think, by Kenneth Cukier, Viktor Mayer-Schonberger, provides an overview of big data, what big data is and how it is being applied. It is a topic I’m starting to be passionate about and this book really satisfied my curiosity.

Translation will be more and more a big data issue

There is an enormous amount of potential value in the examination of big data and one of the most interesting examples is how translation software has been developed.
As we know, rather than using a team of translators, an enormous amount of documents, already been translated from one language to another, was used to build the models used in translation. This has been so effective that there is a joke that the efficiency of the translation software is greater when the linguists are not involved.

I don’t know if there could be any copyright issues, but since I read the Kindle ebook, I wanted to share with you my highlights. I gathered them under sections which don't represent the chapters of the book but only the topic. I hope you will enjoy those small bites of food for mind.

When words become data

“Culturomics” is a computational lexicology that tries to understand human behavior and cultural trends through the quantitative analysis of texts.

Fewer than half the number of English words that appear in books are included in dictionaries.

The cornucopia of words consists of lexical ‘dark matter’ undocumented in standard references.

Transforming words into data unleashes numerous uses. Yes, the data can be used by humans for reading and by machines for analysis.

A classic example of data’s innovative reuse is search terms.

The data had a primary use—to prove the user was human—but it also had a secondary purpose: to decipher unclear words in digitized texts.

The “quantified self” movement

Data can be cleverly reused to become a fountain of innovation and new services. The data can reveal secrets to those with the humility, the willingness, and the tools to listen.

Big data is about what, not why.

This type of thinking was a function of a “small data” environment: with so few things to measure, we had to treat what we did bother to quantify as precisely as possible.

We don’t give up on exactitude entirely; we only give up our devotion to it. What we lose in accuracy at the micro level we gain in insight at the macro level.

We don’t always need to know the cause of a phenomenon; rather, we can let data speak for itself.

What we are able to collect and process will always be just a tiny fraction of the information that exists in the world. It can only be a simulacrum of reality, like the shadows on the wall of Plato’s cave.

We are at a historical impasse where “god is dead.” That is to say, the certainties that we believed in are once again changing. But this time they are being replaced, ironically, by better evidence. What role is left for intuition, faith, uncertainty, acting in contradiction of the evidence, and learning by experience?

Collecting the information is crucial but not enough, since most of data’s value lies in its use, not its mere possession.

To datafy a phenomenon is to put it in a quantified format so it can be tabulated and analyzed.

The imprecision inherent in tagging is about accepting the natural messiness of the world.

We are beginning to realize not only that it may be impossible for a single version of the truth to exist, but also that its pursuit is a distraction.

Big data may require us to change, to become more comfortable with disorder and uncertainty.

As we transition from a hypothesis-driven world to a data-driven world, we may be tempted to think that we also no longer need theories.

Tomorrow, subsequent generations may have a “big-data consciousness”.

Big companies using big data

Amazon can recommend the ideal book, Google can rank the most relevant website, Facebook knows our likes, and LinkedIn divines whom we know. The same technologies will be applied to diagnosing illnesses, recommending treatments, perhaps even identifying “criminals” before one actually commits a crime. Just as the Internet radically changed the world by adding communications to computers, so too will big data change fundamental aspects of life by giving it a quantitative dimension it never had before.

Facebook tracks users’ “status updates” and “likes” to determine the most suitable ads to display on its website to earn revenue.

Another way of looking at it is that every Facebook user was worth around $100, since users are the source of the information that Facebook collects.

In the spirit of Google or Facebook, the new thinking is that people are the sum of their social relationships, online interactions, and connections with content.

41 gradations of blue to see which ones people used more, to determine the color of a toolbar on the site. Google’s deference to data has been taken to extremes.

 Your E-Book Is Reading You: The devices record each time users underline a passage or take notes in the margins. The ability to gather this kind of information transforms reading, long a solitary act, into a sort of communal experience.

…exactly what I did with this e-book!


Post popolari in questo blog

Little platoons

There's no reference to Hegel in the Tory manifesto, but there is an allusion to one of the founding fathers of conservative thought, Edmund Burke. The "institutional building blocks of the Big Society", the document reads, "[are] the 'little platoons' of civil society". “Little platoons" is a phrase that occurs in Burke's Reflections on the Revolution in France (1790), the classic expression of conservative scepticism about large-scale attempts to transform society in the image of abstract ideals. The Tories today use it to refer to the local associations that would go to form a "broad culture of responsibility, mutuality and obligation". The problem is that, for Burke, little platoons weren't groups that you volunteer to join; they were the "social subdivisions" into which you are born - the kind of traditionalism you would have thought Cameron's rebranded "progressive" Conservatives would want to avoid. T

Microsoft Language Portal

Microsoft Language Portal:  a bi-lingual search portal for finding translations of key Microsoft terms and general IT terminology. It is aimed at international users and partners that need to know our terminology for globalization, localization, authoring and general discovery.  It contains approx. 25,000 defined terms, including English definitions, translated in up to 100 languages as well as the software translations for products like Windows, Office, SQL Server and many more.

Football or soccer, which came first?

With the World Cup underway in Brazil, a lot of people are questioning if we should refer to the "global round-ball game" as "soccer" or "football"? This is visible from the queries of the readers that access my blog. The most visited post ever is indeed “ Differenza tra football e soccer ” and since we are in the World Cup craze I think this topic is worth a post. According to a paper published in May by the University of Michigan and written by the sport economist Stefan Szymanski, "soccer" is a not a semantically bizarre American invention but a British import. Soccer comes from "association football" and the term was used in the UK to distinguish it from rugby football. In countries with other forms of football (USA, Australia) soccer became more generic, basically a synonym for 'football' in the international sense, to distinguish it from their domestic game. If the word "soccer" originated in Eng