Passa ai contenuti principali

When words become big data

The book Big Data: A Revolution That Will Transform How We Live, Work and Think, by Kenneth Cukier, Viktor Mayer-Schonberger, provides an overview of big data, what big data is and how it is being applied. It is a topic I’m starting to be passionate about and this book really satisfied my curiosity.

Translation will be more and more a big data issue

There is an enormous amount of potential value in the examination of big data and one of the most interesting examples is how translation software has been developed.
As we know, rather than using a team of translators, an enormous amount of documents, already been translated from one language to another, was used to build the models used in translation. This has been so effective that there is a joke that the efficiency of the translation software is greater when the linguists are not involved.

I don’t know if there could be any copyright issues, but since I read the Kindle ebook, I wanted to share with you my highlights. I gathered them under sections which don't represent the chapters of the book but only the topic. I hope you will enjoy those small bites of food for mind.

When words become data

“Culturomics” is a computational lexicology that tries to understand human behavior and cultural trends through the quantitative analysis of texts.

Fewer than half the number of English words that appear in books are included in dictionaries.

The cornucopia of words consists of lexical ‘dark matter’ undocumented in standard references.

Transforming words into data unleashes numerous uses. Yes, the data can be used by humans for reading and by machines for analysis.

A classic example of data’s innovative reuse is search terms.

The data had a primary use—to prove the user was human—but it also had a secondary purpose: to decipher unclear words in digitized texts.

The “quantified self” movement

Data can be cleverly reused to become a fountain of innovation and new services. The data can reveal secrets to those with the humility, the willingness, and the tools to listen.

Big data is about what, not why.

This type of thinking was a function of a “small data” environment: with so few things to measure, we had to treat what we did bother to quantify as precisely as possible.

We don’t give up on exactitude entirely; we only give up our devotion to it. What we lose in accuracy at the micro level we gain in insight at the macro level.

We don’t always need to know the cause of a phenomenon; rather, we can let data speak for itself.

What we are able to collect and process will always be just a tiny fraction of the information that exists in the world. It can only be a simulacrum of reality, like the shadows on the wall of Plato’s cave.

We are at a historical impasse where “god is dead.” That is to say, the certainties that we believed in are once again changing. But this time they are being replaced, ironically, by better evidence. What role is left for intuition, faith, uncertainty, acting in contradiction of the evidence, and learning by experience?

Collecting the information is crucial but not enough, since most of data’s value lies in its use, not its mere possession.

To datafy a phenomenon is to put it in a quantified format so it can be tabulated and analyzed.

The imprecision inherent in tagging is about accepting the natural messiness of the world.

We are beginning to realize not only that it may be impossible for a single version of the truth to exist, but also that its pursuit is a distraction.

Big data may require us to change, to become more comfortable with disorder and uncertainty.

As we transition from a hypothesis-driven world to a data-driven world, we may be tempted to think that we also no longer need theories.

Tomorrow, subsequent generations may have a “big-data consciousness”.

Big companies using big data

Amazon can recommend the ideal book, Google can rank the most relevant website, Facebook knows our likes, and LinkedIn divines whom we know. The same technologies will be applied to diagnosing illnesses, recommending treatments, perhaps even identifying “criminals” before one actually commits a crime. Just as the Internet radically changed the world by adding communications to computers, so too will big data change fundamental aspects of life by giving it a quantitative dimension it never had before.

Facebook tracks users’ “status updates” and “likes” to determine the most suitable ads to display on its website to earn revenue.

Another way of looking at it is that every Facebook user was worth around $100, since users are the source of the information that Facebook collects.

In the spirit of Google or Facebook, the new thinking is that people are the sum of their social relationships, online interactions, and connections with content.

41 gradations of blue to see which ones people used more, to determine the color of a toolbar on the site. Google’s deference to data has been taken to extremes.

 Your E-Book Is Reading You: The devices record each time users underline a passage or take notes in the margins. The ability to gather this kind of information transforms reading, long a solitary act, into a sort of communal experience.

…exactly what I did with this e-book!

Post popolari in questo blog

Differenza tra football e soccer

Perché il calcio viene chiamato in modo diverso da inglesi (football) e americani (soccer)?

I due termini, football e soccer, si usano per indicare lo stesso sport sebbene football sia presente in un maggior numero di lingue con un più alto numero di occorrenze.

Footballrisale a un decreto del 1424 in cui re Giacomo I di Scozia bandiva il gioco con la frase: "That na man play at the Fute-ball".

Nel 1863 viene fondata a Londra la Football Association (FA), la prima federazione calcistica nazionale che unificò definitivamente il regolamento. Queste regole furono adottate da tutti eccetto che dalla Scuola di Rugby, che preferiva un gioco più fisico in cui si potesse toccare il pallone anche con le mani. Si venne a creare cosi il termine soccer, entrato a far parte dello slang universitario comeabbreviazione colloquiale di Assoc., da  Association football+ la formazione agentiva "-er" per distinguerlo dal Rugby Football.



Football or soccer, which came first?

With the World Cup underway in Brazil, a lot of people are questioning if we should refer to the "global round-ball game" as "soccer" or "football"? This is visible from the queries of the readers that access my blog. The most visited post ever is indeed “Differenza tra football e soccer” and since we are in the World Cup craze I think this topic is worth a post.

According to a paper published in May by the University of Michigan and written by the sport economist Stefan Szymanski, "soccer" is a not a semantically bizarre American invention but a British import.

Soccer comes from "association football" and the term was used in the UK to distinguish it from rugby football. In countries with other forms of football (USA, Australia) soccer became more generic, basically a synonym for 'football' in the international sense, to distinguish it from their domestic game.

If the word "soccer" originated in England, why did it f…

You are doing terminology management all wrong. Here is why

We all know the never-ending, love-hate relationship between translators and terminology… now, let’s explore some of the most common errors.
Generally speaking, when thinking of terminology, we imagine a glossary, made of two parallel columns full of terms, with the source language on one side and the target language on the other.


And what better than an Excel file for this type of structure? Seems easy and intuitive enough. Plus, you can also add an extra column to the right, to add comments or other notes.

Well, there’s something wrong here: Excel was never designed to store text, much less terminological data.

Yes, you guessed it… Excel was created to crunch numbers, not words!

Using Excel files is not an effective or efficient way to manage complex databases. If you use it to create glossaries as mentioned above, you will not be able to specify additional attributes for those terms. It is indeed possible to add extra columns but always limited to one field or category for ea…