10 luglio 2014

When words become big data

The book Big Data: A Revolution That Will Transform How We Live, Work and Think, by Kenneth Cukier, Viktor Mayer-Schonberger, provides an overview of big data, what big data is and how it is being applied. It is a topic I’m starting to be passionate about and this book really satisfied my curiosity.

There is an enormous amount of potential value in the examination of big data and one of the most interesting examples is how translation software has been developed.
As we know, rather than using a team of translators, an enormous amount of documents, already been translated from one language to another, was used to build the models used in translation. This has been so effective that there is a joke that the efficiency of the translation software is greater when the linguists are not involved!

I don’t know if there could be any copyright issues, but since I read the Kindle ebook, I wanted to share with you my highlights. I gathered them under sections which don't represent the chapters of the book but only the topic. I hope you will enjoy those small bites of food for mind.

When words become data

“Culturomics” is a computational lexicology that tries to understand human behavior and cultural trends through the quantitative analysis of texts.

Fewer than half the number of English words that appear in books are included in dictionaries.

The cornucopia of words consists of lexical ‘dark matter’ undocumented in standard references.

Transforming words into data unleashes numerous uses. Yes, the data can be used by humans for reading and by machines for analysis.

A classic example of data’s innovative reuse is search terms.

The data had a primary use—to prove the user was human—but it also had a secondary purpose: to decipher unclear words in digitized texts.

The “quantified self” movement

Data can be cleverly reused to become a fountain of innovation and new services. The data can reveal secrets to those with the humility, the willingness, and the tools to listen.

Big data is about what, not why.

This type of thinking was a function of a “small data” environment: with so few things to measure, we had to treat what we did bother to quantify as precisely as possible.

We don’t give up on exactitude entirely; we only give up our devotion to it. What we lose in accuracy at the micro level we gain in insight at the macro level.

We don’t always need to know the cause of a phenomenon; rather, we can let data speak for itself.

What we are able to collect and process will always be just a tiny fraction of the information that exists in the world. It can only be a simulacrum of reality, like the shadows on the wall of Plato’s cave.

We are at a historical impasse where “god is dead.” That is to say, the certainties that we believed in are once again changing. But this time they are being replaced, ironically, by better evidence. What role is left for intuition, faith, uncertainty, acting in contradiction of the evidence, and learning by experience?

Collecting the information is crucial but not enough, since most of data’s value lies in its use, not its mere possession.

To datafy a phenomenon is to put it in a quantified format so it can be tabulated and analyzed.

The imprecision inherent in tagging is about accepting the natural messiness of the world.

We are beginning to realize not only that it may be impossible for a single version of the truth to exist, but also that its pursuit is a distraction.

Big data may require us to change, to become more comfortable with disorder and uncertainty.

As we transition from a hypothesis-driven world to a data-driven world, we may be tempted to think that we also no longer need theories.

Tomorrow, subsequent generations may have a “big-data consciousness”.

Big companies using big data

Amazon can recommend the ideal book, Google can rank the most relevant website, Facebook knows our likes, and LinkedIn divines whom we know. The same technologies will be applied to diagnosing illnesses, recommending treatments, perhaps even identifying “criminals” before one actually commits a crime. Just as the Internet radically changed the world by adding communications to computers, so too will big data change fundamental aspects of life by giving it a quantitative dimension it never had before.

Facebook tracks users’ “status updates” and “likes” to determine the most suitable ads to display on its website to earn revenue.

Another way of looking at it is that every Facebook user was worth around $100, since users are the source of the information that Facebook collects.

In the spirit of Google or Facebook, the new thinking is that people are the sum of their social relationships, online interactions, and connections with content.

41 gradations of blue to see which ones people used more, to determine the color of a toolbar on the site. Google’s deference to data has been taken to extremes.

 Your E-Book Is Reading You: The devices record each time users underline a passage or take notes in the margins. The ability to gather this kind of information transforms reading, long a solitary act, into a sort of communal experience.

…exactly what I did with this e-book!

26 giugno 2014

Cross-language search: benefits for terminoly research... and your smartphone

The “multilinguality” of Web content provides opportunities for users to directly access and use previously incomprehensible sources of Web information. 

Monolingual search engines only allow users to enter a search query in one language. This restriction clearly limits the amount and type of information that an individual user can access. In a global community, users are looking for online information access systems or services that can help them find and use information presented in native or non–native languages.

Cross–language search service enables Web users to access information that could not be accessible before.

By performing a cross-language search, users just need to write the query in their native language, then just select the target language for the SERP (Search Engine Results Page) and get the result.

Practical example of cross-language search? I will tell you my personal experience

Consider for example the instruction manuals: no company is writing them anymore. What do we do when we need a solution to a problem with our smartphone, for example? We search on Google in our language but maybe don't find anything interesting. By performing a cross-language search, we find that a blogger somewhere has already a solution to our problem but the post is written in a language we don’t know. Then, we just "machine-translate" it to understand the gist of the content. If we realise that it is exactly what we are looking for, then we find a way to get the whole content properly translated.

The best cross-language search tool I know so far, and use very often, is 2lingual.

2lingual is a useful dual-language search tool that makes it easy to search in 2 separate languages.

It performs both a Google Search and a Google Cross-language Search. It also provides a query translation option that can be activated or deactivated for Cross-language Google Searches. The top-ranking Google Search Results from 2 different languages are presented side-by-side in separate lists.

Currently, 37 Google Search Languages are supported.

Enjoy 2lingual for your terminology research! As you can see from the image below, not only I fixed my problem with the battery drain of my Iphone, but I could also check what are the (most popular) equivalent terms in the target language.


19 giugno 2014

Football or soccer, which came first?

With the World Cup underway in Brazil, a lot of people are questioning if we should refer to the "global round-ball game" as "soccer" or "football"? This is visible from the queries of the readers that access my blog. The most visited post ever is indeed “Differenza tra football e soccer” and since we are in the World Cup craze I think this topic is worth a post.

According to a paper published in May by the University of Michigan and written by the sport economist Stefan Szymanski, "soccer" is a not a semantically bizarre American invention but a British import.

Soccer comes from "association football" and the term was used in the UK to distinguish it from rugby football. In countries with other forms of football (USA, Australia) soccer became more generic, basically a synonym for 'football' in the international sense, to distinguish it from their domestic game.

If the word "soccer" originated in England, why did it fall into disuse there and become dominant in the States? "Soccer" was a recognized term in Britain in the first half of the twentieth century, but it wasn't widely used until after World War II, perhaps because of the influence of American troops stationed in Britain during the war and the allure of American culture in its aftermath. In the 1980s, however, Brits began rejecting the term, as soccer became a more popular sport in the United States: too much of an Americanism for British English to bear!

18 giugno 2014

The tech-savvy terminologist

After reading this very interesting article about the need for translators to become technologically savvy, I thought that the same is true for terminologists.

A terminologist today has to know how to use terminology software to effectively carry out terminology projects.  Extraction, selection, collection of terms, editing and management of data, updating, integration with CAT tools, mono and bilingual terminology extraction, interoperability (data exchange with other systems), those tasks can be properly performed only by using software. 

Terminology management software can be really time-saving and allow to process big amount of data.
Terminologists have to be technologically savvy, they have to able to deal with the different software available today but they have to learn how to do it.

I think that a course or webinar providing an overview of the most useful and most effective terminology management systems would be very useful to give terminologists the know-how to enable them to be able to find solutions to modern day translation.

A course on terminology management systems could ideally include the systems listed in this page that I just updated: Terminology Management Software.

Linguatech by Bruno Ciola is so far the only training course available on terminology management systems. On Linguatech you can also find a page providing a list of tutorials. 

12 giugno 2014

Blushing at the amazing result!

Thanks to everyone who voted for me in @babla 's Top 25 Language Twitterers competition.

Top 25 Language Twitterers 2014

It is so exciting to be in this list with such great colleagues!

It is actually on Twitter that I am more active. I use this blog as a secondary platform to store content and ideas. Twitter actually takes less time, just a sentence or a retweet and it is done. And what I like the most it’s that it enables more engagement with other Tweeps. The blog is my online home but I’m always serendipitously wandering around on Twitter finding yummy terminology cherries.

What this competition reveals, at last from my point of view, is that actually terminology is becoming more and more popular. Look at the results: @TermCoord no.4 and @Terminologia n.6 in blog section and @WordLo in top 25 Twitterers!

Once known as a boring subject confined by the walls of academia, terminology is now starting to be considered more interesting and sometimes even funny.

Three years ago Termcoord found out that a great amount of people would have appreciated to have a contact with the EU institution and in that moment Termcoord went social and started sharing its resources online. The love that the online community has for Termcoord today is clearly visible from Termcoord being 4th on the list! The online community likes the great work made by the permanent staff and the trainees of the Terminology Coordination Unit and the never-ending enthusiasms for terminology that the blog shows.

Terminologia etc. is my source of inspiration; it was actually by being a fervid reader of Licia's blog that I decided in 2010 to create mine (with funny results at the beginning because I was basically stealing her content, see this post on scraped content). Since then, I never skipped reading any of her posts. The blog is a must-have resource for terminologists and translators but also an interesting reading for common people that can realise that terminology can be spotted in everyday life.

While I’m writing this post, I’m distracted (what a pleasure), by Tweeps tweeting congratulations and nice words to each other for the good result. This makes me think about this quote which expresses what makes this Community of Language Lovers so lively and joyful:

Patricia Brenes just published a post mentioning me, Termcoord and Terminologia etc.: What a great week for terminology! 

26/06/2014 - LexioPhiles just mentioned this post on: TOP 100 LANGUAGE LOVERS COMPETITION 2014 – THE FINAL BRIEF. Thank you guys! This competition really pleases a Word Lover's heart!

10 giugno 2014

6 funny things I learnt at the TAUS-TaaS workshop at Localization World 2014

Tweets and cherries from the TAUS-TaaS workshop at Localization World 2014.

I promise I will stop speaking about this amazing conference saying how much happy I was for having been invited as a speaker etc etc. But.. before doing that I just want to share with you what I learnt from the TAUS-TaaS Workshop at Localization World Dublin:

1. If you try to search on Google “What does a terminologist do”, THIS is what you get:
2. This is the modern translator:
3. Terminology can be yummy:
4. That a single terminology entry costs 150 $!!
5. …and guessing what is the value of IATE
6. That inconsistent terminology can provoke disasters:

05 giugno 2014

WordLo at LocWorld: the day after..

..and now here I am..back to Luxembourg while my mind is still there in Dublin. I’m just passively joining the conference by reading the Twitter stream #Locworld and thinking: “Oh, now I would have gone to this workshop or at this presentation….”.

I enjoyed so much the TAUS-TaaS Workshop: the presentations were so interesting as well as the debate animated by the brilliant questions by Tex Texin. I met the Tilde team: Indra Samite and Andrejs Vasiļjevs, Uwe Muegge, Luigi Muzii, Jaap Van Der Meer and the lovely Anne-Maj.

I'm just grieving over one thing: I didn’t meet Catherine Christaki! We were both there and we both didn’t know! Arrrgh! :)

Here a Storify of the TAUS-TaaS Workshop (Unfortunately, it doesn’t allow me to display the full stream):

02 giugno 2014

WordLo at LocWorld: Among Terminology VIPs

I’m so thrilled I will be speaking at TAUS-TaaS Workshop at Localization World on the 4th of Jun in Dublin! I will be among the VIPs of terminology, all those I follow on social media and reading every single post end essay they write!

Well, what will I talk about? Simply about terminology from my point of view: as a blogger, as a passionate, as someone who simply enjoys this subject. 
“Calling it a hobby isn’t sufficient, but calling it professional makes it seem like it is work”.
I will show the terminology trends and share some examples from my experience as a blogger:
  1. Communicating about terminology by using social networks;
  2. Social networks as available data for carrying out terminology research, in particular for monitoring language changes such as neologisms;
  3. Websites are made of content and terminology is the critical part of the user experience ( I already wrote about this topic here);
  4. Managing and sharing terminological data: cloud based, collaborative and social platforms;
  5. The subject field of terminology is overwhelming, so some websites provide terminological resources in few clicks.

I will (try) to keep you updated from Dublin on Twitter: @WordLo

21 maggio 2014

Yay! I've been nominated!

I am so excited to announce that my Twitter account “@WordLo” and this blog have been nominated for the Top 100 Language Lovers 2014 competition hosted by bab.la language portal and Lexiophiles language blog!

I’m honored to be among so many talented language lovers.
Thank you to those who supported my nomination and to those who will vote for me!

Voting has just started, from 20th  of May to the 9th  of June. The winners will be announced on the 12th of June. 

Please, click on the following banners to vote!

Vote the Top 100 Language Professional Blogs 2014 Vote the Top 100 Language Twitterer 2014

06 maggio 2014

No more up all night to get lucky

The subject field of terminology is so overwhelming that it is easy to get infoxicated (lost with so much information).

To prevent spending nights on searching on the internet, more and more institutions, researches, companies  and simply passionate people, are taking the initiative to develop websites and blogs applying, in the words of Google, the “I’m feeling lucky” approach: to find the information that you are looking for in one-stop shop website. (Hi Patricia! I stole this expression from your blog, I really love it!).

So, look no further and enjoy using those resources that best embody, in my opinion, the “feeling lucky” approach.

Terminology Forum: Terminology Forum is a global non-profit information forum for freely available terminological information online. The Forum, maintained by Anita Nuopponen with the help of her students at the Dept. of Communication Studies, University of Vaasa, Finland, provides information on terminological activities including terminology work, research and education, online glossaries and termbanks from different fields as well as on general language dictionaries in various languages. 
Strong point: Just check Terminological Organisations, TerminologicalEducation and Bibliography, nowhere you can find such an amount of information. Really very well done.   

In My Own Terms: This newborn blog deserves all our attention even only for the enthusiasm of its creator: Patricia Brenes.  The blog provides sources to basic information on terminology, glossaries, resources which have useful sources or bibliography at the end. 
Strong point: Frequently updated, fresh material added every day (and she quotes me among the resources ^_^).

Termcoord.eu: How not to mention Termcoord? The staff of the Terminology Coordination Unit of the European Parliament, with the help of the trainees, never stops surprising you by providing resources from the EU and not only, a weekly selection of terms from the EU terminology database IATE related to current events, news from most important conferences on terminology and a lot of useful tools such as the frequently-updated list of selected glossaries, Glossary Links
Strong Point: It is the Mecca of terminology, what else?

Taus Directory: TAUS just released this new directory of translation technologies as a free and open service to the global translation industry. The directory contains listings of translation support tools, machine translation engines and language technology tools. It is not strictly related to terminology but of course terminology is directly involved as a lot of resources are listed. 
Strong Point: I was just impressed by the fresh and minimal look and I have to say that this is how a real “feel lucky” layout should be: white background, fresh vivid colours, flat design, html5 (looks like) and just a search box and four buttons to refine your search.

Update 09/05/2014 - Lingua Greca Toolbox: I definitely had to add this impressive list of resources by Catherine Christaki. Everybody in the online language community knows her and loves her because she always tries to help by sharing useful resources, recommend potential clients and so on.
Strong Point: It is very hard to put such an amount of resources in one page and Catherine successfully managed to do that. Here you don't have the sensation of being lost.
Take also a look to the section weekly favorites to be updated on latest news, interesting blog posts and online articles on translation, interpreting, language, as well as freelancing, blogging, business and social media. If you missed any of the great content, here is your chance to catch up.