Passa ai contenuti principali

Free Online Term Extractors

This page provides a set of free terminology extraction tools available online.

Online Terminology Extraction is the extraction of terms from a text through a web service based on linguistic and/or statistical routines and algorithms. Given some text, it will return a list of terms with (hopefully) the most relevant first. Terms can be returned in a variety of formats and can be used for a variety of things:

  • Customisation of a Machine Translation system;

  • Supporting the translation process (multilingual terminology management);

  • Raise a website visibility and SEO by using extracted terms as keywords, tags, and meta-tags;

  • Maintaining a company thesaurus (the more classical approach).

The list is divided into two groups, the first one, with more detailed descriptions, includes the easier-to-use tools where all you have to do is just to specify a source text or paste the source URL, press a button and get the term list. There is no software to install, no manual to read, and, of course, no price to pay. The second group includes also SEO tools and APIs.

OneClick Terms - terminology extractor. OneClick Terms is a simple term extractor interface giving easy access to terminology extraction functionality. It is powered by the Sketch Engine technology.

Terminology Extraction by Translated uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. It uses a probabilistic part of speech tagger to take into account the probability that a particular sequence could be a term. It creates n-grams of words by minimizing the relative entropy. Terminology Extraction by Translated can be also used to improve search results in traditional search engines (es. Google) by giving a better estimation of how much a keyword is relevant to a document.

Uploading: Texts may be submitted for analysis through entering it into the text window.

Languages supported: English, Italian, French.

Supported file formats: .txt

TerMine: Developed specifically for bio medical-area.

Technical terms are important for knowledge mining, especially in the bio-medical area where a vast amount of documents are available. The number of terms (e.g., names of genes, proteins, chemical compounds, drugs, organisms, etc) is increasing at an astounding rate in the biomedical literature. Existing terminological resources and scientific databases cannot keep up-to-date with the growth of neologisms. A domain-independent method for term recognition is very useful to automatically

Uploading: Texts may be submitted for analysis through any of the following ways:

  • entering the text you would like to analyze into the topmost text window;

  • specifying a text file (*.txt or *.pdf) from your computer's hard drive;

  • entering an URL of the Web resource (*.html or *.pdf.

Languages supported: all Unicode-compliant languages.

fivefilters : This is a free software project to enable easy term extraction through a web service.

Given some text, it will return a list of terms with the most relevant first.

The list is returned in JSON format. It is a free alternative to Yahoo's Term Extraction service. It is being developed as part of the Five Filters project to promote alternative, non-corporate media.

Languages supported: English

AlchemyAPI employs sophisticated statistical algorithms and natural language processing technology to analyze data, extracting keywords that can be utilized to index content, generate tag clouds, and more. API endpoints are provided for performing keyword extraction on Internet-accessible URLs and posted HTML files or text content.

Extracted metadata may be returned in XML, JSON, RDF, and Microformats rel-tag formats.

topic keywords from HTML, text, or web-based content.

Languages supported: English, French, German, Italian, Portuguese, Russian, Spanish, Swedish.

Maui - indexer: Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles. It also shows how keyphrases can be extracted from document text.

File formats supported: text, PDF, Microsoft Word.

Vocab Grabber analyzes any text, generating lists of the most useful vocabulary words and shows how those words are used in context. VocabGrabber creates a list of vocabulary from the text, which can be then sorted, filtered, and saved. By selecting any word on the list it is possible to see a snapshot of the Visual Thesaurus map and definitions for that word, along with examples of the word in the text.

Languages supported: English

Supported file formats: all formats.

Uploading: Copy/paste into the box, and click on the "Grab Vocabulary!" button.

A lot more:

Anchovy: a free multilingual cross-platform glossary editor and term extraction tool based on the open Glossary Markup Language (GlossML) format.

ArayaBilingual Terminology Extractionan automated terminology extraction service which is based on a statistical method and results in bilingual term pairs. 

Bibclassify - A module in CDS Invenio (CERN’s document server software) for automatic assignment of terms from SKOS vocabularies, developed on the High Energy Physics vocabulary. Developed in the collaboration between CERN and DESY.

Dandelion dataTXT: a web-based multi-lingual named entity extraction extraction API.

Extractor - Commercial software for keyword extraction in different languages. There is also a demo. Developed at the National Research Council of Canada.

TerminologyExtractor - TerminologyExtractor is a tool that extracts word and collocation lists, with frequencies, from Microsoft Word document, HTML, Rich-Text Format and plain text files.

TexLexAn - An open-source text summariser and keyword extractor.

Topia term extractor - Part-of-speech and frequency based term extraction tool implemented in python.

Yahoo term extraction - Web-service based content analysis via term extraction, includes a demo.

WordFish: standard corpus and terminology extraction.

 - Last updated: 30/11/2011 -

Post popolari in questo blog

Differenza tra football e soccer

Perché il calcio viene chiamato in modo diverso da inglesi (football) e americani (soccer)?

I due termini, football e soccer, si usano per indicare lo stesso sport sebbene football sia presente in un maggior numero di lingue con un più alto numero di occorrenze.

Footballrisale a un decreto del 1424 in cui re Giacomo I di Scozia bandiva il gioco con la frase: "That na man play at the Fute-ball".

Nel 1863 viene fondata a Londra la Football Association (FA), la prima federazione calcistica nazionale che unificò definitivamente il regolamento. Queste regole furono adottate da tutti eccetto che dalla Scuola di Rugby, che preferiva un gioco più fisico in cui si potesse toccare il pallone anche con le mani. Si venne a creare cosi il termine soccer, entrato a far parte dello slang universitario comeabbreviazione colloquiale di Assoc., da  Association football+ la formazione agentiva "-er" per distinguerlo dal Rugby Football.



Football or soccer, which came first?

With the World Cup underway in Brazil, a lot of people are questioning if we should refer to the "global round-ball game" as "soccer" or "football"? This is visible from the queries of the readers that access my blog. The most visited post ever is indeed “Differenza tra football e soccer” and since we are in the World Cup craze I think this topic is worth a post.

According to a paper published in May by the University of Michigan and written by the sport economist Stefan Szymanski, "soccer" is a not a semantically bizarre American invention but a British import.

Soccer comes from "association football" and the term was used in the UK to distinguish it from rugby football. In countries with other forms of football (USA, Australia) soccer became more generic, basically a synonym for 'football' in the international sense, to distinguish it from their domestic game.

If the word "soccer" originated in England, why did it f…

You are doing terminology management all wrong. Here is why

We all know the never-ending, love-hate relationship between translators and terminology… now, let’s explore some of the most common errors.
Generally speaking, when thinking of terminology, we imagine a glossary, made of two parallel columns full of terms, with the source language on one side and the target language on the other.


And what better than an Excel file for this type of structure? Seems easy and intuitive enough. Plus, you can also add an extra column to the right, to add comments or other notes.

Well, there’s something wrong here: Excel was never designed to store text, much less terminological data.

Yes, you guessed it… Excel was created to crunch numbers, not words!

Using Excel files is not an effective or efficient way to manage complex databases. If you use it to create glossaries as mentioned above, you will not be able to specify additional attributes for those terms. It is indeed possible to add extra columns but always limited to one field or category for ea…