Passa ai contenuti principali

Post

Linguistics: How, Why, How and What with David Crystal

The Age of the Dictionary or the definition of a hopeless task

Online term extractors: Terminology Extraction by Translated

Terminology Extraction by Translated uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. It uses a probabilistic part of speech tagger to take into account the probability that a particular sequence could be a term. It creates n-grams of words by minimizing the relative entropy.

Terminology Extraction by Translated can be also used to improve search results in traditional search engines (es. Google) by giving a better estimation of how much a keyword is relevant to a document.

Texts may only be submitted for analysis through entering the text to analyze into the text window.

Languages supported: English, Italian, French.

Website: Translated

Online term extractors: AlchemyAPI

AlchemyAPI extracts topic keywords from HTML, text, or web-based content.

AlchemyAPI employes sophisticated statistical algorithms and natural language processing technology to analyze data, extracting keywords that can be utilized to index content, generate tag clouds, and more.

API endpoints are provided for performing keyword extraction on Internet-accessible URLs and posted HTML files or text content.

Extracted meta-data may be returned in XML, JSON, RDF, and Microformats rel-tag formats.

Keyword extraction is supported in over a half-dozen different languages, enabling even foreign-language content to be categorized and tagged:
English, French, German, Italian, Portuguese, Russian, Spanish, Swedish.

Website: AlchemyAPI

Online term extractors: TerMine

Particularly suitable for bio medical-area.

Technical terms are important for knowledge mining, especially in the bio-medical area where vast amount of documents are available. The amount of terms (e.g., names of genes, proteins, chemical compounds, drugs, organisms, etc) is increasing at an astounding rate in the bio-medical literature. Existing terminological resources and scientific databases cannot keep up-to-date with the growth of neologisms. A domain independent method for term recognition is very useful to automatically

Texts may be submitted for analysis through any of the following ways: entering the text you would like to analyze in to the topmost text window;
specifying a text file (*.txt or *.pdf) from your computer's hard drive;
entering a URL of the Web resource (*.html or *.pdf. Languages supported: all Unicode-compliant languages.


Website: TerMine