21 gennaio 2016

Corpus Italiano: corpus of contemporary Italian texts from the web

This corpus of contemporary Italian texts from the web was created in the context of the project PAISÀ with the aim to provide a large resource of freely available Italian texts for language learning by studying authentic text materials.

It constitutes a unique language resource for Italian in combining the following features: corpus of web texts (harvested in September/October 2010) composed entirely of freely available and freely distributable texts.

Even though primarily created for language learning, the corpus also provides a rich resource for researchers and translators. The interface will offer different modes for accessing the corpus, ranging from precompiled searches to fully flexible search options for constructing complex queries, aiming to serve different user groups.

For more detailed information, please check: Corpus Italiano


Accademia della Crusca offers an exhaustive list of databases, corpora and historical documents;

Banche dati, corpora e archivi testuali - Treccani