This software performs statistical analysis of strings. More precisely, the application counts the number of letters, digits, accents, punctuation marks, words and periods of a text. It also calculates the number of letters per word and words per period (showing position and dispersion measures of these quantitative variables), the longest periods, the shortest periods, the longest words and the shortest words. There are two special modules: the first module lets the student use the frequency distribution of letters to decode messages encrypted with the Caesar cipher, and the second one allows him or her to investigate power laws in frequency distributions of words in a text (Zipf's Law).

The main objective of our software is to provide an interactive environment in which students and teachers can experiment, explore and enjoy the use of statistics in a real-world application (namely, text mining), and through this exercise in the linguistic context, promote the learning of statistical concepts. In addition, this proposal has a practical feature: it is really very easy to find out data for analysis on the Internet (free books, poems, speeches, song lyrics, etc.).

IMPORTANT NOTICE!

With the upgrade of the security system of the Java language implemented by Oracle in January 2014, to run many of the applications below, you must include this site (http://www.uff.br/cdme/) in the exception site list of Java. To do this: follow the instructions on the official page of Java “How can I configure the Exception Site List?” adding the addresses http://www.uff.br/cdme/ and http://www.cdme.im-uff.mat.br (the latter is a mirror of the main site). Even after this setting, Java will still ask you to confirm access to the applications. After you confirm, the application will open finally. For questions, contact us by email: conteudosdigitais@im.uff.br.


MODULES

Módulo 1
 
Module 1:
Statistics of Letters and Cryptography


 

Módulo 2
 
Module 2:
Statistics of Letters, Words and Periods


 

Módulo 3
 
Module 3:
The Zipf's Law


 

Módulo 4
 
Module 4:
Vocabulary Growth


 



Creative Commons License

Responsible: Humberto José Bortolossi.
Idealization: Humberto José Bortolossi.
Programming: Humberto José Bortolossi.
Revision: Thiago Gomes Pereira e Humberto José Bortolossi.

Problems? Suggestions? We give technical support! Please, contact us by the e-mail:
conteudosdigitais@im.uff.br.