LINKS TO CORPORA
Claim identification corpus
314 persuasive essays annotated for claims, effectiveness of claims, relationships amoung the claims, and holistic essay scores. The essays were written by undergraduate students at a public university in the United States who were native speakers of English.
The CommonLit Ease of Readability (CLEAR) corpus provides unique readability scores for ~5,000 text excerpts leveled for 3rd-12th grade readers along with information about the excerpt’s year of publishing, genre, and other meta-data.
The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus is a freely available corpus of ~6,500 ELL writing samples that have been scored for overall holistic language proficiency as well as analytic proficiency scores related to cohesion, syntax, vocabulary, phraseology, grammar, and conventions. In addition, the ELLIPSE corpus provides individual and demographic information for the ELL writers in the corpus including economic status, gender, grade level (8-12), and race/ethnicity.
PERSUADE corpus 1.0
This is the repository for The Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements (PERSUADE) corpus which contains over 280,000 discourse annotations for over 25,000 argumentative essays. The elements include leads, positions, claims, counterclaims, rebuttals, evidence, and concluding summaries.
PERSUADE corpus 2.0
The PERSUADE 2.0 corpus comprises over 25,000 argumentative essays produced by 6th-12th grade students in the United States for 15 prompts on two writing tasks: independent and source-based writing. The PERSUADE 2.0 corpus also provides detailed individual and demographic information for each writer as well as effectiveness ratings for the discourse ratings found in PERSUADE 1.0 and holistic writing quality scores for every essay in PERSUADE 1.0.
LINKS TO OTHER FREELY AVAILABLE CORPUS/NLP TOOLS
AntConc is a great free concordancing program written by Laurence Anthony. Check it out here. Laurence has a number of other helpful tools on his software page as well, so check them out.
Coh-Metrix is an advanced natural language processing tool that analyses a number of textual features related to cohesion, lexical sophistication, syntactic complexity, and text difficulty. Check it out here.
VocabProfile is a great text analysis program with a free online interface. It is particularly useful for quickly assessing learner text difficulty. The web interface is maintained by Tom Cobb and is based on the program Range, which was written by Heatley and Nation. Check out Tom's other programs here as well.
L2 Syntactic Complexity Analyzer
Xiaofei Lu has a number of tools on his website that calculate syntactic and lexical elements of texts. He also has a few tools for comparisons between Chinese and English.