Canterbury corpus

The Canterbury corpus is a collection of files intended for use as a benchmark for testing lossless data compression algorithms. It was created in 1997 at the University of Canterbury, New Zealand and designed to replace the Calgary corpus. The files were selected based on their ability to provide representative performance results.^[1]

Size (bytes)	File name	Description
152,089	alice29.txt	English text
125,179	asyoulik.txt	Shakespeare
24,603	cp.html	HTML source
11,150	fields.c	C source
3,721	grammar.lsp	LISP source
1,029,744	kennedy.xls	Excel spreadsheet
426,754	lcet10.txt	Technical writing
481,861	pl‌rabn12.txt	Poetry
513,216	ptt5	CCITT test set
38,240	sum	SPARC executable
4,227	xargs.1	GNU manual page

References

↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

External links

The Canterbury Corpus

This computer science article is a stub. You can help Wikipedia by expanding it.

[1] Lua error in package.lua at line 80: module 'strict' not found.

[2] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

v t e Standard test items
Pangram Reference implementation Standard test image
Television (testcard)	SMPTE color bars Indian-head test pattern Test Card F Philips PM5544
Computer programming	"Hello, World!" program Quine Trabb Pardo–Knuth algorithm
Data compression	Calgary corpus Canterbury corpus
3D computer graphics	Cornell box Stanford bunny Stanford dragon Utah teapot
Typography	Lorem ipsum The quick brown fox jumps over the lazy dog
Other	EICAR test file GTUBE Harvard sentences Lenna "Tom's Diner"

Canterbury corpus

Contents

Contents

See also

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools