CuneiForm (software)

From Infogalactic: the planetary knowledge core
Jump to: navigation, search


CuneiForm
Original author(s) Cognitive Technologies
Developer(s) Cognitive Technologies
Initial release Source April 2, 2008; 16 years ago (2008-04-02)[1]
Stable release 1.1 / April 19, 2011; 13 years ago (2011-04-19)
Written in C and C++
Operating system Cross-platform
Type Optical character recognition
License Freeware/BSD licenses
Website en.openocr.org

CuneiForm is a software tool for optical character recognition. It was originally developed at Cognitive Technologies and, after a few years with no development, released as freeware on December 12, 2007. The kernel of the OCR engine was released under the open source BSD license license at the beginning of April 2008.[1]

Features

Algorithms used in CuneiForm come from the rules for writing letters, from their topology, and do not require pattern recognition learning. CuneiForm recognizes any print font (scanned from books, newspapers, magazines, laser printer output, dot-matrix printer output, typewriter text, etc.). It does not recognize handwritten or pseudo-handwritten text nor does it recognize decorative fonts (e.g. Gothic). There are special settings in CuneiForm for recognition of text from dot-matrix printer and 200x100 DPI resolution faxes.

CuneiForm can save text formatting, and also recognizes complicated tables (of any structure).

It recognizes Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, French, German, Hungarian, Italian, Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian, Russian-English bilingual, Serbian, Slovene, Spanish, Swedish, Turkish, and Ukrainian text.

CuneiForm can save recognized text in RTF, HTML, or plain text format. It can also pass text to Microsoft Word or Microsoft Excel.

User interface

CuneiForm can be used as a stand-alone command-line application, or as a back-end to other programs. It comes with its own graphic interface. CuneiForm can be also used as an OCR engine in OCRFeeder.[2]

History

Once a leader of OCR software in Russia, CuneiForm was in competition with ABBYY FineReader.

In 1993, Cognitive Technologies signed an OEM contract with Corel Corporation, which allowed the Cognitive recognition library to be built into the popular publishing package Corel Draw 3.0 (and subsequent versions).

In 1996, OCR CuneiForm'96 was released, which was the first OCR package to include the adaptive recognition method of character recognition. This method is based on a combination of two types of printed characters recognition algorithms: multifont and omnifont. This self-learning system is capable of recognizing poorly printed symbols by creating an internal font generated by those symbols which were printed well enough to be recognized. Thus dynamic adjustment (adaptation) for specific input characters is used.

In June, 2008 Cognitive Technologies launched a free on-line recognition service on OpenOCR.org[where?].

Opening sources

Cognitive Technologies has started a program to make OCR available for all users. Its first step was releasing CuneiForm as freeware.

Cognitive Technologies plans to start developing a new version of the software as an investor and coordinator of the project. Developers decided on the BSD license for the release to take into account all legal and technical nuances, but the whole program or its separate modules may be released later licensed under the GPL.[3]

In September 2008, part of Cuneiform was released as open source software. One of the missing parts is table analysis, However, Cognitive has promised to release this component in the future.

Cuneiform is being ported to Linux, BSD and Mac OS X.[4] This branch of code will finally be merged with Cognitive codebase.[when?]

References

External links

  • (English) Official website
  • (Russian) Official website
    • You can download Russian and English version of the setup here, and also the source code.
  • Puma.NET is a wrapper library for Cognitive Technologies CuneiFrom recognition engine. It makes it easy to incorporate OCR functionality in any .NET Framework 2.0 (or higher) application.