OcrEngineRelatedCharacters

Related characters are used during OCR text searches to work around wrong detections of the OCR engine. A common example is that the OCR engine detects a l instead of an I (can you distinguish it?). To avoid a failed search in such cases the engine will accept a result if it contains a related character that is defined in this class with a small penalty (0.5) to the accuracy. E.g. if the engine searches for "Hello WORLD", the string "HeIlo W0RLO" would be found with an accuracy of 1.5 intead of an accuracy of 3 without the related characters.

The LEADOcrEngine benefits from this even more, due to the second and third guess it provides for every detected character. It will accept even related characters from the second and third guess.

By default the following character relations are defined:

{'o','O','0','@','D','e' },
{'s','5','S'},
{'R','B'},
{'r','n'},
{'l','i','I'},
{'.',',','-'},
{':',';','¦','|' }

Not only individual characters can be related, but also whole character groups, i.e. string. Common examples are ("rı", "n") or ("ld", "kt"). These related character groups work in much the same way, but are penalized with 0.8 in the accuracy calculation. Note: Related character groups are currently not supported for SearchLevel.StringComparison

By default no related character groups are defined.

Methods

AddToRelation

Add a new char or string to an already existing relation.

ClearRelatedChars

Clear all character relations.

ClearRelatedCharGroups

Clear all character group (strings) relations.

NewRelation

Create a new character or character group (string) relation.

Properties

RelatedChars

The complete list of all character relations.

Syntax: public IReadOnlyList<char[]> RelatedChars { get; }

RelatedCharGroups

The complete list of all character group (string) relations.

Syntax: public IReadOnlyList<string[]> RelatedCharGroups { get; }

Last updated