OcrEngineRelatedCharacters
Related characters
Related characters are used during OCR text searches to work around wrong detections of the OCR engine. A common example is that the OCR engine detects a l instead of an I (can you distinguish it?). To avoid a failed search in such cases the engine will accept a result if it contains a related character that is defined in this class with a small penalty (0.5) to the accuracy. E.g. if the engine searches for "Hello WORLD", the string "HeIlo W0RLO" would be found with an accuracy of 1.5 intead of an accuracy of 3 without the related characters.
The LEADOcrEngine benefits from this even more, due to the second and third guess it provides for every detected character. It will accept even related characters from the second and third guess.
By default the following character relations are defined:
Related character groups (strings)
Not only individual characters can be related, but also whole character groups, i.e. string. Common examples are ("rı", "n") or ("ld", "kt"). These related character groups work in much the same way, but are penalized with 0.8 in the accuracy calculation. Note: Related character groups are currently not supported for SearchLevel.StringComparison
By default no related character groups are defined.
Methods
AddToRelation
Add a new char or string to an already existing relation.
ClearRelatedChars
Clear all character relations.
ClearRelatedCharGroups
Clear all character group (strings) relations.
NewRelation
Create a new character or character group (string) relation.
Properties
RelatedChars
The complete list of all character relations.
Syntax: public IReadOnlyList<char[]> RelatedChars { get; }
RelatedCharGroups
The complete list of all character group (string) relations.
Syntax: public IReadOnlyList<string[]> RelatedCharGroups { get; }
Last updated