The Hebrew Alphabet | How OCR Works

The Hebrew alphabet supports a limited character set of 27 symbols. Five letters have a final form that gets used when the character appears at the end of a word (“sofit”).

Hebrew has block letters. Don’t look for case differences: HEBREW HAS NO LOWERCASE OR UPPERCASE LETTERS – JUST “PRINT” LETTERS!

The symbol set is an “abjad”, a writing system composed of consonants. The vowels are not usually written (except in the Bible, poetry and books for children and foreign learners) but inferred. (The reader must know how each word is pronounced.) THR R N LWRCS R PPRCS LTTRS N HBRW!

Hebrew is written from right to left, just like Arabic. (Let’s face it: Hebrew is not the easiest language to read...) !WRBH N SRTTL SCRPP R SCRWL N R RHT

However, Hebrew does not have any separate numerals. The standard Western and Roman numerals (1, 2, 3 etc.) are used instead. Furthermore, these numbers are written from left to right, and so are embedded phrases in Latin script. In other words: when numbers and Latin words are inserted in Hebrew texts, both the reading direction and the alphabet change in mid course.

And that’s not the only challenge when you develop an OCR engine for Hebrew. Here’s another particular element you don’t find in the other (Latin, Greek, Cyrillic and Arabic) alphabets: Hebrew text is not written on lines. Rather the text hangs from a line above the letters! (In technical terms: the “base line” is above the characters, not under it!)

Previous page — Next page

Which languages can OCR software read? — The history of the alphabets – Latin alphabet — Latin punctuation — Greek alphabet — Cyrillic (Russian) alphabet — Hebrew alphabet — Arabic alphabet — Let’s go East – Chinese alphabet — Japanese alphabet — Korean alphabet — Asian punctuation

Home page — Intro — Scanners — Images — History — OCR — Languages — Accuracy — Output — BCR — Pen scanners — Sitemap — Search — Contact – Feedback

Home page	Intro	Scanners	Images	History	OCR	Languages
Accuracy	Output	BCR	Pen scanners	Sitemap	Search	Contact – Feedback

Supported languages	Latin alphabet	Latin punctuation	Greek alphabet
Cyrillic alphabet	Hebrew alphabet	Arabic alphabet	Chinese alphabet
Japanese alphabet	Korean alphabet	Asian punctuation