Towards Automatic Transcription of Estrangelo Script

INTRODUCTION

Syriac manuscripts dating back to before the 6th century CE are available in large quantities and are undergoing the process of manual transcription into machine-readable form for scholarly analysis, commentary, and publication. Manual transcription and keyboarding is a tedious and laborious task that few are willing and qualified to undertake. Syriac scholars would welcome a computer- based system that is able to provide transcriptions into machine- readable form with a reasonable accuracy. Any errors made by the automatic transcriber could then be corrected manually as part of on-line proofreading. Syriac is a useful vehicle for automatic hand- writing transcription research because many sources are carefully written by scribes. Therefore, as far as the designers of optical character recognition (OCR) algorithms are concerned, Syriac manuscripts present a large corpus that is intermediate in difficulty between type-written text and unconstrained handwriting. OCR of clearly typewritten Roman-style text is essentially solved, and OCR of unconstrained handwriting will continue to be a challenging re- search problem far into the foreseeable future. By contrast, in scribe-written texts there is sufficient regularity for the OCR prob- lem to be tractable, while there is sufficient variation to require the development of techniques more sophisticated than standard OCR methods. This rationale has also motivated our previous work in automatic transcription of scribe-written Arabic [14, 6]. Syriac is one of the simpler early Semitic languages, lacking the grammatical complexity of classical Arabic and the unpredictability of biblical Hebrew. Although the system described in this paper does not have comprehensive competence, the relative simplicity of Syriac offers motivation for further development of a complete system for Syriac handwriting transcription. Of the several script forms in use, here we focus on Estrangelo, found in the oldest manuscripts, also later widely used in Europe for printed books.

Fig. 1. The word ܩܢܘܡܐ qnoma ‘person, self’ from MS.

No previous work has been published on automatic recognition of Syriac handwriting, but this work falls into the general category of off-line cursive script recognition, an area in which there has been much effort [23, 21, 2]. However, from a character recognition per- spective, Syriac is similar to Arabic, and the existing research in Arabic character recognition has been comprehensively surveyed

[15] recently. The system described in this paper implements a standard statistical classification framework [12]. Figure 2 shows the components of the system. In the training mode, a model is constructed using the input data as training data. In the recognition mode, the model is used to classify the previously unseen input data.

The results described below were obtained from a handwritten manuscript source (MS) and a typeset source (TS). Both sources were written in Estrangelo. The MS is a leaf1 taken from Peter of Callinicum’s Adversus Damianum, a 6th century commentary on the Trinity [8]. The TS consists of the 36 pages of Mark’s Gospel taken from Burkitt’s 1904 edition [3] of the Evangelion Da- Mepharreshe typeset in the late 19th century. Pages were scanned at 300 dpi and saved as 8-bit greyscale images. Any editorial apparatus

(brackets, verse numbers, footnotes) was removed manually. Figure 1 shows an example of the word ) ܩܢܘܡܐ qnoma ‘person, self’

from MS.

The trials described in Section 4 are mainly concerned with recognizing characters within the word. However, for comparison purposes, a few trials on the recognition of whole words are also described. Practically, word recognition [23] or ‘word spotting’ [19] techniques are less useful for Syriac because it is a highly inflected language: Spellings change according to grammatical function, and almost all grammatical functions are written as word prefixes or suffixes instead of as separate words. Therefore, a combinatorially large lexicon would be required to support a word recognition ap- proach. For this reason, we focus on a character recognition ap- proach and remain attentive to relevant insights arising from the word recognition approach.

IMAGE PROCESSING

Given a page image from a source, image processing proceeds as follows. First, the connected components of the image are extracted using the standard two-pass algorithm [13] in which a label is assigned to each pixel in the first pass, with label equivalence based on pixel connectivity with its eight neighbours. Equivalence classes are determined, and a second pass updates each pixel in a connected component with a label unique to the component. This algorithm has a running time of approximately O(N) in the number of pixels. The bounding boxes of each component are then deter- mined. Next, words are found by calculating the frequency distri- bution (histogram) of the horizontal separation between neighbouring bounding boxes. The idea is that the distance be- tween words tends to be larger than the distance between compo- nents within a word [22]. The minima between two maxima in the histogram is located to determine a threshold above which inter- component separations are interpreted as inter-word spaces (Figure 4). In the data we have considered, there is a clear gap between modes of the histogram, leading to successful use of this method on both MS and TS sources (see Figure 3).

Fig. 2. Block diagram of recognition system. The system operates in train- ing mode or recognition mode. Recognition mode requires that a model is available; the model is built during training mode.

Fig. 3. Portion of MS showing bounding boxes around words spotted automatically.

Fig. 4. A frequency distribution of the horizontal separation between neighbouring bounding boxes of connected components.

Fig. 5. Illustration of projections used by the segmentation algorithm.

(a) The horizontal projection from the word sample of Figure 1, showing the upper and lower baseline. The normal density estimated from data near the lower baseline is superimposed in grey. (b) At point P within a shape, the vertical run V and horizontal run H through P are shown. The number of pixels in these runs gives the respective run lengths.

Character Segmentation

One of the main difficulties in cursive word recognition comes from segmentation of the connected characters within the word. In most cases the precise point of segmentation is indeterminate, and in some cases segmentation points can be ambiguous without using higher level contextual information such as the spelling of a word.

(1)

Our approach is to score each pixel in a word with a likelihood of being a valid segmentation point based on general principles.

Because segmentation points lie on horizontal strokes near the baseline, pixels are given a score based on the distance from the lower baseline and approximations to the thickness and direction of the stroke. All measurements are efficiently calculated from horizontal and vertical projections and run lengths of the pixels in the image (Figure 5). For the purposes of definition, let pixels in a word image be represented as the array W [r, c] having rows 1 to R and columns 1 to C; the lower left corner pixel is W [1, 1]. The like- lihood that a pixel at r, c is a segmentation point may be conveniently modelled as

(1)

The baseline likelihood is estimated by using the horizontal projection h of the whole word

(2)

for each row i, then normalising h so that Syriac words tend to be formed so that h has two modes: one at the upper baseline and one at the lower baseline. The data between the lower mode and a point halfway between the modes is used to estimate the mean and variance of a normal density modelling the horizontal projection of the word near the baseline. The likelihood of a pixel at row r being on the baseline is therefore

(3)

Fig. 6.(a) Segmented word showing oversegmentation. (b) Detail of the two spurious cuts made within the rightmost letter ܩ (qoph). (b)

Segmentation corrected by eliminating ‘nested’ segmentations.

The horizontal and vertical run lengths of the pixels connected to r, c are measured and normalised by dividing by C and R respectively:

(4)

(5)

Equation 1 therefore expresses the dependency of segmentation upon proximity to the baseline, and width of the horizontal and vertical run lengths of the neighbouring pixels. The probability of segmentation is maximised when a point is closest to the baseline, within a narrow horizontal stroke. This probability is maximised, for example, at the trough of a ‘V’ shape, which explains why some segmentation techniques use curvature (e.g. [2]).

Pixels where is maximal are chosen as segmentation points, and a vertical cut is made is at the point, stopping when the background is reached (Figure 6). The result is usually overseg- mented in a systematic way. To correct this, spurious ‘nested’ seg- mentations are detected in the following way. First, the bounding boxes of segments are found. If a bounding box is entirely en- closed by another bounding box, the segmentation points given by the inner box are ignored. Also, single cuts within an enclosing bounding box are also ignored.

The segmentation method fails in two particular cases: for the unconnected letter ܢ (nun) because it crosses the baseline, and for the letter ܚ (Heth) because it contains two places resembling

plausible points of segmentation. The segmentation algorithm finds usable segmentations for about 70% of the characters. For the pur- poses of constructing a database of segmented characters to which classification trials could be applied, the remaining 30% were cor- rected manually. Curiously, this is the same over-segmentation rate as recently reported using a very sophisticated segmentation algo- rithm on neat italic English handwriting [2].

Feature Extraction

It is useful to represent character image data as a small set of features, partly to reduce the size of the model, and partly to char- acterise the data in ways that are invariant to typically encountered transformations and deformations. Geometric moments invariant to a variety of transformations are widely used in computer vision [13]. We have considered several alternative methods using mo- ment functions. The first method follows the well known approach of using a set of predefined moment functions (e.g. [17]). The sec-

ond method starts from the generalized moment functions (GMFs) recently introduced by Chang and Grover [4].

Fig. 7. Image of the letter r< alaph and its size normalised polar map.

Pre-defined Moment Functions

We use the feature set defined as follows. Given an image function with mass

the normalised central moments are

(6)

where

Following [17], selected moment functions are

(7)

(8)

(9)

(10)

In the experiments described in the next section, these moments are applied to images of several kinds: the whole image of the char- acter, subimages of overlapping and non-overlapping windows, and

O and all pixels ina polar transformed image (with windows). The polar transforma- tion, similar to the log-polar transform [25] widely used in com- puter vision research, is a conformal mapping from points in image to points in the polar image We adapt this by defining an ‘origin’ given by the centroid

Where d is the maximum distance between the mapping is described by

(11)

(12)

We map onto a polar image of size 64 × 64, giving a representa- tion that is size invariant and for which rotations have been trans- formed to translations (Figure 7). Because the resampling is dense and data is reduced, there is also a certain degree of smoothing of shape distortions. We have used this adaptation of the polar image previously for Arabic handwriting recognition [6], with comparable results.

Fig. 8. Four probing functions used by Chang and Grover (here redrawn from [4]). The leftmost function gives a result equivalent to the mass cen- troid. Each function is used in both the x and y directions.

Fig. 9. Six degree polynomial signature superimposed onto the letter

r< (alaph).

More Generalized Moment Functions

The method of Chang and Grover [4] convolves the object with up to four different predefined ‘probing’ functions (better described as basis functions ) as shown in Fig. 8. The basis functions are one-dimensional, so following [9] are combined using a complex convolution to scan the input image , so that a moment

Within a window, convolu- tion of each basis function with the image will result in a distinct generalized centroid (G-centroid) at the convolution’s zero- crossing point. Chang and Grover pair G-centroids into phasers that are used as features. However, we further generalise the GMF method by generating a set of basis functions from each character in a train- ing set instead of using predefined basis functions. Furthermore, because our basis functions are not necessarily symmetric about an origin, the concept of a G-centroid is not justified, so we must use a pair of basis functions, and , and use the moment value as a feature. Thus the ‘more generalized’ generalized moment over the image function is defined

Using our More Generalized GMF method (MGGMF), a model is defined by selecting one sample from each of the 25 letter

shapes. A pair of basis functions and is generated for each shape in the model, giving a total of 50 basis functions. The func- tion is found by regressing an -degree polynomial in to the pixels of image of character interpreted as unit weighted

points in an - scatterplot, as shown in Fig. 9. Function is similarly found using image of character . The justifica- tion for this approach lay with the basis polynomial representing a

’signature’: a representation of the distribution of the mass of the character as functions of and of . The fitting method mini-

mizes the squared mean error. Goodness of fit is not really an is- sue, as the resulting curve is intended as simply a discriminable sig- nature of the shape, and not a faithful copy of the shape.

Given a character, a feature vector of length 25 is found by convolving the character image with each or

. We have experimented with polynomials of degree

, and have also experimented with increasing the resolution of the method by finding signatures for the four quarters of the bounding box of each character. This increases the number of val- ues in the feature vector to 100.

CLASSIFICATION

Each letter in the alphabet is associated with one or more classes. Some letters are associated with more than one class because their variants are quite different shapes.

Fig 10. Recognition rate (in percent) obtained with tenfold cross- validation for tabulated values of for the trial FW2-PW2-FW8 on source TS.

For example, the letter mim is associated with two classes, one for each variant and . We use the ‘one against one’ approach in which for classes, classifiers are constructed and each classifier trains data from two different classes. Each classifier is a support vector machine (SVM) [7, 26], in which -dimensional training vectors are mapped into a sufficiently high dimensional space where linear separation exists. In practice, a separating hy- perplane may not exist, for example in cases of high noise level. Therefore, slack variables can be introduced in order to relax classi-

fication constraints at a risk of misclassification. By using a kernel function, it is possible to compute the separating hyperplane with- out explicitly mapping into the higher dimensional space. We use the radial basis function (RBF) kernel, defined for patterns and

: . Given training pat-

terns with associated labels , the SVM algorithm solves a dual quadratic optimisation problem to find Lagrange ex- pansion coefficients that specify the separating hyperplane [20]:

(13)

(14)

Those patterns whose are non-zero are called support vectors. This leads to the nonlinear decision function (classifier)

(15)

The classifier tends to be very efficient because most become 0, so the support vectors are the only ones needed. The SVM model we use has two parameters: the kernel ‘spread’ and the relaxation cost trade-off C. Model selection is performed by enumerating values of the parameter pairs to find the pair that gives the highest cross-validation accuracy for each fold of a 10-fold cross-validation procedure (CV-10). In the CV- proce- dure, the samples are randomly divided into disjoint sets; the

classifier is trained times, each with a different set held out as a validation set. The estimated performance is the mean of these errors. Figure 10 shows an example of the recognition rate as a function of . Note the ridge along which the highest recog- nition rates are obtained, suggesting correlation between and .

Table 1. Results (in percent recognition rate) of 28- and 25-class trials using features of character and word samples. Each column is headed with c/s, indicating the class size c and sample size s per class. Results for the feature set F were considered too unpromising to be included in later trials

Once a satisfactory is found, the one-against-one method is used for training a -class discrimination problem in which all classifiers use the same model.

RESULTS

A database of character images was obtained from both MS and TS sources. Character images were size-normalised to 64 × 64 pixels, and a polar transformed image was also obtained. Several classifica- tion trials were carried out, variously using the image, polar image, and regions within the images. To take into account the context- sensitive variations of character shape in Syriac, the model built during the training mode used 28 classes, depending on how the training set was constructed. For example, the variants of the letter mim, and , were assigned different classes during training. Most variants are distinguished by having a longer baseline (e.g. and ), and these variants were assigned to the same class because the segmentation tended to trim the baseline to an approximately uniform length. This study did not use the vowels, as the sources were not vowelled.

The classification trials are identified as follows:

F The five features were obtained from the 64 × 64 pixel character image, giving a feature vector of length 5.

F/PW2 The character/polar image was divided into 2 non-overlapping

windows of 64 rows and 32 columns, and the five features ob-

tained from each window resulting in a feature vector of length 10.

F/PS2W8 The character/polar image was divided into 29 regions of 8 ×

8 pixels overlapped by 6 pixels, and the five features obtained from each window resulting in a feature vector of length 145.

F/PS4W8 The character/polar image was divided into 15 8 × 8 pixels overlapped by 4 pixels, and the five features obtained from each window resulting in a feature vector of length 75.

Table 2. Results (in percent recognition rate) for composite feature vec- tors. Comparing like trials in Table 4, a composite vector gives the highest recognition rate for the MSC source.

Table 4 shows the results from the trials carried out. Columns MSC and TSC refer to character samples taken from the manuscript source and the typeset source respectively. Under each source are columns showing results for different sample sizes and class sizes. The first trial used ten samples of each character. A second charac- ter recognition trial was undertaken using a a different association of character shapes to classes. In this trial 25 classes were defined by merging classes having insignificant differences according to the previous trial. A larger sample set used for the third trial was con- structed by duplicating the original sample size.

We also evaluated the classifier on a word recognition task, for which character segmentation is un-necessary. Column TSW of the table refers to trials carried out on a sample of 990 word images taken from the typeset source TS. The sample consisted of 10 ex- amples of each of the 99 most frequent words in TS. This trial was done only for comparison to other cursive word recognition stud- ies [6], and the recognition rates are comparable. A relatively high word recognition rate is expected because of the uniform quality of the TS sample and the inherent more pronounced distinctions be- tween word shapes relative to character shapes. A word recognition

trial was not carried out for the MS source because an insufficient sample of each word was available.

Table 2 illustrates character recognition trials in which long feature vectors were generated by concatenating the vectors ob- tained from previous trials. The trials using concatenated feature vectors, such as FW2-PW2-FW8, show higher recognition rates, possibly because these trials use both the character image and the polar transformed image in the same feature vector, as well as a combination of window sizes. Despite the longer feature vectors for these trials, the peaking phenomenon [11] is not in evidence. With a few exceptions, the recognition rate in Table 4 increases as the number of samples is increased, even if the new samples are simply duplicates. In other trials not shown in the table, the recog- nition rate reached 100% when the number of samples per charac- ter was replicated to 200 (i.e. still only 10 unique samples). This result should be treated with caution because of two sources of bias when sample size is increased. First, because cross-validation con- structs the training set essentially by sampling without replacement, it is more likely that the training set of a larger sample size repre- sents more diversity within the sample, even if the proportion held out is unchanged. Second, if the classifier shows poor generalisa- tion, then a small increase in the diversity of the training set might cause a disproportionally higher recognition rate. The cross- validation procedure is designed to limit bias [12], but some com- bination of these effects may account for an increase in recognition rate in certain trials.

Table 2 illustrates character recognition trials in which long feature vectors were generated by concatenating the vectors ob- tained from previous trials. The trials using concatenated feature vectors, such as FW2-PW2-FW8, show higher recognition rates, possibly because these trials use both the character image and the polar transformed image in the same feature vector, as well as a combination of window sizes. Despite the longer feature vectors for these trials, the peaking phenomenon [11] is not in evidence. With a few exceptions, the recognition rate in Table 4 increases as the number of samples is increased, even if the new samples are simply duplicates. In other trials not shown in the table, the recog- nition rate reached 100% when the number of samples per charac- ter was replicated to 200 (i.e. still only 10 unique samples). This result should be treated with caution because of two sources of bias when sample size is increased. First, because cross-validation con- structs the training set essentially by sampling without replacement, it is more likely that the training set of a larger sample size repre-

sents more diversity within the sample, even if the proportion held out is unchanged. Second, if the classifier shows poor generalisa- tion, then a small increase in the diversity of the training set might cause a disproportionally higher recognition rate. The cross- validation procedure is designed to limit bias [12], but some com- bination of these effects may account for an increase in recognition rate in certain trials.

We then considered a situation where the classifier was trained on the typeset source TS, then the resulting model used for charac- ter recognition on the manuscript source MS (Table 3). The moti- vation for this was to test the performance of the system on a multi-font problem in which no training data were obtained from the test source. Although classification repeatability is confirmed by the high recognition rate when the model is tested with samples taken solely from the training set, a low rate is shown when the model is tested against samples from the manuscript source. A number of factors may account for this. First, the uniformity of the characters in the TS source provide insufficient variation needed for the model to have good generalization behaviour. Second, there are systematic differences in design between the characters in the MS and TS. In general, the MS characters have a thicker stroke width and a lower width/height ratio. Also, individual characters have slight differences in shape. These factors suggest that the sys- tem is unable to treat the TS and MS sources as interchangeable, and that further work will be required to design a system with multi-font capability.

Table 3. Results of recognition trials on MSC using model obtained from characters from TS source.

Table 4. Results (in percent recognition rate) of trials using features of character samples from the manuscript source (MS) and typeset source (TS). To provide a basis for comparison, training and recognition was also performed with ten geometric moment features [16], seven Hu features [10], and ten Legendre polynomial features [24]. The MGGMF method used a 6-degree polynomial signature on whole character image; MGGMF 6Q used a 6-degree signature on each of four quarters of the character image. All recognition trials used twenty samples of each character from each source.

The final experiments concern the use of our More General Generalized Moment function, comparing the performance of well-known non-generalized moment functions. Table 4 shows the results from the trials carried out. Columns MS and TS refer to character samples taken from the manuscript source and the type- set source respectively. Under each source are columns showing results for different moment functions. As one might expect, rec- ognition rate is better for the typeset source than the manuscript source, no doubt owing to the regularity of the TS. The perform- ance of the MGGMF method applied to the whole character image suggests that the signature is insufficiently discriminative. However, when signatures are found for each quarter of the character image, a dramatic improvement is noticed. One explanation is that signa- tures are thereby more closely identified with separate strokes of the character.

CONCLUSION

This paper has described a system for recognising cursive Syriac text (Estrangelo) from ancient scribe-written and early modern typeset sources. Given a document, the system finds words and then segments each word into characters. These preliminary stages require some manual intervention to remove editorial apparatus and to correct certain systematic oversegmentations. Each charac- ter is then recognized using a trainable classifier constructed using a support vector machine. Recognition rates vary from 61% to 100% based on the method used and the source of text. Some trials may exhibit methodological bias, and these results should be treated with caution. Excluding these, the highest recognition rate on scribe-written manuscript samples, 94%, has been obtained using a the MGGMF 6Q feature vector of length 24. The support vector classifier has been tested using a 10-fold cross-validation proce- dure, which has provided a high accuracy of classification. Because the number of support vectors is minimised during the training stage, recognition is more efficient than the Hidden Markov Model classifier used by our previous work on similar sized data sets [6].

It is important to stress that the system described here is at a most preliminary stage of development. It has been a useful labora- tory research tool, but is not ready to be used on arbitrary docu- ments, nor may it be conveniently used by people other than the developers. The entire system is essentially ‘knowledge free’ in the sense that no knowledge of characteristic Syriac letter shapes or statistics has been used in the system design. Future work should concentrate on improving the segmentation algorithm, and extend- ing the system to deal with articulation marks and punctuation. Steps can also be taken to improve the robustness of the system on documents that have been badly reproduced. Both these areas of work might benefit from building in knowledge of Syriac from the letter-formation level to the morphological and lexical levels [18]. At the letter-formation level, matching flexible templates might be a productive approach instead of geometric moment functions, and a start in this direction has been recently reported for Arabic [1]. However, that method treats each character as an isolated shape, thus presuming some type of segmentation will have been applied. Finally, because Syriac is written in several forms, it would also be useful to investigate whether the system could be trained and tested equally well on the East Syriac and Serto (West Syriac) forms, as well as font-specific variants within the main script systems.

ACKNOWLEDGMENTS

We thank Chih-Jen Lin of National Taiwan University for assistance in using his LIBSVM library. P.P.J. Fernando is sup- ported by a studentship from the Bishop’s Conference of Sri Lanka. We are grateful to Sebastian Brock of the University of Ox- ford, Rifaat Ebied of the University of Sydney and George Kiraz of Beth Mardutho: The Syriac Institute, for valuable advice, source manuscripts and encouragement. This paper is an expanded ver- sion of [5].

BIBLIOGRAPHY

Al-Shaher A. and E.R. Hancock. Arabic character recognition with shape mixtures. In Proc. 13th British Machine Vision Conference, Cardiff, Wales, September 2002.

Arica N. and F.T. Yarman-Vural. Optical character recognition for cursive handwriting. IEEE Transactions on Pattern Analysis and Machine In- telligence, 24(6):801–813, 2002.

Crawford Burkitt, F. Evangelion Da-Mepharreshe. Cambridge University Press, 1904.

Chang, S. and C.P. Grover. Generalized moment functions and conformal transforms. Proceedings of SPIE, 4790:102–113, 2002.

Clocksin, W.F. and P.P.J. Fernando. Towards automatic recognition of Syriac handwriting. In Proceedings of the IEEE International Conference on Image Analysis and Processing, Mantova, Italy, September 2003.

Clocksin, W.F. and M. Khorsheed. Word recognition in Arabic handwrit- ing. In Proc. 8th Int. Conf. on Artificial Intelligence Applications, pages 271–279, Cairo, February 2000.

Cortes, C. and V. Vapnik. Support-vector network. Machine Learning,

20:273–297, 1995.

Ebied, R.Y ., A. Van Roey, and L.R. Wickham. Petri Callinicensis Patriarchae Antiocheni: Tractatus contra Damianum, volume 32 of Corpus Chris- tianorum, Series Graeca. University of Louvain Press, Louvain, 1996.

Freeman, M.O. and B.E.A. Saleh. “Optical location of centroids of non- overlapping objects.” Applied Optics, 26(14):2752–2759, 1987.

Hu, M.K. “Visual pattern recognition by moment invariants.” IRE Trans.

Information Theory, IT-8:179–187, 1962.

Jain, A.K. and B. Chandrasekaran. “Dimension and sample size considera- tions in pattern recognition practice.” In P.R. Krishnaiah and

L.N. Kanal, editors, Handbook of Statistics, pages 835–855. North-

Holland, Amsterdam, 1982.

Jain, A.K., R.P.W. Duin, and J. Mao. “Statistical pattern recognition: A review.” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 22(1): 4–37, 2000.

Jain,R., R. Kasturi, and B.G. Schunck. Machine Vision. McGraw Hill, New

York, 1995.

Khorsheed, M. and W.F. Clocksin. “Structural features of cursive Arabic script.” In Proc. 10th British Machine Vision Conference, pages 422– 431, Nottingham, England, 1999.

Khorsheed, M. “Off-line Arabic character recognition – a review.” Pattern Analysis and Applications, 5(1):31–45, 2002.

Kim, J.H., K.K. Kim, and C.Y. Suen. “An HMM-MLP hybrid model for cursive script recognition.” Pattern Analysis and Applications, 3(4):314–324, 2000.

Kiraz, G.A. “Syriac morphology: From a linguistic model to a computa- tional implementation.” In R. Lavenant, (ed.), VII Symposium Syriacum, Rome, 1996. Orientalia Christiana Analecta.

Manmatha, R. Chengfeng Han, and E.M. Riseman. Word spotting: A new approach to indexing handwriting. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 631–637, San Fran- cisco, June 1996.

Müller, Klaus-Robert, Sebastian Mika, Gunnar Rätsch, Koji Tsuda, and Bernhard Schölkopf. “An introduction to kernel-based learning

algorithms.” IEEE Transactions on Neural Networks, 12(2):181– 202, 2001.

Plamondon, R. and S.N. Srihari. “On-line and off-line handwriting recog- nition: A comprehensive review.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):63–84, 2000.

Seni, G. and E. Cohen. “External word segmentation of off-line hand- written text lines.” Pattern Recognition, 27(1):41–52, 1994.

Steinherz, Tal, Ehud Rivlin, and Nathan Intrator. “Offline cursive script word recognition – A survey.” International Journal on Document Analysis and Recognition, 2(2/3):90–110, 1999.

Teague, M.R. “Image analysis via the general theory of moments.” Journal of the Optical Society of America, 70(8):375–397, 1980.

Tistarelli, M. and G. Sandini. On the advantage of polar and log-polar mapping for direct estimation of time-to-impact from optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):401–410, 1993.

Vapnik, V. Statistical Learning Theory. Wiley, New York, 1998.