<span class="tei-title ">Towards Automatic Transcription of Estrangelo Script</span>
                <span class="tei-title "/>
                
                    
                        William F.
                        Clocksin
                    
                    
                        Department of ComputingOxford Brookes University
                    
                
                
                    
                        Prem P. J.
                        Fernando
                    
                    
                        Computer LaboratoryUniversity of Cambridge
                    
                
                
                    TEI XML encoding by 
                    James E. Walters
                
            
            
                
            
            
                Beth Mardutho: The Syriac Institute
                2003
                Volume 6.2
                
                    
                        For this publication, a Creative Commons Attribution 4.0
                            International license has been granted by the author(s), who retain full
                            copyright.
                    
                
                https://hugoye.bethmardutho.org/article/hv6n2clocksin_fernando
            
            
                
                    
                        William F. Clocksin
                        Prem P. J. Fernando
                        <span class="tei-title title-analytic">Towards Automatic Transcription of Estrangelo Script</span>
                        
                        
                        https://hugoye.bethmardutho.org/pdf/vol6/HV6N2Clocksin_Fernando.pdf
                    
                    
                        <span class="tei-title title-journal">Hugoye: Journal of Syriac Studies</span>
                        <span>Beth Mardutho: The Syriac Institute, <date xmlns="http://www.tei-c.org/ns/1.0">2003</date>
</span>
                        <span>vol 6</span>
                        <span>issue 2</span>
                        <span>pp 249–268</span>
                        <span/>
                    
                
            
        
        
            
                Hugoye: Journal of Syriac Studies is an electronic journal
                    dedicated to the study of the Syriac tradition, published semi-annually (in
                    January and July) by Beth Mardutho: The Syriac Institute. Published since 1998,
                    Hugoye seeks to offer the best scholarship available in the field of Syriac
                    studies.
            
        
        
            
                
            
            
                
                    
                
            
        
        
            File created by James E. Walters
        
    
    
        
            
                Abstract
                This paper surveys several computer-based techniques we have devel-
                    oped for the automatic transcription of Estrangelo handwriting from historical
                    manuscripts. The Syriac language has been a neglected area for research into
                    automatic handwriting transcription, yet is interest- ing because the
                    preponderance of scribe-written manuscripts offers a challenging yet tractable
                    medium between the extremes of type-written text and free handwriting. The
                    methods described here do not need to find strokes or contours of the
                    characters, but exploit characteristic measures of shape that are calculated by
                    geometric moment functions. Both whole words and character shapes are used in
                    recognition ex- periments. After segmentation using a novel probabilistic
                    method, fea- tures of character-like shapes are found that tolerate variation in
                    for- mation and image quality. Each shape is recognised individually us- ing a
                    discriminative support vector machine with 10-fold cross- validation. We
                    describe experiments using a variety of segmentation methods and combinations of
                    features. Images from scribe-written his- torical manuscripts are used, and the
                    recognition results are compared with those for images taken from clearer 19th
                    century typeset documents. Recognition rates vary from 61–100% depending on the
                    algo- rithms used and the size and source of the data set.
            
        
                
            
            INTRODUCTION
            
            Syriac manuscripts dating back to before the 6th century CE are available in large
                quantities and are undergoing the process of manual transcription into
                machine-readable form for scholarly analysis, commentary, and publication. Manual
                transcription and keyboarding is a tedious and laborious task that few are willing
                and qualified to undertake. Syriac scholars would welcome a computer- based system
                that is able to provide transcriptions into machine- readable form with a reasonable
                accuracy. Any errors made by the automatic transcriber could then be corrected
                manually as part of on-line proofreading. Syriac is a useful vehicle for automatic
                hand- writing transcription research because many sources are carefully written by
                scribes. Therefore, as far as the designers of optical character recognition (OCR)
                algorithms are concerned, Syriac manuscripts present a large corpus that is
                intermediate in difficulty between type-written text and unconstrained handwriting.
                OCR of clearly typewritten Roman-style text is essentially solved, and OCR of
                unconstrained handwriting will continue to be a challenging re- search problem far
                into the foreseeable future. By contrast, in scribe-written texts there is
                sufficient regularity for the OCR prob- lem to be tractable, while there is
                sufficient variation to require the development of techniques more sophisticated
                than standard OCR methods. This rationale has also motivated our previous work in
                automatic transcription of scribe-written Arabic [14, 6]. Syriac is one of the
                simpler early Semitic languages, lacking the grammatical complexity of classical
                Arabic and the unpredictability of biblical Hebrew. Although the system described in
                this paper does not have comprehensive competence, the relative simplicity of Syriac
                offers motivation for further development of a complete system for Syriac
                handwriting transcription. Of the several script forms in use, here we focus on
                Estrangelo, found in the oldest manuscripts, also later widely used in Europe for
                printed books.
            
                
            
            <span class="tei-hi tei-">Fig. 1. The word</span>
                <span class="tei-hi tei-">ܩܢܘܡܐ</span>
                <span class="tei-hi tei-">qnoma ‘person, self’ from MS.</span>
            
            No previous work has been published on automatic recognition of Syriac handwriting,
                but this work falls into the general category of off-line cursive script
                recognition, an area in which there has been much effort [23, 21, 2]. However, from
                a character recognition per- spective, Syriac is similar to Arabic, and the existing
                research in Arabic character recognition has been comprehensively surveyed
            [15] recently. The system described in this paper implements a standard statistical
                classification framework [12]. Figure 2 shows the components of the system. In the
                training mode, a model is constructed using the input data as training data. In the
                recognition mode, the model is used to classify the previously unseen input
                data.
            The results described below were obtained from a handwritten manuscript source (MS)
                and a typeset source (TS). Both sources were written in Estrangelo. The MS is a
                leaf
                    British Library Add. MS 7191, Folio 100va-101rb, which
                        contains the end of Chapter XXIV and the beginning of Chapter XXV of Book
                        III.
                 taken
                from Peter of Callinicum’s Adversus Damianum, a 6th century commentary on the
                Trinity [8]. The TS consists of the 36 pages of Mark’s Gospel taken from Burkitt’s
                1904 edition [3] of the <em>Evangelion Da- Mepharreshe</em> typeset in
                the late 19th century. Pages were scanned at 300 dpi and saved as 8-bit greyscale
                images. Any editorial apparatus
            (brackets, verse numbers, footnotes) was removed manually. Figure 1 shows an example
                of the word ) <span class="tei-hi tei-">ܩܢܘܡܐ</span>
                <em>qnoma</em> ‘person, self’
            from MS.
            The trials described in Section 4 are mainly concerned with recognizing characters
                within the word. However, for comparison purposes, a few trials on the recognition
                of whole words are also described. Practically, word recognition [23] or ‘word
                spotting’ [19] techniques are less useful for Syriac because it is a highly
                inflected language: Spellings change according to grammatical function, and almost
                all grammatical functions are written as word prefixes or suffixes instead of as
                separate words. Therefore, a combinatorially large lexicon would be required to
                support a word recognition ap- proach. For this reason, we focus on a character
                recognition ap- proach and remain attentive to relevant insights arising from the
                word recognition approach.
            
            
            IMAGE PROCESSING
            Given a page image from a source, image processing proceeds as follows. First, the
                connected components of the image are extracted using the standard two-pass algorithm [13] in which a label is assigned to
                each pixel in the first pass, with label equivalence based on pixel connectivity
                with its eight neighbours. Equivalence classes are determined, and a second pass
                updates each pixel in a connected component with a label unique to the component.
                This algorithm has a running time of approximately O(N) in the number of pixels. The
                bounding boxes of each component are then deter- mined. Next, words are found by
                calculating the frequency distri- bution (histogram) of the horizontal separation
                between neighbouring bounding boxes. The idea is that the distance be- tween words
                tends to be larger than the distance between compo- nents within a word [22]. The
                minima between two maxima in the histogram is located to determine a threshold above
                which inter- component separations are interpreted as inter-word spaces (Figure 4).
                In the data we have considered, there is a clear gap between modes of the histogram,
                leading to successful use of this method on both MS and TS sources (see Figure
                3).
            
                
            
            <span class="tei-hi tei-">Fig. 2. Block diagram of
                    recognition system. The system operates in train- ing mode or recognition mode.
                    Recognition mode requires that a model is available; the model is built during
                    training mode.</span>
            
                
            
            <span class="tei-hi tei-">Fig. 3. Portion of MS showing
                    bounding boxes around words spotted automatically.</span>
            
                
            
            Fig. 4. A
                    frequency distribution of the horizontal separation between neighbouring
                    bounding boxes of connected components.
            <span class="tei-hi tei-">Fig. 5. Illustration of projections used by the
                    segmentation algorithm.</span>
            
            <span class="tei-hi tei-">(a) The horizontal projection from the word sample of
                    Figure 1, showing the upper and lower baseline. The normal density estimated
                    from data near the lower baseline is superimposed in grey. (b) At point P within
                    a shape, the vertical run V and horizontal run H through P are shown. The number
                    of pixels in these runs gives the respective run lengths.</span>
            
            
                Character Segmentation
                One of the main difficulties in cursive word recognition comes from segmentation
                    of the connected characters within the word. In most cases the precise point of
                    segmentation is indeterminate, and in some cases segmentation points can be
                    ambiguous without using higher level contextual information such as the spelling
                    of a word.
                
                    <strong>(1)</strong>
                Our approach is to score each pixel in a word with a likelihood of being a valid
                    segmentation point based on general principles.
                Because segmentation points lie on horizontal strokes near the baseline, pixels
                    are given a score based on the distance from the lower baseline and
                    approximations to the thickness and direction of the stroke. All measurements
                    are efficiently calculated from horizontal and vertical projections and run
                    lengths of the pixels in the image (Figure 5). For the purposes of definition,
                    let pixels in a word image be represented as the array
                        <em>W </em>[<em>r, c</em>]
                    having rows 1 to <em>R </em>and columns 1 to
                        <em>C</em>; the lower left corner pixel is
                    <em>W </em>[1, 1]. The like- lihood that a
                    pixel at <em>r, c</em> is a segmentation point  may be conveniently modelled as
                
                    <strong>(1)</strong>
                The baseline likelihood  is
                    estimated by using the horizontal projection h of the whole word
                
                    <strong>(2)</strong>
                for each row <em>i</em>, then normalising
                    <em>h </em>so that  Syriac words tend to be formed so
                    that <em>h </em>has two modes: one at the
                    upper baseline and one at the lower baseline. The data between the lower mode
                    and a point halfway between the modes is used to estimate the mean  and variance 
                     of a normal density modelling the
                    horizontal projection of the word near the baseline. The likelihood of a pixel
                    at row r being on the baseline is therefore
                
                    <strong>(3)</strong>
                
                    
                
                <span class="tei-hi tei-">Fig. 6.(a) Segmented word
                        showing oversegmentation. (b) Detail of the two spurious cuts made within
                        the rightmost letter</span>
                    <span class="tei-hi tei-">ܩ</span>
                    <span class="tei-hi tei-">(qoph). (b)</span>
                <span class="tei-hi tei-">Segmentation corrected by
                        eliminating ‘nested’ segmentations.</span>
                
                The horizontal and vertical run lengths of the pixels connected to r, c are
                    measured and normalised by dividing by C and R respectively:
                
                    <strong>(4)</strong>
                
                    <strong>(5)</strong>
                Equation 1 therefore expresses the dependency of segmentation upon proximity to
                    the baseline, and width of the horizontal and vertical run lengths of the
                    neighbouring pixels. The probability of segmentation is maximised when a point
                    is closest to the baseline, within a narrow horizontal stroke. This probability
                    is maximised, for example, at the trough of a ‘V’ shape, which explains why some
                    segmentation techniques use curvature (e.g. [2]).
                Pixels where  is
                    maximal are chosen as segmentation points, and a vertical cut is made is at the
                    point, stopping when the background is reached (Figure 6). The result is usually
                    overseg- mented in a systematic way. To correct this, spurious ‘nested’ seg-
                    mentations are detected in the following way. First, the bounding boxes of
                    segments are found. If a bounding box is entirely en- closed by another bounding
                    box, the segmentation points given by the inner box are ignored. Also, single
                    cuts within an enclosing bounding box are also ignored.
                The segmentation method fails in two particular cases: for the unconnected letter
                        <span class="tei-hi tei-">ܢ</span> (<em>nun</em>) because it crosses the baseline, and for the letter
                        <span class="tei-hi tei-">ܚ</span> (<em>Heth</em>) because it contains two places resembling
                plausible points of segmentation. The segmentation algorithm finds usable
                    segmentations for about 70% of the characters. For the pur- poses of
                    constructing a database of segmented characters to which classification trials
                    could be applied, the remaining 30% were cor- rected manually. Curiously, this
                    is the same over-segmentation rate as recently reported using a very
                    sophisticated segmentation algo- rithm on neat italic English handwriting
                    [2].
            
            
                Feature Extraction
                It is useful to represent character image data as a small set of features, partly
                    to reduce the size of the model, and partly to char- acterise the data in ways
                    that are invariant to typically encountered transformations and deformations.
                    Geometric moments invariant to a variety of transformations are widely used in
                    computer vision [13]. We have considered several alternative methods using mo-
                    ment functions. The first method follows the well known approach of using a set
                    of predefined moment functions (e.g. [17]). The sec-
                ond method starts from the generalized moment functions (GMFs) recently
                    introduced by Chang and Grover [4].
                
                    
                
                <span class="tei-hi tei-">Fig. 7. Image of the letter </span>
<span class="tei-hi tei-">r&lt; </span>
<span class="tei-hi tei-">alaph and its size normalised polar map.</span>
                
                    <strong>Pre-defined Moment
                        Functions</strong>
                
                We use the feature set defined as follows. Given an
                    image function with mass
                 the normalised central moments
                    are
                
                    <strong>(6)</strong>
                where
                Following [17], selected moment functions are
                
                    
                
                
            
            
                (7)
                <strong>(8)</strong>
                <strong>(9)</strong>
                <strong>(10)</strong>
                In the experiments described in the next section, these moments are applied to
                    images of several kinds: the whole image of the char- acter, subimages of
                    overlapping and non-overlapping windows, and
                <em>O </em>and all
                    pixels ina polar transformed image (with windows). The polar transforma- tion,
                    similar to the log-polar transform [25] widely used in com- puter vision
                    research, is a conformal mapping from points in image  to points in the polar image We adapt this by defining an
                    ‘origin’ given by the centroid
                Where d is the maximum distance between the mapping is described by

                
                (11)
                <strong>(12)</strong>
                We map onto a
                    polar image of size 64 × 64, giving a representa- tion that is size invariant
                    and for which rotations have been trans- formed to translations (Figure 7).
                    Because the resampling is dense and data is reduced, there is also a certain
                    degree of smoothing of shape distortions. We have used this adaptation of the
                    polar image previously for Arabic handwriting recognition [6], with comparable
                    results.
                
                    
                
                <span class="tei-hi tei-">Fig. 8. Four probing
                        functions used by Chang and Grover (here redrawn from [4]). The leftmost
                        function gives a result equivalent to the mass cen- troid. Each function is
                        used in both the x and y directions.</span>
                
                    
                
                <span class="tei-hi tei-">Fig. 9. Six degree polynomial signature superimposed
                        onto the letter</span>
                
                <span class="tei-hi tei-">r&lt; </span>
<span class="tei-hi tei-">(alaph).</span>
                
                    <strong>More Generalized Moment
                            Functions</strong>
                
                The method of Chang and Grover [4] convolves the object with up to four different
                    predefined ‘probing’ functions (better described as basis functions ) as shown in Fig. 8. The basis
                    functions are one-dimensional, so following [9] are combined using a complex
                    convolution to scan the input image , so that a moment
                 Within a window, convolu- tion of
                    each basis function with the image will result in a distinct generalized
                    centroid (G-centroid) at the convolution’s zero- crossing point. Chang and
                    Grover pair G-centroids into phasers that are used as features. However, we
                    further generalise the GMF method by
                    <em>generating a set of basis functions from each character in a train- ing set </em>instead
                    of using predefined basis functions. Furthermore, because our basis functions
                    are not necessarily symmetric about an origin, the concept of a G-centroid is
                    not justified, so we must use a pair of basis functions,  and 
                    , and use the moment value as a feature. Thus the ‘more
                    generalized’ generalized moment  over
                    the image function  is
                    defined
                
                    
                
                Using our More Generalized GMF method
                    (MGGMF), a model is defined by selecting one sample from each of the 25
                    letter
                
                    
                    shapes. A pair of
                    basis functions  and
                    is generated for each shape in the model, giving a total of 50 basis functions.
                    The func- tion  is
                    found by regressing an -degree polynomial in to the
                        <span class="tei-hi tei-">pixels of image </span>
                    
                    <span class="tei-hi tei-">of character  </span>
                    
                    <span class="tei-hi tei-">interpreted as unit weighted</span>
                
                
                    points in an - scatterplot, as shown
                    in Fig. 9. Function is
                        <span class="tei-hi tei-">similarly found using image 
                        </span>
                    <span class="tei-hi tei-">of character  </span>
                    <span class="tei-hi tei-">. The justifica- </span>tion for
                    this approach lay with the basis polynomial representing a
                
                    ’signature’: a representation of the
                    distribution of the mass of the character as functions of and of . The fitting
                    method mini-
                mizes the squared mean error. Goodness of fit is not really an is- sue, as the
                    resulting curve is intended as simply a discriminable sig- nature of the shape,
                    and not a faithful copy of the shape.
                Given a character, a feature vector of length 25 is
                    found by convolving the character image with each  or
                . We have experimented with polynomials of degree
                , and have also experimented with increasing the resolution of the method by
                    finding signatures for the four quarters of the bounding box of each character.
                    This increases the number of val- ues in the feature vector to 100.
            
            
                
                CLASSIFICATION
                Each letter in the alphabet is associated with one or more classes. Some letters
                    are associated with more than one class because their variants are quite
                    different shapes.
                
                    
                
                <span class="tei-hi tei-">Fig 10. Recognition rate (in percent) obtained with tenfold cross- validation for tabulated values of </span>
                    
                    <span class="tei-hi tei-">for the trial FW2-PW2-FW8 on source TS.</span>
                
                For example, the letter <em>mim</em> is associated with two
                    classes, one for each variant and . We use the ‘one against one’ approach in
                    which for classes, classifiers are constructed and each classifier trains data
                    from two different classes. Each classifier
                        <span class="tei-hi tei-">is a support vector machine (SVM) [7, 26], in which  </span>
                    <span class="tei-hi tei-">-dimensional </span>training vectors
                    are mapped into a sufficiently high dimensional space where linear separation
                    exists. In practice, a separating hy- perplane may not exist, for example in
                    cases of high noise level. Therefore, slack variables can be introduced in order
                    to relax classi-
                fication constraints at a risk of
                    misclassification. By using a kernel function, it is possible to compute the
                    separating hyperplane with- out explicitly mapping into the higher dimensional
                    space. We use the radial basis function (RBF) kernel, defined for patterns
                    and
                
                    : . Given training pat-
                terns with associated labels , the
                    SVM algorithm solves a dual quadratic optimisation problem to find Lagrange ex-
                    pansion coefficients that specify the separating hyperplane [20]:
                
                    <strong>(13)</strong>
                
                    <strong>(14)</strong>
                
                    Those patterns whose are non-zero are
                    called support vectors. This leads to the nonlinear decision function
                    (classifier)
            
            
                (15)
                
                    
                    The classifier tends to be very
                    efficient because most become 0, so the support vectors are the only ones
                    needed. The SVM model we use has two parameters: the kernel ‘spread’ 
                     and the relaxation cost trade-off
                    C. Model selection is performed by enumerating values of the parameter pairs
                         to
                    find the pair that gives the highest cross-validation accuracy for each fold of
                    a 10-fold cross-validation procedure (CV-10). In the CV- proce- dure, the
                    samples are randomly divided into disjoint sets; the
                
                    classifier is trained times, each with a different set held
                    out as a validation set. The estimated performance is the mean of these errors.
                    Figure 10 shows an example of the recognition rate as a function of . Note the ridge along which the
                    highest recog- <span class="tei-hi tei-">nition rates are obtained, suggesting
                        correlation between</span>
                    <span class="tei-hi tei-">and
                    .</span>
                
                    
                
                <span class="tei-hi tei-">Table 1. Results (in
                        percent recognition rate) of 28- and 25-class trials using features of
                        character and word samples. Each column is headed with c/s, indicating the
                        class size c and sample size s per class. Results for the feature set F were
                        considered too unpromising to be included in later trials</span>
                Once a satisfactory is found, the one-against-one method is used for training a
                    -class discrimination problem in which all  classifiers use the same 
                     model.
            
                
                RESULTS
                
                    
                    A
                    database of character images was obtained from both MS and TS sources. Character
                    images were size-normalised to 64 × 64 pixels, and a polar transformed image was
                    also obtained. Several classifica- tion trials were carried out, variously using
                    the image, polar image, and regions within the images. To take into account the
                    context- sensitive variations of character shape in Syriac, the model built
                    during the training mode used 28 classes, depending on how the training set was
                    constructed. For example, the variants of the letter <em>mim</em>,
                     and , were assigned different
                    classes during training. Most variants are distinguished by having a longer
                    baseline (e.g. and ), and these variants were assigned to the same class because
                    the segmentation tended to trim the baseline to an approximately uniform length.
                    This study did not use the vowels, as the sources were not vowelled.
                The classification trials are identified as follows:
                <strong>F</strong>
                    <span class="tei-hi tei-">The five features were obtained from the 64 × 64
                        pixel character image, giving a feature vector of length 5.</span>
                <strong>F/PW2 </strong>
<span class="tei-hi tei-">The character/polar image was divided into 2
                        non-overlapping</span>
                
                <span class="tei-hi tei-">windows of 64 rows and 32
                        columns, and the five features ob-</span>
                
                <span class="tei-hi tei-">tained from each window resulting in a feature vector
                        of length 10.</span>
                <strong>F/PS2W8 </strong>
<span class="tei-hi tei-">The character/polar image was divided into 29 regions
                        of 8 ×</span>
                <span class="tei-hi tei-">8 pixels overlapped by 6 pixels, and the five features
                        obtained from each window resulting in a feature vector of length
                    145.</span>
                <strong>F/PS4W8 </strong>The character/polar image was
                    divided into 15 8 × 8 pixels overlapped by 4 pixels, and the five features
                    obtained from each window resulting in a feature vector of length 75.
                
                    
                
                <span class="tei-hi tei-">Table 2. Results (in
                        percent recognition rate) for composite feature vec- tors. Comparing like
                        trials in Table 4, a composite vector gives the highest recognition rate for
                        the MSC source.</span>
                Table 4 shows the results from the trials carried out. Columns MSC and TSC refer
                    to character samples taken from the manuscript source and the typeset source
                    respectively. Under each source are columns showing results for different sample
                    sizes and class sizes. The first trial used ten samples of each character. A
                    second charac- ter recognition trial was undertaken using a a different
                    association of character shapes to classes. In this trial 25 classes were
                    defined by merging classes having insignificant differences according to the
                    previous trial. A larger sample set used for the third trial was con- structed
                    by duplicating the original sample size.
                We also evaluated the classifier on a word recognition task, for which character
                    segmentation is un-necessary. Column TSW of the table refers to trials carried
                    out on a sample of 990 word images taken from the typeset source TS. The sample
                    consisted of 10 ex- amples of each of the 99 most frequent words in TS. This
                    trial was done only for comparison to other cursive word recognition stud- ies
                    [6], and the recognition rates are comparable. A relatively high word
                    recognition rate is expected because of the uniform quality of the TS sample and
                    the inherent more pronounced distinctions be- tween word shapes relative to
                    character shapes. A word recognition
                trial was not carried out for the MS source because an insufficient sample of
                    each word was available.
                Table 2 illustrates character recognition trials in which long feature vectors
                    were generated by concatenating the vectors ob- tained from previous trials. The
                    trials using concatenated feature vectors, such as FW2-PW2-FW8, show higher
                    recognition rates, possibly because these trials use both the character image
                    and the polar transformed image in the same feature vector, as well as a
                    combination of window sizes. Despite the longer feature vectors for these
                    trials, the peaking phenomenon [11] is not in evidence. With a few exceptions,
                    the recognition rate in Table 4 increases as the number of samples is increased,
                    even if the new samples are simply duplicates. In other trials not shown in the
                    table, the recog- nition rate reached 100% when the number of samples per
                    charac- ter was replicated to 200 (i.e. still only 10 unique samples). This
                    result should be treated with caution because of two sources of bias when sample
                    size is increased. First, because cross-validation con- structs the training set
                    essentially by sampling without replacement, it is more likely that the training
                    set of a larger sample size repre- sents more diversity within the sample, even
                    if the proportion held out is unchanged. Second, if the classifier shows poor
                    generalisa- tion, then a small increase in the diversity of the training set
                    might cause a disproportionally higher recognition rate. The cross- validation
                    procedure is designed to limit bias [12], but some com- bination of these
                    effects may account for an increase in recognition rate in certain trials.
                Table 2 illustrates character recognition trials in which long feature vectors
                    were generated by concatenating the vectors ob- tained from previous trials. The
                    trials using concatenated feature vectors, such as FW2-PW2-FW8, show higher
                    recognition rates, possibly because these trials use both the character image
                    and the polar transformed image in the same feature vector, as well as a
                    combination of window sizes. Despite the longer feature vectors for these
                    trials, the peaking phenomenon [11] is not in evidence. With a few exceptions,
                    the recognition rate in Table 4 increases as the number of samples is increased,
                    even if the new samples are simply duplicates. In other trials not shown in the
                    table, the recog- nition rate reached 100% when the number of samples per
                    charac- ter was replicated to 200 (i.e. still only 10 unique samples). This
                    result should be treated with caution because of two sources of bias when sample
                    size is increased. First, because cross-validation con- structs the training set
                    essentially by sampling without replacement, it is more likely that the training
                    set of a larger sample size repre-
                sents more diversity within the sample, even if the proportion held out is
                    unchanged. Second, if the classifier shows poor generalisa- tion, then a small
                    increase in the diversity of the training set might cause a disproportionally
                    higher recognition rate. The cross- validation procedure is designed to limit
                    bias [12], but some com- bination of these effects may account for an increase
                    in recognition rate in certain trials.
                We then considered a situation where the classifier was trained on the typeset
                    source TS, then the resulting model used for charac- ter recognition on the
                    manuscript source MS (Table 3). The moti- vation for this was to test the
                    performance of the system on a multi-font problem in which no training data were
                    obtained from the test source. Although classification repeatability is
                    confirmed by the high recognition rate when the model is tested with samples
                    taken solely from the training set, a low rate is shown when the model is tested
                    against samples from the manuscript source. A number of factors may account for
                    this. First, the uniformity of the characters in the TS source provide
                    insufficient variation needed for the model to have good generalization
                    behaviour. Second, there are systematic differences in design between the
                    characters in the MS and TS. In general, the MS characters have a thicker stroke
                    width and a lower width/height ratio. Also, individual characters have slight
                    differences in shape. These factors suggest that the sys- tem is unable to treat
                    the TS and MS sources as interchangeable, and that further work will be required
                    to design a system with multi-font capability.
                
                    
                
                <span class="tei-hi tei-">Table 3.
                        Results of recognition trials on MSC using model obtained from characters
                        from TS source.</span>
                <span class="tei-hi tei-">Table 4. Results (in
                        percent recognition rate) of trials using features of character samples from
                        the manuscript source (MS) and typeset source (TS). To provide a basis for
                        comparison, training and recognition was also performed with ten geometric
                        moment features [16], seven Hu features [10], and ten Legendre polynomial
                        features [24]. The MGGMF method used a 6-degree polynomial signature on
                        whole character image; MGGMF 6Q used a 6-degree signature on each of four
                        quarters of the character image. All recognition trials used twenty samples
                        of each character from each source.</span>
                The final experiments concern the use of our More General Generalized Moment
                    function, comparing the performance of well-known non-generalized moment
                    functions. Table 4 shows the results from the trials carried out. Columns MS and
                    TS refer to character samples taken from the manuscript source and the type- set
                    source respectively. Under each source are columns showing results for different
                    moment functions. As one might expect, rec- ognition rate is better for the
                    typeset source than the manuscript source, no doubt owing to the regularity of
                    the TS. The perform- ance of the MGGMF method applied to the whole character
                    image suggests that the signature is insufficiently discriminative. However,
                    when signatures are found for each quarter of the character image, a dramatic
                    improvement is noticed. One explanation is that signa- tures are thereby more
                    closely identified with separate strokes of the character.
                
                
                CONCLUSION
                This paper has described a system for recognising cursive Syriac text
                    (Estrangelo) from ancient scribe-written and early modern typeset sources. Given
                    a document, the system finds words and then segments each word into characters.
                    These preliminary stages require some manual intervention to remove editorial
                    apparatus and to correct certain systematic oversegmentations. Each charac- ter
                    is then recognized using a trainable classifier constructed using a support
                    vector machine. Recognition rates vary from 61% to 100% based on the method used
                    and the source of text. Some trials may exhibit methodological bias, and these
                    results should be treated with caution. Excluding these, the highest recognition
                    rate on scribe-written manuscript samples, 94%, has been obtained using a the
                    MGGMF 6Q feature vector of length 24. The support vector classifier has been
                    tested using a 10-fold cross-validation proce- dure, which has provided a high
                    accuracy of classification. Because the number of support vectors is minimised
                    during the training stage, recognition is more efficient than the Hidden Markov Model classifier used
                    by our previous work on similar sized data sets [6].
                It is important to stress that the system described here is at a most preliminary
                    stage of development. It has been a useful labora- tory research tool, but is
                    not ready to be used on arbitrary docu- ments, nor may it be conveniently used
                    by people other than the developers. The entire system is essentially ‘knowledge
                    free’ in the sense that no knowledge of characteristic Syriac letter shapes or
                    statistics has been used in the system design. Future work should concentrate on
                    improving the segmentation algorithm, and extend- ing the system to deal with
                    articulation marks and punctuation. Steps can also be taken to improve the
                    robustness of the system on documents that have been badly reproduced. Both
                    these areas of work might benefit from building in knowledge of Syriac from the
                    letter-formation level to the morphological and lexical levels [18]. At the
                    letter-formation level, matching flexible templates might be a productive
                    approach instead of geometric moment functions, and a start in this direction
                    has been recently reported for Arabic [1]. However, that method treats each
                    character as an isolated shape, thus presuming some type of segmentation will
                    have been applied. Finally, because Syriac is written in several forms, it would
                    also be useful to investigate whether the system could be trained and tested
                    equally well on the East Syriac and Serto (West Syriac) forms, as well as
                    font-specific variants within the main script systems.
                
                
                ACKNOWLEDGMENTS
                We thank Chih-Jen Lin of National Taiwan University for assistance in using his
                    LIBSVM library. P.P.J. Fernando is sup- ported by a studentship from the
                    Bishop’s Conference of Sri Lanka. We are grateful to Sebastian Brock of the
                    University of Ox- ford, Rifaat Ebied of the University of Sydney and George
                    Kiraz of Beth Mardutho: The Syriac Institute, for valuable advice, source
                    manuscripts and encouragement. This paper is an expanded ver- sion of [5].
                
                
                BIBLIOGRAPHY
                <span class="tei-hi tei-">Al-Shaher A. and E.R. Hancock. Arabic character
                        recognition with shape mixtures. In Proc. 13th British Machine Vision
                        Conference, Cardiff, Wales, September 2002.</span>
                <span class="tei-hi tei-">Arica N. and F.T. Yarman-Vural. Optical character recognition for cursive handwriting. </span>
<em>IEEE Transactions on Pattern Analysis
                        and Machine In- telligence</em>
<span class="tei-hi tei-">,
                        24(6):801–813, 2002.</span>
                <span class="tei-hi tei-">Crawford Burkitt, F. </span>
<em>Evangelion Da-Mepharreshe. </em>
<span class="tei-hi tei-">Cambridge University Press, 1904.</span>
                
                <span class="tei-hi tei-">Chang, S. and C.P. Grover. Generalized moment functions and conformal transforms. </span>
<em>Proceedings of SPIE</em>
<span class="tei-hi tei-">, 4790:102–113, 2002.</span>
                <span class="tei-hi tei-">Clocksin, W.F. and P.P.J. Fernando. Towards automatic
                        recognition of Syriac handwriting. In Proceedings of the IEEE International
                        Conference on Image Analysis and Processing, Mantova, Italy, September
                        2003.</span>
                <span class="tei-hi tei-">Clocksin, W.F. and M. Khorsheed. Word recognition in
                        Arabic handwrit- ing. In</span>
                    <em>Proc. 8th Int. Conf. on Artificial
                        Intelligence Applications,</em>
                    <span class="tei-hi tei-">pages 271–279, Cairo, February 2000.</span>
                
                <span class="tei-hi tei-">Cortes, C. and V. Vapnik. Support-vector network. </span>
<em>Machine Learning,</em>
                <span class="tei-hi tei-">20:273–297, 1995.</span>
                <span class="tei-hi tei-">Ebied, R.Y ., A. Van Roey, and L.R. Wickham.</span>
                    <em>Petri Callinicensis Patriarchae
                        Antiocheni: Tractatus contra
                        Damianum</em>
<span class="tei-hi tei-">, volume 32 of </span>
<em>Corpus Chris- tianorum, Series
                        Graeca</em>
<span class="tei-hi tei-">. University of Louvain Press,
                        Louvain, 1996.</span>
                <span class="tei-hi tei-">Freeman, M.O. and B.E.A. Saleh. “Optical location of centroids of non- overlapping objects.” </span>
                    <em>Applied Optics, </em>
<span class="tei-hi tei-">26(14):2752–2759, 1987.</span>
                <span class="tei-hi tei-">Hu, M.K. “Visual pattern recognition by moment
                        invariants.”</span>
                    <em>IRE Trans.</em>
                <em>Information Theory</em>
<span class="tei-hi tei-">, IT-8:179–187, 1962.</span>
                <span class="tei-hi tei-">Jain, A.K. and B. Chandrasekaran. “Dimension and
                        sample size considera- tions</span>
                    <span class="tei-hi tei-">in</span>
                    <span class="tei-hi tei-">pattern</span>
                    <span class="tei-hi tei-">recognition</span>
                    <span class="tei-hi tei-">practice.”</span>
                    <span class="tei-hi tei-">In</span>
                    <span class="tei-hi tei-">P.R.</span>
                    <span class="tei-hi tei-">Krishnaiah</span>
                    <span class="tei-hi tei-">and</span>
                <span class="tei-hi tei-">L.N. Kanal, editors,</span>
                    <em>Handbook of Statistics</em>
<span class="tei-hi tei-">, pages 835–855. North-</span>
                
                <span class="tei-hi tei-">Holland, Amsterdam, 1982.</span>
                <span class="tei-hi tei-">Jain, A.K., R.P.W. Duin, and J. Mao.
                        “Statistical pattern recognition: A review.”</span>
                    <em>IEEE Transactions on Pattern Analysis
                        and Machine Intelli- gence</em>
<span class="tei-hi tei-">, 22(1):
                        4–37, 2000.</span>
                
                <span class="tei-hi tei-">Jain,R., R. Kasturi, and B.G. Schunck. </span>
<em>Machine Vision. </em>
<span class="tei-hi tei-">McGraw Hill, New</span>
                <span class="tei-hi tei-">York, 1995.</span>
                <span class="tei-hi tei-">Khorsheed, M. and W.F. Clocksin. “Structural
                        features of cursive Arabic script.” In</span>
                    <em>Proc. 10th British Machine Vision
                        Conference</em>
<span class="tei-hi tei-">, pages 422– 431,
                        Nottingham, England, 1999.</span>
                
                <span class="tei-hi tei-">Khorsheed, M. “Off-line Arabic character
                        recognition – a review.”</span>
                    <em>Pattern Analysis and
                        Applications</em>
<span class="tei-hi tei-">, 5(1):31–45,
                    2002.</span>
                <span class="tei-hi tei-">Kim, J.H., K.K. Kim, and C.Y. Suen. “An HMM-MLP
                        hybrid model for cursive</span>
                    <span class="tei-hi tei-">script</span>
                    <span class="tei-hi tei-">recognition.”</span>
                    <em>Pattern</em>
                    <em>Analysis</em>
                    <em>and</em>
                    <em>Applications,</em>
                    <span class="tei-hi tei-">3(4):314–324, 2000.</span>
                
                <span class="tei-hi tei-">Kiraz, G.A. “Syriac morphology: From a linguistic model to a computa- tional implementation.” In R. Lavenant, (ed.), </span>
<em>VII Symposium Syriacum, </em>
<span class="tei-hi tei-">Rome, 1996. Orientalia Christiana Analecta.</span>
                <span class="tei-hi tei-">Manmatha, R. Chengfeng Han, and E.M. Riseman. Word
                        spotting: A new approach to indexing handwriting. In</span>
                    <em>Proc. of the IEEE Conf. on Computer
                        Vision and Pattern Recognition,</em>
                    <span class="tei-hi tei-">pages 631–637, San Fran- cisco, June 1996.</span>
                
                <span class="tei-hi tei-">Müller, Klaus-Robert, Sebastian Mika, Gunnar Rätsch,
                        Koji Tsuda, and Bernhard Schölkopf. “An introduction to kernel-based
                        learning</span>
                <span class="tei-hi tei-">algorithms.”</span>
                    <em>IEEE Transactions on Neural
                        Networks,</em>
                    <span class="tei-hi tei-">12(2):181– 202, 2001.</span>
                <span class="tei-hi tei-">Plamondon, R. and S.N. Srihari. “On-line and off-line handwriting recog- nition: A comprehensive review.” </span>
                    <em>IEEE Transactions on Pattern Analysis
                        and Machine Intelligence</em>
<span class="tei-hi tei-">, 22(1):63–84,
                        2000.</span>
                
                <span class="tei-hi tei-">Seni, G. and E. Cohen. “External word segmentation of off-line hand- written text lines.” </span>
<em>Pattern Recognition, </em>
<span class="tei-hi tei-">27(1):41–52, 1994.</span>
                <span class="tei-hi tei-">Steinherz, Tal, Ehud Rivlin, and Nathan Intrator.
                        “Offline cursive script word</span>
                    <span class="tei-hi tei-">recognition</span>
                    <span class="tei-hi tei-">–</span>
                    <span class="tei-hi tei-">A</span>
                    <span class="tei-hi tei-">survey.”</span>
                    <em>International</em>
                    <em>Journal</em>
                    <em>on</em>
                    <em>Document Analysis and Recognition,</em>
                    <span class="tei-hi tei-">2(2/3):90–110, 1999.</span>
                
                <span class="tei-hi tei-">Teague, M.R. “Image analysis via the general
                        theory of moments.”</span>
                    <em>Journal of the Optical Society of America, </em>
<span class="tei-hi tei-">70(8):375–397, 1980.</span>
                <span class="tei-hi tei-">Tistarelli, M. and G. Sandini. On the advantage of
                        polar and log-polar mapping for direct estimation of time-to-impact from
                        optical flow.</span>
                    <em>IEEE Transactions on Pattern Analysis and Machine Intelligence, </em>
<span class="tei-hi tei-">15(4):401–410, 1993.</span>
                
                <span class="tei-hi tei-">Vapnik, V. </span>
<em>Statistical Learning Theory. </em>
<span class="tei-hi tei-">Wiley, New York, 1998.</span>