比利时专利BE1024836B1 ABREGE METHOD OF EVALUATION OF THE QUALITY OF AN IMAGE OF A DOCUMENT

专利PDF首页>>比利时专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
A method comprising processing an image (1) into a text image (7) with a number of text blobs, classifying the text blobs according to a calculation indicating whether they belong to a front layer. plane or background in the OCR processing, and the generation of an image quality value (1) according to the classified text blobs. By this generation, images in the image (1), not relevant for the OCR, are not taken into account for the evaluation of the quality of the image (1). The volume of data to be processed is reduced, which allows the execution of the process in real time. The quality evaluation criterion based on the division of blobs into foreground and background layers (prior knowledge of the OCR system) gives a good indication of the accuracy of OCR.
公开号:BE1024836B1
申请号:E2016/5960
申请日:2016-12-22
公开日:2018-07-23
发明作者:Jianglin Ma；Michel Dauw
申请人:I.R.I.S.；
IPC主号:

专利说明:

(30) Priority data:
(73) Holder (s):
IRIS.
1435, MONT-SAINT-GUIBERT Belgium (72) Inventor (s):
My Jianglin
1348 LOUVAIN-LA-NEUVE Belgium
DAUW Michel
1831 MACHELEN Belgium (54) SHORT PROCESS FOR EVALUATING THE QUALITY OF AN IMAGE OF A DOCUMENT (57) Method comprising processing an image (1) into a text image (7) with a number of blobs of text, classifying text blobs based on a calculation of whether they belong to a foreground or background layer in OCR processing, and generating a quality value of l 'image (1) from classified text blobs. By this generation, images in the image (1), not relevant for the OCR, are not taken into account for the evaluation of the quality of the image (1). The volume of data to be processed is therefore reduced, which allows the execution of the process in real time. The quality evaluation criterion based on the division of blobs into foreground and background layers (from previous knowledge of the OCR system) gives a good indication of the accuracy of OCR.
Quality assessment method 100
BELGIAN INVENTION PATENT
FPS Economy, SMEs, Middle Classes & Energy
Publication number: 1024836 Deposit number: BE2016 / 5960
Intellectual Property Office
International Classification: G06K 9/00 G06K 9/03 Date of issue: 07/23/2018
The Minister of the Economy,
Having regard to the Paris Convention of March 20, 1883 for the Protection of Industrial Property;
Considering the law of March 28, 1984 on patents for invention, article 22, for patent applications introduced before September 22, 2014;
Given Title 1 “Patents for invention” of Book XI of the Code of Economic Law, article XI.24, for patent applications introduced from September 22, 2014;
Having regard to the Royal Decree of 2 December 1986 relating to the request, the issue and the maintenance in force of invention patents, article 28;
Considering the patent application received by the Intellectual Property Office on December 22, 2016.
Whereas for patent applications falling within the scope of Title 1, Book XI of the Code of Economic Law (hereinafter CDE), in accordance with article XI. 19, §4, paragraph 2, of the CDE, if the patent application has been the subject of a search report mentioning a lack of unity of invention within the meaning of the §ler of article XI.19 cited above and in the event that the applicant does not limit or file a divisional application in accordance with the results of the search report, the granted patent will be limited to the claims for which the search report has been drawn up.
Stopped :
First article. - It is issued to
I.R.I.S., Rue du Bosquet 10, Parc-Scientifique de Louvain-la-Neuve, 1435 MONT-SAINT-GUIBERT Belgium;
represented by
GEVERS PATENTS, Holidaystraat 5, 1831, DIEGEM;
a Belgian invention patent with a duration of 20 years, subject to the payment of the annual fees referred to in article XI.48, §1 of the Code of Economic Law, for: ABREGE PROCEDE D'EVALUATION DE LA
QUALITY OF AN IMAGE OF A DOCUMENT.
INVENTOR (S):
Ma Jianglin, Voie du roman pays 29 box 102, 1348, LOUVAIN-LA-NEUVE;
DAUW Michel, Calenbergstraat 26, 1831, MACHELEN;
PRIORITY (S):
DIVISION:
divided from the basic application: filing date of the basic application:
Article 2. - This patent is granted without prior examination of the patentability of the invention, without guarantee of the merit of the invention or of the accuracy of the description thereof and at the risk and peril of the applicant (s) ( s).
Brussels, 07/23/2018, By special delegation:
B E2016 / 5960
Method for evaluating the quality of a document image
The present invention relates to a computer-implemented method for evaluating the quality of an image of a composite document comprising images and / or text. In particular, the present invention relates to a quality evaluation method for indicating whether the image can be used for OCR scanning.
Methods for evaluating the quality of an image of a composite document are generally used to predict the accuracy of optical character recognition (OCR). Since the availability of digitized document images, great research efforts have been made in the area of document image quality assessment. However, recently, as a result of the increasing popularity of mobile devices, such as smartphones and compact digital cameras, interest in quality assessment methods for document images from these devices has increased.
For example, more and more employees on business trips are taking photos of important documents with the camera of their smartphone or tablet and sending them to their company for specific treatment. In this scenario, it is essential that the photos sent by employees have sufficient quality to allow further processing such as OCR, extraction and
B E2016 / 5960 classification of document information, manual examination, etc. Therefore, a method of accurately evaluating the quality of document images is important and must be performed on the mobile device.
The known methods generally comprise two steps. First, features representing degradation of document images are extracted. Second, the extracted features are related to the accuracy of OCR. The first step can be carried out using methods based on image clarity, methods based on characters, hybrid methods or methods based on characteristic learning. The second step can be carried out using learning-based methods or empirical methods.
J. Kumar, F. Chen, and D. Doermann, "Sharpness Estimation for Document and Scene Images", Proc. ICPR, pp. 3292-3295, 2012 describe a sharpening-based method that calculates the change in gray scale values, i.e. disparity, which is observed at an edge of a character in a document image. Although this method gives good results with quick calculations, several parameters must be adjusted to obtain the best results.
Other methods based on sharpness are described more generally with regard to images, but have also been applied to images of documents. These include, for example, R. Ferzli and L. Karam, "A no-reference objective image sharpness metric based on the notion of just noticeable blur
BE2016 / 5960 (jnb) ”, IEEE Tran, on Image Processing, 18, pp. 717728, 2009; X. Zhu and P. Milanfar, "Automatic parameter selection for denoising algorithms using a no-reference measure of image content", IEEE Transactions on Image Processing, 19 (12), pp. 31163132, 2010; N. Narvekar and L. Karam, "A no-reference image blur metric based on the cumulative probability of blur detection (cpbd)", IEEE Tran. on Image Processing, 20 (9), pp. 2678-2683, 2011; and R. Hassen, Z. Wang, and M. Salama, "Image sharpness assessment based on local phase coherence", Image Processing, IEEE Transactions on 22 (7), pp. 2798-2810, 2013. A limitation of these methods is that the various criteria used for quality evaluation are very slow to calculate. In addition, these methods do not take into account the characteristics of document images and therefore may not be applicable to document images.
L.R. Blando, J. Kanai, T.A. Nartker, and J. Gonzalez, "Prediction of OCR accuracy", tech, rep., 1995; M. Cannon, J. Hochberg, and P. Kelly, "Quality assessment and restoration of typewritten document images", International Journal on Document Analysis and Recognition 2 (2-3), pp. 80-89, 1999; and A. Souza,
M. Cheriet, S. Naoi, and C.Y. Suen, "Automatic filter selection using image
Proceedings of ICDAR 1, pp.
character-based methods which have been specifically designed for scanned document images, but which can also be applied to camera document images. These quality assessment procedures ”, 508-512, 2003 describe
B E2016 / 5.960 are based on the calculation of measurements representing characteristics in which bad OCR is expected, such as thick typing which tend to touch many characters; and / or broken characters which are usually fragmented into small pieces. However, these methods work on a binarized image assuming that the acquired color or gray scale image has been correctly binarized, which is not necessarily always the case in real situations.
N. Nayef and J. Ogier, "Metric-based no-reference quality assessment of heterogeneous document images", Proc. SPIE 9402, Document Recognition and Retrieval XXII, 94020L, February 8, 2015; and X. Peng, H. Cao,
K. Subramanian, R. Prasad, and P. Natarajan. “Automated image quality assessment for cameracaptured OCR. »Proc. ICIP, pp. 2621-2624, 2011 describe hybrid methods combining methods based on image clarity and methods based on characters. First, the sharpness of the image is calculated, then quality metrics based on characters are estimated. These two measurements are then combined to represent image quality. While these hybrid methods are well suited to predicting the OCR accuracy of images from camera documents, they also suffer from the same disadvantages as methods based on image clarity and methods based on characters.
P. Ye and D. Doermann, “Learning features for predicting OCR accuracy”, in 21st International Conference on Pattern Recognition (ICPR), pp. 3204B E2016 / 5960 characteristic characteristic deterioration,
3207, 2012; and L. Kang, P. Ye, Y. Li, and D. Doermann, "A deep learning approach to document image quality assessment", in Image Processing (ICIP), 2014, IEEE
International Conference on, pp. -2570-2574 describe processes based on the learning of characteristics. Although these methods are very promising, the configuration takes a long time because the systems must undergo training by processing many images.
After the extraction of the necessary characteristics, the image quality evaluation measurement must be linked to the extracted characteristics. This can be done using empirical methods that calculate the weighted sum of the extracted features, and by proving that this measurement is correlated with OCR precision. In particular, the weighting factor for each extract, that is to say one that expresses a level of can be estimated experimentally using the method of least squares. A disadvantage of these methods is that the weighting factors are adjustable parameters which must be estimated using experiments. In these methods, the configuration takes a long time.
Alternatively, the learning evaluation quality prediction methods do not assume that the quality evaluation prediction is in linear correlation with the extracted normalized characteristics. Instead of
BE2016 / 5960 this, a more complicated mapping function is developed to link the extracted features with multiple dimensions with the quality evaluation measure or with the precision of OCR. As with methods based on characteristic learning, the configuration of systems using these methods takes a long time because the systems must undergo training by processing numerous images.
ün object of the present invention is to provide a precise quality evaluation method for evaluating the quality of an image of a composite document, which can be executed in real time.
This object is produced according to the invention with a quality evaluation method implemented by computer intended for the evaluation of the quality of an image for OCR processing, the method comprising the steps of: a) processing the image into a text image comprising a number of text blobs; b) the classification of text blobs in the text image into first and second types of text blobs on the basis of a calculation to determine if they belong to a foreground layer in the processing of OCR or a background layer in OCR processing; and c) generating an image quality value based on the classified text blobs.
By generating the quality value on the basis of classified text blobs, images in the image, which are not relevant for OCR, are not taken into account in the evaluation of the
BE2016 / 5960 image quality. By generating the quality value based on the text images in the image, what is not relevant for OCR is not taken into account for the evaluation of the image quality. In addition, the quality evaluation criterion is based on prior knowledge of the OCR system, since it is based on the division of blobs into a foreground layer and a background layer, which gives a good indication of OCR accuracy.
In one embodiment, step b) comprises: bl) calculating a text compression cost and an image compression cost for each text blob; b2) calculating a ratio of the cost of text compression and the cost of image compression for each text blob; b3) comparing said ratio to a predetermined threshold to determine if said ratio is less than the predetermined threshold; b4) classifying said text blob as a second type blob if the ratio is below the predetermined threshold; and b5) classifying said text blob as a blob of the first type if the ratio is not less than the predetermined threshold.
In a preferred embodiment, step bl) comprises: blla) calculating a background compression cost for each blob of text; bl2a) calculating a foreground compression cost for each text blob; and bl3a) calculating a mask compression cost for each text blob. Preferably, step blla) comprises the calculation of a sum of the squares of gray scale differences between a target pixel in said text blob and the
BE2016 / 5960 eight pixels touching it, step bl2a) comprises the calculation of a sum of the squares of gray scale differences between a gray scale value of a target pixel in said text blob and a value average gray scale of pixels in the text blob, and step bl3a) comprises the calculation of a perimeter of said text blob.
In an alternative preferred embodiment, step bl) comprises: bllb) calculating a surrounding compression cost for each blob of text; bl2b) calculating a foreground compression cost for each text blob; and bl3b) calculating a mask compression cost for each text blob. Preferably, step bllb) comprises the calculation of a sum of the squares of gray scale differences between surrounding pixels and an average color of the surrounding pixels multiplied by a preset factor, the surrounding pixels being rear pixels -plan near an edge of the text blob, step bl2b) comprises the calculation of a sum of the squares of gray scale differences between a gray scale value of a target pixel in said blob of text and an average gray scale value of pixels in the text blob, and step bl3b) comprises the calculation of a perimeter of said text blob.
In one embodiment, step c) comprises the calculation of a ratio of the number of blobs of the first type over the total number of text blobs.
In one embodiment, the image is a color image and step a) includes processing the image to form a gray scale image.
In this embodiment, a quality value can also be derived for color images.
In one embodiment, step a) comprises a1) binarizing the image to form
B E2016 / 5960
In a way in addition : al) the a picture binary text from The picture text. In a way
In a preferred embodiment, step a2) comprises: a21) identifying blobs in the binary image; and a22) classifying each blob as one of a picture element and a text element.
In another preferred embodiment, step a22) comprises: the classification of each blob as an image element if the surface of said blob is too large or too small compared to predefined thresholds.
In another preferred embodiment, step a22) further comprises: calculating a stroke of each blob; and classifying each blob as a picture element if the stroke of said blob is too large in relation to a predefined threshold.
In another preferred embodiment, step a22) further comprises: calculating a width and a height of each blob; and classifying each blob as a picture element if at least one of the width and height of said blob is too large compared to a predefined threshold.
BE2016 / 5960
In an advantageous embodiment, the image (1) is divided into at least two image pads;
and a quality value of each image pad is generated according to the method described above.
In this advantageous embodiment, it is possible to identify pellets of higher quality and of lower quality in the image.
Another object of the present invention is to provide a quality evaluation method for evaluating the quality of an image of a composite document which can be executed in real time.
This object is produced according to the invention with a quality evaluation method implemented by computer intended for the evaluation of the quality of a composite document image, the method comprising the steps of: i) separating the document image composed of a text image and an image image; and ii) generating a quality value of the composite document image by evaluating the quality of the text image.
By generating the quality value based on the text images in the image, what is not relevant for OCR is not taken into account for the evaluation of the image quality. The amount of data to be processed is therefore reduced, which allows the execution of a process in real time.
The invention will be described in detail below with reference to the accompanying drawings.
Figure la represents a high quality image taken by a camera of a composite document.
BE2016 / 5960
FIG. 1b represents the mask image identified by the intelligent high quality compression (Intelligent High Quality Compression iHQC) described in EP-A-2 800 028 (or ÜS 2008/0 273 807 A1) of the high quality image in figure la.
FIG. 2a represents a low quality image taken by a camera of the same composite document as in FIG.
Figures 2b and 2c respectively represent a mask image identified by iHQC and a background image identified by iHQC of the low quality image in Figure 2a.
Figure 3 shows a general flow diagram of an image quality evaluation method according to the present invention.
FIGS. 4a to 4c respectively represent a gray scale image of a composite document camera, a binary image of this document using the Sauvola process and a binary image of this document using the multi-scale Sauvola method.
FIGS. 5a to 5c respectively represent a color image of a color camera document, a binary image of this document using the Sauvola method, and a binary image of this document using the multi-scale Sauvola method.
FIG. 6 represents a detailed flow diagram of the image and text separation step used in the flow diagram of FIG. 3.
BE2016 / 5960
Figure 7 shows a detailed flow diagram of the text blob classification step used in the flow diagram of Figure 3.
FIG. 8 represents the result of the application of the method illustrated in FIG. 3 independently to multiple pellets of an image.
The present invention will be explained below by describing particular embodiments with reference to the accompanying drawings, but the invention is only limited by the appended claims. The drawings described are purely schematic and in no way limitative. In the drawings, the size of some elements may be exaggerated and not to scale for illustrative purposes. The dimensions and the relative dimensions do not necessarily correspond to actual reductions for the practice of the invention.
Furthermore, the terms "first", "second", "third" and so on in the description and in the appended claims are used to distinguish between similar elements and not necessarily to describe a sequential or chronological order. The terms are interchangeable under appropriate circumstances and the embodiments of the invention may operate in sequences other than those described or illustrated herein.
As used herein, the term "color image" is intended to mean a raster image in color, that is, a mapping
BE2016 / 5960 of pixels in which each pixel represents a color value.
As used herein, the term "gray scale image" is intended to mean a mapping of pixels in which each pixel represents an intensity value.
As used herein, the term "binary image" is intended to mean a two-tone image, for example a black and white image, that is, a mapping of pixels in which each pixel represents a binary value (all or nothing, 1 or 0, black or white).
As used herein, the term "binarization" is intended to refer to an operation that transforms a color image or a gray scale image into a binary image.
As used herein, the term "text image" is intended to mean a binary image comprising only text elements.
A known compression method used in connection with OCR for scanned document images is the intelligent high quality compression (iHQC) method described in EP-A-2 800 028 (or US 2008/0). 273 807 Al). An essential component of iHQC is the implementation of a Mixed Raster Content (MRC) model of ITU-TT.44. According to this model, the document image is divided into layers: a bit mask image layer, a foreground image layer and a background image layer.
BE2016 / 5960
As used herein, the term "mask image" is intended to refer to a binary image generated in the iHQC process from foreground objects, the foreground objects generally being text elements in black and white, ün pixel ON in the bit mask layer indicates that, when decompressing, the color must be taken from the foreground layer, ün pixel OFF in the binary mask layer indicates that, when decompression, the color should be taken from the background layer.
As used herein, the term "foreground image" is intended to refer to the color image generated in the iHQC process which represents the color information of foreground objects.
As used herein, the term "background image" is intended to refer to the color image generated in the iHQC process from the background objects, the background objects. usually being colors of background elements or images.
As used herein, the term "blob" is intended to refer to a region of linked pixels, that is, pixels with the same value (eg 0 or 1), in a binary image.
In the iHQC process, the foreground image and the background image are compressed using JPEG2000, while the mask image is compressed using JBIG2. OCR can then be performed
BE2016 / 5960 on the mask image which must contain all the text elements.
However, when the iHQC method is applied to camera document images, the separation results depend on the image quality, as illustrated in Figures la, lb and 2a to 2c. Figure la represents a high quality image taken from the DIQA database available in the public domain (J. Kumar, P. Ye, and D. Doermann, “DIQA: Document image quality assesment datasets”, http: // lampsrv02 .umiacs.umd.edu / projdb / project, php id =
73) and FIG. 1b represents the mask image identified by the iHQC method. It is obvious that the mask image contains all the text elements, that is to say all the alphanumeric characters, and that it is usable for OCR. FIG. 2a represents a low quality image, that is to say a blurred image, of the same document which is used for the image in FIG. 1a (the image is again taken in the DIQA database) . FIGS. 2b and 2c respectively represent the mask image and the background image identified by the iHQC method. Obviously, since the image is blurred, many text elements have been classified as picture elements and have been placed in the background image. The mask image therefore does not contain all the text elements and cannot be used for OCR.
Based on the example above, if the majority of the text elements are placed in the mask image and only a few elements of
BE2016 / 5960 text are placed in the background image, the image has good quality. Preferably, for a perfect quality image, no text element is placed in the background image.
Based on this general rule, a quality evaluation method 100 is derived, as shown in Fig. 3, which can be used for the evaluation of the quality of color or scale images. gray inputs to predict the accuracy of processing the input image using OCR.
In step 110, an input color image 1 is preprocessed. This preprocessing can include noise reduction, image enhancement, image deconvolution, color image transformation, etc.
In a preferred embodiment, when the input image 1 is a color image, it is transformed into a gray scale image 3. Alternatively, if the input image 1 is already a gray scale image , this transformation can be passed.
In step 120, the gray scale image 3 is binarized. In the art, several binarization methods are known, such as the Sauvola method proposed by J. Sauvola and M. Pietikainen, "Adaptive document image binarization", Pattern Recogn. 33, pp. 225-236, 2000; or the multi-scale Sauvola process proposed by G. Lazzara and T. Géraud, "Efficient multiscale Sauvola's binarization", Springer-Verlag Berlin Heidelberg, 2013; or the process of
BE2016 / 5960 binarization using an adaptive local algorithm described in EP-A-2 800 028. The result of step 120 is a binary image 5.
In step 130, the binary image 5 is separated into a text image 7 and an image image. This separation is performed on the basis of a number of rules for filtering text blobs on a binary image and its related components, as will be described below. In an alternative embodiment, the text-image separation can also be carried out without using filtering rules, for example by using a morphological method with multiple resolutions as will be described below.
In step 140, blobs are identified in the text image 7 and each blob is classified into a first type and a. second type of blob, in this case blobs prone to compression of text and blobs prone to compression of images. This classification is based on the cost of compression if the blob is compressed as a text element and the cost of compression if the blob is compressed as a picture element.
In step 150, a quality evaluation value is calculated based on the classification of blobs. As mentioned above, as a general rule in the iHQC process, if the majority of blobs in a text image are placed in the mask image, i.e. if they are classified as only text elements, i.e. blobs prone to text compression, and only
BE2016 / 5960 some of the blobs are placed in the background image, i.e. they are classified as picture elements, i.e. blobs prone to compression image, then the image has good quality. The ratio of the number of blobs prone to text compression to the total number of blobs can be considered as a criterion of document image quality.
The quality evaluation method 100 has the advantage that, due to the deletion of images, the quality evaluation is only performed on a text image. Images in the document image that are not relevant to OCR are not taken into account in the assessment of the quality of the document image. In addition, the quality evaluation criterion is based on prior knowledge of the OCR system, since it is based on the cost of compression. It therefore provides a good indication of the accuracy of OCR, as will be explained below.
In particular, the quality evaluation method 100 mainly focuses on one aspect of image deterioration, namely the blurring effect. Other deterioration factors, such as geometric distortion, high noise and low contrast, are not directly taken into account in the quality evaluation value delivered by this process.
Figures 4a to 4c and 5a to 5c illustrate the binarization of step 120 using the Sauvola method and the multi-scale Sauvola method
BE2016 / 5960 respectively on a camera image of a composite document and on a camera image of a color document.
The Sauvola process is a known process of image binarization based on the idea that each pixel in the image must have its own threshold of binarization, which is determined by statistics and local configurations. The threshold formula is
1 + k fs (x, y)
(1) where m (x, y) is the average of the intensity of the pixels inside a window w * w, s (x, y) is the standard deviation of the intensity of the pixels at inside the window w * w, R is the maximum standard deviation, and k is an arbitrary constant in a range of 0.01 to 0.5, this range being established on the basis of empirical results and already being known to those skilled in the art.
To obtain good binarization results, the parameters w and k must be carefully adjusted, üa general rule for adjusting these two parameters is: w must be set larger than the character stroke and k must be set smaller than text regions and relatively large in regions without text. However, since the stroke of the text characters and the location of the text characters are unknown, adjusting these parameters is not easy. For example, when w is too small compared to the stroke of the text characters, the Sauvola process generates a binary image 5 with hollow text characters, as is
BE2016 / 5960 shown in Figure 4b. In addition, when k is set too small, the binary image 5 delivered includes noise in its background. When k is set too large, the binary image 5 delivered has a clear background, and therefore certain text elements can no longer be recovered.
In a preferred embodiment, binarization in step 120 is performed on the basis of the multi-scale Sauvola method, which has been
made for overcome the problem of setting of w. The idea of base of this process East than, instead of to use a size of window fixed for all the
pixels in the image, each pixel in the image has its own adaptive window. If the pixel is part of a large text character, a large window is used; otherwise a smaller window is used. The multi-scale Sauvola process thus makes it possible to completely recover foreground objects without generating hollow characters, as is clearly shown in FIGS. 4c and 5c.
Another advantage of using the multi-scale Sauvola method is the generation of large blobs for picture elements, which facilitates the separation of text and picture in step 130, as will be described below. after. This advantage becomes clear when comparing Figures 5b and 5c arising respectively from the Sauvola process and the Sauvola process with multiple scales. It clearly appears that, in the multi-scale Sauvola binarization process, image blobs are merged into large blobs (cf. FIG. 5c), and that,
BE2016 / 5960 in the Sauvola binarization process, image blobs are more fragmented (see Figure 5b). Since large image blobs are easier to separate from text blobs, the multi-scale Sauvola process has an obvious advantage.
Those skilled in the art can appreciate that other binarization methods can also be used in place of or in addition to the Sauvola method and the multi-scale Sauvola method.
FIG. 6 represents a detailed flow diagram of the separation of text and image in step 130.
In step 131, the binary image 5 entered is analyzed with a method of analysis of linked components and a plurality of blobs 9 are identified. As mentioned above, each blob 9 comprises a region of linked pixels, that is to say pixels having the same value, for example 0 or 1.
In step 132, the number of pixels of each blob 9, that is to say the surface value of a particular blob, is calculated and compared with predefined thresholds. If the number of pixels in a blob 9 is too large or too small, the blob 9 is classified as a picture element in step 138. One of the predefined thresholds is based on empirical data that binarization in step 120 (cf. FIG. 3) generates large blobs for images. As described above, the multi-scale Sauvola process is well suited for generating these large image blobs. Another of the predefined thresholds is based on empirical data that blobs with a small amount of pixels belong
BE2016 / 5960 usually noise. Noise is often generated by fragmented picture elements.
In step 133, the blobs 9 whose lines are categorized as being thick, that is to say bold blobs, are classified as picture elements in step 138. A predefined value for the categorizing a stroke as thick is based on empirical evidence that the stroke of normal text is generally thin, and that thick strokes often appear in images. In a preferred embodiment, the predetermined value to which the lines are compared can be 16 pixels. However, it is possible that large text characters, such as titles, may be considered bold blobs. Therefore, in a preferred embodiment, a threshold is established to determine whether the blob is large enough to be considered as a picture element.
In step 134, blobs 9 whose width or height is too large are classified as picture elements in step 138. Predefined values for categorizing a line as being too wide and / or height are based on empirical evidence that, when the width or height of a particular blob exceeds a certain threshold, there is a high probability that the particular blob belongs to an image. In a preferred embodiment, the predefined value to which the widths and / or heights are compared can be 128 pixels.
BE2016 / 5960
Blobs 9 that were not classified as picture elements are classified as text elements in step 137. These blobs can now be used to generate the text image in step 139.
Those skilled in the art can appreciate that the order of the steps can also be changed. For example, steps 132 to 134 can be applied in all the possible sequences. In addition, other and / or additional text blob filtering rules can also be used.
Those skilled in the art can appreciate that other methods can also be applied for the separation of text and image elements in a binary image 5. For example, in one embodiment of the invention, an approach multiple resolution morphology can be used to analyze the layout of the document image, to remove image elements, and to keep text content for further processing (DS Bloomberg, "Multiresolution morphological approach to document image analysis ”, International Conference on Document Analysis and Recognition, pp. 963-971, Saint-Malo, France, Sept. 30 - Oct 2, 1991).
Figure 7 shows a detailed flow diagram of the text blob classification in step 140.
In step 141, the entered text image 7 is analyzed and a plurality of text blobs 11 are identified. Each text blob 11 comprises a region of linked pixels, that is to say pixels having the same value, for example 0 or 1.
BE2016 / 5960
In one embodiment, step 141 is not executed and the blobs 9 which have been classified as text elements in step 137 are used as text blobs 11.
In step 142, for each text blob 11, a compression cost is calculated. The compression cost is derived from the principle of minimum description length. With the principle of minimum description length, it is determined whether the blob is compressed as part of the foreground layer or as part of the background layer. If it is determined that the blob is in the background, it is compressed in the background image. If it is determined that the blob is in the foreground, it is compressed in the bit mask image and in the foreground image. Consequently, two compression costs are calculated, namely a first cost, which is the cost (image), in which the text blob 11 is compressed as a background element (equation (2)), and a second cost, cost (text), in which the text blob 11 is compressed as a mask element and foreground element (equation (3)):
cost (image) = cost (background) (2) cost (text) = cost (foreground) + cost (mask). (3) cost of compression, gray values are required. By this is illustrated in Figure 7, gray 3 which was used at one as an entry for the
To calculate the scale pixel therefore, like the previous step scale image as much
BE2016 / 5960 binarization is also retrieved as input for step 142.
In one embodiment, the cost (background) is estimated as the sum of the errors between the pixel colors and the local average color, the local average color referring to the average color of the background pixels surrounding the particular pixel, i.e. the eight pixels that touch it. This estimate is based on the assumption that the color changes regularly in the background image.
In one embodiment, the cost (foreground) is estimated as the sum of the errors between the pixel colors and the average color, the average color referring to the average color of the foreground pixels at l inside the particular blob of which the pixel is a part. This estimate is based on the assumption that all the pixels of a blob have the same color for all the foreground objects.
In one embodiment, the cost (mask) depends on the scope of the blob and it can be estimated by the following formula:
cost (mask) = perimeter * factor, (4) where “factor” is a predefined value which is determined empirically.
In alternative embodiments, alternative compression cost estimation methods may be used.
In another embodiment, the square of errors is calculated before making the sums, that is to say that the cost (background) is estimated in
BE2016 / 5960 as long as the sum of the squared errors between the pixel colors and the local average color and the cost (foreground) is estimated as the squared sum of the errors between the pixel colors and the average color, the average color referring to the average color of the foreground pixels inside the particular blob of which the pixels are a part.
In another embodiment, the pixels surrounding the blob are also taken into account. A pixel surrounding the blob is a pixel in the background near the edge of the blob. Equation 3 then becomes:
cost (text) = cost (foreground) + cost (surrounding) + cost (mask) (3 ') where the cost (surrounding) is the sum (or the sum squared) of the errors between the surrounding pixels and the color average of the surrounding pixels multiplied by a factor, f _s , which is determined empirically using image samples. This estimate is based on the assumption that all the pixels surrounding a block must have the same color. This is often true for text elements printed on a homogeneous background. The surrounding pixels are normally the background pixels that touch the edges of the blob. However, it has been found that it is better to take, as surrounding pixels, the background pixels which are one pixel away from the edges of the blob. This is explained by the fact that the colors of these pixels are more uniform at this distance.
In step 143, for each text blob 11, a ratio of the two compression costs, that is to say the cost (text) and the cost (image), is calculated.
Equation (5) summarizes how the ratio is calculated:
B E2016 / 5960. cost (text) ^ra Pv ^mC = „***>
(5)
According to equation (5), a small cost (image) and / or a large cost (text) gives a large ratio, and vice versa.
To determine whether a text blob 11 is better classified as a blob prone to text compression or as a blob prone to image compression, the ratio calculated in step 143 should be compared to a threshold value . This comparison is made at 144. As indicated in equation (6), if the ratio is below a threshold, the text blob 11 must be compressed as if it were an image element . The text blob 11 is then classified as a blob prone to image compression in step 145. As a variant, if the ratio is greater than or equal to the threshold, the text blob 11 is compressed as if it were was a text element. The text blob 11 is then classified as a blob prone to text compression in step 146.
(ratio <blob threshold prone to image compression {report> blob threshold prone to text compression '
In one embodiment, the threshold value is adjusted between 0 and 1, preferably between 0.45 and 0.6, and more preferably at 0.55. This value is determined on the basis of empirical experiences.
BE2016 / 5960
As described above, in step 150, a quality evaluation value is calculated based on the classification of text blobs 11. In particular, as indicated in equation (7) , the ratio of the number of blobs prone to text compression over the total number of text blobs 11, i.e. the sum of the number of blobs prone to text compression and the number of blobs prone to text compression image compression, is considered an assessment of document image quality:
quality = _number (bZob5 prone to text compression) _ number (bZobs prone to text compression) + number (blobs prone to text compression) (7)
As described above, this quality evaluation value is based on the empirical observation that in the iHQC process, for good image quality, it is more economical to compress text objects in as text elements than compressing them as picture elements. In the iHQC process, for good image quality, the majority of text objects are compressed as text elements, while for a low quality image, the majority of text objects are compressed as background picture elements. A ratio of the number of blobs prone to text compression over the total number of text blobs 11 is a good indication of the quality of the document image. In particular, if the quality is high, the majority of text blobs 11 are blobs prone to text compression whereas, if
BE2016 / 5960 low quality, the majority of text blobs 11 are blobs prone to image compression.
In an alternative embodiment, the quality evaluation criterion comprises several image quality values as a function of the location of the text blob 11 in the text image 7. These local image quality values can be defined as described above. The present embodiment is illustrated in FIG. 8 arising from the application of the quality evaluation criterion to several pads of an image 1 entered independently. It appears that, in the region of low image quality, the quality evaluation value is low while, in the region of high image quality, the quality evaluation value is high. Specifically, from left to right and from top to bottom, the image quality values are 0.833, 0.762 respectively,
0.644, 1, 0.968, 0.945, 1, 1 and 1.
Local image quality values may be advantageous for some applications. For example, it is assumed that, for the same document, there are several acquired images. It is possible to carry out the method proposed on each patch of the different images and to recover a matrix of image quality values indicating local image qualities. Using this matrix, it is possible to select the image pads having the highest local quality. These image pads can then be merged into an artificial image
BE2016 / 5960 having better quality than any of the images acquired.
In certain embodiments of the present invention, the quality evaluation method 100 is implemented on a portable device, such as a smartphone, a camera, a tablet, etc. In this case, the method 100 can be used in real time to ask a user to take another image of a document when the quality evaluation value is considered to be not high enough to allow good accuracy d 'OCR. As a variant or in addition, the portable device can automatically take several images of a document and decide, using the quality evaluation method 100, which of the various images has a sufficiently high quality. If none of the images are of sufficiently high quality, the portable device can continue to take images automatically until a good quality image is obtained.
In another embodiment, the quality evaluation method 100 is implemented on a computer and / or a network system in which users load document images. The computer / network can then determine the quality of the loaded document images using the quality evaluation method 100. If the quality is not high enough, the computer / network can ask the user to load a new document image of higher quality.
BE2016 / 5960
The quality assessment process 100 was tested using the DIQA database available in the public domain. This database contains 25 documents which are extracted from datasets available in the public domain. For each document, 6 to 8 images were taken at a distance for the acquisition of the complete page. The camera was focused at varying distances to generate a series of images with focal blur including a sharp image. In total, 25 of these sets, each consisting of 6 to 8 high resolution images (3264 by 1840), were created using an Android phone with an 8 megapixel camera. The dataset contains a total of 175 images. The verification test in the field corresponding to each image is also available. The level of character accuracy for each acquired image is calculated using the ISRI-OCR assessment tool.
Since the DIQA database provides images and their OCR accuracy, the evaluation of the document image quality evaluation method 100 is simple. The Spearman rank order correlation coefficient (SROCC) and the Pearson correlation coefficient (linear) were used to assess the correlation between the quality assessment value and the given OCR precision calculated using the DRS tool kit (an IRIS OCR engine).
LCC is defined as follows:
P (A B) = ^, (^) (^) (8)
BE2016 / 5960 where A and B are vectors of size N, and σ _Α , σ _Β are their corresponding standard deviation. In particular, LCC is a measure of the degree of linear dependence between two variables. SROCC is defined as a Pearson correlation coefficient between the rank variables:
= 1— ⁶ ^ = i ( ^A i ~ ^B i) ^P N (N ² -i) (9)
Where A 'and B' are rank vectors of A and B, SROCC is a correlation measure based on ranks. LCC and SROCC can be calculated globally or locally. The global LCCs and SROCCs examine the behavior of the measurement without constraint on the target document object. Local LCCs and SROCCs are calculated when the target document is fixed. In the DIQA database, there are local LCCs and SROCCs for each document object. In total, there are therefore 25 local LCC and SROCC values. The median value of local LCCs and SROCCs is used as a measure of overall performance.
The median value of local LCCs and SROCCs indicates how the quality evaluation process 100 responds by taking images of the same document under varying imaging conditions. The global LCCs and SROCCs indicate whether the quality evaluation method 100 can be extended to other documents sharing the same imaging condition.
The performance measures are recorded in Tables I and II. For comparison with the method of the present invention which has been described above, three standard quality evaluation methods were used:
BE2016 / 5960
- Laplacian maximum: in this process, Laplacian filtering is performed on the input image, then the maximum value of the filtered image is taken as the quality evaluation value;
- Laplacian standard deviation: in this process, Laplacian filtering is performed on the input image, then the standard deviation of the filtered image is taken as the quality evaluation value.
- Entropy: in this process, the entropy of the image is calculated using an image histogram.
Table I: LCC performance
Local LCCmedian LCC global Laplacian maximum 0.7554 0.5917 Standard deviationLaplacian 0.7846 0.5601 Entropy 0.6950 0.0856 Method ofinvention 100 0.9488 0.8523
Table II: SROCC performance
Local SROCCmedian Global SROCC Laplacian maximum 0.7143 0.6515 Standard deviationLaplacian 0.9048 0.7059 Entropy 0.5714 0.0458 Method of 0.9375 0.8498
B E2016 / 5960
invention 100
As indicated in Tables I and II, over the standard methods, the quality evaluation method 100 has an obvious advantage due to a higher correlation with the OCR accuracies.
Similar tests have been performed on low resolution images since, in real life applications, low resolution images are often acquired. For each of the high resolution images in the DIQA database, the spatial resolution has been reduced by 2 in the horizontal direction and in the vertical direction, i.e. a 4: 1 subsampling has been used . The tests confirmed that the quality evaluation method 100 also works on reduced resolution images.
The above tests were carried out on a WIN7 computer, Intel Î7-3630QM CPU 2,4GHz, 20 GB RAM running WIN32. The average execution time of the quality evaluation method 100 was 265 milliseconds for an image of the high quality DIQA database and 64 milliseconds for an image of the reduced resolution DIQA database.
Although aspects of this disclosure have been described in connection with specific embodiments, it should be appreciated that these aspects can be implemented in other forms.
BE2016 / 5960

权利要求:
Claims (15)
[1]
1. A computer implemented quality evaluation method intended for evaluating the quality of an image (1) for processing OCR, the method comprising the steps of:
a) processing the image into a text image (7) comprising a number of text blobs (11);
b) classification of the text blobs (11) in the text image (7) into a first type and a second type of text blobs on the basis of a calculation indicating whether they belong to a layer before background to OCR processing or to a background layer in OCR processing; and
c) generating an image quality value (1) based on the classified text blobs (11), where step b) comprises:
bl) calculating a text compression cost and an image compression cost for each text blob (11);
b
[2]
2) calculating a ratio of the cost of text compression and the cost of image compression for each text blob (11);
b
[3]
3) comparing said ratio to a predetermined threshold to determine if said ratio is less than the predetermined threshold;
b4) classifying said text blob (11) as a second type blob if the ratio is less than the predetermined threshold; and
BE2016 / 5960 b5) the classification of said text blob (11) as
that blob of the first type ifbelow the predetermined threshold. The report is not not 2. Process implemented by computer according to the claim 1, characterized in that 1 'step bl) includes: blla) the calculation of a cost of compression back plan for each blob of text 1 : ii); bl2a) the calculation of a compression cost of 'before- plan for each text blob (11); and bl3a) the calculation of a cost of compression mask for each text blob (11). 3. Process implemented by computer according to the claim 2, characterized in that 1 'step bl)
further includes:
bl4a) adding the foreground compression cost and the mask compression cost to calculate the text compression cost for each text blob (11); and bl5a) the equation of the cost of image compression with the cost of background compression for each text blob (11).
[4]
4. A method implemented by computer according to claim 2 or 3, characterized in that the step blla) comprises the calculation of a sum of the squares of gray scale differences between a target pixel in said text blob ( 11) and the eight pixels that touch it, in that
BE2016 / 5960 step bl2a) comprises the calculation of a sum of the squares of gray scale differences between a gray scale value of a target pixel in said text blob (11) and an average value d the gray scale of pixels in the text blob (11), and in that the step bl3a) comprises the calculation of a perimeter of said text blob (11).
5. Process set implemented by computer according to claim 1, characterized in that 1 'step bl) includes: bllb) the calculation of a cost of compression back plan for each text blob (11); bl2b) the calculation of a cost of surrounding compression for each blob of text ( 11) r bl3b) the calculation of a cost compression foreground for each blob of text ( 11) ; and bl4b) the calculation of a cost of compression mask for each blob of text ( 11). 6. Process set implemented by computer according to claim 5, characterized in that 1 'step bl)
further includes:
bl
[5]
5b) adding the foreground compression cost, the mask compression cost and the surrounding compression cost to calculate the text compression cost for each text blob (11);
and
BE2016 / 5960 bl
[6]
6b) the equation of the cost of image compression with the cost of background compression for each text blob (11).
[7]
7. A computer implemented method according to claim 5 or 6, characterized in that step bllb) comprises the calculation of a sum of the squares of surrounding qris scale differences between pixels and an average color of the pixels surrounding multiplied by a preset factor, the surrounding pixels being background pixels
near an edge of the blob of text (11), in this than The stage bl2b) includes the calculation of a sum of squares differences d scale qris Between a value qris scale of a pixel target in said
text blob (11) and an average scale value of qris of pixels in the text blob (11), and in that step bl3b) comprises the calculation of a perimeter of said text blob (11).
[8]
8. Process implemented by computer according to any one of the preceding claims, characterized in that step c) comprises the calculation of a ratio of the number of blobs of the first type over the total number of text blobs (11 ).
[9]
9. Process implemented by computer according to any one of the preceding claims, characterized in that the image (1) is an image in color and in
BE2016 / 5960 that step a) includes image processing to form a gray scale image (3).
[10]
10. Method implemented by computer according to any one of the preceding claims, characterized in that step a) further comprises:
al) binarizing the image to form a binary image (5); and a2) separating the text elements from the binary image (5) to form a text image (7).
[11]
11. Process implemented by computer according to claim 10, characterized in that step a2) comprises:
a21) the identification of blobs (9) in the binary image (5); and a22) classifying each blob (9) as one of an image element and a text element.
[12]
12. Process implemented by computer according to claim 11, characterized in that step a22) comprises:
classifying each blob (9) as an image element if the surface of said blob (9) is too large or too small compared to predefined thresholds.
[13]
13. Process implemented by computer according to claim 11 or 12, characterized in that step a22) further comprises:
the calculation of a line of each blob (9); and
BE2016 / 5960
the classification of each blob (9) in so much one element of imaqe if the trait of said blob ( 9) is too much qrand compared to a predefined threshold. 14. Process implemented computer artwork according to 1 'a
any of claims 11 to 13, characterized in that step a22) further comprises:
the calculation of a width and a height of each blob (9); and the classification of each blob (9) as an image element if at least one of the width and the height of said blob (9) is too large compared to a predefined threshold.
[14]
15. Process implemented by computer for quality evaluation intended for the evaluation of the quality of an image (1), the method comprising the steps of:
dividing the image (1) into at least two image pads; and generating a quality value of each image patch according to the method of any one of the preceding claims.
[15]
A computer implemented quality evaluation method according to any one of the preceding claims, wherein the first type of text blob is a blob prone to text compression and the second type of text blob is a blob prone to image compression.
BE2016 / 5960
Oiïtatr
C!> ÜW SMi ïh (“meed” χ τ ** ”“ ”<-, - ·“ χ · ι
Smt. [“N.
'ïzixa «ajrtwrx synasnant w> otsaiîx.-e, wüte," 4 "ïwan" CïX "and w. rty ΛβΛ lÂan I "" iwneiS & οώ & λ · 5- Πιχ. · ' ¹ y "v, 1 hi" htce * t V>fl> Lvi lil " ΟιΚιβί 5C ** Virtr V> fifö! K iMiS SM e # W »« ΜΙΜ-, Ι <! Vi · ever. "E * κ *> ιΛ.-λιλπ sre" wt w> ïv ΛνΛΜ'Ύ. , · Μ "χ" -βΐ fbetj them "tk""i Kelp. : »» Mry>»ίμ>»! y »tan ΐ <-a uMraeWÄ«.
• X »» h «<o; i * i ii ·. , tvee • iiKi '"' Um> tswtinwl" Xe! <τλ Iw Av KvWv mvvtüs, 1 * "Κ tot * —e * 4 Vif in Jw e * i" "i , io- t Sot Bv i "" rew "il MX SM" r V.n "f Y te <" ¢ 041 "Sw" * BuJOtt "a aies eo" "ur" h "e. Jc> r "send * & * ÿttrUKj: U pat * ιβ ii e" î "'ri no fcofe fefnani -mrii ÄiMc 10‘ "Λ-y y> j" "IïiUl>
Xx <ipcl "
UowrrenJi
BE2016 / 5960
Quality assessment process 100
1 ^f color image ^inputs Pre will do x temen tpicture input110 1 Image d ¹ gray scale 3 f Scale image binarisatioFA. - ~ gray JL ^ U " binary lutage 5
text image 130
Text image
Text blebs classification
140
T
Calculating value ^f quality assessment 150
BE2016 / 5960
Π
BE2016 / 5960
Separation of ia & age text 130
Binary image 5
Classification
Ids blob in I tast as text element 137
Text isle generation 139
Text image 7
BE2016 / 5960
Text blob classification
Text image 7
Blobs identification of
Grayscale image 3
Text blobs! 1
Calculation of text compression cost and image compression cost 142
Classification of text blob as <jue blob prone to text compression 146
I Classification of text blob as I as image compression blob
D45 „
BE2016 / 5960 “ô
Ex"*.
Wtot
BE2016 / 5960
SHORT
METHOD FOR EVALUATING THE QUALITY OF AN IMAGE OF A
DOCUMENT
A method comprising processing an image (1) into a text image (7) with a number of text blobs, classifying the text blobs according to a calculation indicating whether they belong to a foreground layer background or background in OCR processing, and generating an image quality value (1) from classified text blobs. By this generation, images in the image (1), not relevant for the OCR, are not taken into account for the evaluation of the quality of the image (1). The volume of data to be processed is therefore reduced, which allows the execution of the process in real time. The quality evaluation criterion based on the division of blobs into foreground and background layers (prior knowledge of the OCR system} is a good indication of the accuracy of OCR.

类似技术:

公开号 | 公开日 | 专利标题

Bianco et al.2018|On the use of deep learning for blind image quality assessment

US10949952B2|2021-03-16|Performing detail enhancement on a target in a denoised image

US10395393B2|2019-08-27|Method for assessing the quality of an image of a document

Kirchner et al.2010|On detection of median filtering in digital images

Dudhane et al.2019|Ri-gan: An end-to-end network for single image haze removal

RU2538941C1|2015-01-10|Recognition quality enhancements by increasing image resolution

EP2880623B1|2016-12-14|Method and device for reconstructing super-resolution images

Chen et al.2013|A two-stage quality measure for mobile phone captured 2D barcode images

US20130271616A1|2013-10-17|Method of analyzing motion blur using double discrete wavelet transform

Alaei et al.2015|Document image quality assessment based on improved gradient magnitude similarity deviation

CN108369649B|2021-11-12|Focus detection

BE1022635B1|2016-06-22|METHOD AND SYSTEM FOR CORRECTING PROJECTIVE DISTORTIONS USING OWN POINTS

Alaei et al.2018|Blind document image quality prediction based on modification of quality aware clustering method integrating a patch selection strategy

Alaei et al.2016|Document image quality assessment based on texture similarity index

BE1024836B1|2018-07-23|ABREGE METHOD OF EVALUATION OF THE QUALITY OF AN IMAGE OF A DOCUMENT

Kerouh et al.2015|Wavelet-based blind blur reduction

Chabardes et al.2015|Local blur estimation based on toggle mapping

Iuliani et al.2017|A hybrid approach to video source identification

Landge et al.2013|Blur detection methods for digital images-a survey

Dutta et al.2018|Segmentation of meaningful text-regions from camera captured document images

Bui et al.2018|Predicting mobile-captured document images sharpness quality

Júnior et al.2019|A prnu-based method to expose video device compositions in open-set setups

Kaur et al.2017|A novel approach to no-reference image quality assessment using canny magnitude based upon neural network

Lawgaly2017|Digital camera identification using sensor pattern noise for forensics applications

Priyanka et al.2015|A Comparative Study of Binarization Techniques for Enhancement of Degraded Documents

同族专利:

公开号 | 公开日

BE1024836A1|2018-07-16|

BE1024836A9|2018-08-21|

BE1024836B9|2018-08-29|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US9418310B1|2012-06-21|2016-08-16|Amazon Technologies, Inc.|Assessing legibility of images|

法律状态:
2018-10-03| FG| Patent granted|Effective date: 20180723 |

优先权:

申请号 | 申请日 | 专利标题

BE20165960A|BE1024836B9|2016-12-22|2016-12-22|METHOD FOR EVALUATING THE QUALITY OF AN IMAGE OF A DOCUMENT|BE20165960A| BE1024836B9|2016-12-22|2016-12-22|METHOD FOR EVALUATING THE QUALITY OF AN IMAGE OF A DOCUMENT|

[返回顶部]