专利摘要:
An encoding method having a structure defined in at least a first schema inaccessible to a decoder and allowing the decoder to at least partially decode a structured document resulting from a change in at least a second schema accessible to the decoder Wherein the structured document comprises information elements surrounded by one another, wherein the information elements of the document are associated in at least first and second schemas with respective element types respectively defining respective element structures of the information elements. The first schema may not have access to the decoder and the second schema may have access to the decoder, and the first schema may contain at least one derived information element derived from a corresponding element defined in the second schema. And the encoding method comprises: defining each document of the document using the first and second schemas; Encoding said document with a binary stream comprising a binary sequence encoding said information element for an element, said first schema defining the structure of said derived element in said binary sequence encoding said derived information element. And inserting a reference pointing to the reference schema, wherein the reference pointing to the first schema is defined in a schema reference list that includes references to all schemas used to encode the document and the schema reference list is defined by the decoder. It is made accessible.
公开号:KR20030085527A
申请号:KR10-2003-7010295
申请日:2002-02-04
公开日:2003-11-05
发明作者:끌로드 쎄이라;쎄드리 띠에노
申请人:이엑스피웨이;
IPC主号:
专利说明:

Method and system for compressing structured descriptions of documents
[2] Within a few years, computer networks became the main media for communication. It's no exaggeration to say that computers are now shared networks, the operating system makes it easy for applications to exchange messages, the Internet infrastructure allows computers to discover their delegates, and applications synchronize themselves using complex algorithms. do.
[3] Against the backdrop of interoperability, generalized markup languages provide a solution for handling documents. In fact, the structure of a document plays a major role in its use. Formatting, printing and indexing the document is practically done along its structure. SGML was designed from the ground up to easily separate document representations from document structure and content. Because of its ability to encode structures, XML has drawn attention from others interested in non-literal applications. The XML audience is broader to include e-commerce, database knowledge representation groups (among others).
[4] XML and more generally markup languages are widely used to represent and structure current documents (metadata). A structured document contains several elements of information surrounded by one another. Information elements are distinguished and separated from each other by tags. The tags represent the element types of the information elements. A structured document generally includes a first information element, or basic element, separated by tags that represent the entire document and mark the beginning and end of the document. This first element contains an information subelement such as, for example, a paragraph of text. Each information subelement is distinguished by a tag indicating the start and end of the element. Tags may be associated with a tag attribute that indicates one or more characteristics of the information element.
[5] The tag content generally represents information that is intended to be displayed or manipulated by the user. The tag content may be optional or required depending on the type of tag, and may include content and attributes and other surrounding information sub-elements that are subsequently limited by the tags.
[6] A structured document may be associated with a schema that reflects the rules that a structured document must validate to be considered "valid." It contains information about default values, element and attribute types and type hierarchies. Validity ensures that the received document conforms to the schema. Thus the received document has the intended meaning. Moreover, it determines the characteristic, ie the type of each description item (information element or attributes). The XML standard includes an XML Schema language designed to specify the grammar for classes of XML documents with the same structure.
[7] But XML is a verbose language. Therefore, it is ineffectively processed and transmitted expensively. For this reason, ISO / IEC 15938-1 and, more specifically, Moving Picture Expert Group (MPEG-7), propose a binary format and a method of encoding (compressing) the description of structured documents and decoding such binary formats. This standard is further designed to handle highly structured data such as multimedia data.
[8] In order to obtain compression efficiency, this method is based on the schema analysis state. During this state, internal tables are computed to relate the binary code to the respective XML elements, types and attributes. This method requires sufficient knowledge of the same schema by the encoder and the corresponding decoder.
[9] When the schema used to encode structured documents requires extension, the best solution should be to create an extension schema that is useful for the decoder. In special cases, however, it is not possible to easily update the decoder to access the extension schema.
[1] TECHNICAL FIELD The present invention relates to the field of general computer systems, and to a method and system for compressing a structured document using a document representation in a generalized markup language such as Standard Generalized Markup Language (SGML) and eXtensible Markup Language (XML). It is about. Such a document may include media information.
[31] Various forms and preferred embodiments of the invention may be better understood by reference to the following description and the accompanying drawings.
[32] 1 is a binary format of the tree structure of a structured document in accordance with the MPEG-7 standard.
[33] 2 is a block diagram of an MPEG-7 decoder.
[34] FIG. 2A is a block diagram of details of the decoder shown in FIG. 2; FIG.
[35] 3 shows a binary format of an encoded information element according to the present invention.
[36] 4 shows element types defined for each other in a tree structure;
[37] 5 illustrates a structured document that includes information elements organized in a tree structure.
[38] -Translation of the description of the drawings-
[39] <Figure 1>
[40] 1: document
[41] 2: header
[42] 3: set of elements
[43] 31: Element header
[44] 32: element body
[45] 33: length
[46] 35: type
[47] 36: property
[48] 37: value
[49] Integer: Integer
[50] Key: key
[51] <FIG. 2>
[52] 7: Encoded Document
[53] 8: Decrypted Document
[54] 9: schema
[55] 10: decoder
[56] 11: Schema Compilation
[57] 12: Finite state automata
[58] 13: binary syntax
[59] Figure 2a
[60] 9: schema
[61] 11: Schema Compilation
[62] 12: Finite state automata
[63] 15: Realized schema
[64] 16: syntax tree
[65] 17: normalized syntax tree
[66] 3
[67] 4: replace
[68] 5: mode
[69] 32: element body
[70] 33: length
[71] 34: Content
[72] 35: type
[73] 36: property
[74] 37: value
[75] 38: Sch
[76] 41: Alternate pull lag
[77] 42: Sch substitute
[78] 43: element number
[79] <Figure 4>
[80] extention: extension
[81] extention: extension
[82] extention: extension
[83] restriction
[84] restriction
[85] extention: extension
[86] extention: extension
[87] extention: extension
[88] <Figure 5>
[89] freeze schema: freeze schema
[90] frozen schema: frozen schema
[91] no schema change: no schema change
[92] schema change: schema change
[10] Summary of the Invention
[11] It is an object of the present invention to provide a method for encoding a structured document in such a way that the document can be partially decoded even if everything necessary for the schema is not known by the decoder.
[12] Another object of the invention is to ensure forward and backward compatibility, i.e. having a structure defined in at least a first schema inaccessible to a decoder, and at least a structured structure resulting from a change in a second schema accessible to the decoder. An encoding method that enables a decoder to at least partially decode a document, wherein the structured document includes information elements surrounded by each other, wherein the information elements of the document each define a respective element structure of the information elements. Is associated with at least a first type and a second schema, wherein the first schema is inaccessible to the decoder and the second schema is inaccessible to the decoder, wherein the first schema is defined in the second schema. Provides an encoding method for defining at least one derived information element derived from the corresponding element There is a help.
[13] According to the present invention, the encoding method comprises the steps of: encoding the document into a binary stream comprising a binary sequence encoding the information element for each information element of the document using the first and second schemas; Inserting a reference to the first schema that defines the structure of the derived element in the binary sequence encoding the derived information element, wherein the reference pointing to the first schema is used to encode the document. A schema reference list is defined that includes references to all the schemas used in the schema reference list and is made accessible to the decoder.
[14] According to an aspect of the present invention, the binary sequence for encoding each element of the document includes a content field containing an encoded value of the element, and an encoded value of a length of the content field located before the content field. Contains a length field.
[15] According to another aspect of the present invention, the derived information element is associated in the first schema with a structure type restricted to the structure type of the corresponding information element in the second schema, and the binary sequence encoding the derived elements is A reference to the content field and the first schema and a reference to the structure type of the derived element is attached to the content field and defined in the second schema.
[16] According to another aspect of the present invention, the derived information element is associated in the first schema with a structure type that is extended to the structure type of the corresponding information element in the second schema, and the structure type of the derived information element is A second portion having a structure type of the corresponding information element defined in the second schema and a second portion having a structure type defined in the first schema that is specific to the derived information element, wherein the binary sequence comprises Encodes the derived elements comprising a field, wherein the content field comprises a field containing a reference pointing to the second schema and a structure type reference indicating a structure type of the corresponding element in the second schema. A field containing an encoded value of the first portion, a field containing a reference pointing to the first schema, A field containing a structure type reference indicating the structure type of a second part, and a field containing a coded value of the second part.
[17] According to another aspect of the present invention, a binary sequence encoding an information element includes a substitute field including a substitute flag indicating whether an information element is renamed, and if the substitute flag indicates a change, the element name reference field indicates the information. Include a reference to the new name of the element, and the Schema Reference field contains a reference to the schema in which the new name reference is defined.
[18] According to another aspect of the present invention, the binary sequence for encoding one information element in the encoded document includes a schema state mode field, wherein the schema state mode field corresponds to the information element in the second schema. A first state indicating that there is no change in the first schema for the element, a second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema, and A third state indicating that the information element in the first schema changes for the corresponding element in a second schema, wherein the encoded information element refers to a schema when the schema state mode field is in the first state And any other change information, wherein the schema state mode field is Any sub-element of the encoded information elements when the other does not contain any change information with any schema reference.
[19] According to another aspect of the invention, the binary sequence for encoding at least one information element in the encoded document includes a schema state mode field, the schema state mode field, for the element in the second schema A first state indicating that the information element does not change in the first schema, a second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema; And a third state indicating that the information element in the first schema changes with respect to the corresponding element in the second schema, and wherein the information element in the first schema changes with respect to the corresponding element in the second schema. And any sub-elements of the information element are the corresponding in the second schema. A fourth state indicating that there is no change in the first schema for an element, wherein the encoded information element includes any schema information that is different from any schema reference when the schema state mode field is in the first state; When the schema state mode field is in the second state or the fourth state, no sub-elements of the information element contain any change information other than the schema reference.
[20] According to another aspect of the invention, the schema reference list including references pointing to all schemas used to encode the structured document is inserted in a header associated with the binary stream encoding the structured document.
[21] It is another object of the present invention to have a structure defined in at least a first schema inaccessible to a decoder, and at least partially to encode a binary stream encoding a structured document resulting from a change in a second schema inaccessible to the decoder. A decryption method for decryption, the structured document comprising information elements surrounded by each other, wherein the information elements of the document each element type defining at least one element structure of each of the information elements and at least first and first elements. 2 is associated in a schema, wherein the first schema is inaccessible to the decoder and the second schema is inaccessible to the decoder, and the first schema is derived from at least one element defined in the second schema. The present invention provides a decoding method for defining a derived information element.
[22] According to another aspect of the present invention, the decoding method, a binary stream for encoding the structured document using the second schema to detect in the binary stream a binary sequence for encoding each information element of the document. Reading sequentially, detecting a reference pointing to the first schema in each detected binary sequence of encoded information elements as defined in the schema reference known by the decoder, and in the detected binary sequence. If the reference pointing to the first schema is not detected, a field containing a structure type reference pointing to the corresponding element in the second schema and a structure type of the corresponding element in the second schema and the encoded portion of the first portion Fields containing values and fields containing references pointing to the first schema Decoding the detected binary sequence according to a field comprising a structure type reference indicating the structure type of the second portion; and sequentially reading and decoding the binary stream, the sequence associated with the first schema. Skipping binary data.
[23] According to another aspect of the present invention, the binary sequence encoding each element of the document includes a content field including an encoded value of the element, and an end of the binary sequence located in front of the content field and encoding an element. It includes a length field containing the length coded value used by the decoder for determining the.
[24] According to another aspect of the present invention, the decoding method includes reading and decoding a length code value within the binary sequence including a reference indicating the first schema, and the binary of the reference pointing to the first schema. Determining a length of binary data to skip as a function of the decoded length value and the position in a sequence.
[25] According to another aspect of the present invention, the derived information element is associated in the first schema with a structure type restricted to the structure type of the corresponding information element in the second schema, and the binary sequence encoding the derived elements. Includes a content field, a reference to the content field and the first schema and a reference to the structure type of the derived element are attached to the content field and defined in the second schema.
[26] According to another aspect of the present invention, the derived information element is associated in the first schema with a structure type that is extended to the structure type of the corresponding information element in the second schema, and the structure type of the derived information element is A second portion having a structure type of the corresponding information element defined in the second schema and a second portion having a structure type defined in the first schema that is specific to the derived information element, wherein the binary sequence comprises Encodes the derived elements comprising a field, wherein the content field comprises a field containing a reference pointing to the second schema and a structure type reference indicating a structure type of the corresponding element in the second schema. A field containing an encoded value of the first portion, a field containing a reference pointing to the first schema, A field containing a structure type reference indicating the structure type of a second part, and a field containing a coded value of the second part.
[27] According to another aspect of the invention, the derived information element has a name in the first schema that changes for the name of the corresponding information element in the second schema, and the binary sequence encoding the derived element is derived A replacement field including a replacement flag indicating whether a renamed information element has been renamed, and if the replacement flag indicates a change, a schema reference field includes a reference pointing to the first schema, and the derived information in the first schema. Contains an element name reference to the name of the element.
[28] According to another aspect of the present invention, the binary sequence for encoding one information element in the encoded document includes a schema state mode field, wherein the schema state mode field corresponds to the information element in the second schema. A first state indicating that there is no change in the first schema for the element, a second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema, and A third state indicating that the information element in the first schema changes for the corresponding element in a second schema, wherein the encoded information element refers to a schema when the schema state mode field is in the first state And any other change information, the schema state mode field is When the state of any sub-element of said coded information element does not include any other change information with any schema reference.
[29] According to another aspect of the invention, the binary sequence for encoding at least one information element in the encoded document includes a schema state mode field, the schema state mode field, for the element in the second schema A first state indicating that the information element does not change in the first schema, a second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema; A third state indicating that the information element in the first schema changes for the corresponding element in the second schema, and the information element in the first schema changes for the corresponding element in the second schema And any sub-elements of the information element are the corresponding in the second schema. A fourth state indicating that there is no change in the first schema for an element, wherein the encoded information element includes any schema information that is different from any schema reference when the schema state mode field is in the first state; When the schema state mode field is in the second state or the fourth state, no sub-elements of the information element contain any change information other than the schema reference.
[30] According to another aspect of the invention, the schema reference list comprising references pointing to all schemas used to encode the structured document is read in a header associated with the binary stream encoding the structured document.
[93] Referring to FIG. 1, the binary format of a coded structured document 1 according to the MPEG-7 standard specifies a document header 2 specifying at least one encoding mode and at least one structured information element or set of elements 3. Include.
[94] The document header 2 encodes the following parameters.
[95] "allows_skipping": This parameter specifies whether element lengths are encoded in coded documents. This parameter may have the following values.
[96] 00- length is not encoded.
[97] 01- length is optionally encoded.
[98] 10-length encoding is essential.
[99] "allows_partial_instantiation": This parameter specifies whether structured information elements in all subtrees or documents are fully instantiated. This parameter has the following values:
[100] 0-no partial instantiation is allowed. All elements of the document appear in the encoded document.
[101] 1-partial instantiation is allowed.
[102] "allows_subtyping": This parameter specifies whether the subtree has information elements or element attributes that may be homogeneous, i.e., other possible subtypes of data type. This parameter has the following values.
[103] 0-homogeneous heterogeneity is not encoded in the subtree.
[104] 1-homozygous is allowed.
[105] In FIG. 1, the set of structured information elements 3 comprises an element set header 31 and a set of elements 32 and / or elements 3. The element set header 31 is a key having the purpose of clarifying potential ambiguity with respect to the type of element during the decryption state, and the number of occurrences of similar elements or element sets in the element 3 of the next higher hierarchy level Nb.
[106] If the schema defining the structure of the document specifies a single mandatory value for the type of element or set of elements, the value is not shown in the encoded document.
[107] Element 32 contains the following fields.
[108] A length parameter 33 specifying the length in bits of the encoded value of the element,
[109] A type (35) containing the number of data types specifying the element value and the structure of the element's attributes,
[110] Attribute values 36 of the element, ordered for example in alphabetical order,
[111] A value 37 encoding the value of the element itself, when the element has a single type such as an integer or a string, etc., this field contains one or more elements or single values having the structure of the element 32 or the set of elements 3; You may.
[112] The length parameter 33 may not exist depending on the value of the "allows_skipping" parameter in the header 2 of the encoded document. When the length is encoded within element 32, quick access to a particular element in the document is possible by using the length information to skip the previous element in the document.
[113] The fields 35, 36, 37 may not appear in the encoded element. If the possible data type of the element is unique in the document's schema, the type number 35 is not encoded. If a type not specified in the schema has attributes, then attribute values 36 are not encoded. If the element is empty, for example in case of partial instantiation of the document, the value 37 of the element is not encoded.
[114] Referring to FIG. 2, the MPEG-7 decoder 10 uses a schema, such as an XML schema, to obtain a binary syntax code 13 that is executed to decode the encoded document 7 applied at the input of the decoder 10. Includes a schema compiler 11 designed to receive and process them. The latter serves to output the decrypted document 8, for example in XML format.
[115] In FIG. 2A, schema compiler process 11 includes four states. The first state of schema realization is when to evenly type inheritance and when to resolve namespace support. This state produces the realized schema 15. The second state is when constructing a syntax tree from each complex type. The syntax tree 16 thus generated is modified to facilitate binary coding and to improve compression. The third state is when normalizing the syntax tree generated by the previous state to generate a signal for each tree node and normalized syntax trees 17. These signals are used in the next state that creates a relationship between the tree keys and the header keys appearing in the element header 31. The fourth state is when the finite state automata 12 used for the decoding process is generated.
[116] In the schema realization state, the schema is interpreted to list all the names used in the schema to give extended names to each element, attribute, type, and group of elements. These names form a so-called namespace. Extended names are constructed by concatenating namespace delimiters and item names. Realized complex types are created by replacing element references with the definition of the referenced elements in each complex type.
[117] In the syntax tree generation state, item nodes are created by associating an element name with an element type. By including relationships through group nodes, the linked item nodes make up the syntax tree created for each complex type. In special cases, the syntax trees are reduced to improve the compactness of the resulting binary format. This process is performed to simplify complex type definitions in a non-destructive way.
[118] Syntax tree normalization aims to give every group node a unique name in the syntax tree. This process is the act of ordering the nodes and eventually giving them a key. This key is required during automata configuration.
[119] The signal is generated for every element or group of elements. Every node signal is generated by concatenating its child node signals. In the case of other groups or groups of out of order nodes, the node signals are sorted and added in alphabetical order. In the case of an ordered group of nodes, node signals are added in the order of definition in the schema. Item node signals are the same as their element names.
[120] In the finite state automata generation state, complex type automatons are defined recursively by the following rules.
[121] 1. Every node in the syntax tree creates an automaton.
[122] 2. A complex type automaton is an automaton that is produced by its root node.
[123] 3. Every node automaton is created by summing its child automata. The nature of this sum depends on the nature of the node.
[124] At the end of the process, automata are realized to generate their transition encoding keys. This automata contains two type states and element transitions that connect the latter for each item node in the syntax tree. Element transitions intersect when the element appears in the input stream of the document to be decoded. The type state is triggered when a particular decoder is activated, which may be a specific data type decoder for simple types and a basic or specific type of decoder for complex types.
[125] In some cases, the schema defining the structure of the document needs to be revised and modified. If a new schema of the encoded document applied to the decoder is available, the latter can provide full forward and backward compatibility for both text and binary formats. In fact, the entire representation can be validated, and namespaces can be attached to each element and attribute. Moreover, the type of hierarchy is useful and scalability points can be made external to the application. Schema updates address many technical issues raised by scalability and compatibility. However, standard MPEG-7 does not provide any information about the schema used to decode encoded documents. Therefore, a decoder configured to use a schema cannot decrypt a document encoded with the modified schema (no backward compatibility). This problem can be solved simply by providing the decoder with information about the schema to be used before decoding the encoded document. Alternatively, such automata or syntax trees may be transformed into binary syntax code that is executed by a decoder that decrypts the document while forming the input binary stream.
[126] In special cases, the complexity caused by schema updates may be problematic. Such a case occurs when the decoder cannot be easily updated to access the modified schema (forward compatibility), for example when the number of decoders to be updated is very high or when the decoder is not designed to use a different schema. To address this particular situation it is an object of the present invention to encode and decode an information element binary format and cause the decoder to at least " partially " decode a document having a modified or extended structure for a schema known to the decoder. In providing.
[127] In this respect, the level of compatibility is selected in the encoding state. This level is expressed in terms of the set of schemas for which compatibility is to be guaranteed. The encoding process adds the redundancy necessary to ensure the expected compatibility. In extreme cases, if compatibility with one current schema is required, redundancy is removed and the resulting binary coded document follows the current schema.
[128] Basically, the encoding process is added in the document information encoded for the schema used in the encoding state. This information, including schema delimiters and length information, allows elements to be skipped within the document to be encoded using a schema that is not useful for the decoder.
[129] Schema delimiters are defined by a schema delimiter dictionary known from the decoder. For example, schema delimiters are inserted in the header 2 of the encoded document or in the configuration file used by the decoder initialization and the decoder. This schema delimiter dictionary contains all possible schema delimiters to be used by the decoder which decodes the received encoded document. This dictionary can be read by the decoder using the following binary syntax:
[130] SchemaDictionary () {Number of bits NumberOfSchemaIdentifiersUINTVLC i = 0while (i <NumberOfSchemaIdentifiers) {SchemaIdentifier [i]String i ++}}
[131] Here, "NumberOfSchemaIdentifiers" and "SchemaIdentifier []" each represent an array including the number of schema separators and schema separators in the dictionary. "UINTVLC" specifies a format for encoding unsigned integers of indefinite size. In the format "UINTVLC", integers are encoded by a negated set of integer chunks. Each chunk consists of 5 bir. The first bit of each chunk specifies whether another chunk follows the current chunk, and the last fourth bit of that chunk encodes its integer.
[132] This binary syntax indicates that the schema dictionary contains counts. The count is followed by a list of each schema delimiter in the form of a character string. The location of each schema delimiter in the list automatically defines the number of schemas that may be used in the encoded document that identifies a particular schema. The read processing of the dictionary sequentially reads the number of schemas "NumberOfSchemaIdentifiers" from the dictionary, initializes the counter i, reads the i-th value in the array "SchemaIdentifier []", and 0 loop instructions including a small step of incrementing the counter i, where i < NumberOfSchemaIdentifiers. In the following table, the dictionary contains three schema delimiters. Each schema is internally related to the number of binary schemas defined by the rank of the schema delimiter in the dictionary.
[133] Schema IdentifierImplicit Number S 1 00 S 2 01 S 3 10
[134] The length (number of bits) of the binary schema number is defined as a function of the number of schema identifiers in the dictionary and is equal to E (log2 (NumberOfSchemaIdentifiers)). E (x) is equal to the next higher integer rounded x.
[135] As shown in FIG. 3, each type number field 35 in the encoded document is associated with a schema number field 38. The schema count corresponds to the schema delimiter defined in the schema dictionary. This schema count identifies the schema for which the associated type is defined.
[136] In this way, the encoder and decoder according to the present invention may use several schemas for encoding and decoding the document.
[137] Three types of changes are considered in the XML language between schema S0 and modified schema S1. These types of changes are combined together. The first type of change is the global element name minus another global element name. Global elements are defined in the schema in a way that is independent of other elements (not surrounded by other elements). Such a change is defined in the XML schema language as follows:
[138] In XML Schema S0:
[139] ...
[140] <complexType name = "S0: t0" base = "string">
[141] ...
[142] <element name = "e0" type = "S0: t0" />
[143] And XML Schema S1:
[144] ...
[145] <element name = "e1" type = "S0: t0" substitutionGroup = "S0: e0" />
[146] ...
[147] This syntax defines two elements named "e0" and "e1", which have a coarse type "t0". Element "e1" is replaced by element "e0". The element "e1" having the value "value1" may appear in an XML document based on schema S1 as follows.
[148] ...
[149] <e1> value1 </ e1>
[150] ...
[151] According to the invention such a modification is encoded by adding a subtraction field 4 to element 32. Subtraction field 4 includes the following.
[152] A subtraction flag 41,
[153] The number of subtraction elements (43),
[154] The number of schemas 42 in which the number of subtraction elements 43 is defined.
[155] When the subtraction flag is zero, the following subtraction fields 42, 43 do not appear in the coded element.
[156] The element "e1" encoded with the schema S1 including the definition of "e1" includes the following field values as shown in FIG.
[157] [length] 1 [S1] [e1] [S0] [t0] [value1]
[158] Where [length] is the binary encoded length of the element, "1" is the value of the subtraction flag indicating subtraction, and [S1] and [S0] are the encoded numbers of schemas S1 and S0 defined by the schema dictionary. , [t0] is the encoded type number of type "t0", [value1] is the encoded value of the value "value1" of the element "e1", and [e1] is the encoded value of the element "e1". Such binary code will be decoded in accordance with the present invention by a decoder that only knows schema S0 (and thus "e0"). The decoded element can be interpreted as follows.
[159] <e0 DDL: IsSubstituted = "true"> value1 </ e0>
[160] Therefore, the decoder cannot understand the meaning of [e1] and considers that "value1" is the value of element "e0" replaced by another unknown element.
[161] The second type of change between schema S0 and new schema S1 is to derive from type "t0" by limiting the new type "t1". Such a restriction reduces the maximum number of occurrences of the at least one element or group of elements in the type definition, increases the minimum number of occurrences, changes the type t0 to floating t1, or respects the predefined limits in type t0. To define a new group of elements.
[162] Such a change is expressed in the XML schema language in the following example. XML schema S0 contains the following text syntax:
[163] <complexType name = "t0">
[164] <sequence>
[165] <element name = "e1" type = "string" maxOccurs = "unbounded" />
[166] </ sequence>
[167] </ complexType>
[168] ...
[169] <element name = "e" type = "S0: t0" />
[170] ...
[171] In XML Schema S1:
[172] ...
[173] <complexType name = "t1">
[174] <complexContent>
[175] <restriction base = "S0: t0">
[176] <sequence>
[177] <element name = "e1" type = "string" maxOccurs = "1" />
[178] </ sequence>
[179] </ restriction>
[180] </ complexContent>
[181] </ complexType>
[182] ...
[183] In this example, type "t0" is defined in the XML Schema language that includes any occurrence of element "e1". Type "t1" is defined as a restriction of type "t0" and is defined in the XML Schema language containing zero or one single element "e1" which is a special case of type "t0". Thus, type "t1" enforces type "t0" constraints and the same binary format may be used.
[184] The document to be encoded includes the following information.
[185] <S0: e xsi: type = "S1: t1">
[186] <e1> value1 </ e1>
[187] </ S0: e>
[188] This means that the document contains element "e" with type "t1" which is a subtype of expected type "t0" of element "e" defined in schema S0.
[189] If omni-directional compatibility is not required, this element may be encoded as follows.
[190] [length] 0 [S1] [t1] [value1]
[191] Here, "0" is a value of a replacement flag not set for subtraction.
[192] A decoder that knows S0 rather than schema S1 decodes the fields [length] and [S1]. Since S1 is not defined in the schema dictionary, it can be seen that other fields related to element "e" cannot be interpreted using schema S0 and skip element "e" using length information. If the decoder knows schema S1 (and thus S0), it receives the original structure of element "e".
[193] If omni-directional compatibility is required, element "e" is encoded according to the structure defined in FIG. 3 as follows.
[194] [length] 0 [S0] [t0] [value1] [S1] [t1]
[195] A decoder that knows both schemas S0 and S1 can decode all fields of the encoded element "e". Next, by comparing the length field with the length of the decoded stream, it can be seen that there are no more floats and will receive the original structure of element "e". If the decoder knows S0 but not schema S1, it will decode the first part of the fields and skip the [S1] and [t1] fields (using the length field). Thus it creates the following text tree:
[196] <S0: e xsi: type = "S0: t0" DDL: isSubtype = "true">
[197] <e1> value1 </ e1>
[198] </ S0: e>
[199] The third type of change between the old schema S0 and the new schema S1 is to derive from the type "t0" by extending the new type "t1". If forward compatibility is to be ensured, the new type "t1" must be defined in two parts, the first part has type "t0" and the second part has type "t1" which must be defined in the new schema S1.
[200] Such a change is expressed in the XML schema language in the following example. XML schema S0 contains the following text:
[201] <complexType name = "t0">
[202] <sequence>
[203] <element name = "e1" type = "string" />
[204] </ sequence>
[205] </ complexType>
[206] ...
[207] <element name = "e" type = "S0: t0" />
[208] ...
[209] In XML Schema S1, the new type "t1" is defined as follows as an extension of type "t0".
[210] ...
[211] <complexType name = "t1">
[212] <complexContent>
[213] <extension base = "S0: t0">
[214] <sequence>
[215] <element name = "e2" type = "string" />
[216] </ sequence>
[217] </ extension>
[218] </ complexContent>
[219] </ complexType>
[220] ...
[221] If omnidirectional compatibility is required, the XML document to be encoded includes the following information.
[222] <S0: e xsi: type = "S1: t1">
[223] <e1> value1 </ e1>
[224] <e2> value2 </ e2>
[225] </ S0: e>
[226] Here, value1 is part of type "t0" in type "t1". "value2" is a part specific to type "t1" and is added to type "t0". The information is encoded as follows.
[227] [length] 0 [S0] [t0] [value1] [S1] [t1] [value2]
[228] If the decoder knows schema (not S1) S0, it will decode the first part of the binary coded information and skip the t1 part (ie, [t1] [value2]) using the length information. Such processing is performed on the following text tree.
[229] <S0: e xsi: type = "S0: t0" ddl: isSubtype = "true">
[230] <e1> value1 </ e1>
[231] </ S0: e>
[232] Therefore a certain part of t1 will not be decrypted. If the decoder knows two schemes S0 and S1, it will receive the original text tree.
[233] The above description may be combined. Thus, the definition of complex hierarchies of a type that may be defined within other schemas is allowed. 4 shows an example of a hierarchy where types t5 and t6 are two extensions of type t4. Type t4 is an extension of type t3, and type t3 is a restriction of type t1. In a similar way, types t8 and t9 are restrictions and extensions of type t7, respectively. Type t7 is an extension of type t2. Type t2 is an extension of type t1. Considering that each of the types t1 to t9 is defined in each of the other schemas S1 to S9, if compatibility is required for the schemas S1, S3, S4 and S5, then an element of type t5 is encoded in the following form.
[234] [length] 0 [S1] [t1] [att1] [value1] [S3] [t3] [S4] [t4] [att4] [value4] [S5] [t5] [att5] [value5]
[235] Here, the fields [att-i] are respective encoded attribute values of type t-i. [att-i] [value-i] is the t-i part of type t5 (-i = 1, 4, 5). If compatibility is required for schemas S3 and S5, then an element of type t5 is encoded in the following form.
[236] [length] 0 [S3] [t3] [att3] [value3] [S5] [t5] [att5] [value5]
[237] Here, the fields [att3] [value3] are t1 portions of type t5 encoded according to type t3, and [att5] [value5] are t4 + t5 portions of type t5 encoded according to type t5.
[238] Of course, the aforementioned first type of change between schemas may be combined with the second and third type of change by adding replacement field 4 to the encoded element.
[239] As mentioned above omnidirectional compatibility is obtained by adding binary replacement fields (fields 41, 42, 43) and a schema number field associated with each element type to the documented document. Therefore, this solution increases the size of the resulting coded document. To optimize the encoding, it appears that parts of the encoded document are often linked to the same schema. In this respect, the invention aims at adding a field 5 to each encoded element 32 including a schema state code indicating a schema state mode, as shown in FIG. This field defines whether the element and its subelements require a specification from the same schema. If the same schema is used to encode (and decode) the element and all the latter sub-elements, the fields 38 and 42 added to ensure omni-compatibility (including schema numbers) are added to the encoded element. Is removed from. Thus increasing the resulting compression rate.
[240] The following table defines the possible values of the schema status codes.
[241] Schema Status ModeValue no change0 freeze the schema10 Change the schema11
[242] Here, "no change" means that the schema used for the decoding process of the element is the same as the schema used for the element itself. Thus, element 32 does not include schema number field 38 and alternate schema field 42.
[243] "freeze the schema" means that the schema used for the decoding process of the element and all subelements of the element (all subtrees of the element) are the same as the schema used for the element itself. Therefore, element 32 and all subelements of the element do not include any schema number fields 38 and 42 and schema state field 5.
[244] "change the schema" means that the element's schema changes. Hence schema number fields 38 and 42 appear within element 32.
[245] The mechanism controlled by the schema status code is shown by FIG. This figure shows a tree structure of a structured document comprising a main element 51 comprising three sub-elements 52, 53, 54. Element 52 includes two subelements 55, 56, and element 54 includes three subelements 57, 58, 59. Element 55 includes subelements 60, 61.
[246] Elements 51, 53, 54, 57, 59 are encoded with schema status code 5 set to 0 ("no change"). This element does not contain any schema number fields 38, 42. Schema status code 5 of element 52 is set to 10 ("freeze the schema"). Therefore, all subelements 55, 56, 60, 61 of element 52 are encoded with the same schema defined in element 52. These subelements do not contain any schema number fields 38 and 42 and a schema status code field 5.
[247] Element 58 is modified and defined in another schema that includes two subelements 62 and 63. Therefore, the schema status code 5 of the element 58 is set to 11 ("Change the schema"). If elements 62 and 63 are defined in the same schema as element 58, then schema status code 5 of elements 62 and 63 is set to zero.
[248] Further optimization may be performed by adding one possible value to the schema status code 5 as follows.
[249] Schema Status ModeValue no change0 freeze the schema10 Change the schema110 Change and freeze the schema111
[250] When the schema status code of an element is set to 110 or 111, the schema of the element changes. If this code is set to 110, the latter subelement contains a schema status code (5). When this code is set to 111, all subelements of the element are encoded using the latter schema.
[251] In the example of FIG. 5, the schema status code of element 58 may be set to 111 to specify that elements 62, 63 are defined in the same schema as element 58. Therefore the schema status code field 5 is not required in elements 62 and 63.
[252] The present invention is characterized by providing a method of encoding a structured document in such a way that the document can be partially decoded even though everything necessary for the schema is not known by the decoder.
[253] The present invention also provides a structured document that ensures forward and backward compatibility, i.e. has a structure defined in at least a first schema that is inaccessible to a decoder, and at least results in a change in a second schema that can access the decoder. Provided is an encoding method that enables the decoder to at least partially decode.
权利要求:
Claims (17)
[1" claim-type="Currently amended] An encoding method having a structure defined in at least a first schema inaccessible to a decoder and allowing the decoder to at least partially decode a structured document resulting from a change in at least a second schema accessible to the decoder Wherein the structured document comprises information elements surrounded by one another, wherein the information elements of the document are associated in at least first and second schemas with respective element types respectively defining respective element structures of the information elements. The first schema may not have access to the decoder and the second schema may have access to the decoder, and the first schema may contain at least one derived information element derived from a corresponding element defined in the second schema. Define,
The encoding method is
Encoding the document into a binary stream comprising a binary sequence encoding the information element for each information element of the document using the first and second schemas;
Inserting a reference to the first schema that defines the structure of the derived element in the binary sequence that encodes the derived information element,
Wherein the reference pointing to the first schema is defined in a schema reference list including references to all schemas used to encode the document and the schema reference list is made accessible to the decoder. Coding method.
[2" claim-type="Currently amended] According to claim 1,
The binary sequence for encoding each element of the document,
A content field containing the encoded value of the element,
And a length field located in front of the content field and including a coded value of the length of the content field.
[3" claim-type="Currently amended] The method of claim 2,
The derived information element is associated in the first schema with a structure type restricted to the structure type of the corresponding information element in the second schema, and the binary sequence encoding the derived elements includes a content field, the content A reference to a field and the first schema and a reference to the structure type of the derived element are attached to the content field and defined in the second schema.
[4" claim-type="Currently amended] The method according to claim 2 or 3,
The derived information element is associated in the first schema with a structure type extended to the structure type of the corresponding information element in the second schema, and the structure type of the derived information element is the corresponding information element defined in the second schema. A second portion having a structure type defined in the first schema and specific to the derived information element with a first portion having a structure type of and wherein the binary sequence includes the derived elements including a content field. The content field,
A field containing a reference pointing to the second schema,
A field containing a structure type reference indicating a structure type of the corresponding element in the second schema;
A field containing the encoded value of the first portion;
A field containing a reference pointing to the first schema,
A field containing a structure type reference indicating the structure type of the second portion;
And a field including the encoded value of the second portion.
[5" claim-type="Currently amended] The method according to any one of claims 1 to 4,
The binary sequence encoding the information element includes a replacement field that includes a replacement flag indicating whether the information element is renamed, and if the replacement flag indicates a change, the element name reference field refers to a reference indicating the new name of the information element. And a schema reference field includes a reference that points to a schema in which the new name reference is defined.
[6" claim-type="Currently amended] The method according to any one of claims 1 to 5,
The binary sequence for encoding one information element in the encoded document includes a schema state mode field, and the schema state mode field includes:
A first state indicating that the information element does not change in a first schema for that element in the second schema,
A second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema,
A third state indicating that the information element in the first schema changes for the corresponding element in the second schema,
The encoded information element contains any schema reference and any other change information when the schema state mode field is in the first state, and any portion of the encoded information element when the schema state mode field is in the second state. And an element does not contain any schema reference and any other change information.
[7" claim-type="Currently amended] The method according to any one of claims 1 to 5,
The binary sequence encoding at least one information element in the encoded document includes a schema state mode field, wherein the schema state mode field is:
A first state indicating that the information element in the first schema is unchanged for that element in the second schema,
A second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema,
A third state indicating that the information element in the first schema changes with respect to the corresponding element in the second schema,
Indicating that the information element in the first schema changes for the corresponding element in the second schema and that no sub-elements of the information element change in the first schema for the corresponding element in the second schema. Includes a fourth state,
When the schema state mode field is in the first state, the encoded information element includes any change information different from any schema reference, and when the schema state mode field is in the second state or the fourth state, Encoding method characterized in that no sub-elements contain schema references and any other change information.
[8" claim-type="Currently amended] The method according to any one of claims 1 to 7,
And the schema reference list including references pointing to all schemas used to encode the structured document is inserted into a header associated with the binary stream encoding the structured document.
[9" claim-type="Currently amended] A decoding method for at least partially decoding a binary stream having a structure defined in at least a first schema inaccessible to a decoder and at least partially encoding a structured document resulting from a change in a second schema inaccessible to the decoder. Wherein the structured document comprises information elements surrounded by one another, wherein the information elements of the document are associated in at least first and second schemas with respective element types respectively defining respective element structures of the information elements. The first schema may not have access to the decoder and the second schema may have access to the decoder, and the first schema may contain at least one derived information element derived from a corresponding element defined in the second schema. Define,
The decoding method,
Sequentially reading a binary stream encoding the structured document using the second schema to detect in the binary stream a binary sequence encoding each information element of the document;
Detecting in each detected binary sequence of encoded information elements a reference pointing to the first schema as defined in a schema reference known by the decoder;
If the reference pointing to the first schema is not detected in the detected binary sequence, decoding the detected binary sequence according to the structure defined in the second schema of the information element encoded with the detected binary sequence; Steps,
If the reference pointing to the first schema is detected in the detected binary sequence, information of the structured document having a structure defined in the first schema is distinguished from the detected reference pointing to the first schema binary data. Encoding an element and skipping said distinguished binary data while sequentially reading said binary stream.
[10" claim-type="Currently amended] The method according to claim 9,
The binary sequence for encoding each element of the document,
A content field containing the encoded value of the element,
And a length field located in front of the content field and including the length coded value used by the decoder to determine the end of the binary sequence that encodes the element.
[11" claim-type="Currently amended] The method of claim 10,
Reading and decoding a length code value in the binary sequence that includes a reference pointing to the first schema;
Determining a length of binary data for skipping as a function of the decoded length value and the position in the binary sequence of the reference pointing to the first schema.
[12" claim-type="Currently amended] The method according to any one of claims 9 to 11,
The derived information element is associated in the first schema with a structure type restricted to the structure type of the corresponding information element in the second schema, and the binary sequence encoding the derived elements includes a content field, the content A reference to a field and the first schema and a reference to the structure type of the derived element are attached to the content field and defined in the second schema.
[13" claim-type="Currently amended] The method according to any one of claims 9 to 12,
The derived information element is associated in the first schema with a structure type extended to the structure type of the corresponding information element in the second schema, and the structure type of the derived information element is the corresponding information element defined in the second schema. A second portion having a structure type defined in the first schema and specific to the derived information element with a first portion having a structure type of and wherein the binary sequence includes the derived elements including a content field. The content field,
A field containing a reference pointing to the second schema,
A field containing a structure type reference indicating a structure type of the corresponding element in the second schema;
A field containing the encoded value of the first portion;
A field containing a reference pointing to the first schema,
A field containing a structure type reference indicating the structure type of the second portion;
And a field including the encoded value of the second portion.
[14" claim-type="Currently amended] The method according to any one of claims 9 to 13,
The derived information element has a name in the first schema that changes in the second schema with respect to the name of the corresponding information element, and the binary sequence encoding the derived element indicates whether the derived information element is renamed. A replacement field including a replacement flag, wherein if the replacement flag indicates a change, a schema reference field includes a reference pointing to the first schema, and an element name reference that refers to the name of the derived information element in the first schema Decoding method comprising a.
[15" claim-type="Currently amended] The method according to any one of claims 9 to 14,
The binary sequence for encoding one information element in the encoded document includes a schema state mode field, and the schema state mode field includes:
A first state indicating that the information element does not change in a first schema for that element in the second schema,
A second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema,
A third state indicating that the information element in the first schema changes for the corresponding element in the second schema,
The coded information element does not contain any schema reference and any other change information when the schema state mode field is in the first state and any of the coded information elements when the schema state mode field is in the second state. And the sub-elements do not contain any schema references and any other change information.
[16" claim-type="Currently amended] The method according to claim 9, wherein
The binary sequence encoding at least one information element in the encoded document includes a schema state mode field, wherein the schema state mode field is:
A first state indicating that the information element in the first schema is unchanged for that element in the second schema,
A second state indicating that no sub-elements of the information element change in the first schema for the corresponding element in the second schema,
A third state indicating that the information element in the first schema changes with respect to the corresponding element in the second schema,
Indicating that the information element in the first schema changes for the corresponding element in the second schema and that no sub-elements of the information element change in the first schema for the corresponding element in the second schema. Includes a fourth state,
When the schema state mode field is in the first state, the encoded information element includes any change information different from any schema reference, and when the schema state mode field is in the second state or the fourth state, Decryption method characterized in that no sub-elements contain schema references and any other change information.
[17" claim-type="Currently amended] The method according to any one of claims 9 to 16,
The schema reference list comprising references pointing to all schemas used to encode the structured document is read in a header associated with the binary stream encoding the structured document.
类似技术:
公开号 | 公开日 | 专利标题
Clark et al.1999|XML path language |
Buswell et al.2004|The open math standard
US8572127B2|2013-10-29|Structure based storage, query, update and transfer of tree-based documents
US7089567B2|2006-08-08|Efficient RPC mechanism using XML
KR100566019B1|2006-03-31|Xml encoding scheme
US7898441B2|2011-03-01|Method and apparatus for character stream transcoding
US7366982B2|2008-04-29|Packages that contain pre-paginated documents
US7418652B2|2008-08-26|Method and apparatus for interleaving parts of a document
US8661332B2|2014-02-25|Method and apparatus for document processing
CN100580661C|2010-01-13|Method and devices for encoding/decoding structured documents, especially XML documents
US7783862B2|2010-08-24|Method and apparatus for an inductive doubling architecture
US7873663B2|2011-01-18|Methods and apparatus for converting a representation of XML and other markup language data to a data structure format
US7231394B2|2007-06-12|Incremental bottom-up construction of data documents
KR101109280B1|2012-01-31|Methods and systems for defining documents with selectable and/or sequenceable parts
US7802180B2|2010-09-21|Techniques for serialization of instances of the XQuery data model
US7487448B2|2009-02-03|Document mark up methods and systems
US5870749A|1999-02-09|Automatic translation between CMIP PDUs and custom data structures
US7739586B2|2010-06-15|Encoding of markup language data
Ferragina et al.2006|Compressing and searching XML data via two zips
US7590644B2|2009-09-15|Method and apparatus of streaming data transformation using code generator and translator
CA2501858C|2012-10-30|Modular document format
US7565452B2|2009-07-21|System for storing and rendering multimedia data
CA2445300C|2007-04-24|Method for compressing/decompressing a structured document
US7392243B2|2008-06-24|Using permanent identifiers in documents for change management
JP4561150B2|2010-10-13|A database model for hierarchical data formats.
同族专利:
公开号 | 公开日
CA2437123C|2007-05-29|
CA2437123A1|2002-08-15|
US6825781B2|2004-11-30|
JP4615827B2|2011-01-19|
JP2004518231A|2004-06-17|
WO2002063775A3|2003-11-27|
WO2002063775A2|2002-08-15|
CN1552126A|2004-12-01|
EP1388211A2|2004-02-11|
KR100737606B1|2007-07-10|
CN100337407C|2007-09-12|
US20040068696A1|2004-04-08|
AU2002253002B2|2005-03-17|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
法律状态:
2001-02-05|Priority to US26590101P
2001-02-05|Priority to US60/265,901
2002-02-04|Application filed by 이엑스피웨이
2002-02-04|Priority to PCT/EP2002/001333
2003-11-05|Publication of KR20030085527A
2007-07-10|Application granted
2007-07-10|Publication of KR100737606B1
优先权:
申请号 | 申请日 | 专利标题
US26590101P| true| 2001-02-05|2001-02-05|
US60/265,901|2001-02-05|
PCT/EP2002/001333|WO2002063775A2|2001-02-05|2002-02-04|Method and system for compressing structured documents|
[返回顶部]