专利摘要:
data distribution method, data storage method, related apparatus, and system. Embodiments of the present invention disclose a data distribution method for improving the performance of a distributed storage system. The method in the embodiments of the present invention includes: receiving, by a data distribution apparatus, a storage instruction from a user, dividing the data to be stored, which the storage instruction instructs to store, in p data segments, to determine a storage node group that corresponds to each data segment, and finally distribute the data segment to a primary node in the corresponding storage node group. The embodiments of the present invention further provide a data storage method, a related apparatus and a system.
公开号:BR102016012739B1
申请号:R102016012739-4
申请日:2016-06-03
公开日:2019-04-24
发明作者:Daohui Wang;Feng Zhang;Wei Fan;Zhile Zhang;Yongqiang Zeng
申请人:Huawei Technologies Co., Ltd.;
IPC主号:
专利说明:

DATA DISTRIBUTION METHOD AND APPARATUS ”
FIELD OF TECHNIQUE [1] The present invention relates to the field of data storage, and in particular to a method of data distribution, a method of data storage, a related apparatus and a system.
BACKGROUND [2] Currently, distributed storage systems increasingly use erase encryption technology (erase encoding, EC) to store data. A principle of erasure coding technology is to divide data into m data blocks, and perform parity coding on the m data blocks using a redundancy algorithm to generate k parity blocks, in which the m data blocks and the k parity blocks constitute an EC range. Each EC range can tolerate a loss of k data blocks or parity blocks.
[3] In erase encoding technology, a data distribution device performs erase encoding on data to generate m data blocks and k parity blocks, and then distributes the m data blocks and k parity blocks to m + k storage nodes. If multiple data distribution devices concurrently read or write data in the same EC range, a problem of a conflict between multiple data distribution devices needs to be solved with the use of a distributed lock.
[4] However, if distributed blocking is used to resolve a conflict between multiple data distribution devices, when an upper tier application distributes IO requests using different data distribution devices, blocking and switching of frequent locks are caused between data distribution devices. Because locking switching is a relatively time-consuming operation, frequent locking switching causes a relatively long read / write delay, and may not respond
Petition 870190015896, of 02/15/2019, p. 15/20
2/51 to a real application requirement.
SUMMARY [5] The modalities of the present invention provide a method of data distribution to improve the read / write performance of a distributed storage system.
[6] A first aspect of the modalities of the present invention provides a data distribution method applied to a distributed storage system, in which the distributed storage system stores data using the EC erasure coding strips, each stripe being of EC includes a data part and a parity part, where the data part of each EC band includes m data blocks, the par part of each EC band includes k parity blocks which are obtained after coding of parity to be performed on the m data blocks, with the distributed storage system including multiple storage nodes, where the multiple storage nodes constitute multiple groups of storage nodes, an amount of storage nodes included in each group of nodes of storage is not less than m + k, and a primary storage node is specified in each group of storage nodes, where both mo k are positive integers; and the method includes:
receiving a storage instruction, in which the storage instruction carries the data to be stored;
divide the data to be stored in P data segments, where each data segment corresponds to a range of EC, a size of each data segment is not greater than Z, where Z is a size of m blocks data and P is a positive integer;
determine a group of storage nodes that correspond to each data segment; and distributing the data segment to a primary storage node in the given storage node group that corresponds to the data segment.
[7] With reference to the first aspect of the modalities of the present invention, in a first form of implantation of the first aspect
Petition 870160025486, of 6/3/2016, p. 11/104
3/51 of the embodiments of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the division of the data to be stored in P data segments includes:
divide the data to be stored in the P data segments according to logical addresses of the data to be stored, where each data segment is located within one of the logical partitions.
[8] With reference to the first way of implementing the first aspect of the modalities of the present invention, in a second way of implementing the first aspect of the modalities of the present invention, an initial address of a first data segment in the P data segments is a starting address of the data to be stored, and starting address of a pth data segment in the P data segments is a starting address of a logical partition within which the th data segment is located, where * 2 <p < P.
[9] With reference to the first form of implantation of the first aspect of the modalities of the present invention, a third form of implantation of the first aspect of the modalities of the present invention additionally includes:
predefine and record a match between each logical partition and each group of storage nodes; and determining a group of storage nodes that corresponds to each data segment includes:
determine, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes that corresponds to a logical partition within which the data segment is located.
[10] With reference to the third way of implementing the first aspect of the modalities of the present invention, in a fourth way of implementing the first aspect of the modalities of the present invention, each logical partition corresponds exclusively to a key value, and the default and recording a match between each logical partition and
Petition 870160025486, of 6/3/2016, p. 11/114
4/51 each group of storage nodes includes:
predefine and register a key value that corresponds to each group of storage nodes; and determining, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes that corresponds to a logical partition within which the data segment is located, includes:
determining a key value of the data segment according to the logical partition within which the data segment is located; and determining, according to the key value of the data segment, the group of storage nodes that corresponds to the data segment.
[11] A second aspect of the modalities of the present invention provides a data storage method applied to a distributed storage system, in which the distributed storage system stores data using the EC erase coding strips, each stripe being of EC includes a data part and a parity part, where the data part of each EC band includes m data blocks, the par part of each EC band includes k parity blocks which are obtained after coding of parity to be performed on the m data blocks, the distributed storage system including multiple storage nodes, where the multiple storage nodes constitute multiple groups of storage nodes, an amount of storage nodes included in each group of nodes of storage is not less than m + k, and a primary storage node is specified in each group of storage nodes, where both m and k are positive integers; and a primary storage node in any group of storage nodes in the multiple groups of storage nodes performs the data storage method, and the method includes:
receiving a first data segment, where a size of the first data segment is not greater than Z, where Z is a size of m data blocks;
perform erase coding according to the first
Petition 870160025486, of 6/3/2016, p. 11/124
5/51 data segment to obtain a first EC range, where the first EC range includes the first m data blocks and the first k parity blocks; and distribute the first EC range to m + k storage nodes to perform storage, where each storage node on the m + k storage nodes is responsible for storing any of the first m data blocks or the first k parity blocks of the first EC range.
[12] With reference to the second aspect of the modalities of the present invention, in a first way of implementing the second aspect of the modalities of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, where each of the logical partitions has a size of Z and they do not overlap, and the first data segment is located within one of the logical partitions.
[13] With reference to the first form of implantation of the second aspect of the modalities of the present invention, in a second form of implantation of the second aspect of the modalities of the present invention, before the distribution of the first EC strip to the m + k storage nodes perform storage, the method additionally includes:
receiving a second data segment, where the second data segment and the first data segment are located within the same logical partition, and the logical addresses of the second data segment overlap the logical addresses of the first data segment; and performing erasure coding according to the second data segment to obtain a second EC strip, wherein the second EC strip includes m second data blocks and k second parity blocks; and the distribution of the first EC strip for the m + k storage nodes to perform storage includes specifically: determining a serial distribution sequence of the first EC strip and the second EC strip, and distributing the first EC strip and the second strip of EC for the m + k storage nodes, in series, according to the serial distribution sequence.
Petition 870160025486, of 6/3/2016, p. 11/134
6/51 [14] With reference to the second form of implantation of the second aspect of the modalities of the present invention, in a third form of implantation of the second aspect of the modalities of the present invention, the determination of a serial distribution sequence of the first EC band and the second EC strip, and the distribution of the first EC strip and the second EC strip to the m + k storage nodes, in series, according to the serial distribution sequence to perform storage includes:
in case a reception time of the first data segment is prior to a reception time of the second data segment, first distribute the first EC strip to the m + k storage nodes to perform storage, and, after receiving a message of response that indicates that the m + k storage nodes successfully store the first EC range, distribute the second EC range for the m + k storage nodes to perform storage; or if a reception time for the first data segment is later than a reception time for the second data segment, first distribute the second EC strip to the m + k storage nodes to perform storage, and, after receiving a response message which indicates that the m + k storage nodes successfully store the second range of EC, distribute the first range of EC to the m + k storage nodes to perform storage.
[15] With reference to the first form of implantation of the second aspect of the modalities of the present invention, in a fourth form of implantation of the second aspect of the modalities of the present invention, before performing erasure coding according to the first data segment for To obtain a first EC range, the method additionally includes:
receiving a third data segment, where the third data segment and the first data segment are located within the same logical partition, and the logical addresses of the third data segment do not overlap the logical addresses of the first data segment; and combine the third data segment into the first data segment, and trigger the step of performing encoding by erasure accordingly
Petition 870160025486, of 6/3/2016, p. 11/144
7/51 with the first data segment to obtain a first EC range.
[16] With reference to the second aspect or any one of the first to the fourth forms of implantation of the second aspect of the modalities of the present invention, in a fifth form of implantation of the second aspect of the modalities of the present invention, perform erasure coding according with the first data segment to obtain a first EC range, where the first EC range includes the first m data blocks and the first k parity blocks, includes:
when the size of the first data segment is equal to Z, divide the first data segment into first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks; or when the size of the first data segment is less than Z, use stored data to complement the first data segment so that the size is equal to Z, then divide the first complemented data segment into the first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks.
[17] A third aspect of the modalities of the present invention provides a data distribution device applied to a distributed storage system, in which the distributed storage system stores data using the EC erase coding strips, each stripe being of EC includes a data part and a parity part, where the data part of each EC band includes m data blocks, the par part of each EC band includes k parity blocks which are obtained after coding of parity to be performed on the m data blocks, the distributed storage system including multiple storage nodes, where the multiple storage nodes constitute multiple groups of storage nodes, an amount of storage nodes included in each group of nodes of storage is not less than m + k, and a primary storage node is specified in each group of storage nodes, where both the omc such as k are positive integers; and the data distribution device includes:
Petition 870160025486, of 6/3/2016, p. 11/154
8/51 an instruction receiving module, configured to receive a storage instruction, wherein the storage instruction carries the data to be stored;
a data segment division module, configured to divide the data to be stored in P data segments, where each data segment corresponds to a range of EC, one size of each data segment is not greater than Z, where Z is a size of m data blocks and P is a positive integer;
a node group determination module, configured to determine a storage node group that corresponds to each data segment; and a data segment distribution module, configured to distribute the data segment to a primary storage node in the given storage node group that corresponds to the data segment.
[18] With reference to the third aspect of the modalities of the present invention, in a first way of implanting the third aspect of the modalities of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the data segment division module is specifically configured to:
divide the data to be stored in the P data segments according to the logical addresses of the data to be stored, where each data segment is located within one of the logical partitions.
[19] With reference to the first way of deploying the third aspect of the modalities of the present invention, in a second way of deploying the third aspect of the modalities of the present invention, an initial address of a first data segment in the P data segments is a starting address of the data to be stored, and starting address of a pth data segment in the P data segments is a starting address of a logical partition within which the th data segment is located, where * 2 <p < P.
Petition 870160025486, of 6/3/2016, p. 11/164
9/51 [20] With reference to the first form of implantation of the third aspect of the modalities of the present invention, in a third form of implantation of the third aspect of the modalities of the present invention, the data distribution apparatus additionally includes:
a matching module, configured to predefine and record a match between each logical partition and each group of storage nodes; and the node group determination module is specifically configured to:
determine, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes that corresponds to a logical partition within which the data segment is located.
[21] With reference to the third way of implementing the third aspect of the modalities of the present invention, in a fourth way of implementing the third aspect of the modalities of the present invention, each logical partition corresponds, exclusively, to a key value, and the module correspondence is specifically configured to:
predefine and register a key value that corresponds to each group of storage nodes; and the node group determination module is specifically configured to:
determining a key value of the data segment according to the logical partition within which the data segment is located; and determining, according to the key value of the data segment, the group of storage nodes that corresponds to the data segment.
[22] A fourth aspect of the modalities of the present invention provides a data storage device, applied to a distributed storage system, in which the distributed storage system stores data using the EC erase coding ranges, each of which EC range includes a data part and a parity part, where the data part of each EC range includes m data blocks, the par part of each EC range includes k parity blocks that
Petition 870160025486, of 6/3/2016, p. 11/174
10/51 are obtained after parity coding is performed on the m data blocks, and the distributed storage system includes multiple storage nodes, where multiple storage nodes constitute multiple groups of storage nodes, a number of nodes of storage included in each group of storage nodes is not less than m + k, and a primary storage node is specified in each group of storage nodes, where both om and ok are positive integers; and the data storage device is deployed to a primary storage node in any group of storage nodes in the multiple groups of storage nodes, and the data storage device includes:
a data receiving module, configured to receive a first data segment, where a size of the first data segment is not greater than Z, where Z is a size of m data blocks;
a data coding module, configured to perform erase coding according to the first data segment to obtain a first EC strip, where the first EC strip includes the first m data blocks and the first k parity blocks ; and a data distribution module, configured to distribute the first EC range for m + k storage nodes to perform storage, where each storage node on the m + k storage nodes is responsible for storing any of the first m blocks of data or the first k parity blocks of the first EC range.
[23] With reference to the fourth aspect of the modalities of the present invention, in a first way of implementing the fourth aspect of the modalities of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, in which each of the logical partitions has a size of Z and they do not overlap, and the first data segment is located within one of the logical partitions.
[24] With reference to the first form of implantation of the fourth aspect of the modalities of the present invention, in a second form of
Petition 870160025486, of 6/3/2016, p. 11/184
11/51 implementation of the fourth aspect of the modalities of the present invention, the data reception module is additionally configured to receive a second data segment, in which the second data segment and the first data segment are located within the same logical partition, and the logical addresses of the second data segment overlap the logical addresses of the first data segment;
the data coding module is additionally configured to perform erasure coding according to the second data segment to obtain a second EC strip, wherein the second EC strip includes m second data blocks and k second parity blocks ; and the data distribution module is specifically configured to determine a serial distribution sequence from the first EC range and the second EC range, and distribute the first EC range and the second EC range to the m + k nodes in series, according to the serial distribution sequence.
[25] With reference to the second way of implementing the fourth aspect of the modalities of the present invention, in a third way of implementing the fourth aspect of the modalities of the present invention, the data distribution module is configured, specifically, for:
if the reception time of the first data segment is earlier than the reception time of the second data segment, first distribute the first EC band to the m + k storage nodes to perform storage, and after receiving a response message which indicates that the m + k storage nodes successfully store the first range of EC, distribute the second range of EC for the m + k storage nodes to perform storage; or if a reception time for the first data segment is later than a reception time for the second data segment, first distribute the second EC strip to the m + k storage nodes to perform storage, and, after receiving a response message which indicates that the m + k storage nodes successfully store the second range of EC, distribute the first range of EC to the m + k storage nodes to perform storage.
Petition 870160025486, of 6/3/2016, p. 11/194
12/51 [26] With reference to the first way of implementing the fourth aspect of the modalities of the present invention, in a fourth way of implementing the fourth aspect of the modalities of the present invention, the data reception module is additionally configured for:
receiving a third data segment, where the third data segment and the first data segment are located within the same logical partition, and the logical addresses of the third data segment do not overlap the logical addresses of the first data segment; and the data encoding module is additionally configured to: after the data receiving module determines that the third data segment has been received, combine the third data segment into the first data segment, and trigger the step of performing the erase coding according to the first data segment to obtain a first EC range.
[27] With reference to the fourth aspect or any one of the first to the fourth forms of implantation of the fourth aspect of the modalities of the present invention, in a fifth form of implantation of the fourth aspect of the modalities of the present invention, the data coding module is additionally configured to:
when the size of the first data segment is equal to Z, divide the first data segment into first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks; or when the size of the first data segment is less than Z, use the stored data to complement the first data segment so that the size is equal to Z, then divide the first complemented data segment into the first m data blocks , and perform parity coding on the first m data blocks to obtain the first k parity blocks.
[28] A fifth aspect of the modalities of the present invention provides a data distribution apparatus, in which the data distribution apparatus includes an input device, an output device, a processor and a memory, in which, invoking itself an operating instruction stored in memory, the processor is configured to:
Petition 870160025486, of 6/3/2016, p. 11/20
13/51 receive a storage instruction, in which the storage instruction carries the data to be stored;
divide the data to be stored in P data segments, where each data segment corresponds to a range of EC, a size of each data segment is not greater than Z, where Z is a size of m blocks data and P is a positive integer;
determine a group of storage nodes that correspond to each data segment; and distributing the data segment to a primary storage node in the given storage node group that corresponds to the data segment.
[29] With reference to the fifth aspect of the modalities of the present invention, in a first way of implementing the fifth aspect of the modalities of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the processor is additionally configured to:
divide the data to be stored in the P data segments according to the logical addresses of the data to be stored, where each data segment is located within one of the logical partitions.
[30] With reference to the first way of deploying the fifth aspect of the modalities of the present invention, in a second way of deploying the fifth aspect of the modalities of the present invention, an initial address of a first data segment in the P data segments is a starting address of the data to be stored, and starting address of a pth data segment in the P data segments is a starting address of a logical partition within which the th data segment is located, where * 2 <p < P.
[31] With reference to the first form of implantation of the fifth aspect of the modalities of the present invention, in a third form of implantation of the fifth aspect of the modalities of the present invention, the processor is additionally configured to:
preset and record a match between each partition
Petition 870160025486, of 6/3/2016, p. 11/214
14/51 logic and each group of storage nodes; and determining, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes that corresponds to a logical partition within which the data segment is located.
[32] With reference to the third way of implementing the fifth aspect of the modalities of the present invention, in a fourth way of implementing the fifth aspect of the modalities of the present invention, each logical partition corresponds exclusively to a key value, and the processor is additionally configured to:
predefine and register a key value that corresponds to each group of storage nodes;
determining a key value of the data segment according to the logical partition within which the data segment is located; and determining, according to the key value of the data segment, the group of storage nodes that corresponds to the data segment.
[33] A sixth aspect of the modalities of the present invention provides a data storage device, wherein the data storage device includes an input device, an output device, a processor, and a memory, on which to invoke an operating instruction stored in memory, the processor is configured to:
receiving a first data segment, where a size of the first data segment is not greater than Z, where Z is a size of m data blocks;
perform erasure coding according to the first data segment to obtain a first EC strip, wherein the first EC strip includes the first m data blocks and the first k parity blocks; and distribute the first EC range to m + k storage nodes to perform storage, where each storage node on the m + k storage nodes is responsible for storing any of the first m data blocks or the first k parity blocks of the first EC range.
Petition 870160025486, of 6/3/2016, p. 11/22
15/51 [34] With reference to the sixth aspect of the modalities of the present invention, in a first way of implanting the sixth aspect of the modalities of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, each of the logical partitions it has a size of Z and they do not overlap, and the first data segment is located within one of the logical partitions.
[35] With reference to the first form of implantation of the sixth aspect of the modalities of the present invention, in a second form of implantation of the sixth aspect of the modalities of the present invention, the processor is additionally configured to:
receiving a second data segment, where the second data segment and the first data segment are located within the same logical partition, and the logical addresses of the second data segment overlap the logical addresses of the first data segment;
performing erasure coding according to the second data segment to obtain a second EC strip, wherein the second EC strip includes m second data blocks and k second parity blocks; and determine a serial distribution sequence of the first EC strip and the second EC strip, and distribute the first EC strip and the second EC strip to the m + k storage nodes, in series, according to the sequence of serial distribution.
[36] With reference to the second way of implementing the sixth aspect of the modalities of the present invention, in a third way of implementing the sixth aspect of the modalities of the present invention, the processor is additionally configured to:
if the reception time of the first data segment is earlier than the reception time of the second data segment, first distribute the first EC band to the m + k storage nodes to perform storage, and after receiving a response message which indicates that the m + k storage nodes successfully store the first range of EC, distribute the second range of EC for the m + k storage nodes to perform storage; or
Petition 870160025486, of 6/3/2016, p. 11/23
16/51 if a reception time for the first data segment is later than a reception time for the second data segment, first distribute the second EC strip to the m + k storage nodes to perform storage, and after receiving a message of answer that indicates that the m + k storage nodes successfully store the second range of EC, distribute the first range of EC to the m + k storage nodes to perform storage.
[37] With reference to the first form of implantation of the sixth aspect of the modalities of the present invention, in a fourth form of implantation of the sixth aspect of the modalities of the present invention, the processor is additionally configured to:
receiving a third data segment, where the third data segment and the first data segment are located within the same logical partition, and the logical addresses of the third data segment do not overlap the logical addresses of the first data segment; and combining the third data segment into the first data segment, and triggering the step of performing erasure coding according to the first data segment to obtain a first EC range.
[38] With reference to the sixth aspect or any of the first to the fourth form of implantations of the sixth aspect of the modalities of the present invention, in a fifth form of implantation of the sixth aspect of the modalities of the present invention, the processor is additionally configured for :
when the size of the first data segment is equal to Z, divide the first data segment into first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks; or when the size of the first data segment is less than Z, use stored data to complement the first data segment so that the size is equal to Z, then divide the first complemented data segment into the first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks.
[39] A seventh aspect of the modalities of the present invention
Petition 870160025486, of 6/3/2016, p. 11/24
17/51 provides computer-readable media, which includes a computer-executable instruction, in which when a computer's processor executes the computer-executable instruction, the computer performs the following method:
receiving a storage instruction, in which the storage instruction carries the data to be stored;
divide the data to be stored in P data segments, where each data segment corresponds to a range of EC, a size of each data segment is not greater than Z, where Z is a size of m blocks data and P is a positive integer;
determine a group of storage nodes that correspond to each data segment; and distributing the data segment to a primary storage node in the given storage node group that corresponds to the data segment.
[40] With reference to the seventh aspect of the modalities of the present invention, in a first way of implanting the seventh aspect of the modalities of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the computer additionally performs the following method: dividing the data to be stored in the P data segments according to logical addresses of the data to be stored, where each data segment is located within one of the logical partitions.
[41] With reference to the first way of deploying the seventh aspect of the modalities of the present invention, in a second way of deploying the seventh aspect of the modalities of the present invention, a starting address of a first data segment in the P data segments is a starting address of the data to be stored, and starting address of a pth data segment in the P data segments is a starting address of a logical partition within which the th data segment is located, where * 2 <p < P.
[42] With reference to the first form of implantation of the seventh
Petition 870160025486, of 6/3/2016, p. 11/25
18/51 aspect of the modalities of the present invention, in a third way of implanting the seventh aspect of the modalities of the present invention, the computer additionally performs the following method:
predefine and record a match between each logical partition and each group of storage nodes; and determining, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes that corresponds to a logical partition within which the data segment is located.
[43] With reference to the third way of deploying the seventh aspect of the modalities of the present invention, in a fourth way of deploying the seventh aspect of the modalities of the present invention, each logical partition corresponds exclusively to a key value, and the computer additionally performs the following method:
predefine and register a key value that corresponds to each group of storage nodes;
determining a key value of the data segment according to the logical partition within which the data segment is located; and determining, according to the key value of the data segment, the group of storage nodes that corresponds to the data segment.
[44] An eighth aspect of the modalities of the present invention provides a computer-readable medium, which includes a computer-executable instruction, in which, when a computer processor executes the computer-executable instruction, the computer performs the following method:
receiving a first data segment, where a size of the first data segment is not greater than Z, where Z is a size of m data blocks;
perform erasure coding according to the first data segment to obtain a first EC strip, wherein the first EC strip includes the first m data blocks and the first k parity blocks; and distribute the first EC range to m + k storage nodes
Petition 870160025486, of 6/3/2016, p. 11/26
19/51 perform storage, where each storage node in the m + k storage nodes is responsible for storing any of the first m data blocks or the first k parity blocks of the first EC range.
[45] With reference to the eighth aspect of the modalities of the present invention, in a first way of implementing the eighth aspect of the modalities of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, where each logical partition has a size of Z and they do not overlap, and the first data segment is located within one of the logical partitions.
[46] With reference to the first form of implantation of the eighth aspect of the modalities of the present invention, in a second form of implantation of the eighth aspect of the modalities of the present invention, the computer additionally performs the following method:
receiving a second data segment, where the second data segment and the first data segment are located within the same logical partition, and the logical addresses of the second data segment overlap the logical addresses of the first data segment;
performing erasure coding according to the second data segment to obtain a second EC strip, wherein the second EC strip includes m second data blocks and k second parity blocks; and determine a serial distribution sequence of the first EC strip and the second EC strip, and distribute the first EC strip and the second EC strip to the m + k storage nodes, in series, according to the sequence of serial distribution.
[47] With reference to the second way of implanting the eighth aspect of the modalities of the present invention, in a third way of implanting the eighth aspect of the modalities of the present invention, the computer additionally performs the following method:
if a reception time for the first data segment is earlier than a reception time for the second data segment, first distribute the first EC range to the m + k storage nodes
Petition 870160025486, of 6/3/2016, p. 11/274
20/51 perform storage, and after receiving a response message that indicates that the m + k storage nodes successfully store the first EC range, distribute the second EC range to the m + k storage nodes perform storage; or if a reception time for the first data segment is later than a reception time for the second data segment, first distribute the second EC strip to the m + k storage nodes to perform storage, and after receiving a response message that indicates that the m + k storage nodes successfully store the second EC range, distribute the first EC range to the m + k storage nodes to perform storage.
[48] With reference to the first form of implantation of the eighth aspect of the modalities of the present invention, in a fourth form of implantation of the eighth aspect of the modalities of the present invention, the computer additionally performs the following method:
receiving a third data segment, where the third data segment and the first data segment are located within the same logical partition, and the logical addresses of the third data segment do not overlap the logical addresses of the first data segment; and combining the third data segment into the first data segment, and triggering the step of performing erasure coding according to the first data segment to obtain a first EC range.
[49] With reference to the eighth aspect or any of the first to the fourth form of implantation of the eighth aspect of the modalities of the present invention, in a fifth form of implantation of the eighth aspect of the modalities of the present invention, the computer additionally performs the following method:
when the size of the first data segment is equal to Z, divide the first data segment into first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks; or when the size of the first data segment is less than Z, use stored data to complement the first data segment so that the size is equal to Z, then divide the first data segment
Petition 870160025486, of 6/3/2016, p. 11/28
21/51 complemented data in the first m data blocks, and perform parity coding in the first m data blocks to obtain the first k parity blocks.
[50] A ninth aspect of the modalities of the present invention provides a distributed storage system, which includes multiple storage nodes, in which the distributed storage system stores data using the EC erase coding strips, each stripe being EC includes a data part and a parity part, where the data part of each EC band includes m data blocks, the par part of each EC band includes k parity blocks that are obtained after encoding parity is performed on m data blocks, with multiple storage nodes constituting multiple groups of storage nodes, in which the number of storage nodes included in each group of storage nodes is not less than m + k, and a primary storage node is specified in each group of storage nodes, where both om and ok are positive integers; and the storage system additionally includes the data storage device provided by the fourth aspect or any of the first to fifth ways of implanting the fourth aspect of the embodiments of the present invention, or the data storage device provided by the sixth aspect or any of the first to fifth forms of implantation of the sixth aspect of the modalities of the present invention, or the computer-readable media provided by the eighth aspect or any of the first to fifth forms of implantation of the eighth aspect of the modalities of the present invention.
[51] In the data distribution method provided by the modalities of the present invention, a data distribution device determines data to be stored, divides the data to be stored, according to a logical partition, to obtain P data segments, determines a node group that corresponds to each data segment and distributes the data segment to a primary node in the corresponding node group. In the embodiments of the present invention, the data distribution apparatus divides the data to be stored in predefined sizes and then distributes the data to primary nodes in groups of nodes
Petition 870160025486, of 6/3/2016, p. 11/294
22/51 of a distributed storage system, and the primary nodes perform erase coding; the data distribution device itself does not perform data erasure coding. Thus, a problem in which multiple data distribution devices concurrently read or write data in the same EC range does not exist, and there is no conflict between data distribution devices. Therefore, in the embodiments of the present invention, a distributed block does not need to be used between data distribution devices to resolve the read / write conflict between data distribution devices, and a read / write delay is prevented from being caused by blockade confrontation; the read / write performance of the distributed storage system is improved, and an actual application requirement can be met.
BRIEF DESCRIPTION OF THE DRAWINGS [52] Figure 1 (a) is a structural diagram of an EC band;
[53] Figure 1 (b) is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;
[54] Figure 2 is a flow chart of an embodiment of a data distribution method according to an embodiment of the present invention;
[55] Figure 3 is a flow chart of an embodiment of a data storage method according to an embodiment of the present invention;
[56] Figure 4 is a structural diagram of an embodiment of a data distribution apparatus according to an embodiment of the present invention;
[57] Figure 5 is a structural diagram of an embodiment of a data storage apparatus according to an embodiment of the present invention;
[58] Figure 6 is a structural diagram of another embodiment of a data distribution apparatus according to one embodiment of the present invention; and [59] Figure 7 is a structural diagram of another modality
Petition 870160025486, of 6/3/2016, p. 11/30
23/51 of a data storage device according to an embodiment of the present invention.
DESCRIPTION OF THE MODALITIES [60] The modalities of the present invention provide a method of data distribution to improve the performance of a distributed storage system. The embodiments of the present invention further provide a related data storage method, a related apparatus and a system, which are described separately hereinafter.
[61] The distributed storage system includes multiple storage nodes. An upper layer data distribution device divides data to be stored in multiple data blocks, and then distributes the data blocks to multiple storage nodes of the distributed storage system for storage. Distributed data storage can reduce the risk of data loss and improve the reliability of the distributed storage system. The data distribution device can be a client, a server or another device.
[62] A principle of an EC technology is to divide the data to be stored in m data blocks, and to perform parity coding in the m data blocks using a redundancy algorithm to generate k parity blocks, where the m data blocks and the k parity blocks constitute an EC range. For a basic structure of the EC range, see Figure 1 (a). Each data block or parity block can also be called an EC block. Each EC range can tolerate a loss of k EC blocks. When the distributed storage system uses EC technology to perform data storage, the data distribution device generates multiple EC ranges according to data to be stored, and distributes m + k EC blocks from each EC range to m + k storage nodes of the distributed storage system perform storage. Thus, when a node in the distributed storage system is defective, an EC block stored in the defective node can be recovered according to EC blocks in non-defective nodes. Unless more than k storage nodes in the m + k nodes
Petition 870160025486, of 6/3/2016, p. 11/314
24/51 of storage that store an EC range is defective, the EC range data can be read or written by the data distribution device completely. Therefore, the distributed storage system that uses EC technology to store data is highly reliable.
[63] An EC track stored in the distributed storage system can be read or overwritten by the data distribution device. It should be noted that, when reading an EC strip, the data distribution device only needs to read, from the EC strip, a data block to store data. However, when an EC range is overwritten, for example, when an i th data block in m data blocks of an EC range is overwritten, other data blocks in the EC range do not need to be changed, but it is necessary regenerate the parity blocks of the EC range.
[64] If multiple data distribution devices concurrently read or write data on an EC track, a read / write conflict can be caused, and a problem with a conflict between multiple data distribution devices needs to be resolved with the use of a distributed lock. However, if distributed blocking is used to resolve a conflict between multiple data distribution devices, when an upper layer application distributes IO requests using different data distribution devices, blocking confrontation and switching frequent blockages are caused between data distribution devices. Because lock switching is a relatively time-consuming operation, frequent lock switching causes a relatively long read / write delay, and may not meet an actual application requirement.
[65] The modalities of the present invention are dedicated to providing a reliable data distribution method and data storage method to solve the above problem. First, the storage nodes in the distributed storage system are grouped in the embodiments of the present invention. Specifically, storage nodes are grouped into multiple groups of storage nodes, where the number of storage nodes included in each group of storage nodes is not less than m + k, and one node
Petition 870160025486, of 6/3/2016, p. 11/32
25/51 primary storage (hereafter abbreviated as a primary node) is specified in each group of storage nodes. The primary storage node is responsible for interacting with the data distribution device, and is responsible, in addition, for controlling a node in a group of storage nodes to which the primary storage node belongs, to perform data storage. Other storage nodes in the storage node group are secondary storage nodes (hereinafter abbreviated as a secondary node). Both m and k are positive integers. Groups of different storage nodes can have duplicate storage nodes. For example, a primary node in a group of storage nodes 1 can be a secondary node in a group of storage nodes 2. Figure 1 (b) is a schematic structural diagram of a storage system distributed according to a modality of the present invention. Figure 1 (b) is used to indicate only one composition structure of the distributed storage system, and there may also be other storage node structures in the distributed storage system.
[66] A storage node group can be maintained in multiple ways, for example, an additional node group management module can be arranged. The node group management module can be implanted in a data distribution device, or it can be implanted in a storage node, which is not limited in the modalities of the present invention. The node group management module is responsible for maintaining each storage node group. Specifically, the node group management module may be responsible for collecting information about the storage nodes included in each storage node group and determining a primary node, and monitoring a situation for each storage node in real time using of a heartbeat connection. When a storage node situation changes or a primary node in a node group changes, the node group management module needs to update the collected information. The node group management module notifies the information collected for each data distribution device and storage node for registration, so that each data distribution device
Petition 870160025486, of 6/3/2016, p. 11/33
26/51 and storage node can learn information about each group of nodes. The information collected by the node group management module can be represented in a simple way in the form of a Table (1):
TABLE (1)
Node group ID At the Node identity Node situation Knot group 1 THE Primary node Normal B Secondary node Normal Ç Secondary node Defective
[67] Table (1) is just a way of representing the information collected by the node group management module. The node group management module can use other ways to represent the information collected, which is not limited in the modalities of the present invention.
[68] In one embodiment of the present invention, a data distribution apparatus divides the data to be stored into multiple data segments, and distributes each data segment to a primary node in a corresponding group of nodes. For a procedure, see Figure 2.
[69] 201. Receive a storage instruction.
[70] A data distribution device receives a storage instruction from a user, where the storage instruction is used to instruct to store data to be stored.
[71] 202. Divide data to be stored in P data segments.
[72] The data distribution device divides the data to be stored in the P data segments, where each data segment corresponds to a range of EC, one size of each data segment is not greater than Z, and Z it is a size of m data blocks.
[73] 203. Determine a group of storage nodes that
Petition 870160025486, of 6/3/2016, p. 11/34
27/51 corresponds to each data segment.
[74] The data distribution device determines the group of storage nodes that corresponds to each data segment. There are multiple methods of determination, which are described in detail in subsequent modalities, and are not limited in this document.
[75] 204. Distribute the data segment to a primary storage node in the corresponding storage node group.
[76] After determining the group of storage nodes that corresponds to each data segment, the data distribution device distributes each data segment to the primary node in the group of storage nodes that corresponds to the data segment.
[77] This modality provides a method of data distribution, in which a data distribution device receives a storage instruction from a user, divides the data to be stored, which the storage instruction instructs to store, in P segments of data, determines a group of storage nodes that corresponds to each data segment, and finally distributes the data segment to a primary node in the corresponding storage node group. In this modality, due to the fact that the data distribution device directly distributes a segment of data obtained after the division to a storage node, and no longer performs erasure coding on the data to be stored, there is no need to use a distributed lock to solve a conflict problem, frequent lock switching is not triggered and the performance of a distributed storage system is improved.
[78] In step 201, the data distribution device analyzes the storage instruction, and can obtain a start address, a size or other information from the data to be stored. In step 202, there are multiple methods for dividing to obtain data segments by the data distribution apparatus. For example, a logical volume (Logical Volume) of the distributed storage system can include multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap. When performing division to obtain
Petition 870160025486, of 6/3/2016, p. 11/35
28/51 data segments, the data distribution device can perform division to obtain, according to the logical addresses of the data to be stored, data that are located within a logical partition as a data segment. For example, if Z is 4 M, the logical volume of the distributed storage system can be divided into multiple logical partitions, where the logical addresses of a first logical partition are 0 M to 4 M, the logical addresses of a second logical partition are 4 M to 8 M, the logical addresses of a third logical partition are 8 M to 12 M ... Assuming that the logical addresses of the data to be stored are 5 M to 12 M, due to the fact that the 5 M to 8 M are within a range of the logical addresses of the second logical partition and 8 M to 12 M are within a range of the logical addresses of the third logical partition, the data distribution device divides the data to be stored 5 M to 12 M in two data segments, where the logical addresses of the two data segments are 5 M to 8 M and 8 M to 12 M, respectively.
[79] From the example in the above paragraph, it can be concluded that, in the P data segments obtained by dividing the data to be stored, a starting address of the first data segment is a starting address of the data to be stored, and a starting address of a p th data segment is a starting address of a logical partition within which the p th data segment is located, where 2 <p <P.
[80] If the logical volume of the distributed storage system is divided into multiple logical partitions, a correspondence between each logical partition and each group of storage nodes can be predefined. Thus, in step 203, the data distribution device can determine that a group of storage nodes that corresponds to a logical partition within which each data segment is located is a group of storage nodes that corresponds to the data segment . More specifically, a unique key value can be defined for each logical partition of the distributed storage system, and the data distribution device predefines and records a key value that corresponds to each group of storage nodes. In this way, the data distribution device can determine that a key value that corresponds to the partition
Petition 870160025486, of 6/3/2016, p. 36/114
29/51 logic within which each data segment is located is a key value that corresponds to the data segment, and then determine, according to the correspondence between a key value of each data segment and a group of storage nodes, the group of storage nodes that corresponds to each data segment.
[81] It can be understood that if multiple upper layer data distribution devices in the distributed storage system all use the method from step 201 to step 204 to perform data distribution, even if the data segments are distributed by multiple data distribution devices, as long as the distributed data segments are located within the same logical partition of the distributed storage system, the data segments are distributed to a primary node of the same group of nodes.
[82] After receiving a data segment distributed by the data distribution device, the primary node performs erase coding on the received data segment to obtain an EC range, and allocates m + k EC blocks in the EC range to the m + k storage nodes in the node group to which the primary node belongs, to perform storage. In the modality shown in Figure 2, the data distribution device distributes P data segments in total, and therefore needs to distribute P data segments to primary nodes of P storage node groups. Referring to Figure 3, a primary node in a storage node group is used below as an example for the description.
[83] 301. Receive a first segment of data.
[84] A primary node receives a first data segment distributed by a data distribution device, where the first data segment is a data segment in P data segments distributed by the data distribution device. A size of the first data segment is not greater than Z.
[85] Particularly, when the data distribution device divides to obtain data segments according to a logical partition of a distributed storage system, if the first data segment is the first or the last data segment in the P
Petition 870160025486, of 6/3/2016, p. 37/114
30/51 data segments, the size of the first data segment can be smaller than Z; if the first data segment is any data segment from the second to (P-1) th data segment in the P data segments, the size of the first data segment must be equal to Z.
[86] 302. Perform erasure coding according to the first data segment to obtain a first EC range.
[87] The primary node performs erasure coding according to the first data segment to obtain the first EC range. The first EC range includes m + k first EC blocks, which are separately the first m data blocks and the first k parity blocks.
[88] 303. Distribute the first m + k blocks of EC, respectively, to the m + k nodes in a group of nodes to which a primary node belongs, to perform storage.
[89] The primary node distributes the first m data blocks and the first k parity blocks of the first EC range, respectively, to the m + k storage nodes in the storage node group to which the primary node belongs, for execute storage, where each storage node in the m + k storage nodes is responsible for storing a first block of EC. The primary node itself can store an EC block (which can be a first data block or it can be a first parity block), and then send the first remaining m + k-1 EC blocks to m + k-1 node secondary, for storage. Each secondary node stores a first EC block. The primary node may also not store the first EC block, and directly send the first m + k EC blocks to the secondary m + k nodes for storage, which is not limited in this mode.
[90] The modality shown in Figure 2 provides a method for distributing data through a data distribution device. Correspondingly, this modality provides a data storage method, which is used to store data distributed by a data distribution device. A primary node receives a first segment of data distributed by a data distribution device,
Petition 870160025486, of 6/3/2016, p. 11/38
31/51 performs erasure coding according to the first data segment to obtain a first EC range and allocates m + k data blocks from the first EC range for m + k storage nodes to perform storage. In comparison with the prior art, in the method provided by this modality, an erase coding operation, which is originally performed by the data distribution apparatus, is performed, instead, by the primary node. In this way, a conflict problem caused by the erasure coding performed by the data distribution device does not exist, and, in addition, there is no need to use a distributed lock to solve the conflict problem. Therefore, frequent lockout switching is not triggered, and the performance of a distributed storage system is improved.
[91] It can be understood that a primary node of any group of storage nodes in the distributed storage system can execute the data storage method provided by the modality shown in Figure 3.
[92] It is mentioned in the aforementioned modality that the logical volume of the distributed storage system may include multiple logical partitions that do not overlap, and the data distribution device performs division to obtain data that are located within a logical partition as a data segment. In addition, there is a correspondence between a group of storage nodes and a logical partition. It can be understood that, in this modality, the logical addresses of the first data segment received by the primary node must be located within a logical partition that corresponds to the group of storage nodes to which the primary node belongs. It can be understood that the logical addresses of data segments received by primary nodes from different groups of storage nodes do not overlap.
[93] During the generation of an EC band, the primary node can additionally generate a version number that corresponds to the EC band.
[94] In a real application, when the primary node receives the first data segment, it can also receive a data segment distributed by the same data distribution device or by another
Petition 870160025486, of 6/3/2016, p. 39/114
32/51 data distribution device. In this case, the erasure coding performed by the primary node can be classified in the following cases:
[95] Case 1: The primary node receives a second data segment, where the second data segment and the first data segment are located within the same logical partition and the logical addresses of the second data segment overlap the logical addresses the first data segment; the primary node performs erasure coding according to the second data segment to obtain a second EC strip, wherein the second EC strip includes m second data blocks and k second parity blocks; the primary node determines a serial distribution sequence of the first EC range and the second EC range, and distributes the first EC range and the second EC range to the m + k storage nodes, in series, according to serial distribution sequence. Specifically, if the primary node first receives the first data segment, and then receives the second data segment, the primary node first distributes the first m + k EC blocks of the first EC range, respectively, to the m + k storage nodes perform storage, and then, after receiving a response message sent by the m + k storage nodes, it distributes the m + k second EC blocks of the second EC range, respectively, to the m + k nodes of storage perform storage, where the response message is used to indicate that the first data segment is successfully stored on the m + k storage nodes, and a version number of the second EC range is greater than a version number of the first EC range. In case the primary node receives the second data segment first, and then receives the first data segment, the primary node first distributes the m + k second EC blocks of the second EC range, respectively, to the m + k storage nodes perform storage, and then distribute the m + k first EC blocks of the first EC range, respectively, to the m + k storage nodes perform storage, where a version number of the first EC range is greater than a version number of the second EC track.
[96] Case 2: The primary node receives a third data segment, where the first data segment and the third data segment
Petition 870160025486, of 6/3/2016, p. 40/114
33/51 are located within the same logical partition, and the logical addresses of the first data segment do not overlap the logical addresses of the third data segment. In this case, the primary node combines the third data segment into the first data segment, then uses the first combined data segment as a first data segment to be encoded in step 302, and then continues to perform step 302 and step 303. For example, a range of logical addresses for the first data segment is 4 M to 5 M, a range of logical addresses for the third data segment is 5 M to 8 M, and the logical addresses for the first data segment data and the third data segment are all located within a 4 M to 8 M logical partition. The primary node combines the third data segment into the first data segment, and then performs erase coding according to the first segment combined data to obtain a first EC range.
[97] Case 3: The primary node receives a fourth data segment, where the first data segment and the fourth data segment are located within different logical partitions. This case usually occurs on a storage node that is a primary node in two or more groups of storage nodes, or on a primary node in the same group of storage nodes that corresponds to multiple logical partitions. In this case, the first data segment and the fourth data segment are located within different logical partitions. Therefore, the logical addresses of the first data segment do not overlap the logical addresses of the fourth data segment, and the primary node only needs to encode and distribute data normally in the first data segment and the fourth data segment, which it does not cause a storage conflict. The above three cases can be used not only in a scenario in which the primary node receives two data segments concurrently, but also in a scenario in which the primary node receives three or more data segments concurrently, which is not limited in this modality.
[98] As mentioned in the modality shown in Figure 2, each logical partition of the distributed storage system corresponds to a unique key value, and a key value of a logical partition within which each data segment is located is a value of data segment key. In case 1, due to the fact that the first data segment and the
Petition 870160025486, of 6/3/2016, p. 41/114
34/51 second data segment are located within the same logical partition, the first data segment and the second data segment have the same key value; in case 2, due to the fact that the first data segment and the third data segment are located within the same logical partition, the first data segment and the third data segment have the same key value; in case 3, due to the fact that the first data segment and the fourth data segment are located within different logical partitions, the first data segment and the fourth data segment have different key values. The primary node can determine, according to key values of data segments, whether two data segments are located within the same logical partition.
[99] In particular, a key value of a data segment can be used as a key value of a corresponding EC range to identify a range of logical addresses for a data part of the EC range. A key value of an EC range can be carried in each EC block of the EC range.
[100] In step 302, the primary node performs erase coding according to the first data segment to obtain the first EC range. Specifically, if the size of the first data segment is Z, the first data segment can be divided only into the first m blocks of data. Then the primary node performs parity coding on the first m data blocks to obtain the first k parity blocks. In this way, the first EC range is obtained.
[101] If the size of the first data segment is less than Z, the first data segment is not sufficient for dividing it into m data blocks. The primary node needs to use stored data to complement the first data segment so that the size is equal to Z, then uses the first complemented data segment as the first data segment to be coded in step 302, and performs a divide the first data segment into first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks. To ensure that the first data segment can be located within a logical partition, the primary node must select, from the stored data, data that is located within the
Petition 870160025486, of 6/3/2016, p. 42/114
35/51 same logical partition as the first data segment to complement the first data segment. For example, if the logical addresses of the first data segment are 5 M to 8 M and are located within the logical partition 4 M to 8 M, the primary node must select stored data whose logical addresses are 4 M to 5 M to complement the first data segment so that logical addresses are 4 M to 8 M.
[102] A method for performing erase coding according to the second, third, and fourth data segments by the primary node is similar to the method for erasing encoding according to the first data segment, and is not repeated at present document.
[103] To facilitate understanding of the aforementioned modality, the following uses an application scenario specific to the aforementioned modality for the description.
[104] An upper layer of the distributed storage system includes a data distribution device A and a data distribution device B. The distributed storage system stores data using EC ranges, and each EC range includes four data blocks and two parity blocks. The distributed storage system includes 100 groups of storage nodes, and a primary node is specified in each group of storage nodes. A storage node group 1 includes six storage nodes, where a primary node is an A node, and the secondary nodes are a B node, a C node, a D node, an E node, and an F node. The volume logic of the distributed storage system is divided into multiple logical partitions, where a size of each logical partition is 4 M, that is, a range of logical addresses from a first logical partition is 0 M to 4 M, a range of logical addresses of a second logical partition is 4 M to 8 M, a range of logical addresses of a third logical partition is 8 M to 12 M ... Each logical partition corresponds to a unique key value, where a key value that corresponds the first logical partition is 1, a key value that corresponds to the second logical partition is 2, and a key value that corresponds to the third logical partition is 3 ...
[105] The data distribution device A receives a storage instruction from a user, where the storage instruction
Petition 870160025486, of 6/3/2016, p. 43/114
36/51 storage carries the data to be stored, and the logical addresses of the data to be stored are 5 M to 12 M.
[106] Due to the fact that the 5 M to 8 M are located within the second logical partition and the 8 M to 12 M are located within the third logical partition, the data distribution device A divides the data to be stored in two data segments, where the logical addresses of a data segment 1 are 5 M to 8 M, and a key value is 2; and the logical addresses of a data segment 2 are 8 M to 12 M, and a key value is 3.
[107] The data distribution device A predefines and records a correspondence between a key value and a group of storage nodes, and determines that key value 2 corresponds to storage node group 1 and that the key value 3 corresponds to a group of storage nodes 6.
[108] Data distribution device A distributes data segment 1 to a primary node (namely, node A) of the storage node group 1 and distributes data segment 2 to a primary node of the node group storage 6.
[109] Node A receives data segment 1, and then additionally receives a data segment 3 distributed by data distribution device B, where the logical addresses of data segment 3 are 4 M to 7 M. Node A complements the data from part 4 M to 5 M of data segment 1, and then performs erase coding on the complemented data segment 1 to obtain a range of EC 1, where a key value of the range of EC 1 is 2 and a version number is 1; node A additionally complements the data from part 7 M to 8 M of data segment 3, and then performs erase coding on the complemented data segment 3 to obtain a range of EC 3, where a key value of the EC range 3 is also 2 and a version number is 2. Each EC range includes six EC blocks, which are four data blocks and two parity blocks.
[110] For the EC 1 strip, node A itself stores one EC block from the EC 1 strip, and distributes five remaining EC blocks to nodes B to F for storage, so that nodes B to F, respectively ,
Petition 870160025486, of 6/3/2016, p. 44/114
37/51 store an EC block in the EC 1 range. After successfully storing the EC blocks in the EC 1 range, nodes B to F send a response message to node A; after receiving the response message, node A itself stores one EC block from the EC 2 range, and distributes five remaining EC blocks to nodes B to F for storage, so that nodes B to F, respectively, store one EC block of EC range 2.
[111] The above describes in detail the data distribution and data storage methods in the modalities of the present invention. The following modalities provide corresponding devices to implement the aforementioned methods.
[112] With reference to Figure 4, a basic structure of a data distribution apparatus provided by one embodiment of the present invention includes:
an instruction receiving module 401, configured to receive a storage instruction, wherein the storage instruction carries the data to be stored;
a data segment division module 402, configured to divide the data to be stored in P data segments, where each data segment corresponds to a range of EC, one size of each data segment is not greater than Z , where Z is a size of m data blocks and P is a positive integer;
a 403 node group determination module, configured to determine a storage node group that corresponds to each data segment; and a 404 data segment distribution module, configured to distribute the data segment to a primary storage node in the storage node group determined by the node group determination module 403 and corresponding to each data segment.
[113] In a data distribution device provided by this modality, an instruction instruction module 401 receives a storage instruction from a user; a data segment division module 402 divides the data to be stored that the storage instruction instructs to store, in P data segments; a module
Petition 870160025486, of 6/3/2016, p. 45/114
38/51 node group determination 403 determines a storage node group that corresponds to each data segment; and a data segment distribution module 404 distributes the data segment to a primary node in the corresponding storage node group. In this modality, due to the fact that the data distribution device directly distributes a segment of data obtained after the division to a storage node, and no longer performs erasure coding on the data to be stored, there is no need to use a distributed lock to solve a conflict problem, frequent lock switching is not triggered, and the performance of a distributed storage system is improved.
[114] Preferably, in yet another embodiment of the present invention, a logical volume of the distributed storage system is divided into multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the data segment division module 402 is specifically configured to:
divide the data to be stored in P data segments according to the logical addresses of the data to be stored, where each data segment is located within a logical partition.
[115] Preferably, in yet another embodiment of the present invention, in the P data segments obtained by dividing by the data segment division module 402, an initial address of a first data segment is an initial address of the data to be stored, and a starting address of a pth data segment is a starting address of a logical partition within which the pth data segment is located, where * 2 <p <P.
[116] Preferably, in yet another embodiment of the present invention, the data distribution apparatus additionally includes: a 405 correspondence module, configured to preset and record a correspondence between each logical partition and each group of storage nodes; and the 403 node group determination module is specifically configured to determine, according to the correspondence between each logical partition and each storage node group, the storage node group that corresponds to a
Petition 870160025486, of 6/3/2016, p. 46/114
39/51 logical partition within which the data segment is located. The 405 correspondence module is an optional module, and the data distribution device may not have the 405 correspondence module.
[117] Preferably, in yet another embodiment of the present invention, each logical partition of the distributed storage system corresponds exclusively to a key value, and the 405 correspondence module is specifically configured to preset and register a key value. which corresponds to a group of storage nodes; and the node group determination module 403 is specifically configured to determine a key value of the data segment according to the logical partition within which the data segment is located, and to determine according to the value of data segment key, the group of storage nodes that corresponds to the data segment.
[118] One embodiment of the present invention additionally provides a related data storage device, applied to a primary node in any group of storage nodes in a distributed storage system. With reference to Figure 5, a basic structure of the data storage device includes:
a data receiving module 501, configured to receive a first data segment, where a size of the first data segment is not greater than Z, where Z is a size of m data blocks;
a data coding module 502, configured to perform erase coding according to the first data segment to obtain a first EC strip, where the first EC strip includes the first m data blocks and the first k blocks of data. parity; and a 503 data distribution module, configured to distribute the first EC range to m + k storage nodes to perform storage, where each storage node on the m + k storage nodes is responsible for storing any of the first m blocks of data or the first k parity blocks of the first EC range.
[119] The modality shown in Figure 4 provides a data distribution device; correspondingly, this modality provides a data storage device, configured to store data
Petition 870160025486, of 6/3/2016, p. 47/114
40/51 distributed by the data distribution device. In a primary node data storage device, a data receiving module 501 receives a first data segment distributed by a data distribution device; a data coding module 502 performs erasure coding on the first data segment to obtain a first EC band; and a data distribution module 503 distributes the m + k data blocks from the first EC range to the m + k storage nodes for storage. In comparison with the prior art, in this embodiment, an erase encryption operation that is originally performed by the data distribution device is performed, instead, by the data storage device. In this way, a conflict problem caused by the erasure coding performed by the data distribution device does not exist, and, in addition, there is no need to use a distributed lock to solve the conflict problem. Therefore, frequent lockout switching is not triggered, and the performance of a distributed storage system is improved.
[120] Preferably, in yet another embodiment of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap. The first data segment is located within a logical partition.
[121] Preferably, in yet another embodiment of the present invention, the data reception module 501 is additionally configured to:
receiving a second data segment, where the second data segment and the first data segment are located within the same logical partition, and the logical addresses of the second data segment overlap the logical addresses of the first data segment;
the data coding module 502 is additionally configured to perform erase coding according to the second data segment to obtain a second EC strip, wherein the second EC strip includes m second data blocks and k second data blocks parity; and
Petition 870160025486, of 6/3/2016, p. 48/114
41/51 the data distribution module 503 is specifically configured to determine a serial distribution sequence for the first EC range and the second EC range, and to distribute the first EC range and the second EC range, in series , according to the serial distribution sequence for the m + k storage nodes to perform storage.
One purpose of the serial distribution operation is to avoid a storage conflict between the first EC range and the second EC range on the m + k storage nodes. Each storage node in the m + k storage nodes is responsible for storing any of the first m data blocks or the first k parity blocks of the first EC strip, and any of the m second data blocks or the k second blocks parity of the second EC band. Preferably, in yet another embodiment of the present invention, the data distribution module 503 distributes, using the following method, the first EC strip and the second EC strip, in series, according to the serial distribution sequence for the m + k storage nodes, in series, to perform storage:
in case a reception time of the first data segment is prior to a reception time of the second data segment, first distribute the first EC strip to the m + k storage nodes to perform storage, and, after receiving a message of response that indicates that the m + k storage nodes successfully store the first EC range, distribute the second EC range for the m + k storage nodes to perform storage; or if a reception time for the first data segment is later than a reception time for the second data segment, first distribute the second EC strip to the m + k storage nodes to perform storage, and, after receiving a response message which indicates that the m + k storage nodes successfully store the second range of EC, distribute the first range of EC to the m + k storage nodes to perform storage.
[122] Preferably, in yet another embodiment of the present invention, the data receiving module 501 is additionally configured to receive a third data segment, wherein the third data segment
Petition 870160025486, of 6/3/2016, p. 11/4
42/51 data and the first data segment are located within the same logical partition, and the logical addresses of the third data segment do not overlap the logical addresses of the first data segment; and the data encoding module 502 is additionally configured to combine the third data segment into the first data segment, and to trigger the step of performing erase coding according to the first data segment to obtain a first EC range .
[123] Preferably, in yet another embodiment of the present invention, the data encryption module 502 performs encryption by erasing the first data segment using the following method: if the size of the first data segment is equal to Z, divide the first data segment in the first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks; or, if the size of the first data segment is less than Z, use stored data to complement the first data segment so that the size is equal to Z, then divide the first complemented data segment into the first m data blocks , and perform parity coding on the first m data blocks to obtain the first k parity blocks.
[124] To facilitate the understanding of the aforementioned modality, the following uses a specific application scenario of the aforementioned modality for description.
[125] An upper layer of the distributed storage system includes a data distribution device A and a data distribution device B. The distributed storage system stores data using EC ranges, and each EC range includes four data blocks and two parity blocks. The distributed storage system includes 100 groups of storage nodes, and a primary node is specified in each group of storage nodes. A storage node group 1 includes six storage nodes, where a primary node is an A node, and the secondary nodes are a B node, a C node, a D node, an E node, and an F node. The volume logic of the distributed storage system is divided into multiple logical partitions, where one size of each logical partition is 4 M,
Petition 870160025486, of 6/3/2016, p. 50/114
43/51 that is, a range of logical addresses for a first logical partition is 0 M to 4 M, a range of logical addresses for a second logical partition is 4 M to 8 M, a range of logical addresses for a third logical partition is 8 M to 12 M ... Each logical partition corresponds to a unique key value, where a key value corresponding to the first logical partition is 1, a key value corresponding to the second logical partition is 2, and one key value corresponding to the third logical partition is 3 ...
[126] A 401 instruction receiving module from data distribution device A receives a storage instruction from a user, where the storage instruction carries the data to be stored, and the logical addresses of the data to be stored are 5 M to 12 M.
[127] Due to the fact that the 5 M to 8 M are located within the second logical partition, and the 8 M to 12 M are located within the third logical partition, a data segment splitting module 402 divides the data be stored in two data segments, where the logical addresses of a data segment 1 are 5 M to 8 M, and a key value is 2; and the logical addresses of a data segment 2 are 8 M to 12 M, and a key value is 3.
[128] A 403 node group determination module predefines and records a correspondence between a key value and a group of storage nodes, determines that key value 2 corresponds to storage node group 1 and that the value of key 3 corresponds to a group of storage nodes 6.
[129] A 404 data segment distribution module distributes data segment 1 to a primary node (namely, node A) of the storage node group 1 and distributes data segment 2 to a primary node of the group storage nodes 6.
[130] A data reception module 501 from node A receives data segment 1, and then, additionally, receives a data segment 3 distributed by data distribution device B, where the logical addresses of data segment 3 are 4 M to 7 M. A 502 data encoding module complements data from part 4 M to 5 M of the segment
Petition 870160025486, of 6/3/2016, p. 51/114
44/51 of data 1, and then performs erase coding on the supplemented data segment 1 to obtain a range of EC 1, where a key value of the range of EC 1 is 2 and a version number is 1; the data encoding module 502 additionally complements the data from part 7 M to 8 M of data segment 3, and then performs erase coding on the complemented data segment 3 to obtain a range of EC 3, where a key value for the EC range 3 is also 2 and a version number is 2. Each EC range includes six EC blocks, which are four data blocks and two parity blocks.
[131] A data distribution module 503 allocates one EC block from the EC 1 range to node A, and distributes five remaining EC blocks to nodes B to F for storage, so that nodes B to F, respectively , store an EC block in the EC 1 range. After successfully storing the EC blocks in the EC 1 range, nodes B to F send a response message to node A; after node A receives the response message and stores an EC block from the EC 2 range, and data distribution module 503 distributes the remaining five EC blocks to nodes B to F for storage, so that nodes B to F, respectively, store an EC block from the EC 2 range.
[132] A data distribution apparatus in one embodiment of the present invention is described above from the perspective of a modular functional entity. In the following, a data distribution apparatus in an embodiment of the present invention is described from a hardware processing perspective. Referring to Figure 6, another embodiment of a data distribution apparatus 600 in an embodiment of the present invention includes:
an input device 601, an output device 602, a processor 603 and a memory 604 (there may be one or more processors 603 in the data distribution device 600, and in Figure 6, a processor 603 is used as an example). In some embodiments of the present invention, the input device 601, the output device 602, the processor 603 and the memory 604 can be connected by a bus or in other ways, in which connection is used with the use of a bus as a example in Figure 6.
Petition 870160025486, of 6/3/2016, p. 11/114
45/51 [133] By invoking an operating instruction stored in memory 604, processor 603 is configured to perform the following steps:
receiving a storage instruction, in which the storage instruction carries the data to be stored;
divide the data to be stored in P data segments, where each data segment corresponds to a range of EC, with a size of each data segment not greater than Z, where Z is a size of m data blocks and P is a positive integer;
determine a group of storage nodes that correspond to each data segment; and distributing the data segment to a primary storage node in the given storage node group that corresponds to the data segment.
[134] In some embodiments of the present invention, a logical volume of a distributed storage system includes multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and processor 603 is additionally configured to perform the following step:
divide the data to be stored in the P data segments, where each data segment is located within one of the logical partitions.
[135] In some embodiments of the present invention, a starting address of a first data segment in the P data segments is a starting address of the data to be stored, and a starting address of a pth data segment in the P data segments it is a starting address of a logical partition within which the th data segment is located, where * 2 <p <P.
[136] In some embodiments of the present invention, the 603 processor is additionally configured to perform the following steps:
preset and record a match between each partition
Petition 870160025486, of 6/3/2016, p. 53/114
46/51 logic and each group of storage nodes; and determining, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes that corresponds to a logical partition within which the data segment is located.
[137] In some embodiments of the present invention, each logical partition corresponds exclusively to a key value, and processor 603 is additionally configured to perform the following steps:
predefine and register a key value that corresponds to each group of storage nodes;
determining a key value of the data segment according to the logical partition within which the data segment is located; and determining, according to the key value of the data segment, the group of storage nodes that corresponds to the data segment.
[138] The following describes a data storage device in an embodiment of the present invention from a hardware processing perspective. Referring to Figure 7, another embodiment of a data storage device 700 in an embodiment of the present invention includes:
an input device 701, an output device 702, a processor 703 and a memory 704 (there may be one or more processors 703 in the data storage device 700, and in Figure 7, a processor 703 is used as an example). In some embodiments of the present invention, the input device 701, the output device 702, the processor 703 and the memory 704 can be connected by a bus or in other ways, in which connection is used with the use of a bus as a example in Figure 7.
[139] By invoking an operating instruction stored in memory 704, processor 703 is configured to perform the following steps:
receive a first data segment, where a size of the first data segment is not greater than Z, where Z is a size of
Petition 870160025486, of 6/3/2016, p. 54/114
47/51 m data blocks;
perform erasure coding according to the first data segment to obtain a first EC strip, wherein the first EC strip includes the first m data blocks and the first k parity blocks; and distribute the first EC range to m + k storage nodes to perform storage, where each storage node on the m + k storage nodes is responsible for storing any of the first m data blocks or the first k parity blocks of the first EC range.
[140] In some embodiments of the present invention, a logical volume of the distributed storage system includes multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the first data segment is located within one of the logical partitions.
[141] In some embodiments of the present invention, the 703 processor is additionally configured to perform the following steps:
receiving a second data segment, where the second data segment and the first data segment are located within the same logical partition, and the logical addresses of the second data segment overlap the logical addresses of the first data segment;
performing erasure coding according to the second data segment to obtain a second EC strip, wherein the second EC strip includes m second data blocks and k second parity blocks; and determine a serial distribution sequence of the first EC strip and the second EC strip, and distribute the first EC strip and the second EC strip to the m + k storage nodes, in series, according to the sequence of serial distribution.
[142] In some embodiments of the present invention, the 703 processor is additionally configured to perform the following steps:
in the case of a reception time for the first segment of
Petition 870160025486, of 6/3/2016, p. 55/114
48/51 data be before a reception time of the second data segment, first distribute the first EC range to the m + k storage nodes to perform storage, and after receiving a response message that indicates that the m + k nodes of storage successfully store the first range of EC, distribute the second range of EC to the m + k storage nodes to perform storage; or if a reception time for the first data segment is later than a reception time for the second data segment, first distribute the second EC strip to the m + k storage nodes to perform storage, and after receiving a response message that indicates that the m + k storage nodes successfully store the second EC range, distribute the first EC range to the m + k storage nodes to perform storage.
[143] In some embodiments of the present invention, the 703 processor is additionally configured to perform the following steps:
receiving a third data segment, where the third data segment and the first data segment are located within the same logical partition, and the logical addresses of the third data segment do not overlap the logical addresses of the first data segment; and combining the third data segment into the first data segment, and triggering the step of performing erase coding according to the first data segment to obtain a first EC range.
[144] In some embodiments of the present invention, the 703 processor is additionally configured to perform the following steps:
when the size of the first data segment is equal to Z, divide the first data segment into first m data blocks, and perform parity coding on the first m data blocks to obtain the first k parity blocks; or when the size of the first data segment is less than Z, use stored data to complement the first data segment
Petition 870160025486, of 6/3/2016, p. 56/114
49/51 data so that the size is equal to Z, then divide the first complemented data segment into first m data blocks, and perform parity encoding on the first m data blocks to obtain the first k parity blocks.
[145] An embodiment of the present invention additionally provides a distributed storage system. The distributed storage system includes multiple storage nodes. The distributed storage system stores data using EC ranges, with each EC range including a data part and a parity part, where the data part of each EC range includes m data blocks, the parity part of each EC range includes k parity blocks that are obtained after parity coding is performed on the m data blocks, with multiple storage nodes constituting multiple groups of storage nodes, where a number of nodes of storage included in each group of storage nodes is not less than m + k, and a primary storage node is specified in each group of storage nodes, where both om and ok are positive integers.
[146] The distributed storage system additionally includes the data storage device shown in Figure 5 or Figure 7.
[147] It can be clearly understood by a person skilled in the art that, for the purpose of a brief and convenient description, for a detailed work process of the aforementioned system, apparatus and unit, reference can be made to a corresponding process in the modalities of methods mentioned above, and the details are not described again in this document.
[148] In the various modalities provided in this application, it should be understood that the revealed system, apparatus and method can be implemented in other ways. For example, the type of apparatus described is merely exemplary. For example, a unit division is merely a logical function division and can be another division in an actual deployment. For example, a plurality of units or components
Petition 870160025486, of 6/3/2016, p. 57/114
50/51 can be combined or integrated into another system, or some features can be ignored or not realized. In addition, mutual couplings or direct couplings or communication connections displayed or discussed can be implemented using some interfaces. Couplings or indirect communication connections between devices or units can be implanted in electronic, mechanical, or other forms.
[149] The units described as separate parts can be physically separated, or not, and the parts displayed as units can be physical units, or not, they can be located in one position, or they can be distributed in a plurality of network units. Some or all of the units can be selected according to real needs to achieve the objectives of the modalities solutions.
[150] In addition, functional units in the embodiments of the present invention can be integrated into a processing unit, or each of the units can physically exist alone, or two or more units can be integrated into one unit. The integrated unit can be deployed in a form of hardware, or it can be deployed in the form of a functional software unit.
[151] When the integrated unit is deployed as a functional software unit and sold or used as a standalone product, the integrated unit can be stored on a computer-readable storage medium. Based on this understanding, the technical solutions of the present invention, essentially, or the part that contributes to the prior art, or all or some technical solutions can be implemented in the form of a software product. The computer software product is stored on a storage medium and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device, or the like) to perform all or some of the steps in the methods described in the embodiments of the present invention. The aforementioned storage media includes: any media that can store a program code, such as a USB flash drive, a removable hard drive, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM) , Random Access Memory), a magnetic disk or an optical disk.
Petition 870160025486, of 6/3/2016, p. 11/114
51/51 [152] The aforementioned modalities are intended merely to describe the technical solutions of the present invention, but not to limit the present invention. Although the present invention is described in detail with reference to the aforementioned modalities, people of ordinary skill in the art must understand that it is still possible to make modifications to the technical solutions described in the aforementioned modalities or to make equivalent substitutions in some technical resources thereof, without depart from the scope of technical solutions of the modalities of the present invention.
权利要求:
Claims (4)
[1]
1. Data distribution method, applied to a distributed storage system, CHARACTERIZED by the fact that the distributed storage system stores data using the erasure coding ranges, EC, with each EC range comprising a part of data and a parity part, where the data part of each EC range comprises m data blocks, the parity part of each EC range comprises k parity blocks which are obtained after parity coding is performed on the m blocks of data, the distributed storage system comprising multiple storage nodes, where the multiple storage nodes constitute multiple groups of storage nodes, the number of storage nodes comprised in each group of storage nodes is not less than that m + k, and a primary storage node is specified in each storage node group, where both om and ok are integers the positive ones; and the method comprises:
receiving (201) a storage instruction, wherein the storage instruction carries the data to be stored;
divide (202) the data to be stored in P data segments, where each data segment corresponds to a range of EC, a size of each data segment is not greater than Z, where Z is a size of m blocks of data, and P is a positive integer;
determining (203) a group of storage nodes that correspond to each data segment; and distributing (204) the data segment to a primary storage node in the given storage node group that corresponds to the data segment;
wherein a logical volume of the distributed storage system comprises multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the division (202) of the data to be stored in P data segments comprises:
divide the data to be stored in the P data segments according to logical addresses of the data to be stored, where
Petition 870190015896, of 02/15/2019, p. 16/20
[2]
2/4 each data segment is located within one of the logical partitions; wherein the method further comprises: predefining and recording a correspondence between each logical partition and each group of storage nodes; and determining (203) a group of storage nodes corresponding to each data segment comprises:
determine, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes corresponding to a logical partition in which the data segment is located;
where a starting address of a first data segment in
P data segments is a starting address of the data to be stored, and a starting address of a p th data segment in P segment data is a starting address of a logical partition within which the p th data segment is located in that 2 <p <P.
2. Method of data distribution, according to the claim
1, CHARACTERIZED by the fact that, where each logical partition corresponds only to a key value, and the preset and record of a correspondence between each logical partition and each group of storage nodes comprises:
predefine and register a key value that corresponds to each group of storage nodes; and the determination, according to the correspondence between each logical partition and each group of storage nodes, of the group of storage nodes that corresponds to a logical partition within which the data segment is located, comprises:
determining a key value of the data segment according to the logical partition within which the data segment is located; and determining, according to the key value of the data segment, the group of storage nodes that corresponds to the data segment.
[3]
3. Data distribution device, applied to a distributed storage system, CHARACTERIZED by the fact that the distributed storage system stores data using coding bands
Petition 870190015896, of 02/15/2019, p. 17/20
3/4 by EC erasure, with each EC band comprising a data part and a parity part, where the data part of each EC band comprises m data blocks, the par part of each EC band comprises k parity blocks that are obtained after parity coding is performed on the m data blocks, the distributed storage system comprising multiple storage nodes, where the multiple storage nodes constitute multiple groups of storage nodes, in that a number of storage nodes comprised in each group of storage nodes is not less than m + k, and a primary storage node is specified in each group of storage nodes, where both om and ok are positive integers ; and the data distribution apparatus comprises:
an instruction receiving module (401), configured to receive a storage instruction, wherein the storage instruction carries the data to be stored;
a data segment division module (402), configured to divide the data to be stored in P data segments, where each data segment corresponds to a range of EC, one size of each data segment is not greater than that Z, Z is a size of m data blocks, and P is a positive integer;
a node group determination module (403), configured to determine a group of storage nodes that correspond to each data segment; and a data segment distribution module (404), configured to distribute the data segment to a primary storage node in the given storage node group that corresponds to the data segment;
wherein a logical volume of the distributed storage system comprises multiple logical partitions, and each of the logical partitions has a size of Z and they do not overlap; and the data segment division module (402) is specifically configured for:
divide the data to be stored in the P data segments
Petition 870190015896, of 02/15/2019, p. 18/20
ΑΙΑ according to the logical addresses of the data to be stored, where each data segment is located within one of the logical partitions;
wherein the data distribution apparatus additionally comprises:
a matching module (405), configured to predefine and record a match between each logical partition and each group of storage nodes; and the node group determination module (403) is specifically configured to:
determine, according to the correspondence between each logical partition and each group of storage nodes, the group of storage nodes corresponding to a logical partition in which the data segment is located;
where a start address of a first data segment in the P data segments is a start address of the data to be stored, and a start address of a pth data segment in the P data segments is a start address of a logical partition within which the th data segment lies, where 2 <p <P.
[4]
4. Data distribution apparatus, according to claim 3, CHARACTERIZED by the fact that each logical partition corresponds only to a key value, and the correspondence module (405) is specifically configured for:
predefine and register a key value that corresponds to each group of storage nodes; and the node group determination module (403) is specifically configured to:
determining a key value of the data segment according to the logical partition within which the data segment is located; and determining, according to the key value of the data segment, the group of storage nodes that corresponds to the data segment.
类似技术:
公开号 | 公开日 | 专利标题
BR102016012739B1|2019-04-24|DATA DISTRIBUTION METHOD AND APPARATUS
BR122018006762B1|2020-11-24|DATA STORAGE METHOD, DATA RECOVERY METHOD, RELATED DEVICE AND SYSTEM
US10713120B2|2020-07-14|Unique identifiers for data replication, migration, failover operations and failback operations
US9195392B2|2015-11-24|Distributed storage method, apparatus, and system
ES2648972T3|2018-01-09|Procedure, device and distributed storage system
WO2018000812A1|2018-01-04|Data storage method and apparatus
US9547605B2|2017-01-17|Method for data backup, device and system
CN105893188B|2018-12-14|Method and apparatus for accelerating the data reconstruction of disk array
TW200843410A|2008-11-01|Method and apparatus for storing data in a peer to peer network
TW200809866A|2008-02-16|Format transformation of test data
Platania et al.2014|Towards a practical survivable intrusion tolerant replication system
US9009431B2|2015-04-14|Virtual snapshot system and method
BR112018076689B1|2021-09-08|METHOD FOR DATA PROCESSING
CN104714756A|2015-06-17|Local locking in bi-directional synchronous mirroring environment and computer system
CN109542518A|2019-03-29|The method of chip and bootrom
CN107729536B|2020-09-08|Data storage method and device
US10592138B1|2020-03-17|Avoiding storage device overlap in raid extent sub group and keeping relationship balance on mapped raid system and method
CN104052799B|2018-01-26|A kind of method that High Availabitity storage is realized using resource ring
WO2015085802A1|2015-06-18|Data storage method and storage apparatus
CN111314393A|2020-06-19|Data processing method, device and equipment based on block chain
US20180113747A1|2018-04-26|Overdrive mode for distributed storage networks
WO2021012164A1|2021-01-28|Data reconstruction method and apparatus, computer device, storage medium, and system
BR102016012742A2|2017-09-12|STORAGE METHOD, DATA RECOVERY METHOD, RELATED APPARATUS, AND SYSTEM
Rosenfeld et al.2013|Using disk add-ons to withstand simultaneous disk failures with fewer replicas
CN113268374A|2021-08-17|Method for storing data, storage device and data storage system
同族专利:
公开号 | 公开日
CN107844268A|2018-03-27|
US20160357440A1|2016-12-08|
EP3101530A1|2016-12-07|
CN104932953A|2015-09-23|
EP3101530B1|2019-05-08|
CN104932953B|2017-11-21|
JP2017004513A|2017-01-05|
BR102016012739A2|2017-09-19|
CN107844268B|2021-09-14|
JP6259868B2|2018-01-10|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP3367510B2|1993-04-28|2003-01-14|株式会社日立製作所|Database management method and system|
US8103789B1|2001-03-01|2012-01-24|Juniper Networks, Inc.|Method and apparatus for computing a backup path using fate sharing information|
US7644136B2|2001-11-28|2010-01-05|Interactive Content Engines, Llc.|Virtual file system|
US7266716B2|2003-10-23|2007-09-04|Hewlett-Packard Development Company, L.P.|Method and recovery of data using erasure coded data from stripe blocks|
JP4244319B2|2003-12-17|2009-03-25|株式会社日立製作所|Computer system management program, recording medium, computer system management system, management device and storage device therefor|
US7546342B2|2004-05-14|2009-06-09|Microsoft Corporation|Distributed hosting of web content using partial replication|
US7783600B1|2006-02-27|2010-08-24|Symantec Operating Corporation|Redundancy management service for peer-to-peer networks|
CN101188569B|2006-11-16|2011-05-04|饶大平|Method for constructing data quanta space in network and distributed file storage system|
US8255739B1|2008-06-30|2012-08-28|American Megatrends, Inc.|Achieving data consistency in a node failover with a degraded RAID array|
US7992037B2|2008-09-11|2011-08-02|Nec Laboratories America, Inc.|Scalable secondary storage systems and methods|
US8051205B2|2008-10-13|2011-11-01|Applied Micro Circuits Corporation|Peer-to-peer distributed storage|
US8458287B2|2009-07-31|2013-06-04|Microsoft Corporation|Erasure coded storage aggregation in data centers|
US20110078343A1|2009-09-29|2011-03-31|Cleversafe, Inc.|Distributed storage network including memory diversity|
US20110229104A1|2009-10-22|2011-09-22|Hundemer Hank J|System And Method For Recording and Playback Of Multimedia Content|
AU2012398211B2|2012-12-28|2016-12-08|Huawei Technologies Co., Ltd.|Caching method for distributed storage system, a lock server node, and a lock client node|
CN103152395B|2013-02-05|2015-12-09|北京奇虎科技有限公司|A kind of storage means of distributed file system and device|
CN103984607A|2013-02-08|2014-08-13|华为技术有限公司|Distributed storage method, device and system|
CN103699494B|2013-12-06|2017-03-15|北京奇虎科技有限公司|A kind of date storage method, data storage device and distributed memory system|
CN103944981B|2014-04-14|2017-03-22|中国科学院计算技术研究所|Cloud storage system and implement method based on erasure code technological improvement|
CN104639661A|2015-03-13|2015-05-20|华存数据信息技术有限公司|Distributed storage system and storing and reading method for files|CN108141228A|2015-10-09|2018-06-08|华为技术有限公司|The coding of distributed memory system|
CN105426483B|2015-11-19|2019-01-11|华为技术有限公司|A kind of file reading and device based on distributed system|
CN105404561B|2015-11-19|2019-04-12|浙江宇视科技有限公司|A kind of the correcting and eleting codes implementation method and device of distributed storage|
CN107203559B|2016-03-17|2021-01-01|华为技术有限公司|Method and device for dividing data strips|
CN106202271A|2016-06-30|2016-12-07|携程计算机技术(上海)有限公司|The read method of the product database of OTA|
CN107590019B|2016-07-07|2021-03-16|北京金山云网络技术有限公司|Data storage method and device|
CN106201354A|2016-07-12|2016-12-07|乐视控股(北京)有限公司|Date storage method and system|
JP6526235B2|2016-11-25|2019-06-05|華為技術有限公司Huawei Technologies Co.,Ltd.|Data check method and storage system|
CN108614670B|2016-12-13|2020-07-03|杭州海康威视数字技术股份有限公司|Information processing method and device|
CN106776146A|2016-12-29|2017-05-31|华为技术有限公司|A kind of data verification method, apparatus and system|
US10713117B2|2017-06-15|2020-07-14|Hitachi, Ltd.|Storage system and method for controlling storage system|
US11188258B2|2017-06-19|2021-11-30|Hitachi, Ltd.|Distributed storage system|
CN109213420A|2017-06-29|2019-01-15|杭州海康威视数字技术股份有限公司|Date storage method, apparatus and system|
CN107609161A|2017-09-26|2018-01-19|北京思特奇信息技术股份有限公司|A kind of data write-in, read method and system|
CN107707395B|2017-09-28|2021-02-02|浙江大华技术股份有限公司|Data transmission method, device and system|
WO2019080015A1|2017-10-25|2019-05-02|华为技术有限公司|Data reading and writing method and device, and storage server|
CN108780386B|2017-12-20|2020-09-04|华为技术有限公司|Data storage method, device and system|
CN108897497B|2018-06-29|2021-10-08|吴俊杰|Centerless data management method and device|
CN109194566B|2018-08-27|2022-01-04|惠州Tcl移动通信有限公司|Method for retransmitting information, storage medium and terminal equipment|
CN109032536B|2018-08-31|2021-08-10|郑州云海信息技术有限公司|Data storage method, device, system and equipment based on distributed cluster system|
CN109491968B|2018-11-13|2021-01-22|恒生电子股份有限公司|File processing method, device, equipment and computer readable storage medium|
EP3889778A4|2018-12-22|2021-12-29|Huawei Technologies Co., Ltd.|Distributed storage system and computer program product|
EP3889752A4|2018-12-25|2021-12-01|Huawei Technologies Co., Ltd.|Data storage method and apparatus in distributed storage system, and computer program product|
CN110046160B|2019-03-15|2021-07-20|中国科学院计算技术研究所|Stripe-based consistent hash storage system construction method|
CN109977077B|2019-03-25|2021-09-24|腾讯科技(深圳)有限公司|Model file storage method and device, readable storage medium and computer equipment|
CN112083892B|2020-09-25|2021-05-18|上海依图网络科技有限公司|Data storage method, device, equipment and medium|
法律状态:
2017-09-19| B03B| Publication of an application: publication anticipated [chapter 3.2 patent gazette]|
2018-08-07| B07A| Technical examination (opinion): publication of technical examination (opinion) [chapter 7.1 patent gazette]|
2018-08-07| B15K| Others concerning applications: alteration of classification|Ipc: G06F 3/06 (2006.01), G06F 11/10 (2006.01) |
2018-11-21| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application [chapter 6.1 patent gazette]|
2019-02-26| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2019-04-24| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 03/06/2016, OBSERVADAS AS CONDICOES LEGAIS. (CO) 20 (VINTE) ANOS CONTADOS A PARTIR DE 03/06/2016, OBSERVADAS AS CONDICOES LEGAIS |
优先权:
申请号 | 申请日 | 专利标题
CN201510304621.3|2015-06-04|
CN201510304621.3A|CN104932953B|2015-06-04|2015-06-04|A kind of data distributing method, date storage method, relevant apparatus and system|
[返回顶部]