巴西专利BR112019017252A2 deriving motion vector information in a video decoder

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
an illustrative device for decoding video data includes a memory configured to store video data, and a video decoder implemented in circuit systems and configured to determine that the motion information of a current block of video data is to be derived using derivation of motion vector on the decoder side (dmvd), determining a pixel trace for the current block, the pixel trace comprising pixel data obtained from one or more groups of previously decoded pixels, deriving the movement information for the current block according to dmvd from the pixel indication, and decode the current block using the movement information. the video decoder can generate the pixel indication using various hypothesis predictions from several compensated motion blocks. the video decoder can determine an inter prediction direction for the motion information according to the correspondence costs between different prediction directions. the video decoder can refine the motion information using a matching cost calculated for the pixel indication.
公开号:BR112019017252A2
申请号:R112019017252
申请日:2018-02-21
公开日:2020-04-14
发明作者:Chuang Hsiao-Chiang；Chen Jianle；Karczewicz Marta；Chien Wei-Jung；Li Xiang；Chen Yi-Wen；Sun Yu-Chen
申请人:Qualcomm Inc；
IPC主号:

专利说明:

DERIVING MOTION VECTOR INFORMATION IN A VIDEO DECODER [0001] This Patent Application claims the benefit of U.S. Provisional Patent Application No. 62 / 461,729, filed on February 21, 2017; U.S. Provisional Patent Application No. 62 / 463,266, filed February 24, 2017; and U.S. Provisional Patent Application No. 62 / 472,919 filed on March 17, 2017, the contents of which are incorporated in their entirety by reference.
TECHNICAL FIELD [0002] This disclosure refers to the video encoding.
BACKGROUND [0003] Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, direct digital broadcast systems, non-wired broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers , e-book readers, digital cameras, digital recording devices, digital media players, game devices, game consoles, cell phones or satellite radio telephones, so-called smartphones, videoconferencing devices, video streaming devices, among others others. Digital video devices implement video encoding techniques, such as those described in the standards established by ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ITU-T H.264 / MPEG-4, Part 10,
Petition 870190080478, of 8/19/2019, p. 10/150
2/100
Advanced Video Encoding (AVC), ITU-T
H. 265c / High Efficiency Video Coding (HEVC), and extensions of such standards, such as Scalable Video Coding (SVC) and Multivision Video Coding (MVC) extensions. Video devices can transmit, receive, encode, decode and / or store digital video information more efficiently by implementing such video encoding techniques.
[0004] Video coding techniques include spatial (intra image) and / or temporal (between images) prediction to reduce or remove the inherent redundancy in video sequences. For block-based video encoding, a video fraction (for example, a video image or part of a video image) can be divided into video blocks, which can also be referred to as encoding tree units ( CTUs), encoding units (CUs) and / or encoding nodes. The video blocks in an intra (I) coded fraction of an image are encoded using spatial prediction in relation to reference samples in neighboring blocks in the same image. The video blocks in an inter coded fraction (P or B) of an image can use the spatial prediction in relation to the reference samples in neighboring blocks in the same image or the temporal prediction in relation to the reference samples in other reference images. Images can be referred to as frames, and reference images can be referred to as reference frames.
[0005] The spatial or temporal prediction results in a predictive block for a block to be coded. The
Petition 870190080478, of 8/19/2019, p. 11/150
3/100 residual data represents pixel differences between the original block to be encoded and the predictive block. An inter coded block is encoded according to a motion vector that points out to a block of reference samples that form the predictive block, and the residual data indicates the difference between the coded block and the predictive block. An intra coded block is encoded according to an intra coding mode and the residual data. For additional compression, residual data can be transformed from the pixel domain to a transform domain, resulting in residual transform coefficients, which can then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, can be scanned to produce a one-dimensional vector of transform coefficients, and entropy coding can be applied to obtain even more compaction.
SUMMARY [0006] In general, this disclosure describes techniques related to decoder side motion vector (DMVD) derivation. These techniques can be applied to any of the existing video codecs, such as HEVC (High Efficiency Video Coding), and / or can be an efficient coding tool in any future video coding standards.
[0007] In one example, a method of decoding video data includes determining that the motion information of a current block of video data must be derived using decoder side motion vector (DMVD) derivation, determining a
Petition 870190080478, of 8/19/2019, p. 12/150
4/100 pixel indication for the current block, the pixel indication comprising pixel data obtained from one or more groups of pixels previously decoded, derive the movement information for the current block according to the DMVD from the indication of pixels, and decode the current block using motion information.
[0008] In another example, a device for decoding video data includes a memory configured to store video data, and a video decoder implemented in a circuit system and configured to determine that the motion information of a current block of data from video must be derived using the motion vector derivation on the decoder side (DMVD), determine a pixel hint for the current block, the pixel hint comprising in pixel data obtained from one or more groups of pixels previously decoded , derive the movement information for the current block according to the DMVD from the pixel indication and decode the current block using the movement information.
[0009] In another example, a device for decoding video data includes a means for determining which
The information in movement of a block current from data from video is for to be derivative using the derivation of vector in movement at the side of decoder (DMVD), means for
determine a pixel hint for the current block, the pixel hint comprising pixel data obtained from one or more groups of pixels previously decoded, a means of deriving motion information for the current block according to the DMVD from the clue
Petition 870190080478, of 8/19/2019, p. 13/150
5/100 pixels, and a half to decode the current block using motion information.
[0010] In another example, a computer-readable storage medium has stored instructions that, when executed, cause a processor to determine that the motion information of a current block of video data must be derived using vector derivation. of motion on the decoder side (DMVD), determine a pixel indication for the current block, the pixel indication comprising pixel data obtained from one or more groups of pixels previously decoded, derive the movement information for the current block according to the DMVD from the pixel indication and decode the current block using the movement information.
[0011] The details of one or more examples are set out in the accompanying drawings and in the description below. Other characteristics, objectives and advantages will be apparent from the description and drawings and from the claims.
BRIEF DESCRIPTION OF THE DRAWINGS [0012] FIG. 1 is a block diagram illustrating an illustrative video encoding and decoding system that can use techniques to perform motion vector derivation on the decoder side (DMVD) of this disclosure.
[0013] FIG. 2 is a block diagram illustrating an example of a video encoder that can implement techniques to perform motion vector derivation on the decoder (DMVD) side of this disclosure.
Petition 870190080478, of 8/19/2019, p. 14/150
6/100 [0014] FIG. 3 is a block diagram illustrating an example of a video decoder that can implement techniques for performing motion vector derivation on the decoder (DMVD) side of this disclosure.
[0015] FIGS. 4A and 4B are conceptual diagrams illustrating spatial motion vector candidates derived from neighboring blocks.
[0016] FIGS. 5A and 5B are conceptual diagrams illustrating a main block location for a temporal motion vector predictor (TMVP) candidate.
[0017] FIG. 6 is a conceptual diagram illustrating concepts related to bilateral correspondence to derive movement information from a current block.
[0018] FIG. 7 is a conceptual diagram illustrating concepts related to model matching to derive motion information from a current block.
[0019] FIG. 8 is a flow chart illustrating an illustrative process of matching upward rate conversion (FRUC) models.
[0020] FIG. 9 is a flow chart illustrating proposed illustrative changes to the FRUC model matching process of FIG. 8.
[0021] FIG. 10 is a conceptual diagram illustrating concepts related to the bidirectional optical flow in the Joint Exploration Model (JEM) for a proposed video encoding standard to be presented.
[0022] FIG. 11 is a conceptual diagram
Petition 870190080478, of 8/19/2019, p. 15/150
7/100 illustrating an example of gradient calculation for an 8x4 block.
[0023] FIG. 12 is a conceptual diagram illustrating concepts related to the motion vector derivation on the decoder side (DMVD) proposed based on bilateral model correspondence.
[0024] FIGS. 13A and 13B are conceptual diagrams illustrating concepts related to overlapping block compensated movement (OBMC) in JEM.
[0025] FIGS. 14A through 14D are conceptual diagrams illustrating the OBMC weights.
[0026] FIGS. 15A and 15B are conceptual diagrams illustrating illustrative extended areas for a pixel indication according to the techniques of this disclosure.
[0027] FIG. 16 is a conceptual diagram illustrating another illustrative extended area for a pixel indication where the extended area is irregular according to the techniques of this disclosure.
[0028] FIGS. 17A through 17C are conceptual diagrams illustrating illustrative weightings assigned to several pixels according to the techniques of this disclosure.
[0029] FIG. 18 is a conceptual diagram illustrating another example of weight values assigned to several pixels according to the techniques of this disclosure.
[0030] FIGS. 19A and 19B are conceptual diagrams illustrating an illustrative filter applied to pixels according to the techniques of this disclosure.
[0031] FIG. 20 is a flow chart illustrating an illustrative method for encoding video data according to
Petition 870190080478, of 8/19/2019, p. 16/150
8/100 with the techniques of this revelation.
[0032] FIG. 21 is a flow chart illustrating an illustrative method for decoding video data according to the techniques of this disclosure.
DETAILED DESCRIPTION [0033] In general, the techniques of this disclosure refer to the derivation of motion vector on the decoder side (DMVD). That is, instead of explicitly signaling a motion vector or other motion information, a video decoder can derive the motion vector according to any or all of the techniques of this disclosure, alone or in any combination.
[0034] In general, a video decoder can derive motion information for a current block of video data, that is, a block currently being decoded. To derive motion information, the video decoder can first determine a pixel indication for the current block. The pixel index generally corresponds to the pixel data obtained from one or more groups of pixels previously decoded. The pixel indication can be, for example, one or more blocks that have a high probability of being identified by a motion vector. The video decoder can determine such blocks according to bilateral model correspondence. In addition or alternatively, the video decoder can determine such blocks from neighboring pixels for the current block and corresponding neighboring pixels closest to the reference blocks, so that the reference blocks
Petition 870190080478, of 8/19/2019, p. 17/150
9/100 form the indication of pixels.
[0035] In some examples, the video decoder can generate the indication of pixels using various hypothesis predictions from several blocks with compensated movement. For example, the video decoder can calculate a weighted average of the various blocks with compensated motion. Additionally or alternatively, the video decoder can perform compensated overlapping block motion to generate the pixel hint. As yet another example, the video decoder can add offsets for one or more motion vectors from the current block, and derive the various motion compensated blocks from the displaced motion vectors (as well as the original motion vectors from the current block) .
[0036] In some examples, the video decoder can calculate a corresponding cost between a first reference block and a second reference block identified by the derived movement information for the current block. The video decoder can calculate the corresponding cost by applying the respective weight values to the cost measurements for the corresponding pixels of the reference blocks, for example, a first weight for a first cost measurement and a second weight for a second measurement of cost. cost, where weightings and cost measurements may differ from each other. The video decoder can then refine the motion information based on the corresponding cost. In addition, the video decoder can determine weights based on the distances between pixels
Petition 870190080478, of 8/19/2019, p. 18/150
Corresponding 10/100 and the specific points of the current block, in the distances between the corresponding pixels and the specific points of the pixel indication, in a row or column that includes the corresponding pixels and / or in regions that include the corresponding pixels.
[0037] A new video encoding standard, called High Efficiency Video Coding (HEVC) (also referred to as ITU-T H.265), including its range extension, the multivision extension (MV-HEVC) and the extension Scalable (SHVC), was recently developed by the Joint Video Coding Collaboration Team (JCT-VC) as well as the Joint Collaboration Team on 3D Video Coding Extension Development (JCT-3V) of the Video Coding Expert Group Video ITU-T (VCEG) and the ISO / IEC Moving Image Experts Group (MPEG). The HEVC specification, referred to as HEVC WD hereafter, is available at phenix.intevry.fr/j ct / doc_end_user / documents / 14_Vienna / wgll / JCTVCN1003-VÍ. zip.
[0038] Members of ITU-T VCEG (Q6 / 16) and ISO / IEC MPEG (JTC 1 / SC 29 / WG 11) are studying the potential need for standardization of future video encoding technology with a compression capability which significantly exceeds that of the current HEVC standard (including its current extensions and its short-term extensions for encoding screen content and high dynamic range encoding). The groups are working together on this exploration activity in a joint collaborative effort known as the Team
Petition 870190080478, of 8/19/2019, p. 19/150
11/100
Joint Video Exploration (JVET) to evaluate compression technology projects proposed by its specialists in this area. JVET met for the first time between 19 and 21 October 2015. A version of the reference software, ie, Joint Exploration Model 5 (JEM 5), is available at j vet.hhi.fraunhofer.de/svn/ svn_HMJEMSoftware / tags / HM-16.6JEM-5.0. A description of the JEM 5 algorithm is available at phenix.it-sudparis.eu/j vet / doc_end_user / current_document. php id = 2714.
[0039] In HEVC, the largest coding unit in a fraction is called a coding tree block (CTB) or a coding tree unit (CTU). A CTB contains a quaternary tree, the nodes of which are coding units. The size of a CTB can be in a range from 16 x 16 to 64 x 64 in the main HEVC profile (although technically 8 x 8 CTB sizes can be supported). A coding unit (CU) could be the same size as a CTB, although it is as small as 8 x
8. Each coding unit is coded with a mode. When a CU is coded inter, it can be divided into 2 or 4 prediction units (PUs) or become just a PU when the additional division does not apply. When two PUs are present in a CU, they can be rectangles half the size or two rectangle sizes 1/4 or 3/4 the size of the CU. When the CU is inter coded, a set of movement information is present for each PU. In addition, each PU is encoded with a unique prediction mode to derive the set of motion information.
Petition 870190080478, of 8/19/2019, p. 20/150
12/100 [0040] In the HEVC standard, there are two inter prediction modes, called blending mode (jumping is considered as a special case of blending) and advanced motion vector prediction (AMVP), respectively, for a unit of prediction (PU). In AMVP mode or merge mode, a list of motion vector (MV) candidates is maintained for various motion vector predictors. The movement vector (vectors), as well as the reference clues in the current PU blending mode are generated by obtaining a candidate from the MV candidate list.
[0041] The MV candidate list contains up to 5 candidates for the merge mode and only two candidates for the AMVP mode. A merge candidate can contain a set of motion information, for example, motion vectors corresponding to both the reference image lists (list 0 and list 1) and the reference evidence. If a merge candidate is identified by a merge hint, reference images are used to predict the current blocks, and the associated motion vectors are determined. However, in AMVP mode for each potential prediction direction from list 0 or list 1, a reference clue needs to be explicitly flagged, along with an indication of the MV predictor (MVP) for the MV candidate list, since the AMVP candidate contains only one motion vector. In AMVP mode, predicted motion vectors can be further refined. As can be seen above, a merge candidate corresponds to a complete set of movement information,
Petition 870190080478, of 8/19/2019, p. 21/150
13/100 candidate for AMVP contains only one motion vector for a specific prediction direction and the reference indicator. Candidates for both modes are similarly derived from the same spatial and temporal neighboring blocks.
[0042] FIG. 1 is a block diagram illustrating an illustrative video encoding and decoding system 10 that can use techniques to perform motion vector derivation on the decoder side (DMVD) of this disclosure. As shown in FIG. 1, system 10 includes a source device 12 which provides encoded video data to be decoded later by a destination device 14. In particular, source device 12 provides video data to destination device 14 via a computer-readable medium 16. The source device 12 and the target device 14 can comprise on any of a wide variety of devices, including desktop computers, notebook computers (ie laptop), tablet computers, signal decoders, monophonic devices, such as called smartphones, so-called smartpads, televisions, cameras, video devices, digital media players, game consoles, video streaming devices, among others. In some cases, the source device 12 and the destination device 14 can be equipped for non-wired communication.
[0043] The target device 14 can receive the encoded video data to be decoded via a computer-readable medium 16. The medium readable by
Petition 870190080478, of 8/19/2019, p. 22/150
14/100 computer 16 may comprise any type of medium or device capable of moving the encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may comprise a means of communication to enable the source device 12 transmits encoded video data directly to the target device 14 in real time. The encoded video data can be modulated according to a communication standard, such as an unwired communication protocol, and transmitted to the destination device 14. The communication medium can comprise any non-wired or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can be part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from the source device 12 to the destination device 14.
[0044] In some examples, encrypted data can be output from the output interface 22 to a storage device. Similarly, encrypted data can be accessed from the storage device via the input interface. The storage device can include any one of several data storage media distributed or accessed locally, such as a hard disk, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory or any other media.
Petition 870190080478, of 8/19/2019, p. 23/150
15/100 digital storage suitable for storing encoded video data. In an additional example, the storage device can correspond to a file server or another intermediate storage device that can store the encoded video generated by the source device 12. The target device 14 can access the stored video data from the device storage via streaming or transfer. The file server can be any type of server capable of storing encoded video data and transmitting that encoded data to the target device 14. Illustrative file servers include a web server (for example, for a website), a server FTP, network connected storage devices (NAS) or a local disk drive. The target device 14 can access the encoded video data through any standard data connection, including an Internet connection. This can include an un-wired channel (for example, a WiFi connection), a wired connection (for example, DSL, cable modem, etc.) or a combination of both that is suitable for accessing encoded video data stored on a server of files. The transmission of encoded video data from the storage device can be a streaming transmission, a transfer transmission or a combination thereof.
[0045] The techniques of this disclosure are not necessarily limited to non-wired applications or configurations. The techniques can be applied to video encoding in support of any one of several applications of
Petition 870190080478, of 8/19/2019, p. 24/150
16/100 multimedia, such as live broadcast, cable television broadcasts, satellite television broadcasts, Internet streaming video broadcasts, such as dynamic adaptive streaming via HTTP (DASH), digital video that is encoded on a medium data storage, decoding of digital video stored on a data storage medium, or other applications. In some instances, system 10 can be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video playback, video broadcasting and / or video telephony.
[0046] In the example of FIG. 1, the source device 12 includes the video source 18, the video encoder 20 and the output interface 22. The destination device 14 includes input interface 28, the video decoder 30 and the video device 32. According to In this disclosure, the video encoder 20 of the source device 12 can be configured to apply techniques to perform motion vector derivation on the decoder (DMVD) side of this disclosure. In other examples, a source device and a target device may include other components or arrangements. For example, source device 12 can receive video data from an external video source 18, such as an external camera. Likewise, target device 14 can interface with an external video device, instead of including an integrated video device.
[0047] The system illustrated 10 of FIG. 1 is just an example. Techniques for performing motion vector derivation on the decoder (DMVD) side of this
Petition 870190080478, of 8/19/2019, p. 25/150
17/100 processing can be performed by any digital video encoding and / or decoding device. Although, generally, the techniques of this disclosure are performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, typically referred to as a CODEC. In addition, the techniques of this development can also be performed by a video preprocessor. The source device 12 and the target device 14 are merely examples of such encoding devices in which the source device 12 generates encoded video data for transmission to the target device 14. In some examples, devices 12 and 14 may operate in a manner substantially symmetrical, so that each of the devices 12 and 14 includes video encoding and decoding components. Thus, system 10 can support unidirectional or bidirectional video transmission between video devices 12 and 14, for example, for video streaming, video playback, video broadcasting or video telephony.
[0048] The video source 18 of the source device 12 may include a video capture device, such as a video camera, a video file containing previously captured video and / or a video feed interface for receiving video from from a video content provider. As a further alternative, video source 18 can generate computer-based data such as the video source, or a combination of live video, archived video and computer generated video. In some cases, if the video source 18 is a video camera
Petition 870190080478, of 8/19/2019, p. 26/150
18/100 video, the source device 12 and the target device 14 can form so-called camera phones or video phones. However, as mentioned above, the techniques described in this disclosure may be applicable to video encoding in general, and can be applied to non-wired and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder 20. The encoded video information can then be output via the output interface 22 to a computer-readable medium.
16.
[0049] Computer-readable medium 16 may include temporary media, such as an unwired broadcast or wired network transmission, or storage media (i.e., non-temporary storage media), such as a hard drive, flash drive, compact disc, digital video disc, Blu-ray disc or other computer-readable media. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide encoded video data to destination device 14, for example, via network transmission. Similarly, a computing device of a media production facility, such as a disc embossing facility, can receive encoded video data from source device 12 and produce a disc containing the encoded video data. Therefore, it can be understood that the computer-readable medium 16 includes one or more computer-readable media in various ways, in various examples.
Petition 870190080478, of 8/19/2019, p. 27/150
19/100 [0050] The input interface 28 of the target device 14 receives information from a computer-readable medium 16. The computer-readable medium information 16 may include syntax information defined by the video encoder 20, which also it is used by the video decoder 30, which includes elements of syntax that describe characteristics and / or processing of blocks and other encoded units. The video device 32 displays the decoded video data for a user and can comprise any of a variety of video devices, such as a cathode ray tube (CRT), a liquid crystal display (LCD), a video of plasma, an organic light-emitting diode (OLED) video, or other type of video device.
[0051] Video encoder 20 and video decoder 30 can operate according to a video encoding standard, such as the High Efficiency Video Encoding (HEVC) standard, also referred to as ITU-T H.265. Alternatively, video encoder 20 and video decoder 30 can operate in accordance with other proprietary or industry standards, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10 and Advanced Video Encoding ( Strokes), or extensions of such standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video encoding standards include MPEG-2 and ITU-T H.263. Although not shown in FIG. 1, in some respects, each of the video encoder 20 and video decoder 30 can be integrated with a
Petition 870190080478, of 8/19/2019, p. 28/150
20/100 audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle both audio and video encoding in a common data stream or separate data streams. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol or to other protocols, such as the user datagram protocol (UDP).
[0052] Each of the video encoder 20 and the video decoder 30 can be implemented as any one of several suitable encoder circuit systems, such as one or more microprocessors, processing circuit system (including video circuit system). fixed function and / or programmable processing circuit system), digital signal processors (DSPs), application-specific integrated circuits (ASICs), field programmable port arrangements (FPGAs), discrete logic, software, hardware, firmware or any combination of these. When the techniques are partially implemented in software, a device can store instructions for the software in a suitable non-temporary, computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each video encoder 20 and video decoder 30 can be included in one or more encoders or decoders, any of which can be integrated as part of a combined encoder / decoder (CODEC) in a respective device.
[0053] In general, according to ITU-T H.265,
Petition 870190080478, of 8/19/2019, p. 29/150
21/100 a video image can be divided into a sequence of coding tree units (CTUs) (or larger coding units (LCUs)) that can include both luma and cross samples. Alternatively, CTUs can include monochrome data (that is, only luma samples). Syntax data within a bit stream can define a size for the CTU, which is the largest encoding unit in terms of the number of pixels. A fraction includes several consecutive CTUs in the coding order. A video image can be divided into one or more fractions. Each CTU can be divided into coding units (CUs) according to a quaternary tree. In general, a quaternary tree data structure includes one node per CU, with a root node corresponding to the CTU. If a CU is divided into four sub-CUs, the node corresponding to the CU includes four leaf nodes, each of which corresponds to one of the sub-CUs.
[0054] Each node in the data structure of the quaternary tree can provide syntax data for the corresponding CU. For example, a node in the quaternary tree may include a division flag, indicating whether the CU corresponding to the node is divided into sub-CUs. The syntax elements for a CU can be established recursively and may depend on whether the CU is divided into subCUs. If a CU is not further divided, it is referred to as a CU-sheet. In this disclosure, four subCUs of a CU-leaf will also be referred to as CUs-leaf, even if there is no explicit separation from the original CU-leaf. For example, if a 16 x 16 CU is not further divided, the four 8x8 sub-CUs will also
Petition 870190080478, of 8/19/2019, p. 30/150
22/100 will be referred to as Cus-folio, although CU 16 x 16 has never been split.
[0055] A CU has a similar purpose to an H.264 macroblock, except that a CU does not have a size distinction. For example, a CTU can be divided into four child nodes (also referred to as sub-CUs), and each child node can, in turn, be a parent node and be divided into four other child nodes. A final undivided child node, referred to as a leaf node of the quaternary tree, comprises a coding node, also referred to as a CU-leaf. The syntax data associated with an encoded bit stream can define a maximum number of times that a CTU can be divided, referred to as a maximum CU depth, and can also define a minimum size of the encoding nodes. Consequently, a bit stream can also define a smaller encoding unit (SCU). This disclosure uses the term block to refer to any CU, prediction unit (PU) or transform unit (TU), in the context of HEVC, or similar data structures in the context of other patterns (for example, macroblocks and sub -blocks of these in H.264 / AVC).
[0056] A CU includes a coding node and prediction units (PUs) and transformation units (TUs) associated with the coding node. A CU size corresponds to a coding node size and is generally quadratic in shape. The size of the CU can vary from 8 x 8 pixels to the size of the CTU with a maximum size, for example, 64 x 64 pixels or more. Each CU can contain one or more PUs and one or more TUs. The
Petition 870190080478, of 8/19/2019, p. 31/150
23/100 syntax data associated with a CU can describe, for example, the division of the CU into one or more PUs. The dividing modes can differ between whether the CU is encoded by direct mode or by hop mode, coded by intra prediction mode or coded by inter prediction mode. PUs can be divided to be in a non-quadratic format. The syntax data associated with a CU can also describe, for example, the division of the CU into one or more TUs according to a quaternary tree. A TU can be quadratic or non-quadratic (for example, rectangular) in shape.
[0057] The HEVC standard allows transforms according to the TUs, which can be different for different CUs. TUs are typically sized based on the size of PUs (or divisions of a CU) within a given CU defined for a split CTU, although this is not always the case. TUs are typically the same size or smaller than PUs (or divisions of a CU, for example, in the case of intra prediction). In some examples, residual samples corresponding to a CU can be subdivided into smaller units using a quaternary tree structure known as a residual quaternary tree (RQT). RQT leaf nodes can be referred to as processing units (TUs). The pixel difference values associated with the TUs can be transformed to produce transformation coefficients, which can be quantized.
[0058] A CU-sheet can include one or more prediction units (PUs) when predicted using inter prediction. In general, a PU represents an area
Petition 870190080478, of 8/19/2019, p. 32/150
24/100 corresponding to all or a part of the corresponding CU and may include data to retrieve and / or generate a reference sample for the PU. In addition, a PU includes data related to prediction. When the CU is encoded in inter mode, one or more PUs of the CU can include data that define motion information, such as one or more motion vectors, or the PUs can be encoded in the jump mode. The data that defines the motion vector for a PU can describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution for the motion vector (for example, quarter pixel precision or precision of an eighth of a pixel), a reference image to which the motion vector highlights and / or a list of reference images (for example, List 0 or List 1) for the motion vector.
[0059] Cus-folia can also be predicted in the intra mode. In general, intra prediction involves predicting a CU-sheet (or divisions thereof) using an intra mode. A video encoder can select a set of neighboring pixels previously encoded for the CU-sheet to use to predict the CU-sheet (or divisions thereof).
[0060] A CU-sheet can also include one or more processing units (TUs). Transformation units can be specified using an RQT (also referred to as a TU quaternary tree structure), as discussed above. For example, a division flag can indicate whether a CU-leaf is divided into four transformation units. Then, each TU can be further divided into sub-TUs. When a TU is not
Petition 870190080478, of 8/19/2019, p. 33/150
25/100 additionally divided, it can be called TU-leaf. Generally, for intra coding, all TU-sheets belonging to a CU-sheet share the same intra prediction mode. That is, the same intra prediction mode is generally applied to calculate the predicted values for all TUs on a CU-sheet. For intra coding, a video encoder can calculate a residual value for each TU-sheet using the intra prediction mode, as a difference between the part of the CU corresponding to the TU and the original block. A TU is not necessarily limited to the size of a PU. Thus, TUs can be larger or smaller than a PU. For intra coding, the divisions of a CU, or the CU itself, can be placed with a corresponding TU-sheet for the CU. In some examples, the maximum size of a TU-sheet may correspond to the size of the corresponding CU-sheet.
[0061] In addition, the TUs of leaf CUs can also be associated with the respective data structures of the quaternary tree, called residual quaternary trees (RQTs). That is, a CU-leaf can include a quaternary tree indicating how the CU-leaf is divided into TUs. The root node of a TU quaternary tree usually corresponds to a CU-leaf, while the root node of a CU quaternary tree generally corresponds to a CTU (or LCU). RQT TUs that are not split are called leaf TUs. In general, this disclosure uses the terms CU and TU to refer to CU-sheet and TU-sheet, respectively, unless otherwise noted.
[0062] Although some techniques are explained in relation to HEVC, it must be understood that the
Petition 870190080478, of 8/19/2019, p. 34/150
26/100 techniques of this disclosure are not limited to HEVC. For example, instead of using quaternary tree division according to HEVC, the techniques of this disclosure can be applied when CTUs are divided according to other division schemes, such as binary quaternary tree division (QTBT), where a tree data structure can include a region tree divided according to the division of the quaternary tree, and the leaf nodes of the region tree can serve as the root nodes of the respective prediction trees that can be divided according to the binary tree and / or the central lateral triple tree split.
[0063] A video sequence typically includes a series of frames or video images, starting with a random access point (RAP) image. A video sequence can include syntax data in a set of sequence parameters (SPS) that characterize the video sequence. Each fraction of an image can include fraction syntax data that describes an encoding mode for the respective fraction. The video encoder 20 typically operates on video blocks within individual video fractions to encode the video data. A video block can correspond to a coding node within a CU. Video blocks can have fixed or varying sizes and may differ in size according to a specified encoding standard.
[0064] As an example, prediction can be performed for PUs of various sizes. Assuming that the size of a particular CU is 2N x 2N, the intra prediction
Petition 870190080478, of 8/19/2019, p. 35/150
27/100 can be performed in PU sizes of 2N x 2N or N x N, and inter prediction can be performed in symmetrical PU sizes of 2N x 2N, 2N x N, N x 2N or N x N. asymmetric for inter prediction can also be performed for PU sizes of 2N x nU, 2N x nD, nL x 2N and nR x 2N. In the asymmetric division, one direction of a CU is not divided, while the other direction is divided by 25% and 75%. The portion of the CU corresponding to the 25% division is indicated by an n followed by an Up, Down, Left or Right indication. So, for example, 2N x nU refers to a CU 2N x 2N that is divided horizontally with a PU 2N x 0.5N at the top and a PU 2N x 1.5N at the bottom.
[0065] In this disclosure, N x N and N by N can be used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, for example, 16 x 16 pixels or 16 by 16 pixels . In general, a 16 x 16 block will have 16 pixels in a vertical direction (y = 16) and 16 pixels in a horizontal direction (x = 16). Likewise, an N x N block usually has N pixels in a vertical direction and N pixels in a horizontal direction, where N represents a non-negative integer value. The pixels in a block can be arranged in rows and columns. In addition, blocks do not necessarily have to have the same number of pixels in the horizontal direction as in the vertical direction. For example, blocks can include N x M pixels, where M is not necessarily equal to N.
[0066] Following an intra predictive coding or inter predictive coding using the PUs of
Petition 870190080478, of 8/19/2019, p. 36/150
28/100 a CU, the video encoder 20 can calculate the residual data for the CU's TUs. PUs can comprise syntax data describing a method or method of generating predictive pixel data in the spatial domain (also referred to as pixel domain) and TUs can understand coefficients in the transform domain after applying a transform, for example, a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform for residual video data. Residual data may correspond to pixel differences between the pixels in the uncoded image and the prediction values corresponding to the PUs. The video encoder 20 can form the TUs to include quantized transform coefficients representative of the residual data for the CU. That is, the video encoder 20 can calculate the residual data (in the form of a residual block), transform the residual block to produce a transform coefficient block and then quantize the transform coefficients to form quantized transform coefficients. The video encoder 20 can form a TU, including quantized transform coefficients, as well as other syntax information (e.g., split information for the TU).
[0067] As mentioned above, after any transformed to produce coefficients of transformed, the encoder video 20 can perform quantization of coefficients of transformed. The quantiz action usually if refers to a process in which coefficients of transformed are quantized to possibly reduce the
Petition 870190080478, of 8/19/2019, p. 37/150
29/100 amount of data used to represent the coefficients, providing additional compression. The quantization process can reduce the bit depth associated with some or all of the coefficients. For example, a value of n bits can be rounded down to a value of m bits during quantization, where n is greater than m.
[0068] After quantization, the video encoder can scan the transform coefficients, producing a one-dimensional vector from the two-dimensional matrix, including the quantized transform coefficients. The sweep can be designed to place higher (and therefore lower frequency) energy coefficients in front of the arrangement, and to place lower (and therefore higher frequency) energy coefficients at the back of the arrangement. In some examples, video encoder 20 may use a predefined scan order to scan the quantized transform coefficients to produce a serialized vector that can be encoded by entropy. In other examples, the video encoder 20 can perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 can entropy encode the one-dimensional vector, for example, according to the context-adaptive variable length encoding (CAVLC), the adaptive binary arithmetic encoding context (CABAC), syntax-based adaptive arithmetic coding based on syntax (SBAC), probability interval entropy (PIPE) or other
Petition 870190080478, of 8/19/2019, p. 38/150
30/100 entropy coding methodology. The video encoder 20 can also entropy encode the syntax elements associated with the encoded video data for use by the video decoder 30 in decoding the video data.
[0069] To perform CABAC, the video encoder 20 can assign a context within a context model to a symbol to be transmitted. The context can relate, for example, if the neighboring values of the symbol are different from zero or not. To perform CAVLC, video encoder 20 can select a variable length code for a symbol to be transmitted. VLC code words can be constructed so that relatively short codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC can save excess bits, for example, using code words of equal length for each symbol to be transmitted. The probability determination can be based on the context assigned to the symbol.
[0070] In general, the video decoder 30 performs a substantially similar, though reciprocal, process to that performed by the video encoder 20 to decode the encoded data. For example, the video decoder 30 inversely quantizes and inversely transforms the coefficients of a received TU to reproduce a residual block. The video decoder 30 uses a signaled prediction mode (intra or inter prediction) to form a predicted block. Then the
Petition 870190080478, of 8/19/2019, p. 39/150
31/100 video decoder 30 combines the predicted block and the residual block (on a pixel-by-pixel basis) to reproduce the original block. Additional processing can be performed, such as performing a disaggregation process to reduce visual artifacts along the block boundaries. In addition, the video decoder 30 can decode the syntax elements using CABAC in a substantially similar, though reciprocal, manner to the CABAC encoding process of the video encoder 20.
[0071] Video encoder 20 can additionally send syntax data, such as block-based syntax data, image-based syntax data and sequence-based syntax data to video decoder 30, for example, in a header image, a block header, a fraction header, or other syntax data, such as a set of sequence parameters (SPS), a set of image parameters (PPS), or a set of video parameters (VPS) .
[0072] Each of video encoder 20 and video decoder 30 may be implemented as any one of several suitable encoder or decoder circuit systems, as applicable, such as one or more microprocessors, processing circuit system ( including fixed function circuit system and / or programmable processing circuit system), digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable port arrangements (FPGAs), discrete logic circuit system , software, hardware, firmware or any combination thereof. Each of the
Petition 870190080478, of 8/19/2019, p. 40/150
32/100 video encoder 20 and video decoder 30 can be included in one or more encoders or decoders, any of which can be integrated as part of a combined video encoder / decoder (CODEC). A device including the video encoder 20 and / or the video decoder 30 may comprise an integrated circuit, a microprocessor and / or an unwired communication device, such as a cell phone.
[0073] According to the techniques of this disclosure, a video encoder, such as video encoder 20 and video decoder 30, can execute DMVD to derive motion information for a current block of video data. In particular, these techniques can include any or all of the following, alone or in any combination.
[0074] One of the main concepts of this revelation is to improve the derivation of motion vector on the decoder side (DMVD). The techniques are elaborated in several different aspects detailed as discussed below. The following techniques for improving DMVD can be applied individually. Alternatively, any combination of them can be applied.
[0075] The concept of DMVD approaches is for the video encoder (for example, the video encoder 20 or the video decoder 30) to derive motion information, such as motion vectors and prediction directions, using previously decoded information . In current approaches, groups of pixels are first derived and motion information is additionally derived using groups of pixels.
Petition 870190080478, of 8/19/2019, p. 41/150
33/100
This disclosure refers to these groups of pixels as the indication of pixels. For example, in FRUC Model Matching, the models of the current block and the reference blocks are the indication of pixels; the Bilateral FRUC correspondence, the mirror pairs along the movement paths of the reference blocks are the indication of pixels; in Bilateral Model Correspondence, the model generated by bi-prediction and the reference blocks are the indication of pixels; and in BIO, the reference blocks are the indication of pixels.
[0076] A video encoder can apply filters to indicate pixels. The filters can be any noise removal filter, such as the Guided filter, the Bilateral filter, the median filter and so on. The filters can also be any smoothing filter, such as an average filter. In addition, if a filtering process is applied (or not) to the pixel indication for the DMVD addressed, it can be signaled in the fraction's SPS / PPS / header.
[0077] The video encoder (for example, video encoder 20 or video decoder 30) can apply any methods of refinement of motion for the generation of the pixel indication. Movement refinement methods can contain, but are not limited to, existing methods, including BIO, FRUC Model matching, Bilateral FRUC matching. In one example, when the pixel signal is generated by multiple compensated motion (MC) blocks, BIO can be applied to the pixel signal to further improve the quality of the pixel signal. For example, in the approach
Petition 870190080478, of 8/19/2019, p. 42/150
34/100 of Bilateral Model Correspondence, the Bilateral Model is generated by the average of blocks MC L0 and LI. The video encoder can simply apply the BIO to refine the Bilateral Model and use the refined Bilateral Model to perform MV refinements. A quick algorithm can be applied to avoid possible redundant operations. After refinement of the Bilateral Model Matching MV, if the MV is identical to the original, the video encoder does not need to perform another BIO and MC. As the Bilateral Model will be identical to the final predictor, the video encoder can directly use the Bilateral Model as the final predictor.
[0078] When the pixel indication is generated by compensated motion (MC) blocks, the video encoder can additionally improve the pixel indication using several hypothesis predictions, such as the weighted average of several MC blocks. For example, OBMC can be applied to generate the pixel signal. In another example, the various MC blocks can be derived using the motion vectors of the current block added by displacements (ie +1 or -1) for the X or Y component of the current MV or both X and Y components of the MV current.
[0079] The video encoder can improve the pixel indication iteratively using the refined MVs. For example, in the Bilateral Model Matching method, after the refined VM is derived, the Bilateral Model can be generated again using the refined VM and the video encoder can perform another VM refinement and the MV refinement iteration can be repeated until some pre-established criterion is
Petition 870190080478, of 8/19/2019, p. 43/150
35/100 reached. In one example, the iteration number is fixed and pre-established for video encoder 20 and video decoder 30. For example, the derivation of MV is iterated N times (N is fixed and pre-established) and for each iteration the pixel indication is refined according to the results of previous iterations and the refined pixel indication is then used to perform the MV derivation. In another example, iterations are terminated when the correspondence cost is less (or equal to) a pre-established limit. In yet another example, iterations are terminated when the correspondence cost is less than (or equal to) a pre-established limit or the number of iterations reaches a pre-established number.
[0080] In addition to the pixel indication of the same color component, the video encoder can use the pixel indication of any or all of the other components to perform the MV derivation. In addition, the pixel indication of the other component may be the reconstructed pixels, which are the predicted pixels of the associated signaled residual.
[0081] In order to generate reference data, video encoder 20 decodes encoded video data and stores the decoded video data in a decoded image buffer (DPB), for example, a part of the encoder memory video 20. Thus, video encoder 20 can use DPB data for reference when predictively encoding subsequent video data. Because the video encoder 20 includes elements for decoding encoded video data, it can be said that the video encoder 20
Petition 870190080478, of 8/19/2019, p. 44/150
36/100 includes a video decoder. [0082] The video encoder 20 and / or the video decoder 30 can derive motion information for a current block of video data, that is, a block currently being decoded. To derive the motion information, the video encoder 20 and / or the video decoder 30 can first determine a pixel indication for the current block. The pixel indication generally corresponds to the pixel data obtained from one or more groups of pixels previously decoded. The indication of pixels can be, for example, one or more blocks that have a high probability of being identified by a motion vector. The video encoder 20 and / or the video decoder 30 can determine such blocks according to bilateral model correspondence. Additionally or alternatively, the video encoder 20 and / or the video decoder 30 can determine such blocks from neighboring pixels for the current block and corresponding neighboring pixels most closely to the reference blocks, so that the reference blocks form the pixel hint.
[0083] In some examples, the video encoder 20 and / or the video decoder 30 can generate the pixel indication using various hypothesis predictions from several blocks with compensated movement. For example, video encoder 20 and / or video decoder 30 can calculate a weighted average of the various blocks with compensated motion. Additionally or alternatively, the video encoder 20 and / or the video decoder 30 can perform a movement
Petition 870190080478, of 8/19/2019, p. 45/150
37/100 overlapping block offset to generate pixel indication. As yet another example, the video encoder 20 and / or the video decoder 30 can add displacements for one or more motion vectors of the current block and derive the various motion compensated blocks from the displaced motion vectors (as well as the original motion vectors of the current block).
[0084] In some examples, the video encoder 20 and / or the video decoder 30 can calculate a correspondence cost between a first reference block and a second reference block identified by the derived motion information for the current block. The video encoder 20 and / or the video decoder 30 can calculate the correspondence cost by applying the respective weight values for the cost measurements to the corresponding pixels of the reference blocks, for example, a first weight for a first measurement cost and a second weight for a second cost measurement, where the weights and cost measurements can be different from each other. The video encoder 20 and / or the video decoder 30 can then refine the motion information based on the correspondence cost. Additionally, the video encoder 20 and / or the video decoder 30 can determine the weightings based on the distances between the corresponding pixels and specific points of the current block, the distances between the corresponding pixels and specific points of the pixel indication, in a row or column that includes the corresponding pixels and / or in the regions that include the corresponding pixels.
Petition 870190080478, of 8/19/2019, p. 46/150
38/100 [0085] FIG. 2 is a block diagram illustrating an example of a video encoder 20 that can implement techniques for performing motion vector derivation on the decoder (DMVD) side of this disclosure. In particular, video encoder 20 can perform the DMVD techniques of this disclosure during a decoding loop, which includes processes performed by the inverse quantization unit 58, the inverse transform unit 60 and the adder 62. Additionally, as discussed above, the video encoder 20 can signal some values that can assist a video decoder, such as the video decoder 30, in executing the DMVD.
[0086] Video Encoder 20 can perform intra and inter encoding of video blocks within video fractions. Intra coding relies on spatial prediction to reduce or remove spatial redundancy in the video within a given frame or video image. Inter coding relies on temporal prediction to reduce or remove temporal redundancy in the video within adjacent frames or images in a video sequence. Intra mode (I mode) can refer to any of several space-based coding modes. Inter modes, such as unidirectional prediction (P mode) or biprediction (B mode), can refer to any of several time-based coding modes.
[0087] As shown in FIG. 2, video encoder 20 receives a current video block within a video frame to be encoded. In the example of FIG. 2, video encoder 20 includes the
Petition 870190080478, of 8/19/2019, p. 47/150
39/100 mode selection 40, reference image memory 64 (which can also be referred to as a decoded image storage (DPB)), adder 50, transform processing unit 52, quantization unit 54 and entropy coding unit 56. The mode selection unit 40, in turn, includes the motion compensation unit 44, motion estimation unit 42, intra prediction unit 46 and division unit 48. For the reconstruction of video block, the video encoder 20 also includes the inverse quantization unit 58, the inverse transform unit 60, and the adder 62. A breakout filter (not shown in FIG. 2) can also be included to filter thresholds of block to remove blocking artifacts from the reconstructed video. If desired, the breakout filter would typically filter the output of adder 62. Additional filters (loop or post-loop) can also be used in addition to the breakout filter. These filters are not presented for brevity, but, if desired, they can filter the output of adder 50 (like a loop filter).
[0088] During the encoding process, video encoder 20 receives a frame or video fraction to be encoded. The frame or fraction can be divided into several blocks of video. The motion estimation unit 42 and the motion compensation unit 44 perform inter predictive encoding of the received video block in relation to one or more blocks in one or more reference frames to provide time prediction. The intra 46 prediction unit can in
Petition 870190080478, of 8/19/2019, p. 48/150
40/100 alternative to performing the prediction encoding of the received video block in relation to one or more neighboring blocks in the same frame or in the fraction that the block to be encoded to provide spatial prediction. The video encoder 20 can perform several encoding passes, for example, to select an appropriate encoding mode for each block of video data.
[0089] Furthermore, the division unit 48 can divide blocks of video data into sub-blocks, based on the evaluation of previous division schemes in previous coding passages. For example, the division unit 48 can initially divide a frame or fraction into CTUs and divide each of the CTUs into sub-CUs based on the rate distortion analysis (for example, rate distortion optimization). The mode selection unit 40 can additionally produce a quaternary tree data structure indicative of dividing a CTU into sub-CUs. Quaternary tree leaf node CUs can include one or more PUs and one or more TUs.
[0090] The mode selection unit 40 can select one of the prediction modes, intra or inter, for example, based on the error results, and provide the resulting predicted block for adder 50 to generate residual data and adder 62 to reconstruct the coded block for use as a frame of reference. The mode selection unit 40 also provides syntax elements, such as motion vectors, intra mode indicators, division information and other such syntax information, for the entropy coding unit 56.
Petition 870190080478, of 8/19/2019, p. 49/150
41/100 [0091] The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a PU from a video block within a current frame or video image relative to a predictive block within a reference frame (or other encoded unit) relative to the current block being encoded within the current frame (or another encoded unit). A predictive block is a block that closely matches the block to be encoded, in terms of pixel difference, which can be determined by the sum of absolute differences (SAD), sum of square difference (SSD) or other difference metrics. In some examples, video encoder 20 can calculate values for subintelligent pixel positions of reference images stored in reference image memory 64. For example, video encoder 20 can interpolate quarter pixel position values , eighth pixel positions, or other fractional pixel positions in the reference image. Therefore, the motion estimation unit 42 can perform a motion search relative to the entire pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.
[0092] The motion estimation unit 42 calculates a motion vector for a PU of a block of
Petition 870190080478, of 8/19/2019, p. 50/150
42/100 video in an inter coded fraction, comparing the position of the PU with the position of a predictive block of a reference image. The reference image can be selected from a first list of reference images (List 0) or a second list of reference images (List 1), each of which identifies one or more reference images stored in the memory. reference image 64. Motion estimation unit 42 sends the calculated motion vector to entropy coding unit 56 and to motion compensation unit 44.
[0093] The compensated movement, performed by the movement compensation unit 44, may involve seeking or generating the predictive block based on the movement vector determined by the movement estimate unit 42. Again, movement estimate unit 42 and compensation unit of movement 44 can be functionally integrated, in some examples. Upon receiving the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block for which the motion vector highlights in one of the reference image lists. The adder 50 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being encoded, forming pixel difference values, as discussed below. In general, the motion estimation unit 42 performs the motion estimate in relation to the luma components, and the motion compensation unit 44 uses calculated motion vectors.
Petition 870190080478, of 8/19/2019, p. 51/150
43/100 based on luma components for chroma components and luma components. The mode selection unit 40 can also generate syntax elements associated with the video blocks and the video fraction for use by the video decoder 30 in decoding the video blocks of the video fraction.
[0094] In accordance with the techniques of this disclosure, the mode selection unit 40 may determine that the motion compensation unit 44 must derive the motion information for a current block of video data using the vector derivation techniques of movement on the decoder side. Consequently, the motion compensation unit 44 can use any or all of the techniques of this disclosure, as discussed in greater detail above and below, to generate motion information for the current block. Thus, instead of using only motion information determined by motion estimation unit 42, motion compensation unit 44 can derive motion information for blocks of video data, for example, using a pixel indication, as discussed in this document.
[0095] The motion compensation unit 44 can determine the pixel indication from pixel data of one or more pixel groups of previously decoded pixels, for example, of previously decoded images stored in the reference image memory 64. In some instances, the motion compensation unit 44 can generate the pixel signal using various hypothesis predictions from several
Petition 870190080478, of 8/19/2019, p. 52/150
44/100 blocks with compensated movement of previously decoded images.
[0096] In some examples, to derive the movement information, the movement compensation unit 44 can determine an inter prediction direction for the movement information according to the correspondence costs between different prediction directions. The inter prediction direction can generally correspond to whether the derived motion information refers to the reference images in list O, list 1 or both in list O and list 1 (ie, bi-prediction).
[0097] In some examples, the movement compensation unit 44 can calculate a correspondence cost between a first reference block and a second reference block. To calculate the correspondence cost, the movement compensation unit 44 can calculate a weighted average of two or more cost measurement techniques. For example, the motion compensation unit 44 can perform a first cost measurement of the differences between the first and second reference blocks and then a second different cost measurement of these differences. The movement compensation unit 44 can then weight the cost measurements, for example, by applying the weighting values to the cost measurements. The movement compensation unit 44 can then accumulate (i.e., add) the weighted cost measurements to obtain a final correspondence cost and then refine the movement information using the correspondence cost.
[0098] In some examples, the unit of
Petition 870190080478, of 8/19/2019, p. 53/150
45/100 motion compensation 44 can determine whether or not to use one or more motion vectors from derived motion information, for example, based on whether the motion vectors are similar to other motion vector candidates in a candidate list motion vector in a current block. If one or more of the motion vectors are similar enough for an existing motion vector candidate, the motion compensation unit 44 can discard one or more motion vectors.
[0099] The intra 46 prediction unit can intra predict a current block, as an alternative to the inter prediction performed by the motion estimation unit 42 and the motion compensation unit 44, as described above. In particular, the intra prediction unit 46 can determine an intra prediction mode to use to encode a current block. In some instances, the intra prediction unit 46 may encode a current block using various intra prediction modes, for example, during separate coding passes, and the intra prediction unit 46 (or mode selection unit 40, in some examples) you can select an appropriate intra prediction mode to use from the tested modes.
[00100] For example, the intra prediction unit 46 can calculate the rate distortion values using a rate distortion analysis for the various intra prediction modes tested, and select the intra prediction mode having the best rate distortion characteristics among the tested modes. Rate distortion analysis generally determines an amount of distortion (or error) between a coded block and an original block, not
Petition 870190080478, of 8/19/2019, p. 54/150
46/100 encoded, which was encoded to produce the encoded block, as well as a bit rate (i.e., a number of bits) used to produce the encoded block. The intra prediction unit 46 can calculate proportions from the distortions and rates for the various coded blocks to determine which intra prediction mode has the best rate distortion value for the block.
[00101] After selecting an intra prediction mode for a block, the intra prediction unit 46 can provide information indicative of the intra prediction mode selected for the block for the entropy coding unit 56. The entropy coding unit 56 can encode the information indicating the selected intra prediction mode. The video encoder 20 may include in the transmitted bit stream configuration data, which may include several intra prediction mode cue tables and several modified intra prediction cue tables (also referred to as keyword mapping tables). code), coding context definitions for various blocks, and indications of a most likely intra prediction mode, an intra prediction mode indicator table and a modified intra prediction mode indicator table for use for each of the contexts.
[00102] The video encoder 20 forms a residual video block by subtracting the prediction data from the mode selection unit 40 from the original video block being encoded. The adder 50 represents the component or components that perform this subtraction operation. The processing unit
Petition 870190080478, of 8/19/2019, p. 55/150
47/100 transform 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block, producing a video block comprising transform coefficient values. Wavelet transforms, integer transforms, subband transforms, discrete sine transforms (STDs) or other types of transforms could be used instead of a DCT. In any case, the transform processing unit 52 applies the transform to the residual block, producing a block of transform coefficients. The transform can convert residual information from a pixel domain to a transform domain, such as a frequency domain. The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting a quantization parameter.
[00103] After quantization, the entropy coding unit 56 entropy codes the quantized transform coefficients. For example, the entropy coding unit 56 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic (CABAC), syntax-based context-adaptive binary coding (SBAC), coding by
Petition 870190080478, of 8/19/2019, p. 56/150
48/100 entropy by probability interval division (PIPE) or other entropy coding technique. In the case of context-based entropy coding, the context can be based on neighboring blocks. Following entropy coding by entropy coding unit 56, the encoded bit stream can be transmitted to another device (e.g., video decoder 30) or archived for later transmission or retrieval.
[00104] The inverse quantization unit 58 and the inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain. In particular, adder 62 adds the reconstructed residual block to the previously compensated motion prediction block produced by the motion compensation unit 44 or the intra prediction unit 46 to produce a reconstructed video block for storage in reference image memory 64 The reconstructed video block can be used by the motion estimation unit 42 and the motion compensation unit 44 as a reference block for encoding inter-block in a subsequent video frame.
[00105] FIG. 3 is a block diagram illustrating an example of a video decoder 30 that can implement techniques for performing motion vector derivation on the decoder (DMVD) side of this disclosure. In the example of FIG. 3, the video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, a prediction unit
Petition 870190080478, of 8/19/2019, p. 57/150
49/100 intra 74, an inverse quantization unit 76, an inverse transform unit 78, a reference image memory 82 and an adder 80. The video decoder 30 may, in some instances, perform a generally reciprocal decoding step to the encoding pass described in relation to video encoder 20 (FIG. 2). The motion compensation unit 72 can generate prediction data based on the motion vectors received from the entropy decoding unit 70, while the intra prediction unit 74 can generate prediction data based on intra prediction mode indicators received from of the entropy decoding unit 70.
[00106] During the decoding process, the video decoder 30 receives an encoded video bit stream that represents video blocks from an encoded video fraction and associated syntax elements from the video encoder 20. The decoding unit entropy 70 of the video decoder 30 entropy decodes the bit stream to generate quantized coefficients, motion vectors or intra prediction mode indicators and other syntax elements. The entropy decoding unit 70 forwards the motion vectors and other syntax elements to the motion compensation unit 72. The video decoder 30 can receive the syntax elements at the video fraction level and / or at the block level of video.
[00107] When the video fraction is encoded as an intra (I) encoded fraction, the intra prediction unit 74 can generate prediction data for a block of
Petition 870190080478, of 8/19/2019, p. 58/150
50/100 video of the current video fraction based on a signaled intra prediction mode and data from previously decoded blocks of the current frame or image. When the video frame is encoded as an inter encoded fraction (i.e., B or P), the motion compensation unit 72 produces predictive blocks for a video block of the current video fraction based on the motion vectors and other elements of syntax received from the entropy decoding unit 70. Predictive blocks can be produced from one of the reference images within one of the reference image lists. The video decoder 30 can build the reference frame lists, List 0 and List 1, using predefined construction techniques based on the reference images stored in the reference image memory 82.
[00108] The motion compensation unit 72 determines the prediction information for a video block of the current video fraction by analyzing the motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current video block being decoded. For example, the motion compensation unit 72 uses some of the syntax elements received to determine a prediction mode (for example, intra or inter prediction) used to encode the video blocks of the video fraction, a type of prediction fraction inter (for example, fraction B or fraction P), construction information for one or more of the reference image lists for the fraction, motion vectors for each video block
Petition 870190080478, of 8/19/2019, p. 59/150
51/100 inter fraction coded, inter prediction status for each inter fraction coded video block, and other information to decode the video blocks in the current video fraction.
[00109] The motion compensation unit 72 can also perform interpolation based on interpolation filters. The motion compensation unit 72 can use interpolation filters as used by the video encoder 20 during the encoding of the video blocks to calculate interpellated values for subintelligent pixels of reference blocks. In this case, the motion compensation unit 72 can determine the interpolation filters used by the video encoder 20 from the received syntax elements and use the interpolation filters to produce predictive blocks.
[00110] According to the techniques of this disclosure, the motion compensation unit 72 can determine to derive motion information for a current block of video data using motion vector derivation techniques on the decoder side. Consequently, the motion compensation unit 72 can use any or all of the techniques of this disclosure, as discussed in greater detail above and below, to generate motion information for the current block. Thus, instead of using only the motion information decoded by the entropy decoding unit 70, the motion compensation unit 72 can derive the motion information for blocks of video data, for example, using a pixel indication, such as discussed in this document.
Petition 870190080478, of 8/19/2019, p. 60/150
52/100 [00111] The motion compensation unit 72 can determine the pixel trace from pixel data from one or more groups of previously decoded pixel pixels, for example, from previously decoded images stored in the image memory of reference 82. In some examples, the motion compensation unit 72 can generate the pixel indication using several hypothesis predictions of several blocks with compensated movement of the previously decoded images.
[00112] In some examples, to derive the movement information, the movement compensation unit 72 can determine an inter prediction direction for the movement information according to the correspondence costs between the different prediction directions. The inter prediction direction can generally correspond to whether the derived motion information refers to reference images in list O, list 1 or both in list O and list 1 (ie, bi-prediction).
[00113] In some examples, the movement compensation unit 72 can calculate a correspondence cost between a first reference block and a second reference block. To calculate the corresponding cost, the movement compensation unit 72 can calculate a weighted average of two or more cost measurement techniques. For example, the motion compensation unit 72 can perform a first cost measurement of the differences between the first and second reference blocks and then a second different cost measurement of those differences. The unit of
Petition 870190080478, of 8/19/2019, p. 61/150
53/100 motion compensation 72 can then weight cost measurements, for example, by applying weight values to cost measurements. The movement compensation unit 72 can then accumulate (i.e., add) the weighted cost measurements to obtain a final correspondence cost and then refine the movement information using the correspondence cost.
[00114] In some examples, the motion compensation unit 72 can determine whether or not it uses one or more motion vectors from derived motion information, for example, based on whether the motion vectors are similar to other candidates for motion vectors. motion in a list of motion vector candidates for a current block. If one or more of the motion vectors is similar enough to an existing motion vector candidate, the motion compensation unit 72 can discard the one or more motion vectors.
[00115] The inverse quantization unit 76 inversely quantizes, that is, dequantizes, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit 70. The inverse quantization process may include the use of a parameter of quantization QP _Y calculated by the video decoder 30 for each video block in the video fraction to determine a degree of quantization and, likewise, a degree of inverse quantization that must be applied.
[00116] The reverse transform unit 78 applies an inverse transform, for example, a DCT
Petition 870190080478, of 8/19/2019, p. 62/150
54/100 inverse, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
[00117] After the motion compensation unit 72 generates the predictive block for the current video block based on the motion vectors and other syntax elements, the video decoder 30 forms a decoded video block by adding the residual blocks from the reverse transform unit 78 with the corresponding predictive blocks generated by the motion compensation unit 72. The adder 80 represents the component or components that perform this sum operation. If desired, a breakout filter can also be applied to filter the decoded blocks, in order to remove blocking artifacts. Other loop filters (in the encoding loop or after the encoding loop) can also be used to smooth out pixel transitions or improve video quality. The video blocks decoded in a given frame or image are then stored in the reference image memory 82, which stores reference images used for subsequent motion compensation. The reference image memory 82 also stores decoded video for later display on a video device, such as the video device 32 of FIG. 1.
[00118] FIGS. 4A and 4B are conceptual diagrams illustrating candidates for spatial motion vectors derived from neighboring blocks. The video encoder 20 and / or the video decoder 30
Petition 870190080478, of 8/19/2019, p. 63/150
55/100 can derive spatial MV candidates from neighboring blocks, for example, as shown in FIGS. 4A and 4B, for a specific PU (for example, PUO 102 of FIG. 4A and PUO 122 of FIG. 4B, which are included in the same corresponding CUs as PU1 104 and PU1 124, respectively), although the methods for generating candidates from blocks differ for blending and AMVP modes. In blending mode, video encoder 20 and / or video decoder 30 can derive up to four spatial MV candidates in the order shown in FIG. 4A with numbers, as follows: left (block 112), above (block 106), above right (block 108), below left (block 114) and above left (block 110).
[00119] In AVMP mode, video encoder 20 and / or video decoder 30 divide neighboring blocks into two groups: the group on the left (including blocks 132 and 134) and the group above (including blocks 130, 126 and 128), as shown in FIG. 4B. For each group, the video encoder 20 and / or the video decoder 30 determine the potential candidate in a neighboring block, referring to the same reference image as indicated by the signaled reference indication that has the highest priority to form a final candidate of the group. It is possible that no neighboring block contains a motion vector pointing to the same reference image. Therefore, if such a candidate cannot be found, the video encoder 20 and / or the video decoder 30 can scale the first available candidate to form the final candidate and thus the differences in
Petition 870190080478, of 8/19/2019, p. 64/150
56/100 time distance can be compensated.
[00120] FIGS. 5A and 5B are conceptual diagrams illustrating a main block location for a candidate for temporal motion vector predictor (TMVP). The video encoder 20 and video decodifleader 30 can add candidates for temporal motion vector (TMVP) predictors, if enabled and available, to the list of motion vector candidates (MV) following candidates for spatial motion vector in the MV candidate list. The motion vector derivation process for TMVP candidates is the same for both blending and AMVP modes. However, the target benchmark for the TMVP candidate in the merge mode can be set to 0.
[00121]
Main block location of the block for TMVP candidate derivation for a PU is the lower right block outside the co-located PU, for example, PU0 140. Main location for the TMVP candidate derivation as shown in FIG. 5A is block T 142, to compensate for the tendency to the above and left blocks used to generate neighboring spatial candidates. However, if that block is located outside the current CTB row, or the movement information is not available, the block is replaced with a central PU block, for example, block 144.
[00122]
The video encoder and
video decoder 30 can derive a motion vector for the TMVP candidate from the colocalized PU of the co-located image, indicated at the fraction level. The motion vector for the co-located PU is
Petition 870190080478, of 8/19/2019, p. 65/150
57/100 called a co-located MV. Similar to the direct temporal mode in the AVC, to derive the candidate motion vector TMVP, the co-located MV needs to be dimensioned to compensate for the differences in temporal distance, as shown in FIG. 5B.
[00123] In blending mode and / or in AMVP, video encoder 20 and / or video decoder 30 can scale reference motion vectors. FIG. 5B illustrates time distances between images, for example, a current time distance between the current image 150 and the current reference image 152 and a co-located time distance between the co-located image 154 and the co-located reference image 156. It is assumed that the value of motion vectors is proportional to the distance of images in the presentation time. A motion vector associates two images: the reference image and the image containing the motion vector (that is, the image it contains). When the video encoder 20 or video decoder 30 uses a motion vector to predict another motion vector, the video encoder 20 or video decoder 30 calculates the distance between the image it contains and the reference image based on Image Order Count (POC) values for these images.
[00124] For a motion vector to be predicted, its image and the associated reference image may be different. Therefore, the video encoder 20 and / or the video decoder 30 can calculate a new distance (based on the POC). The video encoder 20 and / or the video decoder 30 can scale
Petition 870190080478, of 8/19/2019, p. 66/150
58/100 the motion vector based on these two POC distances. For a spatial neighbor candidate, the images they contain for the two motion vectors are the same, while the reference images are different. In HEVC, motion vector scaling applies to both TMVP and AMVP for neighboring spatial and temporal candidates.
[00125] In some examples, video encoder 20 and video decoder 30 can perform the generation of candidate artificial motion vector. If a list of motion vector candidates is not complete, video encoder 20 and video decoder 30 can generate artificial motion vector candidates and insert candidates at the end of the list, until the list has all the necessary candidates .
[00126] In merge mode, there are two types of artificial MV candidates: the combined candidate derived only for B fractions and zero candidates used only for AMVP if the first type does not provide enough artificial candidates.
[00127] For each pair of candidates that are already on the candidate list and have required motion information, the bidirectional combined motion vector candidates are derived by a combination of the motion vector of the first candidate referring to an image in the list 0 and the motion vector of a second candidate referring to an image in list 1.
[00128] In some cases, the video encoder 20 and the video decoder 30 may perform a removal process after inserting the candidate. Candidates from
Petition 870190080478, of 8/19/2019, p. 67/150
59/100 different blocks can be the same, which decreases the efficiency of a merge / AMVP candidate list. The removal process can resolve this issue. According to the removal process, video encoder 20 and video decoder 30 can compare one candidate with the others in the current candidate list to avoid identical candidates, to some extent. To reduce complexity, only limited numbers of removal processes are applied, rather than comparing each potential candidate with all other existing candidates.
[00129] In the JEM reference software, there are several inter coding tools that derive or refine the motion vector (MV) for a current block on the decoder side. These user-side MV derivation (DMVD) approaches are elaborated as below.
[00130] FIGS. 6 and 7 are conceptual diagrams illustrating concepts for derivation of corresponding pattern of motion vector (PMMVD). PMMVD mode is a special blending mode based on Upward Conversion Rate (FRUC) techniques. With this mode, the motion information of a block is not signaled, but derived on the decoder side (for example, during a decoding process performed by the video encoder 20 and the video decoder 30). This technology was included in JEM.
[00131] Video encoder 20 can signal a FRUC flag to a CU when a CU merge flag is true. When the FRUC flag is false, video encoder 20 signals a merge hint and, in response to detecting
Petition 870190080478, of 8/19/2019, p. 68/150
60/100 the blending indication after a false value for the FRUC flag, the video decoder 30 uses the normal blending mode. When the FRUC flag is true, video encoder 20 may flag an additional FRUC flag to indicate which method (bilateral match or model match) should be used to derive motion information for the block. Accordingly, the video decoder 30 can use the false value for the FRUC flag to determine that the FRUC mode flag will be present and determine the FRUC mode from the FRUC mode flag, for example, bilateral match or match of models.
[00132] During the motion derivation process, the video encoder 20 and / or the video decoder 30 can first derive an initial motion vector for the entire CU based on bilateral or model matching (according to FRUC mode flag). First, the video encoder 20 and / or the video decoder 30 can check the CU merge list, or called PMMVD seeds, and then select the candidate that leads to the minimum matching cost as a starting point. Then, video encoder 20 and / or video decoder 30 can conduct a local search based on bilateral correspondence or model matching around the starting point and select the MV resulting in the minimum correspondence cost as the MV for the entire the CU. Subsequently, video encoder 20 and / or video decoder 30 can further refine the
Petition 870190080478, of 8/19/2019, p. 69/150
61/100 sub-block level movement information with the derived CU motion vectors as the starting points.
[00133] A
FIG. 6 is a conceptual diagram illustrating concepts related to bilateral correspondence to derive movement information from the current block 160 in the current image 180. As shown in FIG. 6, the bilateral correspondence used to derive movement information from the current block 160 for finding the best correspondence between two reference blocks (for example, between block R0 162 and block RI 164 and / or between block R'0 166 and block R'l 168) along the corresponding movement paths of the current block 160 in two different reference images 182 and 184. In particular, motion vector 170 defines a location of block R0 162 in reference image 182 in relation to to the position of the current block 160, while motion vector 172 defines a location of block RI 164 in reference image 184 relative to the position of current block 160. Similarly, motion vector 174 defines the location of block R'0 166 in the reference image 182 in relation to the position of the current block 160, while the motion vector 176 defines the location of the block R'1 168 in the reference image 184 in relation to the position of the current block 160.
[00134]
According to bilateral correspondence, the motion vectors 170 and 172 have magnitudes proportional to the POC distances between the current image 180 and the reference images 182 and 184, and the motion vector 170 is on a path opposite to the motion vector 172. Similarly, motion vectors 174 and
Petition 870190080478, of 8/19/2019, p. 70/150
62/100
176 have quantities proportional to the POC distances between the current image 180 and the reference images 182 and 184, and the motion vector 174 has a path opposite to the motion vector 17 6. That is, if a motion vector has x- components and y- {x0, yO}, a motion vector having an equal magnitude in an opposite directory can be established by the components x- and y- {-x0, -yO}. In other words, under the assumption of the continuous motion path, the motion vectors 170 and 174 and the corresponding motion vectors 172 and 176 (respectively) pointing to respective reference blocks must be proportional to the time distances between the current image 180 and reference images 182 and 184. As a special case, when the current image 180 is temporarily between the reference images 182 and 184, and the time distance from the current image 180 to the reference images 182 and 184 is the same, the bilateral correspondence becomes mirror-based bidirectional MV.
[00135] FIG. 7 is a conceptual diagram illustrating concepts related to model matching to derive movement information from the current block 190. As shown in FIG. 7, model matching is used to derive movement information from the current block 190 by finding the best match between a model 216 (including the upper and / or left neighboring blocks to the current block 190) in the current image 210 and one or more reference models 218, 220, 222 and 224 (each having the same size as the model 216) in one or more images of references 212 and 214. Video encoder 20 and video decoder 30 can
Petition 870190080478, of 8/19/2019, p. 71/150
63/100 use one of the reference blocks 192, 194, 196 and 198 to predict the current block 190 (according to the respective motion vectors 200, 202, 204 and 206) for which the corresponding model of the reference models 218 , 220, 222 and 224 corresponds more closely to model 216. In several examples, video encoder 20 and video decoder 30 can use two of the vectors
of movement 200, 202, 204 and 206, for example, for bi- prediction.[00136] 0 encoder of video 20 can determine whether must use the way merge FRUC for
a CU according to a rate distortion (RD) cost selection, as done for the normal merge candidate. That is, the video encoder 20 can compare the two matching modes (bilateral matching and model matching) that are both checked for a CU using RD cost selection. The video encoder 20 can select the mode leading to the minimum cost of RD and additionally compare the cost of RD for this mode with the costs for other CU modes. If a FRUC match mode is the most efficient, video encoder 20 can set the FRUC flag as true for CU, and signal the related match mode. Likewise, the video decoder 30 can determine a prediction mode according to whether the FRUC flag is set to true and, if so, which of the bilateral correspondence or model correspondence is signaled.
[00137] FIG. 8 is a flow chart illustrating a process for matching conversion models
Petition 870190080478, of 8/19/2019, p. 72/150
64/100 ascending frame rate (FRUC) illustrative. At the 5th JVET meeting, the JVET-E0035 was proposed to further improve the matching of FRUC Models. A flow chart of the existing FRUC model matching mode is shown in FIG. 8. In the first step, a model T _o (and its corresponding movement information MVO) is found to correspond with the current model Tc of the current block from the reference images in list O (230). In the second step, the Ti model (and its corresponding movement information MV1) is found from reference images in list 1 (232). The movement information obtained from MVO and MV1 is used to perform bi-prediction to generate the current block predictor (234).
[00138] FIG. 9 is a flow chart illustrating the proposed illustrative changes to the FRUC model matching process of FIG. 8. The video encoder 20 and / or the video decoder 30 can perform the steps of FIG. 9 as shown. The existing FRUC model matching mode can be improved by introducing bidirectional model matching and adaptive selection between unidirectional prediction and bi-prediction. The proposed modifications relating to FIG. 8 are highlighted with light gray shading in FIG. 9 [00139] Proposed bidirectional model matching is implemented based on existing unidirectional model matching. As shown in FIG. 9, a corresponding model T _o is first found in the first stage of model matching from images of
Petition 870190080478, of 8/19/2019, p. 73/150
65/100 reference of list O (Note that list O here is taken as an example only) (240). In fact, whether list O or list 1 used in the first stage is adaptable to the initial distortion cost between the current model and the initial model in the corresponding reference figure. The initial model can be determined with initial movement information for the current block that is available before executing the first model match. The list of reference images corresponding to the minimum initial model distortion cost will be used in the first stage of model matching. For example, if the cost of distortion of the initial model corresponding to list O is not greater than the corresponding cost of list 1, list O is used in the first stage of model matching and list 1 is used in the second stage, then , the current model Tc of the current block is updated (242) as follows:
Tc-2 * Tc-To [00140] Instead of the current model Tc, the updated model T'c is used to find another corresponding model Ti from the reference images in list 1 in the second model correspondence. As a result, the corresponding model Ti is found by the joint use of the reference images from list 0 and list 1 (244). This matching process is called bi-directional model matching.
[00141] The proposed selection between unidirectional and bi-prediction for motion compensation prediction (MCP) is based on model matching distortion. As shown in FIG. 9, during
Petition 870190080478, of 8/19/2019, p. 74/150
66/100 model correspondence, the distortion between the model T _o and the model Tc (the current model) can be calculated as cost 0, and the distortion between the model Ti and the model T'c (the current model updated) can be calculated as cost 1. If cost 0 is less than 0.5 * cost 1 (SIM branch of 246), the MVO-based unilateral prediction is applied to the FRUC model matching mode (250); otherwise (branch NOT of 246), the bi-prediction model based on MV 0 and MV 1 is applied (248). Note that cost 0 is compared to 0.5 * cost 1 since cost 1 indicates a difference between the Ti and T'c model (the current updated model), which is twice the difference between Tc (the current model ) and its prediction of 0.5 * (T _o + Ti). It is noted that the proposed methods are only applied for movement refinement at the PU level. The refinement of movement at the Sub-PU level is kept unchanged.
[00142] FIG. 10 is a conceptual diagram illustrating concepts related to bidirectional optical flow in JEM. The Bi-directional Optical Flow (BIO) is a refinement of movement with respect to pixels which is performed on top of the motion compensation with respect to the block in a case of bi-prediction. Since it is used to compensate for refined motion within a block, allowing BIO can result in expanding the block size for motion compensation. Refinement of movement at the sample level does not require exhaustive search or signaling, since there is an explicit equation that gives the refined movement vector for each sample.
[00143] Let Τ ^{(λ) be} a luminance value from the reference k (k = 0, l) after the compensation of
Petition 870190080478, of 8/19/2019, p. 75/150
67/100 a ^w / a-, block movement, and 'are horizontal and vertical components of gradient I ^<k) respectively. Assuming that the optical flow is valid, the motion vector field (v _x , v _y ) is given by an equation aw / a + v.aw / air + r ^ / w (i) [00144] Combining the equation of optical flow with Hermite's interpellation for the movement path of each sample, a single third order polynomial is obtained that corresponds to both the function values I ^(k) and the derivatives Λ-><· at the ends. The value of this polynomial at t = 0 is the BIO prediction:
preíL · * 1/2 · (/ ^ + / ^í0 + · κ / 24 ^ ^ί11 / & “^ / άϊ) + ν * iMiJ i .5-f J i U ff Kj l í · U f jf (2) [00145] Here Tq and Ti denote the distance to reference frames as shown in FIG. 10. The distances To and Ti are calculated using POC values relative to the B 260 image for the RefO 262 image and the Refl 264 image: To = POC (current) - POC (RefO), Ti = POC (Refl) - POC ( current). If both predictions come from the same time direction (both from the past and from the future), then the signals are different To · Ti <0. In this case, the BIO is applied only if the prediction does not come from same time point (το Ti), both referenced regions have non-zero motion (MVx o, MVy o, MVx i, MVy i Φ 0), and the motion vectors of the block are proportional to the time distance (MVx ο / MVx i = MVy 0 / MVy! = - To / τι) ·
Petition 870190080478, of 8/19/2019, p. 76/150
68/100 [00146] The motion vector field (v _x , v _y ) is determined by minimizing the difference Ά between the values at points A and B (intersection of the motion path and the reference planes in Fig. 6) . The model uses only the first linear term of the local Taylor expansion for Δ:
Δ ~ (/ - “A + 1 f Φ '| [00147] All values in (1) depend on the location of the sample (i', j '), which has been omitted so far. Assuming that the movement is consistent in the surrounding, we minimize Δ inside the (2M + 1) x (2M + 1) square window Ω centered on the currently predicted point (i, j):
(v _vs J-argtmn (4) [00148] For this optimization problem, a solution, a simplified solution is used, making the first minimization in the vertical and then in the horizontal directions.
Vj. s ($! f r)> m cUp3 thBIO,; θ (5)
- (% + r)> m ctip3 (.-thBÍO, thBÍO, -: 0 (6) where,
Petition 870190080478, of 8/19/2019, p. 77/150
69/100 · * ι = Σ (ν / ^{! ,,} / & + «. - yÚ - / ⁽ “ T, a * / & + r>'7&)'iS>
'1 »Σ ^ ϊ * ^ + h 01'Ήη'7' / Φ + r. h'h ««, - Σ (η a »/ ^« £ ( · ₊
[.-: · + [t>
(7) [00149] In order to avoid division by zero or by a very small value, the regularization parameters r and m are introduced in equations (2), (3).
r = 500-4 ^tí-8 (8) m ~ 700 4 ^d ' ⁸ (9) [00150] Here D is the internal bit depth of the input video.
[00151] In some cases, the BIO MV regiment may be unreliable due to noise or irregular movement. Therefore, in BIO, the magnitude of the MV regiment is cut to some thBIO limit. The limit value is determined based on whether all of the reference images in the current image are all from one direction. If all reference images of the current images of the current image are from one direction, the limit value is set to 12 x 2 ¹⁴ ~ ^d , otherwise it will be set to 12 x 2 ¹³ ~ ^d .
[00152] Gradients for BIO are calculated at the same time with motion compensation interpolation using operations consistent with the HEVC motion compensation process (2D separable FIR). The input for this separable FIR in 2D is the same sample of reference frame as for the motion compensation process and the fractional position (fracX, fracY) according
Petition 870190080478, of 8/19/2019, p. 78/150
70/100 with the fractional part of the block motion vector. In the case of the horizontal gradient signal 31 / dx first vertically interpellated using BlOfílterS corresponding to the fractional position fracY with displacement offset d-8, then the gradient filter BIOfílterG is applied in the horizontal direction corresponding to the fractional position fractional withdrawal scaling by 18-d. In the case of the vertical gradient, _the first gradient filter is applied vertically using BIOfilterG corresponding to the fractional position fracY with displacement of d-8 removal, then the signal displacement is performed using BlOfilterS in the horizontal direction corresponding to the fractional position fracX with displacement of removal of scaling by 18-d. The length of the interpellation filter for calculating BIOfilterG gradients and BIOfilterF signal displacement is shorter (6 leads) in order to maintain reasonable complexity. Table 1 shows the filters used to calculate gradients for different fractional positions of the block movement vector in ΒΙΟ. Table 2 shows the interpellation filters used to generate the prediction signal in BIO.
[00153] FIG. 11 is a conceptual diagram illustrating an example of gradient calculation for an 8x4 block. For a current 8 x 4 270 block, a video encoder (for example, video encoder 20 or video decoder 30) searches for predictors with compensated motion and calculates the horizontal / vertical gradients (HOR / VER) of all pixels within the block
Petition 870190080478, of 8/19/2019, p. 79/150
71/100
270 current, as well as the two outer lines of pixels, because solving vx and vy for each pixel needs the HOR / VER gradient values and the predictors with pixel compensated movement within the janela window centered on each pixel, as shown in the equation ( 4). In addition, in JEM, the size of this window is set to 5 x 5. Therefore, the video encoder searches for predictors with compensated motion and calculates the gradients for the two outer lines of pixels around points A 272 and B 274.
Table 1: Filters for calculating gradients in BIO
Fractional Position Gradient Interpolation Filter (BlOfilterG) 0 {8, -39, -3, 46, -17, 5} 1/16 {8, -32, -13, 50, -18, 5} 1/8 {7, -27, -20, 54, -19, 5} 3/16 {6, -21, -29, 57, -18, 5} 1/4 {4, -17, -36, 60, -15, 4} 5/16 {3, -9, -44, 61, -15, 4} 3/8 {1, -4, -48, 61, -13, 3} 7/16 {0, 1, -54, 60, -9, 2} 1/2 {1, 4, -57, 57, -4, 1}
Table 2: Interpolation filters for generating the prediction signal in the BIO
Fractional Position was going Interpolation filter for prediction signal (BlOfilterS){o, o, 64, o, o, 0} 1/16 {1, -3, 64, 4, -2, 0} 1/8 {1, -6, 62, 9, -3, 1} 3/16 {2, -8, 60, 14, -5, 1} 1/4 {2, -9, 57, 19, -7, 2} 5/16 {3, -10, 53, 24, -8, 2} 3/8 {3, -11, 50, 29, -9, 2}
Petition 870190080478, of 8/19/2019, p. 80/150
72/100
7/16 {3, -11, 44, 35, -10, 3} 1/2 {1, -7, 38, 38, -7, 1}
[00154] In JEM, BIO is applied to all bidirectional predicted blocks when the two predictions are from different reference images. When the LIC is allowed for a CU, the BIO is disabled.
[00155] FIG. 12 is a conceptual diagram illustrating concepts related to the motion vector derivation on the decoder side (DMVD) proposed based on bilateral model correspondence. A video encoder (such as video encoder 20 or video decoder 30) can generate bilateral model 308 as the weighted combination of two prediction blocks 292 and 298, from initial MV 300 on list O and MV 302 from list 1, respectively, as shown in FIG. 12.
[00156] The video encoder (video encoder 20 or video decoder 30) can continue the model matching operation, which includes calculating the cost measurements between the generated model 308 and the sample region (around the initial prediction block) in reference images 312 and 314. For each of the reference images 312 and 314, the video encoder can determine that the MV that produces the minimum model cost is the updated MV of that list to replace the original . Finally, the video encoder uses the two new MVs, that is, the MV 304 and the MV 306, as shown in FIG. 12, for regular bi-prediction from the corresponding blocks 294 and 296. As is commonly
Petition 870190080478, of 8/19/2019, p. 81/150
73/100 used in the estimation of block correspondence movement, the sum of absolute differences (SAD) can be used as a cost measure.
[00157] Video encoder 20 and video encoder 30 can apply motion vector derivation on the decoder side (DMVD) for bi-prediction blending mode, with one from the reference image in the past and the another from the reference image in the future, without the transmission of additional syntax element from video encoder 20 to video decoder 30. In JEM4.0, when LIC, affinity, sub-CU or FRUC merge candidate is selected for a CU, DMVD is not applied.
[00158] FIGS. 13A and 13B are conceptual diagrams illustrating concepts related to overlapping block movement compensation (OBMC) in JEM. OBMC has been used for the first generations of video standards, for example, as in H.2 63. In JEM, OBMC is performed for all motion compensated (MC) block limits, except for the right and bottom of a CU. In addition, it is applied to the luma and chroma components. In JEM, an MC block corresponds to a coding block. When a CU is coded with the sub-CU mode (includes the sub-CU merge mode, Afine and FRUC mode, each CU sub-block is an MC block). To process the CU limits evenly, OBMC is performed at a sub-block level for all limits of the MC block, where the size of the established sub-block is 4 x 4, as shown in Figure 13. For example , a video encoder, such as the
Petition 870190080478, of 8/19/2019, p. 82/150
74/100 video encoder 20 or video decoder 30, can perform OBMC on the current sub-block 322 of FIG. 3A using motion vectors from the neighboring sub-block above 324 and / or from the left neighboring sub-block 326.
[00159] When the OBMC is applied to the current sub-block, in addition to current motion vectors, the motion vectors of four connected neighboring sub-blocks, if available and not identical to the current vector of the sub-block, are also used to derive the prediction block for the current sub-block. These various prediction blocks based on various motion vectors are combined to generate the final prediction signal for the current subblock.
[00160] As shown in FIG. 13B, a prediction block based on motion vectors of a neighboring subblock is denoted as P _N , with N indicating an indication for the neighboring subblock above 332, subblock below 338, the left subblock 336 and the sub-block on the right 334, and a prediction block based on motion vectors of the current sub-block 330 called P _c . When P _N is based on the movement information of a neighboring sub-block that contains the same movement information for the current sub-block, OBMC is not performed from P _N. Otherwise, each PN pixel is added to the same pixel on the PC, that is, four rows / columns of P _N are added to the P _c . Weighting factors {1/4, 1/8, 1/16, 1/32} are used for P _N and weighting factors {3/4, 7/8, 15/16, 31/32} are used for P _c . The exceptions are small MC blocks, (that is, when the height or width of the coding block is equal to 4 or a CU is coded with the mode
Petition 870190080478, of 8/19/2019, p. 83/150
75/100 sub-CU), for which only two rows / columns of P _N are added to P _c . In this case, the weighting factors {1/4, 1/8} are used for P _N and the weighting factors {3/4, 7/8} are used for P _c . For P _N generated based on vertically (horizontally) neighboring subblock motion vectors, the pixels in the same row (column) of P _N are added to P _c with the same weighting factor. Note that BIO is also applied for the derivation of the prediction block P _N.
[00161] FIGS. 14A and 14D are conceptual diagrams illustrating the OBMC weights. In particular, FIG. 14A illustrates a sub-block above for extended prediction, FIG. 14B illustrates a sub-block on the left for extended prediction, FIG. 14C illustrates a sub-block below for extended prediction, and FIG. 14D illustrates a sub-block on the right for extended prediction.
[00162] In JEM, for a CU with size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether the OBMC is applied or not for the current CU. For CUs larger than 256 luma samples or not encoded with AMVP mode, OBMC is applied by default. The video encoder 20 can take into account the impact of the OBMC on a CU during the motion estimation stage, as discussed above. The video encoder 20 can use the prediction signal to use the movement information from the upper neighboring block and the neighboring block on the left to compensate for the upper and left limits of the original current CU signal, and then apply the estimation process normal movement.
Petition 870190080478, of 8/19/2019, p. 84/150
76/100 [00163] Conventional methods related to DMVD (BIO, Bilateral Correspondence FRUC, Correspondence of FRUC Models, Correspondence of Bilateral Models, among others) provide significant reductions in the bit rate. However, some information may not be used in conventional approaches. This disclosure describes several techniques and this can additionally improve the DMVD, which can be performed by the video encoder 20 and / or the video decoder 30.
[00164] FIGS. 15A and 15B are conceptual diagrams illustrating examples of extended areas 352 and 356 for an indication of current block pixels 350 and 354, respectively. When a video encoder (for example, video encoder 20 or video decoder 30) generates a pixel cue from motion compensated (MC) blocks, the video encoder can extend the size of the pixel cue by seek and obtain more reference pixels. For example, if the current block size is Μ x N, the video encoder can derive (Μ + I) x (N + J) MV blocks as the pixel indication. In addition, the extended area can be of any shape. For example, as shown in FIG. 15A, the extended area 352 is the area around the current block 350. As another example, the extended area can be asymmetric. For example, as shown in FIG. 15B, the extended area 356 is asymmetric to the current block 354.
[00165] In addition, the extended area can also be irregular, as shown in FIG. 16. FIG. 16 is a conceptual diagram illustrating another example from the area
Petition 870190080478, of 8/19/2019, p. 85/150
77/100 extended 360 to the current block 358.
[00166] The extended areas of FIGS. 15 and 16 can serve the purpose of the models used by the FRUC Model Match method as discussed above. Note that when the video encoder (eg video encoder 20 or video decoder 30) derives the pixel indication from the image
current, the pixels within extended area can be the pixels rebuilt neighbors or the pixels of prediction with movement compensated.[00167] In another example, the encoder in
video (for example, video encoder 20 or video decoder 30) may decide to include a specific extended area from top / right / and bottom / left. If the object occlusion occurs in a specific direction, the video encoder uses a different QP value from specific neighboring blocks, or when a neighboring block has offset lighting without residue, the video encoder can detect these events by calculating a SAD value (or SAD with average removed) with and without the specific model. If the accumulated SAD value for including the specific model exceeds a pre-established limit, the video encoder may choose not to include the specific model in the pixel indication. Alternatively, the selection of the specific extended area can be signaled from the encoder side to provide a better balance between the complexity of the decoder and the encoding performance.
[00168] In another example, due to the nature of the flexible shape of the prediction block, when the proportion
Petition 870190080478, of 8/19/2019, p. 86/150
78/100 between width and height or the proportion between height and width is greater than a pre-established limit, the use of the model can be restricted to the longest side for a more stable prediction. This limit can also be signaled in the bit stream (for example, by the video encoder 20 and retrieved / decoded by the video decoder 30).
[00169] The video encoder 20 can derive any additional information that may be useful for the video decoder 30 to improve the MV derivation using the pixel indication. For example, video encoder 20 can signal residual or pixel offsets to video decoder 30, and pixel indication can be enhanced by signaled residual or pixel offsets to perform better MV derivation.
[00170] In the existing DMVD approaches, the motion vectors and prediction directions (L0, LI or Bipreditions) for a block or sub-block are derived both on the encoder and decoder side with the same methods, so that the information it does not need to be signaled in the bit stream. The techniques of this disclosure can further enhance these derivation approaches and extend the scope of the existing DMVD approach to determine more prediction information on the decoder side (for example, on the video decoder 30 and / or during a decoding process performed by the video encoder. video 20).
[00171] In some DMVD approaches, video encoder 20 and video decoder 30 can
Petition 870190080478, of 8/19/2019, p. 87/150
79/100 determine the inter prediction directions (L0, LI or Biprediction) according to the correspondence costs between different prediction directions. For example, assuming that the matching costs for L0, LI and Bi-prediction are CostLO, CostLl and CostBi, respectively, the prediction directions can be determined by choosing the prediction direction with minimal cost (based on the assumption that the lowest cost means best match result). As mentioned below, the correspondence cost can be the sum of the absolute difference (SAD), the sum of the quadratic sum of the difference (SSD) of the absolute transform difference (SATD) or any other cost measurement methods.
[00172] Based on observations during the development of the techniques of this disclosure, it has been found that bi-prediction generally provides more stable prediction results. Therefore, according to the techniques of this disclosure, the video encoder (for example, video encoder 20 or video decoder 30) can add a trend value to the correspondence costs so that bi-prediction is preferably selected. In one example, the costs for unidirectional prediction of list 0 (L0) and list 1 (LI) are scaled by a scaling value (for example, equal to 1.25) and the scaled costs of L0 and LI are then compared bi-prediction costs to select the best prediction direction. In another example, bi-prediction costs are reduced by a design value (for example, equal to 0.75) and the scaled bi-prediction cost is then compared to L0 costs.
Petition 870190080478, of 8/19/2019, p. 88/150
80/100 and LI to select the best prediction direction. The scaling value can be pre-established in both the video encoder 20 and the video decoder 30 (for example, as configuration information) or, alternatively, the video encoder 20 can signal the scaling value in the bit stream ( and the video decoder 30 can decode the scaling value).
[00173] The video encoder (for example, the video encoder 20 or the video decoder 30) can use the pixel indication to determine divisions of motion (for example, 2N x 2N, 2N x N, N x 2N, N x N, among others) for a block. A current block is divided into sub-blocks according to different movement divisions and the cost of each sub-block is calculated using its associated pixel index. Then, all costs between the different movement divisions are compared to each other to determine the best movement division for the current block. Different cost offsets can be added for different movement divisions to adjust the accuracy of the movement division determination.
[00174] FIGS. 17A through 17C are conceptual diagrams illustrating illustrative weightings given to several pixels according to the techniques of this disclosure. When calculating correspondence costs, the sum of the absolute difference (SAD), the sum of the quadratic difference (SSD), the sum of the absolute difference of transform (SATD), the mean absolute difference (MAD), the mean quadratic difference (MSD ) or any other cost measurement methods may be used. According to the techniques of this
Petition 870190080478, of 8/19/2019, p. 89/150
81/100 development, the video encoder can apply weighting to the costing for different pixels. Examples are shown in FIGS. 17A through 17C. In FIG. 17A, different weights are given for different rows and columns of the model 372 used for the current block 370 in the FRUC Model Correspondence. In FIG. 17B, the lower right part 376 of the current block 374 and the remaining part 378 of the current block 374 can use different weights (W1 and WO, respectively). Note that the weighting patterns are not restricted to these two examples of FIGS. 17A and 17B.
[00175] In addition, the weightings can be adaptable according to the coded information, such as the block size and the coded modes. In one example, the weighted SAD, as shown in FIG. 17A is applied to FRUC Model Match. For blocks with a width and height equal to or greater than 32, the weighting factors wO = 1/2, wl = 1/4, w2 = 1/8 and w3 = 1/8 are used. For the other blocks, the weighting factors wO = 1, wl = 1/2, w2 = 1/4 and w3 = 1/8 are used.
[00176] Additionally, the weightings can be adaptable according to the reconstructed pixel in the model. Depending on the variation of the pixel value or the border structure in the model, the weightings could be projected and applied.
[00177] In another example, the SAD weighting (or weighted SAD), as shown in FIG. 17B Bilateral Correspondence FRUC or Bilateral Model Correspondence with wO = 1 and w2 = 2 is applied.
Petition 870190080478, of 8/19/2019, p. 90/150
82/100 [00178]
When a video encoder, such as video encoder 20 or video decoder 30, performs motion vector derivation on the decoder side (DMVD) on more than one MV candidate, the video encoder can selectively apply the DMVD to partial sets of MV candidates according to the coded information, such as movement information, pixel information, block size, among others. In one example, when a VM candidate is similar or identical to previously derived MV candidates, the video encoder turns off Bilateral Model Matching for this MV candidate (or removes this MV candidate from the MV candidate list). More specifically, in some instances, when the MV difference between a MV candidate and any of the previously derived MV candidates is less than a pre-established limit (for example, 1 pixel), the video encoder disables Match of Bilateral Models for this MV candidate (or remove this MV candidate from the MV candidate list). Note that the video encoder can perform the MV difference exam on the X and Y components of the L0 and LI MVs.
[00179]
In another example, when the difference in
MV between a MV candidate and any of the previously derived MV candidates is less than a pre-established limit, the video encoder can disable bilateral Model Match for this MV candidate (or can remove this MV candidate from the list of MV candidates). The limits for the MV difference may be different for the different block size. For example, for blocks which are smaller than samples
Petition 870190080478, of 8/19/2019, p. 91/150
83/100 of 64 pixels, the limit can be set to 1/4 pixel; for blocks that are smaller than 256-pixel samples and greater than or equal to 64-pixel samples, the limit can be set as a pixel; for blocks of other sizes, the limit can be set to 1 pixel. Note that the video encoder can perform the MV difference exam on the X and Y components of the L0 and LI MVs.
[00180] In some examples, when a video encoder, such as video encoder 20 or video decoder 30, calculates correspondence costs, the video encoder can calculate, for example, any or all of: sum of absolute difference (SAD), sum of the quadratic sum of the difference (SSD) of the absolute transform difference (SATD), the average SAD removal, average SSD removal or any other cost measurement method. According to the techniques of this disclosure, the video encoder can apply the weightings to the cost calculation for different pixels. In this way, the video encoder can be configured with the associated weight for each pixel in the pixel indication to calculate the weighted cost (the cost can be the absolute difference, the square difference, the absolute transform difference, the absolute removal difference mean or the square difference of mean removal, for example). The video encoder then uses the sum of all the weighted cost of the pixels within the Pixel indication to determine the movement information, such as motion vectors, reference images, among others. There are several ways to determine weights
Petition 870190080478, of 8/19/2019, p. 92/150
84/100 as illustrated below. The video encoder can apply any or all of the following illustrative techniques, alone or in any combination:
1. The video encoder can determine associated weights according to the distance between the pixels and any specific point (for example, centroid or corner points) in a current block of video data. In one example, the video encoder assigns relatively lower weights to pixels that have greater distances from the specified point, or vice versa. The video encoder can classify pixels into groups and assign specific points to each group. Thus, the video encoder can determine associated weights for the pixels in each group according to the distance between the pixels in the group and the specific point of the group in the current block.
2. The video encoder can determine associated weights, according to the distance between the pixels and any specific point (for example, centroid or corner points) of the pixel indication, such as the model used in the FRUC Model Match. In one example, the video encoder assigns greater weights to greater distances, or vice versa. In some examples, the video encoder can classify pixels into groups and assign specific points to each group. Thus, the video encoder can determine associated weights for the pixels in each group according to the distance between the
Petition 870190080478, of 8/19/2019, p. 93/150
85/100 pixels in the group and the specific point in the group of the pixel hint.
3. The video encoder can use line based weights, as shown in FIGS. 17A and 17C, for simplification. FIG. 17C represents the current block 380 and the pixel indicator 382. The video encoder can be configured with weights (WO to W3 in FIG. 17A; WO to W7 in FIG. 17C) for each vertical or horizontal line. For additional simplicity, the video encoder can be configured with the same weight for several neighboring lines. The current block size can be defined as M x N, where M and N are integer values that can be, but are not necessarily the same (in the example in Figure 17C, the current block is 8 x 8). In one example, each line (M / O) shares the same weight along the horizontal side, and each line (N / O) shares the same weight along the vertical side. In this example, Μ, N and O are any positive integers. In one example, if O = 4, as in the example in FIG. 17C, WO and W1 are the same; W2 and W3 are the same; W4 and W5 are the same; and W6 and W7 are the same. Note that several line-based weights can be applied together. For example, the video encoder can determine the associated weight for each pixel by applying both line based weights as shown in FIGS. 17A and 17C.
4. For line-based weights, line weights
Petition 870190080478, of 8/19/2019, p. 94/150
86/100 neighbors can be in monotonic order, increasing or decreasing. For example, weights can be limited as WO <= W1 = W2 <= W3 <= W4 <= W5 <= W6 <= W7 or w0> = W1> = W2> = W3> = W4> = W5> = W6 > = W7, for the example of FIG. 17B.
5. To achieve additional simplifications, weights based on the region, as shown in FIG. 17B. The video encoder can divide the pixels into a pixel index for the current block 374 into several regions. The video encoder can be configured with an associated weight for each region. In the example of FIG. 17C, a first region 378 (region 0), a first weight (WO) is assigned, where the first region includes white pixels, while a second region 376 (region 1), a second weight (Wl) is assigned, where the second region includes pixels shaded in gray and outlined with a dashed line.
6. Weights can be adaptive depending on encoding information, such as block size, block modes and reconstructed pixels. For example, blocks of different sizes can have different weight sets. Thus, the video encoder can determine the adaptive weightings based on any or all of these factors for a current block and / or reference blocks used in the DMVD. [00181] FIG. 18 is a conceptual diagram
Petition 870190080478, of 8/19/2019, p. 95/150
87/100 illustrating another example of weight values applied to the pixel indicator 386 for the current block 384. A specific example using a combination of some of the above techniques is as follows. The current block 384 is an M x N block in this example. In this example, weights based on a horizontal line and weights based on a vertical line are both used. In addition, every vertical line (M / 4) shares the same weight along the horizontal side, and all lines (N / 4) share the same weight along the vertical side, in this example. As shown in FIG. 18, for blocks with a width and height equal to or greater than 8, the weighting factors wO = 1, wl = 1, w2 = 1/2 and w3 = 1/4 are used, while w'0 = 1, w '1 = 1/2, w'2 = 1/2 and w'3 = 0. For the other blocks, the weighting factors wO = 1, wl = 1/2, w2 = 1/4 and w3 are used = 0, while w'0 = 1, w '1 = 1, w'2 = 1 and w'3 = 1.
[00182] The video encoder (for example, the video encoder 20 or the video decoder 30) can perform a filtering process (for example, the low-pass filter) for the 386 pixel indication and / or a prediction for the 386 pixel hint, to improve derivation stability. FIGS. 19A and 19B are conceptual diagrams illustrating an example of a filter for such a filtering process. FIG. 19A illustrates an example of a 3-by-3 filter 392. Before applying filter 392, the video encoder (for example, Video encoder 20 or video decoder 30) can fill pixel values outside the 394 pixels, as shown in FIG. 19B. That is, in the example of
Petition 870190080478, of 8/19/2019, p. 96/150
88/100
FIG. 19B, the video encoder can determine the fill values for the pixels of the current block 390 shaded in gray (which are the indication of external pixels 394), with the purpose of applying the filter to filter the values of the pixel indication. As shown in FIG. 19B, filter 392 is applied to the values of the pixel indicator 394 and the padding values outside the pixel indicator 394 to filter the values of the pixel indicator 394. The video encoder can combine the filter with the weighted cost. The weights can be in monotonically increasing or decreasing order. For example, weights can be limited as W1 <= W2 <= W3, W4 <= W5 <= W6, W7 <= W8 <= W9, W1 <= W4 <= W7, W2 <= W5 <= W8, W3 <= W6 <= W9, W1> = W2> = W3, W4> = W5> = W6, W7> = W8> = W9, W1> = W4> = W7, W2> = W5> = W8, or W3> = W6> = W9.
[00183] FIG. 20 is a flow chart illustrating an illustrative method for encoding video data according to the techniques of this disclosure. The video encoder 20 is described as performing the techniques of FIG. 20, for example purposes, although it should be understood that in other examples, other video encoding devices may perform this or a similar method.
[00184] In this method, it is assumed that the video encoder 20 has previously encoded one or more images and received a current block of a current image to be encoded. The current image can be a P image, a B image or another image for which inter prediction is enabled. The video selection unit 40 of the video encoder 20 can calculate the rate distortion (RD) costs of performing various prediction modes
Petition 870190080478, of 8/19/2019, p. 97/150
89/100 for the current block (400). The mode selection unit 40 can then determine that the motion vector derivation on the decoder side (DMVD) yields the best RD cost among the modes tested and therefore determines the use of the DMVD for the current block (402). In this way, the video encoder 20 can determine that the motion information of the current block of video data is to be derived using DMVD.
[00185] The mode selection unit 40 can then signal to the motion compensation unit 44 that DMVD must be used to predict the current block. In response, the motion compensation unit 44 can determine a pixel indication for the current block (404). For example, the motion compensation unit 44 can determine the pixel indication using one of the model correspondence, bilateral correspondence, bilateral model correspondence, FRUC model correspondence, among others, as discussed above. In some instances, the motion compensation unit 44 can generate the pixel signal using various hypothesis predictions from several blocks with compensated motion. For example, motion compensation unit 44 can calculate a weighted average of the various blocks with compensated movement, apply overlapping block movement compensation (OBMC) to the pixel indication and / or add displacements to the motion vectors for the current block and derive the various blocks with compensated motion from the displaced motion vectors. The motion vectors for the current block can be, for example, candidates for MV
Petition 870190080478, of 8/19/2019, p. 98/150
90/100 determined according to the blending modes and / or AMVP (for example, from neighboring blocks that are predicted using inter prediction). In this way, the video encoder 20 can determine a pixel signal for the current block, the pixel signal comprising pixel data obtained from one or more groups of pixels previously decoded.
[00186] The motion compensation unit 44 can ultimately derive motion information using the pixel indication (406). In general, the motion information can include one or more motion vectors that refer to one or more reference blocks corresponding to the pixel indication. In some instances, motion compensation unit 44 may determine an inter prediction direction (for example, list 0, list 1 prediction or bi-prediction) for the derived motion information according to the correspondence costs between the different prediction directions. For example, motion compensation unit 44 can select an inter prediction direction having a lower tested matching cost. In some examples, as discussed above, the motion compensation unit 44 can weight the resulting matching costs so that the matching costs are biased in favor of bi-prediction, for example, by using a weight that reduces the cost of matching. bi-prediction matching and / or using weights that increase the one-way prediction matching cost. Additionally or alternatively, the motion compensation unit 44 can calculate correspondence costs between two or more
Petition 870190080478, of 8/19/2019, p. 99/150
91/100 reference blocks using two or more different cost measurement processes, and then refine the derived transaction information based on an aggregated correspondence cost calculated from the correspondence costs calculated using the various cost measurement processes. In this way, the video encoder 20 can derive the motion information for the current block according to the DMVD from the pixel indication.
[00187] Finally, the movement compensation unit 44 can predict the current block using the derived movement information (408), to form a prediction block for the current block. The motion compensation unit 44 can pass this prediction block to the adder 50, which subtracts the prediction block from the original uncoded version of the current block (on a pixel by pixel basis), to calculate a residual block including residual data for the current block (410). The transform processing unit 52 can then transform the residual block into a transform domain (for example, a frequency domain), forming transform coefficients and the quantization unit 54 can quantize the transform coefficients to transform and quantize the data residuals (412). Ultimately, the entropy coding unit 56 can encode POR entropy data representative of the prediction mode (e.g., a FRUC flag and a match mode flag), as well as the quantized transform coefficients (414).
[00188] It should be understood that, although described
Petition 870190080478, of 8/19/2019, p. 100/150
92/100 as part of a video encoding process, video encoder 20 also performs a decoding process. That is, after transforming and quantizing the residual data, the inverse quantization unit 58 inversely quantizes the transform coefficients quantized to reproduce the transform coefficients. Then, the reverse transform unit 60 inversely transforms the transform coefficients to reproduce the residual block. The adder 62 then combines the residual block with the prediction block, forming a decoded block that can be stored in a decoded image storage of the reference image memory 64. Consequently, the process performed by video encoder 20 can be said to be including decoding video data. Likewise, in this way, the video encoder 20 can decode the current block using the motion information derived in accordance with the DMVD.
[00189] In this way, the method of FIG. 20 represents an example of a method of decoding video data, the method including determining that the motion information of a current block of video data must be derived using the decoder side motion vector (DMVD) derivation, determining a pixel indication for the current block, the pixel indication comprising pixel data obtained from one or more groups of pixels previously decoded, deriving the movement information for the current block according to the DMVD from the pixel indication, and decode the current block using the motion information.
Petition 870190080478, of 8/19/2019, p. 101/150
93/100 [00190] FIG. 21 is a flow chart illustrating an illustrative method for decoding video data according to the techniques of this disclosure. The video decoder 30 is described as performing the techniques of FIG. 21, for example purposes, although it should be understood that in other examples, other video encoding devices may perform this or a similar method.
[00191] In this method, it is assumed that the video decoder 30 has previously decoded one or more images and received a current block of a current image to be decoded. The current image can be a P image, a B image or another image for which inter prediction is enabled. The entropy decoding unit 70 can entropy decode a prediction mode indication for the current block, as well as the quantized transform coefficients for the current block (420). The entropy decoding unit 70 can pass the prediction mode indication to the motion compensation unit 72 and the quantized transform coefficients to the reverse quantization unit 76. The motion compensation unit 72 can then determine that the derivation motion vector on the decoder side (DMVD) is to be used for the current block from the prediction mode indication (422). In this way, the video decoder 30 can determine that the motion information of the current block of video data is to be derived using the DMVD.
[00192] The motion compensation unit 72 can then determine a pixel indication for the current block (424). For example, the
Petition 870190080478, of 8/19/2019, p. 102/150
94/100 movement 72 can determine the indication of pixels using one of the model correspondence, bilateral correspondence, bilateral model correspondence, FRUC model correspondence, among others, as discussed above. In some instances, the motion compensation unit 72 can generate the pixel signal using various hypothesis predictions from several blocks with compensated motion. For example, the motion compensation unit 72 can calculate a weighted average of the various blocks with compensated movement, apply overlapping block movement compensation (OBMC) for the pixel indication and / or add displacements to the motion vectors for the current block and derive the various blocks with compensated motion from the displaced motion vectors. The motion vectors for the current block can be, for example, MV candidates determined according to the blending and / or AMVP modes (for example, from neighboring blocks that are predicted using inter prediction). In this way, the video decoder 30 can determine a pixel trace for the current block, the pixel trace comprising pixel data obtained from one or more groups of pixels previously decoded.
[00193] The motion compensation unit 72 can ultimately derive motion information using the pixel indication (426). In general, the motion information can include one or more motion vectors that refer to one or more reference blocks corresponding to the pixel indication. In some instances, the motion compensation unit 72 may
Petition 870190080478, of 8/19/2019, p. 103/150
95/100 determine an inter prediction direction (for example, prediction from list 0, list 1 or bi-prediction) for the derived movement information according to the correspondence costs between the different prediction directions. For example, the motion compensation unit 72 can select an inter prediction direction having a lower tested matching cost. In some examples, as discussed above, the motion compensation unit 72 can weight the resulting matching costs so that the matching costs are biased in favor of bi-prediction, for example, by using a weight that reduces the cost of matching. bi-prediction matching and / or using weights that increase the one-way prediction matching cost. Additionally or alternatively, the motion compensation unit 72 can calculate the correspondence costs between two or more reference blocks using two or more different cost measurement processes and then refine the derived movement information based on a correspondence cost. aggregate calculated from correspondence costs calculated using the various cost measurement processes. In this way, the video decoder 30 can derive the motion information for the current block according to the DMVD from the pixel indication.
[00194] Finally, the movement compensation unit 72 can predict the current block using the derived movement information (428), to form a prediction block for the current block. The movement compensation unit 72 can pass this block of
Petition 870190080478, of 8/19/2019, p. 104/150
96/100 prediction for adder 80. Meanwhile, the inverse quantization unit 76 inversely quantizes the transform coefficients quantized to reproduce transform coefficients for the current block, and the reverse transform unit 78 inversely transforms the transform coefficients to reproduce a residual block for the current block (430). The reverse transform unit 78 passes the residual block to the adder 80, which adds the prediction block to the residual block (on a pixel by pixel basis) (432), to decode the current block.
[00195] In this way, the method of FIG. 21 represents an example of a method of decoding video data, the method including determining that the motion information of a current block of video data is to be derived using the decoder side motion vector derivation (DMVD), determine a pixel trace for the current block, the pixel trace for the current block, the pixel trace comprising pixel data obtained from one or more groups of pixels previously decoded, derive motion information for the current block of according to the DMVD of the pixel indication, and decode the current block using the motion information.
[00196] It is to be recognized that, depending on the example, some acts or events of any of the techniques described in this document can be performed in a different sequence, can be added, merged or left aside altogether (for example, not all described acts or events are necessary for the practice
Petition 870190080478, of 8/19/2019, p. 105/150
97/100 of the techniques). In addition, in some instances, acts or events can be performed concurrently, for example, through multitasking processing, interrupt processing or multiple processors, rather than sequentially.
[00197] In one or more examples, the functions described can be implemented in hardware, software, firmware or any combination thereof. If implemented in software, functions can be stored or transmitted as one or more instructions or code in a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media can include computer-readable storage media, which corresponds to a tangible medium, such as data storage media or communication media, including any medium that facilitates the transfer of a computer program from a computer. place to another, for example, according to a communication protocol. In this way, computer-readable media can generally correspond to (1) tangible computer-readable storage media that is not temporary or (2) a communication medium, such as a signal or carrier wave. The data storage medium can be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementing the techniques described in this disclosure. A computer program product may include a computer-readable medium.
[00198] As an example, and not as a
Petition 870190080478, of 8/19/2019, p. 106/150
98/100 limitation, such computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory or any other medium that may be used to store the desired program code in the form of instructions or data structures that can be accessed by a computer. In addition, any connection is appropriately termed a computer-readable medium. For example, if instructions are transmitted from a website, server or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or non-wired technologies such as infrared, radio and microwave , then coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or non-wired technologies such as infrared, radio and microwave are included in the media definition. However, it must be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals and other temporary media, but instead are directed to non-tangible storage media. Magnetic disc and optical disc, as used in this document, include compact optical disc (CD), laser optical disc, optical disc, digital versatile optical disc (DVD), flexible optical disc and Blu-ray optical disc, where magnetic discs normally play data magnetically, while optical discs reproduce data optically with lasers. Combinations of the above should also be included in the
Petition 870190080478, of 8/19/2019, p. 107/150
99/100 computer-readable media scope.
[00199] Instructions may be carried out by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, processing circuit systems (including fixed function circuit systems and / or programmable processing circuits), application-specific integrated circuits (ASICs), field programmable port arrangements (FPGAs) or other equivalent integrated discrete or logical circuit systems. Therefore, the term processor, as used in this document, can refer to any of the previous structures or any other structure suitable for implementing the techniques described. In addition, in some respects, the functionality described in this document can be provided within dedicated hardware and / or software modules configured to encode and decode, or incorporated into a combined codec. In addition, the techniques could be fully implemented in one or more circuits or logic elements.
[00200] The techniques of this disclosure can be implemented in a wide variety of devices or devices, including an unwired handset, an integrated circuit (IC) or a set of ICs (for example, a chip set). Various components, modules or units are described in this disclosure to emphasize the functional aspects of devices configured to perform the techniques disclosed, but do not necessarily require realization by different hardware units. On the contrary, as described above, the various units
Petition 870190080478, of 8/19/2019, p. 108/150
100/100 can be combined into a codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors as described above, in conjunction with suitable software and / or firmware.
[00201] Several examples have been described. These and other examples are within the scope of the following claims.

权利要求:
Claims (13)
[1]
1. Video data decoding method, the method comprising:
determine that the movement information of a
current block of data of video is for to be derivative using the derivation vector movement in side of decoder (DMVD);determine a hint of pixels for the block
current, the pixel indication comprises pixel data obtained from one or more groups of pixels previously decoded;
derive the movement information for the current block according to the DMVD from the pixel indication; and decode the current block using the motion information.
[2]
2. Method according to claim 1, in which determining the pixel signal comprises generating the pixel signal using various hypothesis predictions from several compensated motion blocks.
[3]
3. Method, according to claim 2, in which generating the pixel indication comprises calculating a weighted average of the various blocks with compensated movement.
4. Method, of according to claim 2, in that generate the clue of pixels comprises applying The compensation of movement block superimposed to generate O
pixel hint.
A method according to claim 2, further comprising:
add displacements to the motion vectors of the current block; and
Petition 870190080478, of 8/19/2019, p. 110/150
2/13 Derive the various blocks with compensated motion from the displaced motion vectors.
6. Method according to claim 1, further comprising determining an inter prediction direction for the movement information according to the correspondence costs between different prediction directions, in which deriving the movement information comprises deriving the movement information according to the DMVD so that the movement information has the predetermined direction of prediction.
7. Method according to claim 6, in which determining the inter prediction direction comprises selecting the inter prediction direction having the lowest correspondence cost.
8. Method according to claim 6, in which determining the inter prediction direction comprises selecting one of the prediction from list 0, prediction from list 1, or bi-prediction.
The method according to claim 6, further comprising comprising corresponding costs in favor of bi-prediction.
A method according to claim 9, wherein tending comprises scaling weighting costs associated with unidirectional prediction upward by a scaling value.
A method according to claim 10, wherein the design value comprises 1.25.
12. The method of claim 9, wherein tending comprises scaling down a weighting cost associated with bi-prediction downward by a
Petition 870190080478, of 8/19/2019, p. 111/150
3/13 dimensioning value.
13. The method of claim 12, wherein the design value comprises 0.75.
A method according to claim 1, wherein the pixel indication comprises pixel data obtained from a first block of a first reference image and a second block of a second reference image, in which the first reference image is different from the second reference image, and where deriving the movement information comprises:
calculate a correspondence cost between the first block and the second block, where the calculation of correspondence costs comprises:
applying a first weight to a first cost measurement for a first set of corresponding pixels from the first block and the second block; and applying a second weight, different from the first weight, to a second cost measurement for a second set of corresponding pixels from the first block and the second block; and refine movement information based on correspondence cost.
A method according to claim 14, further comprising:
determine the first weight based on the distance between the first set of corresponding pixels and a specific first point of the current block; and determine the second weight based on a distance between the second set of matching pixels and a second specific point in the block
Petition 870190080478, of 8/19/2019, p. 112/150
[4]
Current 4/13.
16. The method of claim 15, wherein the specific point comprises one of a centroid of the current block or a corner of the current block.
17. The method of claim 14, further comprising:
determining the first weight based on a distance between the first set of corresponding pixels and a specific first point of the pixel trace; and determining the second weight based on a distance between the second set of corresponding pixels and a second specific point of the pixel trace.
18. The method of claim 17, wherein the specific point comprises one of a centroid of the current block or a corner of the current block.
19. The method of claim 14, further comprising:
determining the first weight based on at least one of a first row that includes the first set of matching pixels or a first column that includes the first set of matching pixels; and determining the second weight based on at least one of a second row that includes the second set of corresponding pixels or a second column that includes the second set of corresponding pixels.
20. The method of claim 14, further comprising:
determine the first weight based on a
Petition 870190080478, of 8/19/2019, p. 113/150
[5]
5/13 first region that includes the first set of corresponding pixels; and determining the second weight based on a second region that includes the second set of corresponding pixels.
21. Method according to claim 1, in which deriving motion information comprises selectively applying the DMVD to a partial set of motion vector candidates in a motion vector candidate list for the current block, the method additionally comprising determining that a motion vector of the derived motion information using the DMVD differs
at least a motion vector candidate in the list in candidates for motion vector for the current block by one limit before to decode the current block using The information22. movement.Method according to claim 1, additionally comprising iteratively refining O
pixel indication using refined motion vectors, where iteratively refining comprises:
after drifting a vector of movement refined, regeneratemovementmovement a refined model; and run one using the 23. method, bilateral using the vector offurther refinement of regenerated bilateral model vector.according to claim 1,
additionally comprising applying one or more filters to the pixel trace before deriving the movement information from the pixel trace, wherein one or more filters comprise one or more of a guided filter, a filter
Petition 870190080478, of 8/19/2019, p. 114/150
[6]
Bilateral 6/13, medium filter, a smoothing filter or an average filter.
24. Method according to claim 1, in which determining the pixel indication comprises generating the pixel indication using motion refinement, wherein the motion refinement comprises one or more of the bidirectional optical flow (BIO), the correspondence of frame rate upward conversion (FRUC) models or bilateral FRUC correspondence.
25. Method, according to claim 1, in which deriving the movement information comprises deriving the movement information for a first color component of the current block, and in which determining the pixel trace comprises generating the pixel trace using the first color component and a second color component.
26. Method, according to claim 1, in which determining the pixel hint comprises generating the pixel hint to have a size larger than a current block size, where when a current block size is M x N, where M and N are integer values, generating the pixel indication comprises generating the pixel indication from (Μ + I) x (N + J) blocks with compensated movement, where I and J are integer values.
27. The method of claim 1, wherein decoding the current block comprises:
predict the current block using motion information to form a predicted block;
decode the quantized transform coefficients of the current block;
inversely quantify the coefficients of
Petition 870190080478, of 8/19/2019, p. 115/150
[7]
7/13 transform quantized to produce transform coefficients;
inversely transform the transform coefficients to produce a block of residual data; and combining the predicted block and the residual data block to form a decoded block.
28. The method of claim 1, further comprising encoding the current block before decoding the current block.
29. Device for decoding video data, the device comprising:
a memory configured to store video data; and a video decoder implemented in circuit systems and configured to: determine that the motion information of a current block of the video data is to be derived using the decoder side motion vector derivation (DMVD); determining a pixel cue for the current block, the pixel cue comprising pixel data obtained from one or more groups of pixels previously decoded;
derive the movement information for the current block according to the DMVD from the pixel index; and decode the current block using the motion information.
30. The device of claim 29, wherein the video decoder is configured to
Petition 870190080478, of 8/19/2019, p. 116/150
[8]
8/13 generate the indication of pixels using various hypothesis predictions from several blocks with compensated movement.
31. The device of claim 29, wherein the video decoder is further configured to determine an inter prediction direction for motion information according to the corresponding costs between different prediction directions, and to derive information from movement according to the DMVD so that the movement information has the predetermined prediction direction.
32. The device of claim 29, wherein the pixel indication comprises pixel data obtained from a first block of a first reference image and a second block of a second reference image, wherein the first reference image is different from the second reference image, in which to derive motion information, the video decoder is configured to:
calculate a correspondence cost between the first block and the second block, in which, to calculate the correspondence costs, the video decoder is configured to:
applying a first weight for a first cost measurement for a first set of corresponding pixels from the first block and the second block; and applying a second weight, different from the first weight, for a second cost measurement for a second set of corresponding pixels from the first block and the second block; and
Petition 870190080478, of 8/19/2019, p. 117/150
[9]
9/13 refine the movement information based on the correspondence cost.
33. The device of claim 29, wherein the video decoder is further configured to determine that a motion vector of the derived motion information using the DMVD differs from other motion vector candidates in a vector candidate list. of movement to the current list by a limit before decoding the current block using the movement information.
34. The device of claim 29, further comprising a video encoder including the video decoder, the video encoder implemented in the circuit system.
35. The device of claim 29, further comprising a video configured to display the decoded video data.
36. The device of claim 29, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device or a signal decoder.
37. Device for decoding video data, the device comprising:
means for determining that the motion information of a current block of video data is to be derived using the motion vector derivation on the decoder side (DMVD);
means for determining a pixel cue for the current block, the pixel cue comprising pixel data obtained from one or more groups of pixels
Petition 870190080478, of 8/19/2019, p. 118/150
[10]
10/13 previously decoded;
means to derive the movement information for the current block according to the DMVD from the pixel indication; and a half to decode the current block using the motion information.
38. The device of claim 37, wherein the means for determining the pixel trace comprises means for generating the pixel trace using the various hypothesis predictions from several compensated motion blocks.
39. The device of claim 37, further comprising means for determining an inter prediction direction for motion information according to the correspondence costs between the different prediction directions, wherein the means for deriving motion information it comprises means for deriving the movement information according to the DMVD so that the movement information has the predetermined direction of prediction.
40. The device of claim 37, wherein the pixel indication comprises pixel data obtained from a first block of a first reference image and a second block of a second reference image, wherein the first reference image is different from the second reference image, and in that the means to derive the movement information comprises:
means to calculate a correspondence cost between the first block and the second block, in which the means
Petition 870190080478, of 8/19/2019, p. 119/150
[11]
11/13 to calculate correspondence costs comprises:
means for applying a first weight for a first cost measurement to a first set of corresponding pixels from the first block and the second block; and means for applying a second weight, different from the first weight, for a second cost measurement for a second set of corresponding pixels from the first block and the second block; and means to refine the movement information based on the correspondence cost.
41. Computer-readable storage medium having stored instructions on it that, when executed, cause a processor to:
determine that the motion information of a current block of video data is to be derived using the decoder side (DMVD) motion vector derivation;
determine a pixel trace for the current block, the pixel trace comprising pixel data obtained from one or more groups of pixels previously decoded;
derive movement information for the current block according to the DMVD of the pixel indication; and decode the current block using the motion information.
42. Computer-readable storage medium according to claim 41, in which the instructions that cause the processor to determine the indication of pixels comprise instructions that cause the processor to generate
Petition 870190080478, of 8/19/2019, p. 120/150
[12]
12/13 the pixel indication using several hypothesis predictions from several blocks with compensated movement.
43. Computer-readable storage medium according to claim 41, further comprising instructions that cause the processor to determine an inter prediction direction for the motion information according to the correspondence costs between different prediction directions, in that the instructions that cause the processor to derive motion information comprise instructions that cause the processor to derive motion information according to the DMVD so that the motion information has the predetermined direction of prediction.
44. Computer-readable storage medium according to claim 41, wherein the pixel indication comprises the pixel data obtained from a first block of a first reference image and a second block of a second image of reference, where the first reference image is different from the second reference image, and where the instructions that cause the processor to obtain motion information comprise instructions that cause the processor:
calculate a correspondence cost between the first block and the second block, where the instructions that cause the processor to calculate the correspondence costs comprise instructions that cause the processor:
apply a first weight for a first cost measurement for a first set of pixels
Petition 870190080478, of 8/19/2019, p. 121/150
[13]
13/13 correspondents of the first block and the second block; and apply a second weight, different from the first weight, for a second cost measurement for a second set of corresponding pixels from the first block and the second block; and refine the movement information based on the correspondence cost.

类似技术:

公开号 | 公开日 | 专利标题

BR112019017252A2|2020-04-14|deriving motion vector information in a video decoder

US10375413B2|2019-08-06|Bi-directional optical flow for video coding

RU2705428C2|2019-11-07|Outputting motion information for sub-blocks during video coding

BR112019019210A2|2020-04-14|restriction motion vector information derived by decoder side motion vector derivation

JP2019534626A|2019-11-28|Adaptive motion vector accuracy for video coding

BR112019018689A2|2020-04-07|inter-prediction refinement based on bi-directional | optical flow

BR112019013684A2|2020-01-28|motion vector reconstructions for bi-directional | optical flow

WO2018200960A1|2018-11-01|Gradient based matching for motion search and derivation

BR112019025566A2|2020-06-23|MOTION VECTOR PREDICTION

US10652571B2|2020-05-12|Advanced motion vector prediction speedups for video coding

BR112021005357A2|2021-06-15|improvements to history-based motion vector predictor

TW201711472A|2017-03-16|Sub-prediction unit motion vector prediction using spatial and/or temporal motion information

BR112019027821A2|2020-07-07|template pairing based on partial reconstruction for motion vector derivation

BR112020006588A2|2020-10-06|affine prediction in video encoding

BR112020014654A2|2020-12-01|Affine motion compensation in video encoding

JP2015536118A|2015-12-17|Inter-view prediction motion vectors for 3D video

BR112020006232A2|2020-10-13|Affine prediction motion information encoding for video encoding

BR112020014522A2|2020-12-08|IMPROVED DERIVATION OF MOTION VECTOR ON THE DECODER SIDE

JP6271734B2|2018-01-31|Sub-PU level altitude residual prediction

BR112021009721A2|2021-08-17|triangular motion information for video encoding

TWI702825B|2020-08-21|Variable affine merge candidates for video coding

BR112021003965A2|2021-05-25|motion vector acquisition method and apparatus, computer device, and storage medium

KR20210024165A|2021-03-04|Inter prediction method and apparatus

同族专利:

公开号 | 公开日

US10701366B2|2020-06-30|

TWI717586B|2021-02-01|

US20180241998A1|2018-08-23|

CN110301135A|2019-10-01|

EP3586512A1|2020-01-01|

TW201842766A|2018-12-01|

WO2018156628A1|2018-08-30|

SG11201906286QA|2019-09-27|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6539058B1|1998-04-13|2003-03-25|Hitachi America, Ltd.|Methods and apparatus for reducing drift due to averaging in reduced resolution video decoders|

KR101691199B1|2008-04-11|2016-12-30|톰슨 라이센싱|Method and apparatus for template matching prediction in video encoding and decoding|

US8873626B2|2009-07-02|2014-10-28|Qualcomm Incorporated|Template matching for video coding|

KR20190018624A|2016-05-13|2019-02-25|브이아이디 스케일, 인크.|Generalized Multi-Hypothesis Prediction System and Method for Video Coding|US10750203B2|2016-12-22|2020-08-18|Mediatek Inc.|Method and apparatus of adaptive bi-prediction for video coding|

FR3066873A1|2017-05-29|2018-11-30|Orange|METHODS AND DEVICES FOR ENCODING AND DECODING A DATA STREAM REPRESENTATIVE OF AT LEAST ONE IMAGE|

WO2018225593A1|2017-06-05|2018-12-13|パナソニックインテレクチュアルプロパティコーポレーションオブアメリカ|Coding device, decoding device, coding method and decoding method|

WO2019004283A1|2017-06-28|2019-01-03|シャープ株式会社|Video encoding device and video decoding device|

CN111034197A|2017-08-22|2020-04-17|松下电器（美国）知识产权公司|Image encoder, image decoder, image encoding method, and image decoding method|

KR20200057082A|2017-10-09|2020-05-25|애리스 엔터프라이지즈 엘엘씨|Adaptive non-equal weighted plane prediction|

EP3682634A1|2017-10-09|2020-07-22|Huawei Technologies Co., Ltd.|Motion vector refinement of a motion vector pointing to a fractional sample position|

CN111201795A|2017-10-09|2020-05-26|华为技术有限公司|Memory access window and padding for motion vector modification|

US10375422B1|2018-03-30|2019-08-06|Tencent America LLC|Method and apparatus for motion field based tree splitting|

WO2019191717A1|2018-03-30|2019-10-03|Hulu, LLC|Template refined bi-prediction for video coding|

EP3793197A4|2018-05-10|2022-02-16|Samsung Electronics Co Ltd|Image segmentation method and apparatus for image encoding and decoding|

US10469869B1|2018-06-01|2019-11-05|Tencent America LLC|Method and apparatus for video coding|

WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |

WO2019234671A1|2018-06-07|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Improved pmmvd|

GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|

CN110662064A|2018-06-29|2020-01-07|北京字节跳动网络技术有限公司|Checking order of motion candidates in LUT|

US10911768B2|2018-07-11|2021-02-02|Tencent America LLC|Constraint for template matching in decoder side motion derivation and refinement|

US10511852B1|2018-07-13|2019-12-17|Tencent America LLC|Method and apparatus for video coding|

KR20210052568A|2018-09-24|2021-05-10|주식회사 비원영상기술연구소|Video encoding/decoding method and apparatus|

WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|

JP2022506161A|2018-11-05|2022-01-17|北京字節跳動網絡技術有限公司|Interpolation for inter-prediction with refinement|

CN112956202A|2018-11-06|2021-06-11|北京字节跳动网络技术有限公司|Extension of inter prediction with geometric partitioning|

CN113170097A|2018-11-20|2021-07-23|北京字节跳动网络技术有限公司|Coding and decoding of video coding and decoding modes|

CN111263147A|2018-12-03|2020-06-09|华为技术有限公司|Inter-frame prediction method and related device|

CN113273210A|2019-01-01|2021-08-17|Lg 电子株式会社|Method and apparatus for compiling information about merged data|

US11153590B2|2019-01-11|2021-10-19|Tencent America LLC|Method and apparatus for video coding|

WO2020164580A1|2019-02-14|2020-08-20|Beijing Bytedance Network Technology Co., Ltd.|Size selective application of decoder side refining tools|

CN113615196A|2019-03-08|2021-11-05|交互数字Vc控股法国公司|Motion vector derivation in video encoding and decoding|

US20200314443A1|2019-04-01|2020-10-01|Qualcomm Incorporated|Gradient-based prediction refinement for video coding|

WO2020211755A1|2019-04-14|2020-10-22|Beijing Bytedance Network Technology Co., Ltd.|Motion vector and prediction sample refinement|

US11076169B2|2019-05-14|2021-07-27|Qualcomm Incorporated|Switchable interpolation filteringfor video coding|

CN112135141A|2019-06-24|2020-12-25|华为技术有限公司|Video encoder, video decoder and corresponding methods|

US11272203B2|2019-07-23|2022-03-08|Tencent America LLC|Method and apparatus for video coding|

WO2020251418A2|2019-10-01|2020-12-17|Huawei Technologies Co., Ltd.|Method and apparatus of slice-level signaling for bi-directional optical flow and decoder side motion vector refinement|

WO2021068955A1|2019-10-12|2021-04-15|Beijing Bytedance Network Technology Co., Ltd.|Use and signaling of refining video coding tools|

法律状态:
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762461729P| true| 2017-02-21|2017-02-21|

US201762463266P| true| 2017-02-24|2017-02-24|

US201762472919P| true| 2017-03-17|2017-03-17|

US15/900,649|US10701366B2|2017-02-21|2018-02-20|Deriving motion vector information at a video decoder|

PCT/US2018/019018|WO2018156628A1|2017-02-21|2018-02-21|Deriving motion vector information at a video decoder|

[返回顶部]