巴西专利BR112020016133A2 INTRA-BLOCK COPY FOR VIDEO ENCODING

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
this disclosure describes examples of techniques that a video encoder (such as a video encoder or decoder) can use to determine a block vector for a chroma block in which the partition trees for the chroma component and the luma component are different types (such as decoupled partition trees).
公开号:BR112020016133A2
申请号:R112020016133-0
申请日:2019-02-07
公开日:2020-12-08
发明作者:Li Zhang；Kai Zhang；Wei-Jung Chien；Marta Karczewicz
申请人:Qualcomm Incorporated；
IPC主号:

专利说明:

[0001] [0001] This request claims priority for EUAN Application No. 16 / 269,349, filed on February 6, 2019, and claims the benefit of EUAN Provisional Application 62 / 628,101, filed on February 8, 2018, entire contents both orders are incorporated here for reference purposes. TECHNICAL FIELD
[0002] [0002] This disclosure refers to video encoding and decoding. BACKGROUND
[0003] [0003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cell phones or satellite radio, so-called “smart phones”, video teleconferencing devices , streaming video devices and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4, Part 10, Encoding Advanced Video (AVC), ITU-T H.265 High Efficiency Video Coding (HEVC) and extensions of such standards. Video devices can transmit, receive, encode, decode and / or store digital video information more effectively by implementing such video encoding techniques.
[0004] [0004] Video encoding techniques include spatial prediction (intra-image) and / or temporal prediction (inter-image) to reduce or remove the redundancy inherent in video sequences. For block-based video encoding, a video slice (such as a video frame or part of a video frame) can be partitioned into video blocks, and can also be referred to as tree units. encoding (CTUs), encoding units (CUs) and / or encoding nodes. The video blocks in an intra-coded slice (I) of an image are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image. Video blocks in an inter-encoded slice (P or B) of an image can use spatial prediction with respect to reference samples in neighboring blocks in the same image or temporal prediction with respect to reference samples in other reference images. Images can be referred to as frames and reference images can be referred to as reference frames. SUMMARY
[0005] [0005] In general, this disclosure describes techniques for encoding intra-block copy (IBC). Exemplary techniques can be applied to existing video codecs, such as High Efficiency Video Encoding (HEVC) or can be an efficient video encoding tool for future video encoding standards. Exemplary techniques can refer to the use of IBC with various tools related to movement, as well as the use of IBC in decoupled partition trees.
[0006] [0006] For example, in IBC, a block is encoded (as, for example, encoded or decoded) with a block vector that points to a reference block in the same image as the block that is encoded. In cases where luma components and chroma components are partitioned in the same way (as, for example, where partition trees are coupled), the luma block and the corresponding chroma blocks can use the block vector with potential block vector scaling to the chroma blocks. However, in cases where chroma components are partitioned differently than luma components (such as where partition trees are decoupled), there may be technical problems with video encoding because chroma blocks are formed differently than the luma blocks, which results in little correspondence between chroma blocks and luma blocks.
[0007] [0007] This disclosure describes examples of techniques that a video encoder (such as a video encoder or video decoder) can use to determine a block vector for a chroma block where the partition trees for the chroma component and the luma component are different (such as decoupled partition trees). For example, the video encoder can additionally partition the chroma block into a plurality of sub-blocks, based on the partition tree used to partition the luma component. In this example, there can be a one-to-one correspondence between each of the chroma block sub-blocks and a luma block of the plurality of luma blocks.
[0008] [0008] The video encoder can determine block vectors for one or more of the chroma block sub-blocks based on block vectors of one or more of the plurality of luma blocks that are predicted in IBC mode. Thus, although there is decoupling of the partition tree between the luma and chroma components, exemplary techniques allow for ways in which a chroma block can inherit block vectors from a luma block. By allowing block vectors from luma blocks to be inherited by chroma blocks, exemplary techniques can reduce signaling bandwidth, which would otherwise be necessary, if block vectors for chroma blocks are explicitly signaled.
[0009] [0009] In one example, the disclosure describes a method of encoding video data, the method comprising determining a plurality of blocks of a first color component that corresponds to a block of a second color component, wherein the plurality of blocks of the first color component is generated from the partitioning of samples from the first color component according to a first partition tree, and the block of the second color component is generated from the partitioning of samples from the second color component of according to a second partition tree, partition the block of the second color component based on the first partition tree to generate sub-blocks of the second color component where each corresponds to a block of the plurality of blocks of the first color component , determine one or more block vectors for one or more of the subblocks of the second color component predicted in the intra-block copy prediction (IBC) mode based on one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component, and encode block of the second color component based on one or more determined block vectors.
[0010] [0010] In one example, the disclosure describes a device for encoding video data, the device comprising a memory configured to store samples of a first color component and samples of a second color component of the video data and a video encoder comprising at least one fixed and programmable function circuit. The video encoder is configured to determine a plurality of blocks of a first color component that corresponds to a block of a second color component, in which the plurality of blocks of the first color component is generated from the partitioning of samples from the first color component according to a first partition tree and the block of the second color component is generated from the partitioning of samples from the second color component according to a second partition tree, partition the block of the second color component based on the first partition tree to generate subblocks of the second color component, where each corresponds to a block of the plurality of blocks of the first color component, determine one or more block vectors for one or more of the sub- blocks of the second color component that are predicted in the intra-block copy prediction (IBC) mode based on one or more block vectors of one or more corresponding blocks of the p block lurality of the first color component, and encode the block of the second color component based on one or more determined block vectors.
[0011] [0011] In one example, the disclosure describes a computer-readable storage medium that stores instructions that, when executed, cause one or more processors to determine a plurality of blocks of a first color component that corresponds to a block of a second color component, in which the plurality of blocks of the first color component is generated from the partitioning of samples of the first color component according to a first partition tree and the block of the second color component is generated at from partitioning samples from the second color component according to a second partition tree, partition the second color component block based on the first partition tree to generate sub-blocks of the second color component each corresponding to a block of the plurality of blocks of the first color component, determine one or more block vectors for one or more of the sub-blocks of the second color component r predicted in intra-block copy prediction (IBC) mode based on one or more block vectors of one or more corresponding blocks from the plurality of blocks of the first color component, and encode the block of the second color component based one or more given block vectors.
[0012] [0012] In one example, the disclosure describes a device for encoding video data, the device comprising means for determining a plurality of blocks of a first color component that corresponds to a block of a second color component, wherein the plurality of blocks of the first color component is generated from the partitioning of samples of the first color component according to a first partition tree, and the block of the second color component is generated from the partitioning of samples of the second color component according to a second partition tree, means for partitioning the block of the second color component based on the first partition tree to generate sub-blocks of the second color component in which each corresponds to a block of the plurality of blocks of the first color component, means for determining one or more block vectors for one or more of the subblocks of the second color component that are predicted in the prediction mode intra-block copy (IBC) based on one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component, and means for encoding the block of the second color component based on one or more determined block vectors.
[0013] [0013] Details of one or more aspects of the techniques are presented in the accompanying drawings and in the description that follows. Other resources, objects and advantages of these techniques will be evident from the description and the drawings and claims. BRIEF DESCRIPTION OF THE DRAWINGS
[0014] [0014] Figure 1 is a block diagram showing an example of a video encoding and decoding system that can perform the techniques of this disclosure.
[0015] [0015] Figures 2A and 2B are conceptual diagrams showing an example of a quadtree binary tree structure (QTBT) and a corresponding coding tree unit (CTU).
[0016] [0016] Figure 3 is a block diagram showing an example of a video encoder that can perform the techniques of this development.
[0017] [0017] Figure 4 is a block diagram showing an example of a video decoder that can perform the techniques of this development.
[0018] [0018] Figure 5 is a conceptual diagram showing neighboring motion vector (MV) candidates for spatial mixing and prediction of advanced motion vector (AMVP).
[0019] [0019] Figure 6A is a conceptual diagram showing a candidate for temporal movement vector predictor (TMVP).
[0020] [0020] Figure 6B is a conceptual diagram showing the scaling of the motion vector (MV) for the TMVP candidate.
[0021] [0021] Figure 7 is a conceptual diagram showing an example of intra-block copy (IBC) coding.
[0022] [0022] Figure 8 is a conceptual diagram showing an example of prediction of alternative time movement vector (ATMVP) for a coding unit (CU).
[0023] [0023] Figure 9 is a conceptual diagram showing an example of bilateral frame rate upward conversion (FRUC) correspondence.
[0024] [0024] Figure 10 is a conceptual diagram showing an example of FRUC model matching.
[0025] [0025] Figures 11A and 11B are flowcharts showing examples of the FRUC model matching mode.
[0026] [0026] Figure 12 is a conceptual diagram showing a derivation of motion vector on the decoder side (DMVD) based on bilateral model correspondence.
[0027] [0027] Figure 13 is a conceptual diagram showing the path of the optical flow.
[0028] [0028] Figure 14 is a conceptual diagram for bidirectional optics (BIO) for 8x4 block.
[0029] [0029] Figures 15A and 15B are conceptual diagrams showing sub-blocks where overlapping block movement compensation (OBMC) is applied.
[0030] [0030] Figures 16A-16D show examples of OBMC weighting.
[0031] [0031] Figure 17A is an example of a CTU partition structure for a QTBT luma structure.
[0032] [0032] Figure 17B is an example of a CTU partition structure for a chroma QTBT structure.
[0033] [0033] Figures 18A and 18B show an example of sub-block partition and mode inheritance for the QTBT luma structure and the QTBT chroma structure.
[0034] [0034] Figure 19 is a flow chart showing an exemplary method of encoding video data. DETAILED DESCRIPTION
[0035] [0035] Video encoding standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or
[0036] [0036] The Video Coding Standard for High Efficiency Video Coding (HEVC) was promulgated by the Joint Video Coding Collaboration Team (JCT-VC) of the ITU-T Group of Video Coding Specialists (VCEG) and the Moving Image Experts Group (MPEG) ISO / IEC. The latest HEVC specification: ITU-T H.265, Series H: Audiovisual and Multimedia Systems, Infrastructure for audiovisual services - Video encoding in motion, Advanced video encoding for generic audiovisual services, International Telecommunication Union. December 2016, and referred to as HEVC WD below, is available from http://phenix.intevry.fr/jct/doc_end_user/documents/23_San% 20Diego / wg11 / JCTVC-W1005-v4.zip
[0037] [0037] ITU-T VCEG (Q6 / 16) and ISO / IEC MPEG (JTC 1 / SC 29 / WG 11) are now studying the potential need for standardization of upcoming video encoding technology with a compression capacity that significantly exceeds that of the current HEVC standard (which includes your current extensions and short-term extensions for encoding screen content and encoding higher dynamic range). The groups are working together on this exploration activity in a joint collaborative effort known as the Joint Video Exploration Team (JVET) to evaluate the compression technology designs proposed by their experts in this field. O
[0038] [0038] An earlier draft of the new JEM7-based video encoding standard, referred to as the H.266 / Versatile Video Encoding (VVC) standard, is available in document JVET-J1001, “Versatile Video Encoding (Draft 1 ) ”, By Benjamin Bross, and his description of the algorithm is available in document JVET-J1002,“ Description of algorithm for Versatile Video Coding and Test Model 1 (VTM 1) ”by Jianle Chen and Elena Alshina. The techniques of this disclosure, however, are not limited to any specific coding standard.
[0039] [0039] VVC provides an intra-block copy prediction (IBC) mode similar, but not identical, to the IBC prediction mode used as part of HEVC's screen content encoding (SCC). In Intra-Block Copy (IBC), a block vector for a block that is encoded (such as encoded or decoded) in IBC mode points to a reference block in the same image as the block that is encoded. There are different types of blocks. For example, a video data image includes a luma component and chroma components. The luma component is partitioned to form a plurality of luma blocks, and the chroma components are partitioned to form a plurality of chroma blocks.
[0040] [0040] In VVC, luma and chroma components can be partitioned in different ways. For example, luma components can be partitioned according to a first partition tree and chroma components can be partitioned according to a second partition tree. In some examples, to reduce the amount of information that is required to encode a chroma block, it may be possible for a chroma block predicted in IBC prediction mode to inherit block vector information from a corresponding luma block instead of a video encoder which explicitly flags the block vector information for the chroma block. A luma block corresponds to a chroma block and vice versa if the luma block and the chroma block are part of the same coding unit (CU), as an example.
[0041] [0041] However, if the luma blocks and the chroma blocks are partitioned at a different time, then, for a chroma block, there may be a plurality of different luma blocks partitioned differently than the chroma block that would be the corresponding block. Due to the different partitions of the luma blocks and chroma blocks and having a plurality of luma blocks that correspond to a chroma block, it may not be clear from which luma block a chroma block should inherit the block vector.
[0042] [0042] According to the techniques described in this disclosure, a video encoder (such as, for example, video encoder or video decoder) can be configured to partition the chroma block into a plurality of chroma block sub-blocks, with based on the way the luma component was partitioned. In this way, there is a one-to-one correspondence between each sub-block of the chroma block and the plurality of luma blocks. The video encoder can be configured to assign the block vector of the luma blocks that have been encoded in the IBC prediction mode to the corresponding corresponding sub-blocks of the chroma block.
[0043] [0043] Figure 1 is a block diagram showing an example of a video encoding and decoding system 100 that can perform the techniques of this disclosure. The techniques of this disclosure are generally directed to the encoding (encoding and / or decoding) of video data. In general, video data includes any data for processing a video. Thus, video data can include raw, unencrypted video, encoded video, decoded video (such as, reconstructed) and video metadata, such as signaling data.
[0044] [0044] As shown in Figure 1, in this example, system 100 includes a source device 102 that provides encoded video data to be decoded and displayed by a destination device 116. In particular, source device 102 provides the data video to target device 116 via computer-readable medium 110. Source device 102 and target device 116 can comprise any of a wide range of devices, including desktop computers, notebook computers (i.e., laptop), tablet computers, set-top boxes, telephone devices such as smart phones,
[0045] [0045] In the example of Figure 1, source device 102 includes video source 104, memory 106, video encoder 200 and output interface 108. Target device 116 includes input interface 122, the video decoder 300, memory 120 and display device 118. According to this disclosure, the video encoder 200 of the source device 102 and the video decoder 300 of the destination device 116 can be configured to apply the techniques for intra-block copy. Thus, source device 102 represents an example of a video encoding device, while destination device 116 represents an example of a video decoding device. In other examples, a source device and a target device can include other components or arrangements. For example, the source device 102 can receive video data from an external video source, such as an external camera. Similarly, the target device 116 can interface with an external display device, instead of including an integrated display device.
[0046] [0046] As shown in Figure 1, system 100 is merely an example. In general, any digital video encoding and / or decoding device can perform techniques for intra-block copying (IBC). The source device 102 and the target device 116 are merely examples of such encoding devices in which the source device 102 generates encoded video data for transmission to the target device 116. This disclosure refers to a "encoding" device , such as a device that performs encryption (encoding and / or decoding) of data. Thus, video encoder 200 and video decoder 300 represent examples of encoding devices, in particular, a video encoder and a video decoder, respectively. In some examples, devices 102, 116 may operate in a substantially symmetrical manner, such that each of devices 102, 116 includes video encoding and decoding components. Therefore, system 100 can support unidirectional or bidirectional video transmission between devices 102, 116, such as, for example, for continuous video transmission, video playback, video broadcasting or video telephony.
[0047] [0047] In general, video source 104 represents a source of video data (ie raw and non-encoded video data) and provides a sequential series of images (also known as "frames") of video data for the video encoder 200, which encodes the data for the images. The video source 104 of the source device 102 may include a video capture device, such as a video camera, a video file containing previously captured raw video and / or a video feed interface for receiving video from from a video content provider. As an additional alternative, video source 104 can generate data based on computer graphics such as the source video or a combination of live video, archived video and computer generated video. In each case, the video encoder 200 encodes the captured, pre-captured or computer generated video data. The video encoder 200 can re-arrange the images from the received order (sometimes referred to as "display order") in an encoding order for encoding. Video encoder 200 can generate a bit stream that includes encoded video data. The source device 102 can then transmit encoded video data via output interface 108 to computer-readable medium 110 for reception and / or retrieval by, for example, input interface 122 of the destination device 116 .
[0048] [0048] Memory 106 of the source device 102 and memory 120 of the destination device 116 represent general purpose memories. In some example, memories 106, 120 can store raw video data, such as raw video from video source 104 and raw and decoded video data from video decoder 300. In addition, or alternatively , memories 106, 120 can store executable software instructions by, for example, video encoder 200 and video decoder 300, respectively. Although shown separately from video encoder 200 and video decoder 300 in this example, it should be understood that video encoder 200 and video decoder 300 may also include internal memories for functionally similar or equivalent purposes. In addition, memories 106, 120 can store encoded video data, such as output from video encoder 200 and input to video decoder 300. In some examples, parts of memories 106, 120 can be allocated as one or more video stores, such as for storing raw, decoded and / or encoded video data.
[0049] [0049] Computer-readable medium 110 can represent any type of medium or device capable of carrying encoded video data from source device 102 to destination device 116. In one example, the medium readable per computer 110 represents a means of communication to enable the source device 102 to transmit encoded video data directly to the destination device 116 in real time, such as via a radio frequency or computer-based network . Output interface 108 can modulate a transmission signal that includes encoded video data, and input interface 122 can modulate the received transmission signal, according to a communication standard, such as a wireless communication protocol. The communication medium can comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can be part of a packet-based network, such as a local area network, an extended area network, or a global network, such as the Internet. The communication means may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from the source device 102 to the destination device 116.
[0050] [0050] In some examples, source device 102 can transmit encrypted data from output interface 108 to storage device 116. Similarly, destination device 116 can access encrypted data from storage device 116 via input interface 122. Storage device 116 may include any of a variety of data storage media distributed or accessed locally, such as a hard disk drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, memory volatile or non-volatile or any other digital storage medium suitable for storing encoded video data.
[0051] [0051] In some examples, source device 102 can transmit encoded video data to file server 114 or another intermediate storage device that can store encoded video generated by source device 102. Destination device 116 can access video data stored from file server 114 via streaming or downloading. File server 114 can be any type of server device capable of storing encoded video data and transmitting that data to destination device 116. File server 114 can represent a web server (such as for a website), a File Transfer Protocol (FTP), a content delivery network device or a Network Attached Storage (NAS) device. Target device 116 can access encoded video data from file server 114 through any standard data connection, including an Internet connection. This can include a wireless channel (such as a Wi-Fi connection), a wired connection (such as DSL, cable modem, etc.) or a combination of both that is suitable for accessing data from encoded video stored on file server 114. File server 114 and input interface 122 can be configured to function according to a streaming transmission protocol, a download transmission protocol or a combination of them.
[0052] [0052] Output interface 108 and input interface 122 can represent wireless transmitters / receivers, modems, wired network components (such as Ethernet cards), wireless communication components that work in accordance with any one various IEEE 802.11 standards or other physical components. In examples where output interface 108 and input interface 122 comprise wireless components, output interface 108 and input interface 122 can be configured to transfer data, such as encoded video data, according to a standard of cellular communication, such as 4G, 4G-LTE (Long Term Evolution), Advanced LTE, 5G or similar. In some examples where output interface 108 comprises a wireless transmitter,
[0053] [0053] The techniques of this disclosure can be applied to video encoding in support of any of a variety of multimedia applications, such as broadcast television over the air, cable television broadcasts, satellite television broadcasts, video broadcasts via Internet streaming, such as dynamic adaptive streaming over HTTP (DASH), digital video that is encoded on a data storage medium, decoding digital video stored on a data storage medium or other applications.
[0054] [0054] Input interface 122 of destination device 116 receives an encoded video bit stream from computer-readable medium 110 (such as, for example, storage device 112, file server 114 or the like). The computer readable medium of the encoded video bit stream 110 may include signaling information defined by the video encoder 200, which is also used by the video decoder 300, such as syntax elements that have values that describe characteristics and / or processing video blocks or other encoded units (such as slices, figures, groups of figures, sequences or the like). The display device 118 displays decoded images of the decoded video data for a user. Display device 118 can represent any of a variety of display devices, such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma monitor, organic light-emitting diode monitor (OLED) or other type of display device.
[0055] [0055] Although not shown in Figure 1, in some examples, video encoder 200 and video decoder 300 may be integrated with an audio encoder and / or audio decoder and may include appropriate MUX-DEMUX units, or other hardware and / or software, to handle multiplexed streams, which include both audio and video in a common data stream. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol or to other protocols, such as the user datagram protocol (UDP).
[0056] [0056] Video encoder 200 and video decoder 300 each can be implemented as any of a variety of suitable encoder and / or decoder circuits, such as one or more microprocessors, digital signal processors (DSPs) , application-specific integrated circuits
[0057] [0057] Video encoder 200 and video decoder 300 can operate according to a video encoding standard, such as ITU-T H.265, also known as High Efficiency Video Encoding (HEVC) or extended to it, such as multiple viewing and / or scalable video encoding extensions. Alternatively, video encoder 200 and video decoder 300 can operate according to other proprietary or industry standards, such as the Joint Exploration Test Model (JEM7) for the Versatile Video Encoding (VVC) standard currently under development . The techniques of this disclosure, however, are not limited to any specific encoding standard and may be applicable to the video encoding standards under development.
[0058] [0058] In general, video encoder 200 and video decoder 300 can perform block-based image encoding. The term "block" generally refers to a structure that includes data to be processed (such as, for example, encoded, decoded or otherwise used in the encoding and / or decoding process). For example, a block can include a two-dimensional array of luminance and / or chrominance data samples. In general, video encoder 200 and video decoder 300 can encode video data represented in the YUV format (such as, for example, Y, Cb, Cr). That is, instead of encoding red, green and blue (RGB) data for samples of an image, the video encoder 200 and video decoder 300 can encode the luminance (luma) and chrominance (chroma) components, where the chrominance components can include both red and blue hue chrominance components. In some examples, video encoder 200 converts received RGB formatted data into a YUV representation prior to encoding and video decoder 300 converts the YUV representation to RGB format. Alternatively, the pre-processing and post-processing units (not shown) can perform these conversions.
[0059] [0059] This disclosure can generally refer to the encoding (such as, for example, encoding and decoding) of images to include the process of encoding or decoding image data. Similarly, this disclosure may refer to the encoding of blocks in an image to include the process of encoding or decoding data for the blocks, such as, for example, prediction and / or residual encoding. An encoded video bit stream usually includes a series of values for syntax elements representative of encoding decisions (such as encoding modes) and image partitioning into blocks. Thus, references to the encoding of a figure or block should generally be understood as encoding values for syntax elements that form the figure or block.
[0060] [0060] HEVC defines several blocks, which include coding units (CUs), prediction units (PUs) and transformation units (TUs). According to HEVC, a video encoder (such as video encoder 200) partitions an encoding tree unit (CTU) into CUs according to a quadtree structure. That is, the video encoder partitions CTUs and CUs into four equal, non-overlapping squares, and each quadtree node has either zero or four child nodes. Nodes without child nodes can be referred to as "leaf nodes" and the CUs of such leaf nodes can include one or more PUs and / or one or more UTs. The video encoder can additionally partition PUs and TUs. The video encoder can additionally partition PUs and TUs. For example, in HEVC, a residual quadtree (RQT) represents the partitioning of TUs. In HEVC, PUs represent inter-prediction data, while TUs represent residual data. CUs that are intra-predicted include intra-predicted information, such as an intra mode indication.
[0061] [0061] In HEVC, the largest encoding unit of a slice is called tree block encoding (CTB). A CTB contains a quad-tree whose nodes are coding units. The size of a CTB can vary from 16x16 to 64x64 in the main HEVC profile (although technically the CTB sizes of 8x8 can be supported). A coding unit (CU) can be the same size as a CTB, although it is as small as 8x8. Each coding unit is coded with a unique mode. When a CU is inter-coded, it can be additionally partitioned into two prediction units (PUs) or become just a PU when additionally the partition does not apply. When two PUs are present in a CU, they can be half-size rectangles or two sizes of ângulos or ¾ CU rectangles.
[0062] [0062] When the CU is coded, a set of movement information is present for each PU. In addition, each PU is coded with a unique inter-prediction mode to derive the set of motion information. In HEVC, the smallest PU sizes are 8x4 and 4x8.
[0063] [0063] As another example, video encoder 200 and video decoder 300 can be configured to work according to JEM. According to JEM, a video encoder (such as video encoder 200) partitions an image into a plurality of encoding tree units (CTUs). The video encoder 200 can partition a CTU according to a tree structure, such as a quadtree binary tree structure (QTBT). JEM's QTBT structure removes the concepts of multiple partition types, such as the separation between HEVC CUs, PUs and TUs. A JT QTBT structure includes two levels: a first level partitioned according to quadtree partitioning and a second level partitioned according to binary tree partitioning. A root node of the QTBT structure corresponds to a CTU. The leaf nodes of the binary trees correspond to the coding units (CUs).
[0064] [0064] In some examples, video encoder 200 and video decoder 300 can use a single QTBT structure to represent each of the luminance and chrominance components, while in other examples, video encoder 200 and video decoder video 300 can use two or more QTBT structures, such as a QTBT structure for the luminance component and another QTBT structure for both chrominance components (or two QTBT structures for the respective chrominance components).
[0065] [0065] Video encoder 200 and video decoder 300 can be configured to use quadtree partitioning by HEVC partitioning, QTBT according to JEM (such as, for example, VVC) or other partitioning structures. For explanatory purposes, the description of the techniques of this disclosure is presented with respect to QTBT partitioning. However, it should be understood that the techniques of this disclosure can also be applied to video encoders configured to use quadtree partitioning, or as well as other types of partitioning.
[0066] [0066] This disclosure can use “NxN” and “N by N” interchangeably to refer to the sample dimensions of a block (such as a CU or other video block) in terms of vertical and horizontal dimensions, such as, for example, 16x16 samples or 16 by 16 samples. In general, a 16x16 CU has 16 samples in the vertical direction (y = 16) and 16 samples in the horizontal direction (x = 16). Similarly, a CU NxN usually has N samples in the vertical direction and N samples in the horizontal direction, where N represents a non-negative integer value. Samples in a CU can be arranged in rows and columns. In addition, CUs do not necessarily have the same number of samples in the horizontal direction as in the vertical direction. For example, CUs can comprise NxM samples, where M is not necessarily equal to N.
[0067] [0067] Video encoder 200 encodes video data for CUs representing prediction and / or residual information and other information. The prediction information indicates how the CU should be predicted in order to form a prediction block for the CU. Residual information generally represents sample-by-sample differences between CU samples prior to coding and the prediction block.
[0068] [0068] To predict a CU, video encoder 200 can generally form a prediction block for the CU through inter-prediction or intra-prediction. Inter-prediction generally refers to the prediction of CU from data from a previously encoded image, while intra-prediction generally refers to prediction of CU from previously encoded data from the same image. To perform inter-prediction, video encoder 200 can generate the prediction block using one or more motion vectors. Video encoder 200 can generally perform a motion search to identify a reference block that closely matches the CU, for example, in terms of differences between the CU and the reference block. Video encoder 200 can calculate a difference metric using an absolute difference sum (SAD), a squared difference sum (SSD), mean absolute difference (MAD), mean squared difference (MSD) or other difference calculations to determine whether a reference block corresponds closely to the current CU. In some instances, video encoder 200 can predict the current CU using unidirectional prediction or bidirectional prediction.
[0069] [0069] For each block, a set of movement information may be available. A motion information set contains motion information for forward and backward prediction directions. Here, the forward and backward prediction directions are two directions of prediction in a bidirectional prediction mode and the terms "forward" and "backward" do not necessarily have a geometric meaning; instead, they correspond to the reference image list 0 (RefPicList0) and the reference image list 1 (RefPicListl) of a current image. When only a list of reference images is available for an image or slice, only RefPicList0 is available and the movement information for each block in a slice is always forward.
[0070] [0070] For each prediction direction, the movement information contains a reference index and a movement vector. In some cases, for simplicity, a motion vector itself can be referred to in a way that is assumed to have an associated reference index. A reference index is used to identify a reference image in the current reference image list (RefPicList0 or RefPicListl). A motion vector has a horizontal and a vertical component.
[0071] [0071] Image order counting (POC) is widely used in video encoding standards to identify an image display order. Although there are cases of two images within an encoded video sequence that can have the same POC value, this typically does not occur within an encoded video sequence. When multiple encoded video streams are present in a bit stream, images with the same POC value can be close to each other in terms of decoding order. The POC values of the images are typically used for the construction of the reference image list, reference image derivation configured as in HEVC and motion vector scaling.
[0072] [0072] JEM also provides a related motion compensation mode, which can be considered an inter-prediction mode. In the related motion compensation mode, video encoder 200 can determine two or more motion vectors that represent non-translational motion, such as zooming in, zooming out, rotation, perspective movement or other types of irregular movement.
[0073] [0073] To perform the intra-prediction, the video encoder 200 can select an intra-prediction mode to generate the prediction block. In some instances, sixty-seven intra-prediction modes are available, including several directional modes, as well as planar mode and DC mode. In general, video encoder 200 selects an intra-prediction mode that describes neighboring samples from a current block (such as, for example, a CU block) from which it predicts samples from the current block. Such samples can generally be above, above and to the left or to the left of the current block in the same image as the current block, assuming the video encoder 200 that encodes CTUs and CUs in raster scan order (left to right, from top to bottom).
[0074] [0074] Video encoder 200 encodes data that represents the prediction mode for a current block. For example, for inter-prediction modes, video encoder 200 can encode data that represents which of the various available inter-prediction modes are used, as well as motion information for the corresponding mode. For unidirectional or bidirectional inter-prediction, for example, video encoder 200 can encode motion vectors using advanced motion vector prediction (AMVP) or blending mode. Video encoder 200 may use similar modes to encode motion vectors for the related motion compensation mode.
[0075] [0075] In the HEVC standard, there are two inter-prediction modes, called mixing (ignoring is considered a special case of mixing) and advanced motion vector (AMVP) prediction modes, respectively, for a prediction unit (PU ). Either in AMVP or blending mode, a list of motion vector (MV) candidates is maintained for multiple motion vector predictors. The motion vectors, as well as the reference indexes in the current PU mixing mode, are generated by taking a candidate from the MV candidate list.
[0076] [0076] The MV candidate list contains up to 5 candidates for mixing mode and only two candidates for AMVP mode. A mix candidate can contain a set of motion information, such as motion vectors corresponding to both the reference image lists (list 0 and list 1) and the reference indices. If a mix candidate is identified by a mix index, reference images are used to predict the current blocks, and the associated motion vectors are determined. However, under AMVP mode for each potential prediction direction from list 0 or list 1, a reference index needs to be flagged explicitly, along with an MVP index for the MV candidate list, provided that the AMVP candidate contains only a motion vector. In AMVP mode, predicted motion vectors can be further refined.
[0077] [0077] As can be seen above, a mix candidate corresponds to a complete set of motion information, while an AMVP candidate contains only one motion vector for a specific prediction direction and reference index. Candidates for both modes are similarly derived from the same spatial and temporal neighboring blocks.
[0078] [0078] For mixing and AMVP, spatial MV candidates are derived from the neighboring blocks shown in Figure 5, for a specific PU (PU0), although the methods for generating candidates from the blocks are different for mixing modes and AMVP. In mixing mode, the positions of five spatial MV candidates are shown in Figure 5. For each candidate position, availability is checked according to the order: {a1, b1, b0, a0, b2}.
[0079] [0079] In AMVP mode, neighboring blocks are divided into two groups: left group, which includes blocks a0 and a1, and group above, which includes blocks b0, b1 and b2, as shown in Figure 5. For the group left, availability is checked according to the order: {a0, a1}. For the above group, availability is checked according to the order: {b0, b1, b2}. For each group, the potential candidate in a neighboring block that refers to the same reference image as that indicated by the flagged reference index has the highest priority to be chosen to form a final candidate for the group. It is possible that all neighboring blocks do not contain a motion vector that points to the same reference image. Therefore, if that candidate cannot be found, the first available candidate will be scheduled to form the final candidate; thus, differences in temporal distance can be compensated.
[0080] [0080] There are other candidates alongside the neighboring space candidates for mixing and AMVP mode. In mixing mode, after validating spatial candidates, two types of redundancy are removed. If the candidate position for the current PU is referred to the first PU within the same CU, the position is excluded, as the same mixture could be achieved by a CU without dividing into prediction partitions. In addition, any redundant entries where applicants have exactly the same transaction information will also be excluded. After neighboring spatial candidates are verified, temporal candidates are validated. For the temporal candidate, the lower right position outside the PU placed from the reference image is used, if available. Otherwise, the central position will be used instead. The way of choosing the placed PU is similar to that of previous standards, but HEVC allows more flexibility by transmitting an index to specify which reference image list is used for the placed reference image. A problem related to the use of the temporal candidate is the amount of memory to store the movement information of the reference image. This is solved by restricting the granularity for storing temporal movement candidates for only the resolution of a 16x 16 luma grid, even when smaller PB (or possibly PU) structures are used at the corresponding location in the reference image.
[0081] [0081] In addition, a PPS level flag (set of image parameters) allows the video encoder 200 to disable the use of the temporal candidate, which is useful for applications with error-prone transmission.
[0082] [0082] In AMVP mode, HEVC only allows a much smaller number of candidates to be used in the case of motion vector prediction process, since video encoder 200 can send a coded difference of the change in the motion vector (such as a motion vector difference (MVD)). In addition, video encoder 200 can perform motion estimation, which is one of the most expensive computational operations on video encoder 200, and complexity is reduced by allowing a small number of candidates. When the reference index of the neighboring PU is not the same as the current PU, a staggered version of the motion vector is used. The neighboring motion vector is scaled according to the time distances between the current image and the reference images indicated by the reference indexes of the neighboring PU and the current PU, respectively. When two spatial candidates have the same components as the motion vector, a redundant spatial candidate is excluded. When the number of motion vector predictors is not equal to two and the use of temporal MV prediction is not explicitly disabled, the temporal MV prediction candidate is included. This means that the temporal candidate is not used at all when two spatial candidates are available. Finally, the predefined motion vector, which is a zero motion vector, is included repeatedly until the number of motion vector prediction candidates equals two, which ensures that the number of motion vector predictors is two . Thus, only a coded flag is needed to identify which motion vector prediction is used in the case of AMVP mode.
[0083] [0083] A candidate for temporal motion vector prediction (TMVP), if enabled and available, is added to the MV candidate list after spatial motion vector candidates. The motion vector derivation process for a TMVP candidate is the same for both mixing and AMVP modes, however, the target reference index for the TMVP candidate in mixing mode is always set to 0.
[0084] [0084] The location of the primary block for derivation of the TMVP candidate is the lower right block outside the placed PU, as shown in Figure 6A as a “T” block, to compensate for the bias of the above and left blocks used to generate spatial neighboring candidates. However, if that block is located outside the current CTB line or the movement information is not available, the block is replaced by a central block from the PU.
[0085] [0085] The motion vector for the TMVP candidate is derived from the co-located PU of the co-located image, indicated at the slice level. The motion vector for the co-located PU is called the placed MV. Similar to the direct temporal mode in stroke, to derive the candidate motion vector TMVP, the co-located MV can be scaled to compensate for differences in temporal distance, as shown in Figure 6B.
[0086] [0086] Several aspects of mixing modes and AMVP are worth mentioning, as follows:
[0087] [0087] Motion vector scaling: It is assumed that the value of motion vectors is proportional to the distance of the images at the time of presentation. A motion vector associates two figures, the reference image and the figure containing the motion vector (namely, the continent image). When a motion vector is used to predict the other motion vector, the distance from the main image and the reference image is calculated based on the Image Order Count (POC) values.
[0088] [0088] For a motion vector to be predicted, both its associated continent image and the reference image may be different. Therefore, a new distance (based on POC) is calculated and the motion vector is scaled based on these two POC distances. For a neighboring spatial candidate, the continent images for the two motion vectors are the same, while the reference images are different. In HEVC, the scaling of the motion vector applies to both TMVP and AMVP for neighboring spatial and temporal candidates.
[0089] [0089] Artificial motion vector candidate generation: If a motion vector candidate list is not complete, artificial motion vector candidates will be generated and inserted at the end of the list until the list has all the candidates.
[0090] [0090] In mixing mode, there are two types of artificial MV candidates: the combined candidate derived only for slices B and the zero candidates, used only for AMVP if the first type does not provide enough artificial candidates.
[0091] [0091] For each pair of candidates that are already on the candidate list and have required motion information, the bidirectional combined motion vector candidates are derived by a combination of the first candidate's motion vector that refers to an image in the list 0 and the movement vector of a second candidate referring to a figure in list 1.
[0092] [0092] Suppression process for entering candidates: It may happen that candidates from different blocks are the same, which reduces the efficiency of a list of candidates for mixing / AMVP. A suppression process is applied to solve this problem. It compares one candidate with the others on the current candidate list to avoid entering identical candidates to some extent. To reduce complexity, only a limited number of suppression processes are applied instead of comparing each potential candidate with all others.
[0093] [0093] Parallel mix / level processing in HEVC: In HEVC, a LCU can be divided into parallel motion estimation regions (MERs) and allow only those neighboring PUs that belong to MERs other than the current PU to be included in the mix / skip the MVP list building process. The MER size is signaled in the image parameter configured as log2_parallel_merge_level_minus2. When the size of the MER is greater than NxN, where 2Nx2N is the smallest size of the CU, the MER results in a way that a neighboring spatial block, if it is internal to the same MER as the current PU, is considered unavailable.
[0094] [0094] The motion vector used for chroma coding can be scaled from the motion vector used for luma. The motion vector is derived for the luma component of a current PU / CU. Before being used for chroma motion compensation, the motion vector is scaled, based on the chroma sampling format.
[0095] [0095] In addition to intra-prediction or inter-prediction of a block, another encoding mode includes the intra-block copy prediction (IBC) mode and is included in the HEVC screen content encoding (SCC) extension. In the IBC prediction mode, a current CU / PU is predicted from an already decoded block of the current image / slice referred to by a current CU / PU block vector, as shown in Figure 7. Note that the prediction is reconstructed, but without loop filtering, which includes unlocking and Adaptive Sample Shifting (SAO).
[0096] [0096] In inter-prediction, a current CU / PU is predicted from an already decoded block, but in a different image, but in the intra-block copy, the already decoded block and the current block are in the same image. In intra-prediction, a current CU / PU is predicted from samples in the current image / slice, but is not referred to by a block vector as in the intra-block copy. In some instances, intra-block copying can be considered as a form of inter-prediction, where the current image is included in the reference image list.
[0097] [0097] For block compensation in the intra-block copy (BC), for the luma component or the chroma components that are encoded with Intra BC, the block compensation is done with integer block compensation, therefore, it is not necessary interpolation. In addition, the block vector is predicted and signaled at the integer level. In the current screen content encoding (SCC), the block vector predictor is set to (-w, 0) at the beginning of each CTB, where w is the width of the CU. Such a block vector predictor is updated to be one of the latest encoded CU / PU, if it is encoded with the Intra BC mode. If a CU / PU is not encoded with Intra BC, the block vector predictor will remain (as, for example, remains) unchanged. After the prediction of the block vector, the difference of the block vector is coded using the difference coding method (MVD) MV, which is HEVC. The current Intra BC is enabled at both the CU and PU level. For the intra BC PU level, the PU partition of 2NxN and Nx2N is supported for all CU sizes. In addition, when CU is the lowest CU, PU NxN partition is supported.
[0098] [0098] The following describes ways in which intra-BC is treated similarly to inter-prediction. In U.S. Patent Publication No. 2015/0271487, the current image is used as a reference image and added to the reference list. So, Intra-BC is treated as Inter mode. In Li B et alii, “Non-SCCEl: Unification of intra-BC and inter modes”, 18, JCT-VC MEETING; 30-6-2014 - 7-7-2014; SAPPORO; (JOINT COLLABORATION TEAM IN VIDEO CODING OF ISO / IEC JTC1 / SC29 / WG11 AND ITU-T SG.16); URL: http://wftp3.itu.int/av-arch/jctvc- site /, No. JCTVC-R0100-v2, 29 June 2014 (29-06-2014), XP030116357 (hereinafter, “ JCTV-R0100 ”), the unification of Inter and Intra-BC is described. The current image is added to the reference list. It is marked as long term before decoding and marked as short term after decoding the current image. When Intra-BC is enabled, the syntax parse process and the slice P decoding process is followed by slice I.
[0099] [0099] In the USAN Patent Publication 2016/0057420, some solutions are proposed to solve the problems related to the derivation of temporal movement vector prediction (TMVP), interaction with restricted intra-prediction, construction of reference list and so on. onwards. In USAN Patent Publication 2016/0100189, when Intra BC is treated as Inter mode, some solutions are proposed to avoid the extra verification of conditions and to solve the problems that exist for the interaction between TMVP, restricted intra-prediction, precision Intra BC MV and so on.
[0100] [0100] In the final HEVC SCC, the information of which can be found in “Intra-Block Copy in HEVC Screen Content Extensions”, IEEE Journal on Emerging and Selected Topics in Circuits and Systems (volume 6, edition 4), Ten 2016, by Xiaozhong Xu, Shan Liu, Tzu-Der Chuang, Yu-Wen Huang, Shaw-Min Lei, Krishnakanth Rapaka, Chao Pang, Vadim Seregin, Ye-Kui Wang and Marta Karczewicz, when IBC mode is enabled on image level, the current reconstructed image is also a reference image for decoding the current slice. To avoid possible scaling of the motion vector for the prediction of the time motion vector, this reference image is marked as “used for long-term reference” during the decoding of the current image. This reference image is placed in the list of reference images (RPL) of list 0 and also in list 1 for slices B. It is located in the last position of the initial RPL, after the long-term reference images (when applicable). The NumPicTotalCurr variable is increased by 1 accordingly, when the current reconstructed image is added to the initial RPL.
[0101] [0101] Following prediction, such as intra-prediction, inter-prediction or IBC prediction of a block, video encoder 200 can calculate residual data for the block. Residual data, like a residual block, represents samples due to sample differences between the block and a prediction block for the block, formed using the corresponding prediction mode. Video encoder 200 can apply one or more transforms to the residual block to produce transformed data in a transform domain instead of the sample domain. For example, video encoder 200 can apply a discrete cosine transform (DCT), an integer transform, a wavelet transform or a transform conceptually similar to residual video data. In addition, video encoder 200 can apply a secondary transform after the first transform, such as a mode-dependent non-separable secondary transform (MDNSST), a signal-dependent transform, a Karhunen-Loeve transform (KLT), or similar. The video encoder 200 produces transform coefficients after the application of one or more transforms.
[0102] [0102] As noted above, following any transforms for producing transform coefficients, the video encoder 200 can perform quantization of the transform coefficients. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, providing additional compression. In performing the quantization process, the video encoder 200 can reduce the bit depth associated with some or all of the coefficients. For example, video encoder 200 can round an n-bit value to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, the video encoder 200 may shift the value to be quantized to the right bit by bit.
[0103] [0103] Following quantization, video encoder 200 can scan the transform coefficients, producing a one-dimensional vector from the two-dimensional matrix, which includes the quantized transform coefficients. The scan can be designed to place high energy (and therefore lower frequency) coefficients in front of the vector and to place low energy (and therefore high frequency) transform coefficients at the rear of the vector. In some examples, video encoder 200 may use a predefined scan order to scan the quantized transform coefficients to produce a serialized vector and then entropy encoding the quantized transform coefficients of the vector. In other examples, video encoder 200 can perform adaptive scanning. After scanning the quantized transform coefficients to form the one-dimensional vector, the video encoder 200 can entropy encode the one-dimensional vector, for example, according to the context-adaptive binary arithmetic coding (CABAC). The video encoder 200 can also entropy encode the encoding values for syntax elements that describe metadata associated with the encoded video data for use by the video decoder 300 in decoding the video data.
[0104] [0104] To perform CABAC, the video encoder 200 can assign a context within a context model to a symbol to be transmitted. The context can be related to, for example, whether neighboring values of the symbol are zero or not. The probability determination can be based on the context assigned to the symbol.
[0105] [0105] Video encoder 200 can additionally generate syntax data, such as block-based syntax data, image-based syntax data and sequence-based syntax data, for video decoder 300, such as, for example, in an image header, a block header, a slice header, or other syntax data, such as a sequence parameter set (SPS), image parameter set (PPS), or video parameter set (VPS). The video decoder 300 can also decode such syntax data to determine how to decode the corresponding video data.
[0106] [0106] In this way, video encoder 200 can generate a bit stream that includes encoded video data, such as, syntax elements that describe partitioning an image into blocks (such as, for example, CUs) and prediction and / or residual information for the blocks. Finally, the video decoder 300 can receive the bit stream and decode the encoded video data.
[0107] [0107] In general, the video decoder 300 performs a reciprocal process to that performed by the video encoder 200 to decode the encoded video data of the bit stream. For example, video decoder 300 can decode values for bit stream syntax elements using CABAC in a substantially similar, though reciprocal, way to the video encoder 200 CABAC encoding process. Syntax elements can define partitioning an image into CTUs, and partitioning each CTU according to a corresponding partition structure, such as a QTBT structure, to define CTU CUs. Syntax elements can additionally define prediction and residual information for blocks (such as CUs) of video data.
[0108] [0108] Residual information can be represented by, for example, quantized transform coefficients. The video decoder 300 can perform reverse quantization and reverse transformation on the quantized transform coefficients of a block to reproduce a residual block for the block. The video decoder 300 uses a signaled prediction mode (intra- or inter-prediction) and related prediction information (such as, for example, motion information for inter-prediction) to form a prediction block for the block. The video decoder 300 can then combine the prediction block and the residual block (on a sample by sample basis) to reproduce the original block. The video decoder 300 can perform additional processing, such as performing an unlocking process to reduce visual artifacts along the block boundaries.
[0109] [0109] This disclosure can generally refer to the “signaling” of certain information, such as elements of syntax. The term "signaling" can generally refer to the communication of values of syntax elements and / or other data used to decode encoded video data. That is, video encoder 200 can signal values for syntax elements in the bit stream. In general, signaling refers to the generation of a value in the bit stream. As noted above, source device 102 can transport the bit stream to destination device 116 substantially in real time, or in non-real time, which can happen when it stores syntax elements in storage device 112 for later retrieval by target device 116.
[0110] [0110] As described above, in some examples, the luma and chroma components can be partitioned in different ways. For example, a video encoder (such as video encoder 200 or video decoder 300) can be configured to partition samples of a first color component (such as a luma component) according to a first tree partitioning and partitioning samples of a second color component (such as chroma component) according to a second partition tree. The partition result can be multiple luma blocks and corresponding chroma blocks. A chroma block and a luma block are corresponding blocks if the chroma block and the luma block belong to the same CU. For example, as described above, the video data is represented in the format Y, Cb, Cr. A CU is an image block and a CU includes a luma block and its corresponding chroma blocks that together represent the color in the image for the block.
[0111] [0111] In some examples, chroma components are under-sampled relative to luma components. For example, in 4: 2: 2 sub-sampling, the 8x8 samples of the luma component correspond to the 4x4 samples of each of the chroma components. Due to sub-sampling, each chroma block can correspond to a plurality of luma blocks. For example, it is assumed that the 8x8 samples of the luma component are divided into four luma 4x4 blocks and the 4x4 samples of a chroma component form a 4x4 chroma block. In this example, the four luma blocks correspond to a chroma block.
[0112] [0112] For the IBC prediction mode, to reduce the amount of information that the video encoder 200 may need to signal to the video decoder 300, instead of signaling information indicative of the block vector for a chroma block, it may be possible for the chroma block inherit the block vector from its corresponding luma block with some scaling for the purpose of sub-sampling. In examples where the luma and chroma components are partitioned in the same way, some technical problems associated with the inheritance of block vectors can be minimized because the chroma blocks and the luma blocks are of similar size and shape (as, for example, both are square blocks of size 4x4).
[0113] [0113] However, in examples where the luma and chroma components are partitioned in different ways, there may be technical problems with video encoding. For example, due to the different partitioning schemes for luma and chroma components, a chroma block can correspond to a plurality of luma blocks, where two or more luma blocks have different size and shape than the chroma block. In such examples, it may not be known which luma block or luma blocks the chroma block should inherit information from the block vector.
[0114] [0114] In one or more examples, the video encoder can be configured to determine a plurality of blocks of the first color component (such as, for example, luma component) that correspond to a block of the second color component (such as chroma block). The video encoder can be configured to partition the block of the second color component based on the first partition tree (such as, based on the way in which the first color component is partitioned) to generate sub-blocks of the block of the second color component. In this example, since the chroma block is partitioned based on the first partition tree, there can be a corresponding luma block for each chroma sub-block. For example, the plurality of luma blocks together correspond to the chroma block. After partitioning the chroma block into sub-blocks, each of the sub-blocks corresponds to one of the plurality of luma blocks.
[0115] [0115] The video encoder can determine one or more block vectors for one or more block sub-blocks of the second color component based on one or more block vectors of the plurality of blocks of the first color component that are predicted in the IBC prediction mode. For example, of the plurality of luma blocks, a subset of these luma blocks can be predicted in the IBC prediction mode. Each luma block in the subset corresponds to a subblock of the chroma block, and the subblocks of the chroma block can inherit the vector of the luma block from the subset to which the subblock corresponds.
[0116] [0116] Thus, for a chroma block that is predicted in IBC mode and where different partition trees are used to partition luma and chroma components, video encoder 200 may not need to signal information to the video decoder 300 indicative of the vector from block to chroma block. Instead, video encoder 200 and video decoder 300 can partition the chroma block into sub-blocks in a manner similar to the partitioning used for the luma component, so that there is a one-to-one correspondence between the luma and the sub-blocks of the chroma block. In some examples, not all luma blocks can be predicted in the IBC prediction mode, and in such examples, the video encoder 200 and video decoder 300 can assign the block vectors of the luma blocks that are predicted in the prediction mode IBC to their corresponding chroma block sub-blocks, as a way for chroma block sub-blocks to inherit block vectors from luma blocks.
[0117] [0117] In addition to assigning block vectors to the chroma block sub-blocks based on the block vectors of the plurality of luma blocks, the video encoder 200 and the video decoder 300 may need to scale the block vectors based on the format sub-sampling. For example, if the 4: 2: 2 sub-sampling format is used, then video encoder 200 and video decoder 300 can divide both the x and y components of the block vectors by two.
[0118] [0118] In the examples above, chroma sub-blocks are described as block vectors inherited from luma blocks. In some instances, the opposite can occur. For example, video encoder 200 or video decoder 300 can determine a block vector for a chroma block and then partition samples from a luma component based on the way in which the chroma component samples were partitioned. Video encoder 200 and video decoder 300 can then assign block vectors to the luma blocks generated from the partition. To facilitate the description, exemplary techniques are described in relation to the chroma sub-blocks that inherit (with potential scaling) block vectors from the corresponding corresponding luma blocks. In some examples, the corresponding luma blocks can be formed in the same way (such as, for example, horizontal rectangles, vertical rectangles or squares) as the sub-blocks of the chroma block.
[0119] [0119] In addition to describing techniques for determining block vectors for subblocks of a chroma block to allow subblocks to inherit block vectors, the disclosure describes techniques related to other problems that may be present with the prediction mode IBC in VVC. These techniques can be used in combination with the techniques for sub-blocks of a chroma block that inherit block vectors or they can be separated from the techniques for sub-blocks of a chroma block that inherit block vectors.
[0120] [0120] Figures 2A and 2B are conceptual diagrams showing an example of a quadtree binary tree structure (QTBT) 130 and a corresponding coding tree unit (CTU) 132. The solid lines represent the quadtree division and the dotted lines indicate the binary tree split. At each division node (ie, not leaf) of the binary tree, a flag is signaled to indicate which type of division (ie, horizontal or vertical) is used, where 0 indicates horizontal division and 1 indicates vertical division in this example. For the quadtree division, there is no need to indicate the type of division, since the quadtree nodes divide a block horizontally and vertically into four sub-blocks of equal size. Therefore, video encoder 200 can encode and video decoder 300 can decode syntax elements (such as split information) for a tree-level region of the structure
[0121] [0121] In general, CTU 132 of Figure 2B can be associated with parameters that define block sizes corresponding to the nodes of the QTBT 130 structure in the first and second levels. These parameters can include a CTU size (representing a CTU 132 size in samples), a minimum quadtree size (MinQTSize, representing a minimum allowed quadtree leaf node size), a maximum binary tree size (MaxBTSize, which represents a minimum allowed quadtree root node size), a maximum binary tree depth (MaxBTDepth, which represents a maximum allowed binary tree depth) and a minimum binary tree size (MinBTSize, which represents the leaf node size minimum allowed binary tree length).
[0122] [0122] The root node of a QTBT structure that corresponds to a CTU can have four child nodes at the first level of the QTBT structure, each of which can be partitioned according to quadtree partitioning. That is, the first level nodes are either leaf nodes (without child nodes) or have four child nodes. The QTBT 130 structure example represents such nodes as including the parent node and the child nodes that have solid lines for branches. If the first level nodes are not larger than the maximum allowed binary tree root node size (MaxBTSize), they can be further partitioned by the respective binary trees. A node's binary tree split can be iterated until the resulting nodes from the split reach the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). The QTBT 130 structure example represents such nodes with dashed lines for branches. The binary leaf-tree node is referred to as a coding unit (CU), which is used for prediction (such as, for example, intra-image or inter-image prediction) and transformed without any additional partitioning. As discussed above, CUs can also be referred to as “video blocks” or “blocks”.
[0123] [0123] In an example of the QTBT partitioning structure, the CTU size is set to 128x128 (luma samples and two corresponding 64x64 chroma samples), MinQTSize is set to 16x16, MaxBTSize is set to 64x64, MinBTSize (for width and height) is set to 4, and MaxBTDepth is set to 4. Quadtree partitioning is first applied to CTU to generate quadtree leaf nodes. Quadtree leaf nodes can be from 16x16 (ie, MinQTSize) to 128x128 (ie, CTU size). If the quadtree leaf node is 128x128, it will not be further divided by the binary tree, as the size exceeds MaxBTSize (that is, 64x64, in this example). Otherwise, the quadtree leaf node will be additionally partitioned by the binary tree. Therefore, the quadtree leaf node is also the root node for the binary tree and has the depth of the binary tree as 0. When the depth of the binary tree reaches MaxBTDepth (4, in this example), no further division is allowed. When the binary tree node has a width equal to MinBTSize (4, in this example), it implies that no further horizontal splitting is allowed. Similarly, a binary tree node that has a height equal to MinBTSize implies that no additional vertical splitting is allowed for that binary tree node. As noted above, the leaf nodes of the binary tree are referred to as CUs and are further processed according to the prediction and transformed without further partitioning.
[0124] [0124] In the VCEG COM16-C966 proposal: J. An, Y.- W. Chen, K. Zhang, H. Huang, Y.W. Huang and S. Lei., “Block partitioning structure for next generation video coding”, International Telecommunication Union, COM16-C966, September 2015, a quad-tree binary tree (QTBT) has been described for the video encoding in addition to HEVC. Simulations showed that the proposed QTBT structure is more efficient than the non-HEVC quad-tree structure used.
[0125] [0125] In the QTBT structure, as described above, a CTB is first partitioned by quad-tree, where the quad-tree division of a node can be iterated until the node reaches the minimum allowed quad-tree leaf node size ( MinQTSize). If the quad-tree leaf node size is not greater than the maximum allowed binary tree root node size (MaxBTSize), it can be additionally partitioned by a binary tree. The division of a node's binary tree can be iterated until the node reaches the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). The binary tree leaf node is namely the CU that will be used for prediction (such as, for example, intra-image or inter-image prediction) and transformed without any additional partitioning.
[0126] [0126] There are two types of division in the binary tree division, horizontal symmetric division and vertical symmetric division. In an example of the QTBT partitioning structure, the CTU size is set to 128x128 (luma samples and two corresponding 64x64 chroma samples), MinQTSize is set to 16x16, MaxBTSize is set to 64x64, MinBTSize (for width and height ) is set to 4 and MaxBTDepth is set to 4. Quadtree partitioning is first applied to the CTU to generate quad-tree leaf nodes.
[0127] [0127] Quad-tree leaf nodes can have a size of 16x16 (that is, MinQTSize) to 128x128 (that is, the size of the CTU). If the quad-tree leaf node is 128x128, it will not be further divided by the binary tree, as the size exceeds MaxBTSize (that is, 64x64). Otherwise, the quad-tree leaf node will be additionally partitioned by the binary tree. Therefore, the quad-tree leaf node is also the root node for binary tree and has binary tree depth as 0. When the depth of the binary tree reaches MaxBTDepth (ie, 4), it implies that there is no further division. When the binary tree node has a width equal to MinBTSize (that is,
[0128] [0128] Figure 2B shows an example of block partitioning using QTBT, and Figure 2A shows the corresponding tree structure. The solid lines indicate the division in quad-tree and the dotted lines indicate the division in binary tree. At each division node (ie, not leaf) of the binary tree, a flag is signaled to indicate which type of division (ie, horizontal or vertical) is used, where 0 indicates horizontal division and 1 indicates vertical division. For quad-tree division, there is no need to indicate the type of division, as it always divides a block horizontally and vertically into 4 sub-blocks of equal size.
[0129] [0129] In addition, the QTBT block structure supports the features that luma and chroma have from the separate QTBT structure. Currently, for slices P and B, the luma and chroma CTUs in a CTU share the same QTBT structure. For slice, the CTU luma is partitioned into CUs by a QTBT structure (such as, for example, the first partition tree) and the CTU chroma is partitioned into chroma CUs by another QTBT structure (such as, for example, a different second partition tree ). This means that a CU in a slice I includes a coding block of the luma component or coding blocks of two chroma components and a CU in the P and B slice of a CU includes coding blocks of all three color components.
[0130] [0130] Next, the sub-block motion vector candidates are described, as shown in Figure 8. In JEM with QTBT, each CU can have at most a set of motion parameters for each prediction direction. Two sub-CU level motion vector prediction methods are considered in video encoder 200 by dividing a large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU. The alternative temporal motion vector (ATMVP) prediction method allows each CU to search multiple sets of motion information from multiple smaller blocks of the current CU in the placed reference image. In the method of prediction of space-time motion vectors (STMVP), the motion vectors of the sub-CUs are recursively derived using the temporal motion vector predictor and the spatial neighboring motion vector. To preserve a more accurate motion field for the prediction of sub-CU motion, motion compression for reference frames is currently disabled.
[0131] [0131] For example, in Figure 8, block 400 refers to the block for which ATMVP is to be performed and is shown to be divided into NxN sub-PUs. Block 400 is co-located with block 401 in a motion source image. Block 401 is divided into corresponding blocks for the divided blocks of block 400. Block 401 includes divided blocks 402A-402D. Block 402D can be a representative central block at ATMVP. Each of blocks 402A through 402D can include a motion vector (such as MV0, which is a motion vector that points to a block of an image in a first reference image list, or MV1 which is a vector of movement that points to a block in a second list of reference images) or two motion vectors (such as MV0 and MV1). In Figure 8, motion vectors 404A, 404B and 404C are all motion vectors that point to blocks in figures in a first list of reference images (as, for example, they are all MV0s) and motion vectors 406A, 406B and 406C are all motion vectors that point to blocks in images in a second list of reference images (as, for example, they are all MV1s).
[0132] [0132] For adaptive motion vector resolution, sub-pixel motion compensation is usually much more efficient than integer pixel motion compensation. However, for some content, such as texture with very high frequency or screen content, sub-pixel motion compensation does not show better or worse performance. In such cases, it is best to have only MVs with integer pixel precision. In U.S. Patent Publication No. 2015/0195562, it is proposed that the MV precision information (either integer pel or ¼ pel) be flagged for a block. In EUAN Serial Order 15 / 724,044, filed on October 3, 2017 (published as EUAN Patent Patent 2018/0098089), it is proposed to allow more precision of MV, such as 4-pel or 8-pel .
[0133] [0133] For derivation of motion vector on the decoder side (DMVD) in JEM, in the JEM reference software, there are several coding tools that derive or refine the motion vector (MV) for a current block on the decoder side ( on the video decoder 300). These decoder-side MV (DMVD) derivation approaches are further described in more detail below.
[0134] [0134] The standard combined motion vector derivation mode (PMMVD) is a special blending mode based on the Frame Rate Upward Conversion (FRUC) techniques. With the PMMVD mode, the movement information of a block is not signaled, but is derived on the decoder side. This technology was included in JEM.
[0135] [0135] In PMMVD mode, a FRUC flag is flagged for a CU when its mix flag is true. When the FRUC flag is false, a mixing index is signaled and regular mixing mode is used. When the FRUC flag is true, an additional FRUC mode flag is flagged to indicate which method (bilateral match or model match) should be used to derive motion information for the block.
[0136] [0136] During the motion derivation process, an initial motion vector is first derived for the entire CU based on bilateral match or model match. First, the CU mix list, or generated PMMVD call, is checked and the candidate leading to the minimum match cost is selected as the starting point. Then, a local search based on bilateral correspondence or model matching around the starting point is performed and the MV results in the minimum correspondence cost are taken as MV for the entire CU. Subsequently, the movement information is further refined at a sub-block level, with the derived CU motion vectors as starting points.
[0137] [0137] As shown in Figure 9, bilateral correspondence is used to derive movement information from the current block, finding the best correspondence between two reference blocks along the movement path of the current block in two different reference images. Under the assumption of continuous motion path, the motion vectors MV0 506 and MV1 508 for the current block in the current image 500 which points to the two reference blocks (as, for example, MV0 points to the reference block R0 502 and MV1 508 points to the reference block R1 504) can be proportional to the time distances between the current image and the two reference images. As a special case, when the current image is temporally between the two reference images and the temporal distance from the current image to the two reference images is the same, bilateral correspondence becomes two-way MV based on mirroring.
[0138] [0138] As shown in Figure 10, model matching is used to derive movement information from the current block by finding the best correspondence between a model (upper and / or left neighboring blocks of the current block) in the current image and a block ( same size as the model) in a reference image.
[0139] [0139] On the encoder side (such as video encoder 200), the decision on whether to use FRUC mixing mode for a CU is based on the selection of cost RD (rate distortion) as done for a normal mixing candidate. That is, the two correspondence modes (bilateral correspondence and model correspondence) are both checked for a CU using RD cost selection. What leads to the minimum cost is additionally compared with other CU modes. If a FRUC match mode is the most efficient, a FRUC flag is set to true for CU and the related match mode is used.
[0140] [0140] At the fifth JVET meeting, the JVET-E0035 further described techniques for matching FRUC models. A flow chart of the existing FRUC model matching mode is shown in Figure 11A. In the first step, a model T0 (and its corresponding movement information MV0) is found to match the current model Tc of the current block from the reference images in list0. In the second step, the T1 model (and its corresponding movement information MV1) is found from the reference images in list1. The movement information obtained MV0 and MV1 are used to perform bi-prediction to generate a predictor of the current block.
[0141] [0141] The existing FRUC model matching mode can be improved by introducing bidirectional model matching and adaptive selection between uni-prediction and bi-prediction. The modifications are highlighted in italics and underlined in Figure 11B compared to Figure 11A.
[0142] [0142] Bidirectional model matching is implemented based on existing unidirectional model matching. As shown in Figure 11B, a corresponding model T0 is found for the first time in the first stage of the corresponding model from the reference images of list0 (note that list0 here is taken as an example only). In fact, if list0 or list1 is used in the first step, it is adaptive to the initial distortion cost between the current model and the initial model in the corresponding reference image. The initial model can be determined with initial movement information for the current block that is available before the first corresponding model is made. The list of reference images corresponding to the minimum initial model distortion cost will be used in the first stage of model matching. For example, if the initial model distortion cost corresponding to list0 is not greater than the cost corresponding to list1, list0 will be used in the first stage of model matching and list1 will be used in the second stage, then the current model TC of current block is updated as follows: T'C = 2 * TC - T0.
[0143] [0143] The current updated model T'C, instead of the current model TC, is used to find another corresponding model T1 from the reference images in list1 in the second corresponding model. As a result, the corresponding model T1 is founded by the joint use of reference images from list0 and list1. This matching process is called bidirectional model matching.
[0144] [0144] The proposed selection between uni-prediction and bi-prediction for motion compensation prediction (MCP) is based on model matching distortion. As shown in Figure 11B, during model matching, the distortion between model T0 and TC (the current model) can be calculated as cost0, and the distortion between model T1 and T'C (the current model updated) can be calculated as cost1. If cost0 is less than 0.5 * cost1, the MV0 based prediction will be applied to the FRUC model matching mode; otherwise, bi- prediction based on MV0 and MV1 is applied. Note that cost0 is compared to 0.5 * cost1, as cost1 indicates a difference between model T1 and T'C (the current model updated), which is 2 times the difference between TC (the current model) and its prediction of 0.5 * (T0 + T1). Note that the proposed methods can be applied only to movement refinement at the PU level. The refinement of movement at the sub-PU level remains unchanged.
[0145] [0145] The bilateral model correspondence is described below. A bilateral model is generated as the weighted combination of the two prediction blocks, from the initial MV0 of list0 and MV1 of list1, respectively, as shown in Figure 12. The model matching operation includes the calculation of cost measures between the generated model and the sample region (around the initial prediction block) in the reference image. For each of the two reference images, the MV that generates the minimum model cost is considered as the updated MV in this list to replace the original. Finally, the two new MVs, that is, MV0 'and MV1', as shown in Figure 12, are used for regular bi-prediction. As it is commonly used in the estimation of the movement of block correspondence, the sum of absolute differences (SAD) is used as a cost measure. The proposed decoder-side motion vector (DMVD) derivation is applied to the bi-prediction blend mode with one of the reference image in the past and the other of the reference image in the future, without transmitting an element of reference. additional syntax. In JEM4.0, when the LIC, albeit, ATMVP or STMVP or FRUC blend candidate is selected for a CU, the DMVD is not applied.
[0146] [0146] The bidirectional optical flow in the JEM is described below. The bidirectional optical flow (BIO) is a refinement of movement in the pixel sense, which is performed on the compensation of motion in the block sense in a case of bi-prediction. Once compensated, the fine movement can be internal to the block, allowing the BIO to result in an enlarged block size for motion compensation. Refining movement at the sample level does not require exhaustive search or signaling, as there is an explicit equation that determines a fine motion vector for each sample. To facilitate the explanation, the example is described with reference to Figure 13.
[0147] [0147] Let I (k) be the luminance value from the reference k (k = 0, 1) after the movement of the compensation block and are horizontal and vertical components of the gradient I (k), respectively. Assuming that the optical flow is valid, the motion vector field (vx, vx), shown in Figure 13, is given by the equation. (1)
[0148] [0148] Combining the optical flow equation with the Hermite interpolation for the movement path of each sample, a single third order polynomial is obtained that corresponds to both I (k) and derivative function values at the ends. The value of this polynomial at t = 0 is the prediction of BIO (2)
[0149] [0149] Here τ0 (T0 in Figure 13) and τ1 (T1 in Figure 13) denote the distance from the reference frames, as shown in Figure 13. The distances τ0 and τ1 are calculated based on the value of the image order count (POC) for Ref0 and Ref1: τ0 = POC (current) - POC (Ref0), τ1 = POC (Ref1) - POC (current). If both predictions come from the same time direction (both from the past or both from the future), the signs will be different τ0 · τ1 <0. In this case, BIO will be applied only if the prediction does not come from the same moment (τ0 ≠ τ1), both referenced regions have nonzero movement (MVx0, MVy0, MVx1, MVy1 ≠ 0) and the block movement vectors are proportional to the time distance (MVx0 / MVx1 = MVy0 / MVy1 = - τ0 / τ1).
[0150] [0150] The motion vector field (vx, vx) is determined by minimizing the difference Δ between the values at points A and B (intersection of movement path and reference frame planes in Figure 13). The model uses only the first linear term of the Taylor local expansion for Δ: (3)
[0151] [0151] All values in (1) depend on the sample location (i ', j'), which has been omitted so far. Assuming that the movement is consistent in the local environment, we minimize Δ within the square window (2M + 1) x (2M + 1) izada centered on the current predicted point (i, j):
[0152] [0152] For this optimization problem, a simplified solution is used, minimizing first vertically and then horizontally. This results in:
[0153] [0153] (5)
[0154] [0154] (6)
[0155] [0155] where,
[0156] [0156]
[0157] [0157] In order to avoid division by zero or very small values, the regularization parameters r and m are introduced in equations (2), (3).
[0158] [0158] r = 500 · 4d - 8 (8)
[0159] [0159] m = 700 · 4d - 8 (9)
[0160] [0160] Here d is the internal bit depth of the input video.
[0161] [0161] In some cases, the MV BIO regiment may be unreliable due to noise or irregular movement. Therefore, in BIO, the magnitude of the MV regiment is cut to a certain thBIO limit. The threshold value is determined based on whether all reference images in the current image are all in one direction. If all reference images of the current images in the current image are from one direction, the threshold value will be set to 12 x 214-d, otherwise it will be set to 12 x 213-d.
[0162] [0162] Gradients for BIO are calculated at the same time with motion compensation interpolation using operations consistent with the HEVC motion compensation process (2D separable FIR). The entry for this separable FIR in 2D is the same reference frame sample as for the motion compensation and fractional position process (fracX, fracY) according to the fractional part of the motion vector of the block. In the case of the first horizontal gradient signal interpolated vertically using the BIOfilterS corresponding to the fractional position fracY with displacement displacement d-8, the gradient filter BIOfilterG is applied in the horizontal direction that corresponds to the fractional position fractional displacement with 18- d. In the case of the first vertical gradient gradient filter applied vertically using the BIOfilterG corresponds to the fractional position of de-scaling displacement d-8, the signal substitution is performed using the BIOfilterS in the horizontal direction corresponding to the fractional position with displacement 18-d de-escalation. The length of the interpolation filter for calculating BIOfilterG gradients and replacing the BIOfilterF signal is shorter (6 strokes), in order to maintain reasonable complexity. Table 1 shows the filters used to calculate gradients for fractional positions other than the BIO block motion vector. Table 2 shows the interpolation filters used to generate the prediction signal in BIO.
[0163] [0163] Figure 14 shows an example of the gradient calculation for an 8x4 block. For an 8x4 block, an encoder (such as video encoder 200 or video decoder 300) needs to search for motion compensation predictors and calculate the HOR / VER gradients of all pixels within the current block, as well like the two outer lines of pixels, because the resolution vx and vy for each pixel requires the HOR / VER gradient values and the compensated predictors of pixel movement within the window janela centered on each pixel, as shown in equation (4). In JEM, the size of this window is set to 5x5. The encoder, therefore, needs to search for compensated motion predictors and calculate the gradients for the two outer lines of pixels.
[0164] [0164] In JEM, BIO is applied to all bidirectional predicted blocks when the two predictions are from different reference images. When LIC (local lighting compensation) is enabled for a CU, BIO is disabled.
[0165] [0165] Next, the overlapping block movement compensation (OBMC) in JEM is described. Overlapping Block Movement Compensation (OBMC) was used by the first generations of video standards, for example, as in H.263. In JEM, OBMC is performed for all Motion Compensated (MC) block limits, except the right and lower limits of a CU. Furthermore, OBMC is applied to both luma and chroma components. In JEM, an MC block corresponds to an encoding block. When a CU is coded with the sub-CU mode (which includes a mix of sub-CU, Afine and FRUC mode, described above), each sub-block of the CU is an MC block. To process the CU limits evenly, OBMC is performed at the sub-block level for all MC block limits, where the sub-block size is set to 4x4, as shown in Figures 15A and 15B.
[0166] [0166] When the OBMC applies to the current sub-block, in addition to the current motion vectors, the motion vectors of four connected neighboring sub-blocks, if available and not identical to the current motion vector, are also used to derive the prediction block for the current sub-block. These multiple prediction blocks based on multiple motion vectors are combined to generate the final prediction signal for the current sub-block.
[0167] [0167] As shown in Figures 16A-16D, a prediction block based on motion vectors of a neighboring sub-block is denoted as PN, with N indicating an index for the neighboring sub-blocks above, below, left and right a prediction block based on motion vectors of the current sub-block is indicated as PC. When PN is based on the movement information of a neighboring sub-block that contains the same movement information as the current sub-block, OBMC is not performed from PN. Otherwise, each PN pixel is added to the same pixel on PC, that is, four PN rows / columns are added to
[0168] [0168] In JEM, for a CU with size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether the OBMC is applied or not for the current CU. For CUs with a size greater than 256 luma samples or not encoded in AMVP mode, OBMC is applied by default. In video encoder 200, when OBMC is applied to a CU, its impact is taken into account during the motion estimation stage. The prediction signal using movement information from the upper neighboring block and the left neighboring block is used to compensate for the upper and left limits of the original current CU signal, and then the normal motion estimation process is applied.
[0169] [0169] There may be some technical problems with all the new coding tools in JEM, such as when used with IBC. Addressing these technical problems with technical solutions that are made for IBC operation can result in better video encoding and decoding, such as where the operation of video encoder 200 and video decoder 300 can be better. As an example, the IBC in SCC HEVC 5CC used for one luma block and two corresponding chroma blocks, together with the same motion vector (with potential scaling for chroma components). How to indicate / signal motion vectors for the IBC under decoupled partition trees is a technical problem related to video encoding, and specifically to IBC video encoding. As another example, several new movement-related tools have been studied. How to combine them with IBC is a technical problem related to video encoding, and specifically to IBC video encoding.
[0170] [0170] The following are examples of technical solutions to technical problems. These solutions can be used together, separately or in any combination of them.
[0171] [0171] When the partition tree is decoupled for different color components, the IBC mode signaling and / or IBC motion vectors for a block can be applied to only one color component (such as, for example, light component) . Alternatively, in addition, the block of a component (such as, for example, Cb or Cr) that is encoded after a precoded component (such as, for example, Luma) always inherits the use of the IBC mode from a corresponding block of that component pre-coded.
[0172] [0172] In one example, the DM (derived) mode flag or index can be used, as described in US Patent Application No. 15 / 676,314 (USAN Patent Publication 2018/0048889) and No. 15 / 676,345 (U.S. Patent Publication 2018- 0063553). If the corresponding luma block identified by the flag or DM index is encoded with the IBC, the current chroma block will be set to be encoded with the IBC. In one example, the “matching” block refers to any predefined mapping. For example, the corresponding block is the 4x4 block in the center of the current block, such as C3 in Figure 17A. Alternatively, blocks located in other positions (such as, for example, C0, Cl, C2, BR, BL, TL, TR), as shown in Figure 17A, can be used.
[0173] [0173] When the inherited IBC mode is activated for a block, the movement information of the current entire block can be inherited or derived from a corresponding block from the region of the corresponding pre-coded component (such as the luma component) or derived of predefined motion vectors.
[0174] [0174] When the inherited IBC mode is activated for a block, the movement information of the sub-blocks within the current entire block can be inherited or derived from multiple corresponding blocks within the region of the pre-coded component or derived from of predefined motion vectors. For example, as described in more detail below, the precoded component is assumed to be the luma component, the Cb block indicated as the shaded block 706 in Figure 17B can inherit motion and mode information from the corresponding luma partitions. If the corresponding luma block (such as the partition covering TL and the partition covering C2) is encoded with the IBC, the associated motion vectors can be further scaled accordingly. An example is shown in Figures 18A and 18B.
[0175] [0175] For example, Figure 17A shows luma 700 samples, of a luma component, which are partitioned according to a luma QTBT structure, which is similar to the partition tree shown in Figure 2A. For example, the partitioning of luma 700 samples in Figure 17A is the same as the partitioning shown in Figure 2B. Partitioning of luma 700 samples based on a first partition tree generates the luma 701 partition (shown in the bold box) which includes a plurality of blocks, including the luma 704A block and the luma 704B block as two examples. The luma block 704A can be predicted in the IBC prediction mode and has a block vector of <-8, -6>, and the luma block 704B can be predicted in the IBC prediction mode and has a block vector of <-12 , -4>.
[0176] [0176] Figure 17B shows chroma 702A samples of a chroma component that are partitioned according to a chroma QTBT structure. The chroma 702A samples and the luma 700 samples correspond to each other and are part of the same CU. In the example shown, chroma 702A samples are partitioned based on a second partition tree that is different from the first partition tree used to partition luma 700 samples. Partitioning chroma 702A samples according to the second partition tree results in the block chroma 706.
[0177] [0177] In the example of Figures 17A and 17B, the chroma block 706 corresponds to the luma partition 701. However, the luma partition 701 includes two blocks predicted using the IBC prediction mode: block 704A and block 704B. Therefore, it may not be clear which block vector (such as, for example, the block vector of block 704A or the block vector of block 704B) the chroma block 706 should inherit.
[0178] [0178] Figure 18A is a reproduction of Figure 17A. According to the techniques described in this disclosure and as shown in Figure 18B, video encoder 200 and video decoder 300 can partition the chroma block 706 based on the partition tree used for the luma 700 samples, as shown with the samples chroma 702B of a chroma component. For example, the chroma block 706 of Figure 17B is the chroma partition 710 in Figure 18B, where the chroma partition 710 is partitioned in the same way as the luma partition 701.
[0179] [0179] Video encoder 200 and video decoder 300 can partition the chroma block 706 to generate sub-blocks 708A and 708B. In the example shown in Figure 18B, sub-block 708A corresponds to luma block 704A and sub-block 708B corresponds to luma block 704B. For example, the chroma sub-block 708A has the same shape as the luma block 704A and the chroma sub-block 708A is in the same relative position within the chroma partition 710 in which the luma block 704A is inside the luma partition 701. In this case , there is a one-to-one correspondence between the chroma subblock 708A and the luma block 704A. In addition, the chroma subblock 708B has the same shape as the luma block 704B and the chroma subblock 708B is in the same relative position within the chroma partition 710 in which the luma block 704B is inside the luma partition 701. In this case, there is a one-to-one correspondence between the chroma subblock 708B and the luma block 704B.
[0180] [0180] In one or more examples, video encoder 200 and video decoder 300 can assign the chroma subblock 708A to the block vector of the luma block 704A (such as, for example, the subblock 708A inherits the vector block of luma block 704A). Video encoder 200 and video decoder 300 can assign block vector 708B to block luma 708A (such as, for example, subblock 708B inherits block vector from luma block 704B). In addition, video encoder 200 and video decoder 300 can scale the block vectors. For example, the block vector for luma block 704A is <-8, -6> and the block vector for luma block 704B is <- 12, -4>. In this example, video encoder 200 and video decoder 300 can divide component x and component y by two because luma 700 samples are 4x the size of chroma 702A or 702B samples (such as the sub-sample format 4: 2: 2). As shown, the block vector of the chroma subblock 708A is <-4, -3>, where -4 is -8 divided by 2 and -3 is -6 divided by 2. The block vector of subblock 700B is <-6, -2>, where -6 is -12 divided by and -2 is -4 divided by 2.
[0181] [0181] In this way, a video encoder (such as video encoder 200 or video decoder 300) can be configured to partition samples from a first color component (such as luma 700 samples from a component luma) according to a first partition tree (like,
[0182] [0182] The video encoder can determine a plurality of blocks of the first color component that corresponds to a block of the second color component (as, for example, the blocks of partition 701 correspond to the chroma block 706). The plurality of blocks of the first color component is generated from the partitioning of samples of the first color component according to the first partition tree (as, for example, the luma 700 samples of a luma component are partitioned to generate the partition luma 701 including the plurality of blocks) and the block of the second color component (such as, for example, block 706) is generated from the partitioning of samples from the second color component according to the second partition tree.
[0183] [0183] In one or more examples described in this disclosure, the video encoder can partition the block of the second color component based on the first partition tree to generate sub-blocks of the second color component in which each corresponds to a block the plurality of blocks of the first color component. For example, as shown in Figure 18B, the chroma partition 710, which corresponds to the chroma block 706, is partitioned based on
[0184] [0184] The video encoder can determine one or more block vectors for one or more of the subblocks of the second color component that are predicted in the intra-block copy prediction (IBC) mode based on one or more vectors of one or more corresponding blocks of the plurality of blocks of the first color component. For example, the chroma subblocks 708A and 708B are predicted in the IBC prediction mode, and the video encoder can determine the block vectors for the chroma subblocks 708A and 708B based on the block vectors of the luma blocks 704A and 704B.
[0185] [0185] The video encoder can encode the block of the second component of color based on one or more determined block vectors. For example, where encoding is encoding, for chroma subblock 708A, video encoder 200 can determine a prediction block based on the block vector of subblock 708A, subtract the prediction block from the subblock block 708A to generate a residual block and signal information indicative of the residual block. Where encoding is decoding, for chroma subblock 708A, video decoder 300 can determine a prediction block based on the block vector of subblock 708A, determine a residual block (for example, based on signaled information) for sub-block 708A, and add the residual block to the prediction block to reconstruct sub-block 708 of chroma block 706.
[0186] [0186] Alternatively, for some sub-blocks (set 0), the movement information can be inherited, while for other sub-blocks (set 1), the movement information can be derived from set 0 or derived from of predefined motion vectors. In an example, in addition, a mode index to indicate the use of the inherited IBC mode for each or all of the remaining color components is flagged. For each color component, or for luma and chroma components, respectively, a flag can be flagged in SPS / VPS / PPS / Slice header / Tile header to indicate whether the IBC can be enabled or disabled.
[0187] [0187] The following describes the intra-block copy interactions with the OBMC. In one example, OBMC can always be disabled. Therefore, there is no need to flag an OBMC flag when the block is encoded with IBC. Alternatively, the OBMC flag can be flagged before the IBC mode indication (such as, for example, a reference index pointing to the current image; or a mode index representing the IBC mode); in this case, the IBC indication signal is ignored.
[0188] [0188] In one example, OBMC can also be activated, but with the restriction of the neighboring block to be encoded in the same encoding mode. For example, if a current block is encoded with the IBC mode and there is one of the neighboring blocks of the current block, but the neighboring block is not encoded with the IBC mode, that neighboring block is configured as unavailable. That is, the motion parameters are not used in the OBMC process for the current block. For example, if the current block is encoded with non-IBC mode and one of the neighboring blocks is existing, but encoded with IBC mode, that neighboring block will be configured as unavailable. That is, the motion parameters are not used in the OBMC process for the current block.
[0189] [0189] In one example, OBMC can also be activated and all the movement information of the existing neighboring blocks can be used regardless of the encoded mode of the neighboring blocks. Alternatively, in addition, the weighting factors may additionally depend on the coded mode of the current block. Alternatively, in addition, the weighting factors may additionally depend on whether the current block and the neighboring block share the same coded mode.
[0190] [0190] Sub-blocks with intra-block copy are described below. Sub-block IBC can be applied when a block is encoded using the IBC mode; however, for each sub-block within the block, the motion vectors (such as block vectors) may be different. For example, the block vector for sub-block 708A and the block vector for sub-block 708B may be different.
[0191] [0191] In one example, a base motion vector can be signaled. The motion vector of the subblock can depend on the base motion vector. In one example, the base motion vector can be signaled as the difference between the optimal motion vector minus a motion vector predictor. The motion vector predictor can be derived from motion vectors of neighboring spatial and / or temporal blocks. In one example, the base motion vector can be a predictor for signaling the actual motion vector associated with a sub-block. In one example, the base motion vector can be the motion vector associated with one of the multiple sub-blocks. Therefore, there is no longer a need to signal the motion vector to the sub-block. In one example, the base motion vector can be used as a starting point and the motion vector of the subblock can be derived (or refined) accordingly (as, for example, using the model matching method) .
[0192] [0192] In some examples, slice level indices can be flagged to indicate the accuracy of the motion vectors for the IBC encoded blocks. Alternatively, the index can be flagged at block level / region level / tile / PPS / SPS / VPS. In one example, candidates for MV accuracy may include, for example, integer pel, quad-pel, semi-pel, two-pel, four-pel. In one example, the set of MV precision candidates may depend on encoded information, such as slice type, time layer index, range of motion vector.
[0193] [0193] Next, the interaction of the intra-block copy with the related movement is described. In one example, the IBC mode is not signaled when the block is coded with the related mode. In one example, the affine mode is not signaled when the block is encoded with the IBC mode. For example, the video encoder can determine that a block is encoded in the IBC prediction mode. The video encoder can at least avoid signaling or parse information indicating whether the affine mode is enabled for the block based on the determination that the block is encoded in the IBC prediction mode. For example, video encoder 200 may not signal information indicating whether affine mode is enabled for the block, and video decoder 300 may not parse information indicating whether affine mode is enabled for the block.
[0194] [0194] The movement associated with the IBC coded blocks can be treated as conventional translational movement. Therefore, the exemplary techniques described in U.S. Order Serial No. 62 / 586,117, filed on November 14, 2017, and U.S. Order Serial No. 16 / 188,774, filed on November 13, 2018 may still work. Alternatively, a third category of movement is defined (in addition to conventional and related and existing translational movement). And for the mixing mode, if the decoded mixing index indicates the use of the IBC mode, only the movement information belonging to the third category from neighboring blocks can be added to the list of mixing candidates.
[0195] [0195] In some examples, it may be required to prohibit the taking of the IBC motion vector as the starting point for the ATMVP derivation process. In one example, the image from which the movement is obtained cannot be the current image. In one example, as in the current ATMVP design, the starting point for seeking movement information is from the first available spatial mix candidate. With the proposed method, the first available non-IBC movement is used as a starting point. For example, if the first available spatial mix candidate is associated with the IBC motion information and the second is associated with conventional translational motion, the second available spatial mix candidate will be used as a starting point. If one of the sub-blocks where the movement is sought is coded with the IBC mode, it can be treated as an intra mode in the ATMVP process. In this case, for example, a predefined movement can be treated as the movement of the sub-block. Alternatively, the movement from a neighboring sub-block can be used as the movement of the sub-block.
[0196] [0196] As an example, a video encoder can determine that a block of the first color component or the second color component is predicted using ATMVP, determine one or more blocks in a reference image used to perform ATMVP on the block, determine that at least one block of one or more blocks in the reference image is predicted in the IBC prediction mode and perform an ATMVP operation for the block without using a block vector used for at least one block in the reference image predicted in the prediction mode IBC.
[0197] [0197] The interaction of the intra-block copy with the illumination compensation (IC) is described below. In one example, the IBC mode is not signaled when the block is coded with the IC mode. In one example, the IC mode is not signaled when the block is coded with the IBC mode.
[0198] [0198] For example, the video encoder can determine that a block is encoded in the IBC prediction mode and at least avoid the signaling or parse of information indicating whether the lighting compensation (IC) mode is enabled for the block with based on the determination that the block is encoded in the IBC prediction mode. For example, video encoder 200 may not signal information indicative of whether IC mode is enabled for the second block, and video decoder 300 may not parse indicative information if IC mode is enabled for the second block.
[0199] [0199] In one example, the IC mode can be applied to an IBC-encoded block. In this case, the neighboring samples of the reference block are in the same image as the current block.
[0200] [0200] In the following, the interaction of intra-block copy with Adaptive Motion Vector Precision is described. In one example, the IBC mode is signaled only when the block is encoded with a predefined subset of MV precision, such as the precision of movement that is in one or multiple integer pixel scalings. In one example, a precision subset MV is signaled when the block is coded with IBC mode. In one example, the MV precision is not signaled when the block is coded with IBC mode. Precision can be set at a higher level, such as in a slice header.
[0201] [0201] For example, the video encoder can determine that a first block of the first color component or the second color component is not predicted in the IBC prediction mode and determine a first set of motion vector precision for the first block . The video encoder can determine that a second block of the first color component or the second color component is predicted in the IBC prediction mode and determine a second set of motion vector precision for the second block. The second set of motion vector precision is a subset of the first set of motion vector precision.
[0202] [0202] The interaction of the intra-block copy with the Bidirectional Optical Flow (BIO) is described below. In one example, the BIO is not driven if at least one of the two motion vectors refers to the current image, that is, at least one of the two motion vectors is a block vector in the IBC. For example, the video encoder can determine that a block is encoded with a vector that refers to the image that includes the block. In this example, the video encoder can avoid performing BIO on the block based on the block that is encoded with a vector that refers to the image that includes the block.
[0203] [0203] In another example, the BIO is driven if at least one of the two motion vectors refers to the current image, that is, at least one of the two motion vectors is a block vector in the IBC. In this case, the difference in POC between the current framework and the reference framework for the IBC, which is also in fact the current framework, is equal to zero. A fixed number not equal to 0 can be used to replace the POC difference in the derivation, such as Eq (3) - Eq (7).
[0204] [0204] The interaction of the intra-block copy with the Frame Rate Upward Conversion (FRUC) is described below. In an example, model matching cannot be conducted if the generated motion vector refers to the current image. In one example, model matching is conducted with the reference image identical to the current image if the generated motion vector refers to the current image. In one example, bi-literal correspondence cannot be conducted if at least one of the two motion vectors refers to the current image, that is, at least one of the two motion vectors is a block vector in the IBC. For example, the video encoder can determine that a block is encoded with a vector that refers to the image that includes the block. The video encoder can avoid bi-literal matching on the block based on the block that is encoded with a vector that refers to the image that includes the block.
[0205] [0205] Figure 3 is a block diagram showing an example of video encoder 200 that can perform the techniques of this disclosure. Figure 3 is presented for explanatory purposes and should not be considered a limitation of the techniques, as amply exemplified and described in this disclosure. For explanatory purposes, this disclosure describes video encoder 200 in the context of video encoding standards, such as the HEVC video encoding standard and the H.266 video encoding standard under development (such as VCC) . However, the techniques of this disclosure are not limited to these video encoding standards and are generally applicable to video encoding and decoding.
[0206] [0206] In the example of Figure 3, video encoder 200 includes video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, quantization unit reverse 210, reverse transform processing unit 212, reconstruction unit 214, filter unit 216, decoded image store (DPB) 218 and entropy coding unit 220.
[0207] [0207] Video data memory 230 can store video data to be encoded by components of video encoder 200. Video encoder 200 can receive video data stored in video data memory 230 from, for example example, video source 104 (Figure 1). DPB 218 can act as a reference image memory that stores reference video data for use in predicting subsequent video data by video encoder 200. Video data memory 230 and DPB 218 can be formed by any one of a variety of memory devices, such as dynamic random access memory (DRAM), which includes synchronous DRAM (SDRAM), magneto-resistive RAM (MRAM), resistive RAM (RRAM) or other types of memory devices. Video data memory 230 and DPB 218 can be provided by the same memory device or separate memory devices. In several examples, the video data memory 230 may be chip-embedded with other components of the video encoder 200, as shown, or not chip-embedded in relation to those components.
[0208] [0208] In this disclosure, the reference to video data memory 230 should not be interpreted as being limited to the internal memory of the video encoder 200, unless specifically described as such, or to the external memory of the video encoder 200, the unless specifically described as such. Instead, reference to video data memory 230 should be understood as reference memory that stores the video data that video encoder 200 receives for encoding (such as, for example, video data for a current block that must be coded). The memory 106 of Figure 1 can also provide temporary storage of outputs of several units of the video encoder 200.
[0209] [0209] The various units in Figure 3 are shown to assist with understanding the operations performed by video encoder 200. The units can be implemented as fixed-function circuits, programmable circuits or a combination of them. Fixed function circuits refer to circuits that provide specific functionality and are preconfigured in the operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, programmable circuits can run software or firmware that makes programmable circuits work in the manner defined by the instructions in the software or firmware. Fixed-function circuits can execute software instructions (such as, for example, to receive parameters or output parameters), but the types of operations that fixed-function circuits perform are generally immutable. In some examples, the one or more units may be separate circuit blocks (fixed or programmable function) and, in some examples, one or more units may be integrated circuits.
[0210] [0210] Video encoder 200 may include arithmetic logic units (ALUs), elementary function units (EFUs), digital circuits, analog circuits and / or programmable cores, formed from programmable circuits. In examples where video encoder 200 operations are performed using software executed by programmable circuits, memory 106 (Figure 1) can store the object code of the software that video encoder 200 receives and executes, or other memory in video encoder 200 (not shown) you can store such instructions.
[0211] [0211] The video data memory 230 is configured to store received video data. The video encoder 200 can retrieve an image from the video data of the video data memory 230 and supply the video data to the residual generation unit 204 and mode selection unit 202. The video data in the memory of video data 230 can be raw video data that must be encoded.
[0212] [0212] The mode selection unit 202 includes a motion estimation unit 222, a motion compensation unit 224 and an intra prediction unit
[0213] [0213] The mode selection unit 202 generally coordinates multiple encoding passes to test combinations of encoding parameters and resulting rate distortion values for such combinations. Encoding parameters can include partitioning of CTUs into CUs, prediction modes for CUs, transform types for residual data from CUs, quantization parameters for residual data from CUs, and so on. The mode selection unit 202 can finally select the combination of encoding parameters that have rate distortion values that are better than the other tested combinations.
[0214] [0214] Video encoder 200 can partition an image retrieved from video data memory 230 into a series of CTUs and encapsulate one or more CTUs within a slice. The mode selection unit 202 can partition a CTU of the image according to a tree structure, such as the QTBT structure or the HEVC quad-tree structure described above. As described above, video encoder 200 can form one or more CUs from partitioning a CTU according to the tree structure. This CU can also be referred to generally as a “video block” or “block”.
[0215] [0215] In general, the mode selection unit 202 also controls its components (such as motion estimation unit 222, motion compensation unit 224 and prediction unit 226) to generate a prediction block for a current block (such as a current CU or HEVC, the overlapping part of a PU and a TU). For inter-prediction of a current block, the motion estimation unit 222 can perform a motion search to identify one or more reference blocks that closely match one or more reference images (such as one or more previously encoded images stored in DPB 218). In particular, the motion estimation unit 222 can calculate a value representative of how similar a potential reference block is with the current block, for example, according to the sum of the absolute difference (SAD), the sum of the differences over squared (SSD), the mean absolute difference (MAD), the mean squared difference (MSD) or similar. The motion estimation unit 222 can generally perform these calculations using sample by sample differences between the current block and the reference block that is considered. The motion estimation unit 222 can identify a reference block that has a lower value resulting from these calculations, indicating a reference block that most closely matches the current block.
[0216] [0216] The motion estimation unit 222 can form one or more motion vectors (MVs) that define the positions of the reference blocks in the reference images in relation to the position of the current block in a current image. The motion estimation unit 222 can then supply the motion vectors to the motion compensation unit 224. For example, for unidirectional inter-prediction, the motion estimation unit 222 can provide a single motion vector, while for the bidirectional inter-prediction, the motion estimation unit 222 can provide two motion vectors. The motion compensation unit 224 can then generate a prediction block using the motion vectors. For example, the motion compensation unit 224 can retrieve data from the reference block using the motion vector. As another example, if the motion vector has fractional sample precision, the motion compensation unit 224 can interpolate values for the prediction block according to one or more interpolation filters. In addition, for bidirectional inter-prediction, the motion compensation unit 224 can retrieve data for two reference blocks identified by the respective motion vectors and combine the recovered data, for example, through the sample mean by sample or mean weighted.
[0217] [0217] As another example, for intra-prediction, or intra-prediction coding, the intra-prediction unit 226 can generate the prediction block from samples neighboring the current block. For example, for directional modes, the intra-prediction unit 226 can generally mathematically combine values from neighboring samples and fill those calculated values in the direction defined through the current block to produce the prediction block. As another example, for DC mode, the intra-prediction unit 226 can calculate an average of neighboring samples in the current block and generate the prediction block to include that resulting average for each sample in the prediction block.
[0218] [0218] The mode selection unit 202 provides the prediction block for the residual generation unit 204. The residual generation unit 204 receives a raw, uncoded version of the current block from the video data memory 230 and the prediction block from mode selection unit 202. Residual generation unit 204 calculates sample differences per sample between the current block and the prediction block. The resulting sample-by-sample differences define a residual block for the current block. In some examples, the residual generation unit 204 can also determine differences between the sample values in the residual block to generate a residual block using residual differential pulse code (RDPCM) modulation. In some examples, the residual generation unit 204 can be formed using one or more subtractor circuits that perform binary subtraction.
[0219] [0219] In examples where the 202 mode selection unit partitions CUs into PUs, each PU can be associated with a luma prediction unit and the corresponding chroma prediction units. Video encoder 200 and video decoder 300 can support PUs that have different sizes. As indicated above, the size of a CU can refer to the size of the CU's luma coding block and the size of a PU can refer to the size of a PU luma prediction unit. Assuming the size of a specific CU is 2Nx2N, video encoder 200 can support PU sizes of 2Nx2N or NxN for intra prediction and symmetric PU sizes of 2Nx2N, 2NxN, Nx2N, NxN or similar for inter prediction. Video encoder 20 and video decoder 30 can also support asymmetric partitioning for PU sizes of 2NxnU, 2NxnD, nLx2N and nRx2N for inter prediction.
[0220] [0220] In examples where the mode selection unit 202 does not further partition a CU into PUs, each CU can be associated with a luma coding block and the corresponding chroma coding blocks. As above, the size of a CU can refer to the size of the CU's luma coding block. Video encoder 200 and video decoder 300 can support CU sizes of 2Nx2N, 2NxN or Nx2N.
[0221] [0221] For other video encoding techniques, such as encoding in intra-block copy mode, encoding in affine mode and encoding in linear model mode (LM), such as some examples, 202 mode selection unit, through the respective units associated with the coding techniques, it generates a prediction block for the current block that is encoded. In some examples, such as encoding in the palette mode, the mode selection unit 202 may not generate a prediction block and, instead, generate syntax elements that indicate the way in which the block is reconstructed based on a selected palette. In such modes, the mode selection unit 202 can provide these syntax elements for the entropy coding unit 220 to be coded.
[0222] [0222] As described above, the residual generation unit 204 receives the video data for the current block and the corresponding prediction block. Residual generation unit 204 then generates a residual block for the current block. To generate the residual block, the residual generation unit 204 calculates sample by sample differences between the prediction block and the current block. Like this,
[0223] [0223] Transformer processing unit 206 applies one or more transforms to the residual block to generate a block of transform coefficients (hereinafter referred to as a "transform coefficient block"). The transform processing unit 206 can apply several transforms to a residual block to form the transform coefficient block. For example, transform processing unit 206 can apply a discrete cosine transform (DCT), a directional transform, a Karhunen-Loeve transform (KLT) or a transform conceptually similar to a residual block. In some examples, transform processing unit 206 can effect multiple transforms in a residual block, such as, for example, a primary transform and a secondary transform, such as a rotational transform. In some instances, transform processing unit 206 does not apply transforms to a residual block.
[0224] [0224] The quantization unit 208 can quantize the transform coefficients in a transform coefficient block, to produce a quantized transform coefficient block. The quantization unit 208 can quantize the transform coefficients of a transform coefficient block according to a quantization parameter (QP) value associated with the current block. The video encoder 200 (as, for example, through the mode selection unit 202) can adjust the degree of quantization applied to the coefficient blocks associated with the current block by adjusting the QP value associated with the CU. Quantization can introduce loss of information and thus the quantized transform coefficients may be less accurate than the original transform coefficients produced by the transform processing unit 206.
[0224] [0224] The inverse quantization unit 210 and the inverse transform processing unit 212 can apply inverse quantization and inverse transforms to a quantized transform coefficient block, respectively, to reconstruct a residual block from the transform coefficient block. The reconstruction unit 214 can produce a reconstructed block corresponding to the current block (albeit potentially with some degree of distortion) based on the reconstructed residual block and a prediction block generated by the 202 mode selection unit. reconstruction 214 can add samples from the reconstructed residual block to corresponding samples from the prediction block generated by the mode selection unit 202 to produce the reconstructed block.
[0226] [0226] The filter unit 216 can perform one or more filter operations on reconstructed blocks. For example, filter unit 216 can perform unlocking operations to reduce blocking artifacts along the edges of the CUs. As shown by dashed lines, the operations of the filter unit 216 can be ignored in some examples.
[0227] [0227] Video encoder 200 stores reconstructed blocks in DPB 218. For example, in examples where filter unit 216 operations are not required, reconstruction unit 214 can store reconstructed blocks in DPB 218. In examples where filter unit 216 operations are required, filter unit 216 can store the reconstructed blocks filtered in DPB 218. Motion estimation unit 222 and motion compensation unit 224 can retrieve a reference image from DPB 218 , formed from the reconstructed (and potentially filtered) blocks, to inter-predict blocks of subsequently encoded images. In addition, the intra-prediction unit 226 can use blocks reconstructed in DPB 218 from a current image to intra-predispose other blocks in the current image.
[0228] [0228] In general, entropy coding unit 220 can encode syntax elements received from other functional components of video encoder 200. For example, entropy coding unit 220 can entropy encode blocks of transform coefficients. quantized from quantization unit 208. As another example, entropy coding unit 220 can encode elements of prediction syntax (such as, for example, movement information for inter-prediction or intra-mode information for intra-prediction). ) from the mode selection unit 202. The entropy coding unit 220 can perform one or more entropy coding operations on the syntax elements, which are another example of video data, to generate entropy encoded data. For example, the entropy coding unit 220 can perform a CABAC operation, a context-adaptive variable-size coding operation (CAVLC), a variable-to-variable length coding operation (V2V), a binary arithmetic coding operation adaptive to context based on syntax (SBAC), an entropy coding operation with probability interval partitioning (PIPE), an Exponential Golomb coding operation or other type of entropy coding operation on the data. In some examples, the entropy coding unit 220 may operate in bypass mode, where the syntax elements are not entropy encoded.
[0229] [0229] Video encoder 200 can transmit a bit stream that includes the entropy-encoded syntax elements necessary to reconstruct the blocks of a slice or image. In particular, the entropy coding unit 220 can transmit the bit stream.
[0230] [0230] The operations described above are described with respect to a block. Such description must be understood as being operations for a luma coding block and / or chroma coding blocks. As described above, in some examples, the luma coding block and the chroma coding blocks are luma and chroma components of a CU. In some examples, the luma coding block and the chroma coding blocks are luma and chroma components of a PU.
[0231] [0231] In some examples, operations performed with respect to a luma coding block do not need to be repeated for chroma coding blocks. As an example, the operations to identify a motion vector (MV) and a reference image for a luma coding block do not need to be repeated to identify a MV and a reference image for chroma blocks. Instead, the MV for the luma coding block can be scaled to determine the MV for the chroma blocks, and the reference image can be the same. As another example, the intra-prediction process can be the same for the luma coding blocks and the chroma coding blocks.
[0232] [0232] Video encoder 200 represents an example of a device configured to encode video data, which includes a memory configured to store video data and one or more processing units implemented in circuits and configured to perform exemplary encoding operations described in this disclosure, including the operations for the various interactions between intra-block copying and the different encoding modes described above.
[0233] [0233] Figure 4 is a block diagram showing an example of a video decoder 300 that can perform the techniques of this development. Figure 4 is presented for explanatory purposes and is not limiting of the techniques, as they are widely exemplified and described in this disclosure. For purposes of explanation, this disclosure describes the video decoder 300, which is described according to the techniques of JEM and HEVC. However, the techniques of this disclosure can be performed by video encoding devices configured to other video encoding standards.
[0234] [0234] In the example in Figure 4, video decoder 300 includes encoded image storage (CPB) memory 320, entropy decoding unit 302, prediction processing unit 304, reverse quantization unit 306, processing unit reverse transform 308, reconstruction unit 310, filter unit 312 and decoded image store (DPB) 314. The prediction processing unit 304 includes the motion compensation unit 316 and the intra prediction unit 318. The processing unit prediction 304 may include addition units to effect prediction according to other prediction modes. As examples, the prediction processing unit 304 may include a palette unit, an intra-block copy unit (which may be part of the motion compensation unit 316), an affine unit, a linear model unit (LM) or similar. In other examples, the video decoder 300 may include many, few or different functional components.
[0235] [0235] The CPB 320 memory can store video data, such as an encoded video bit stream, to be decoded by the components of the video decoder 300. The video data stored in the CPB 320 memory can be obtained, for example, from the computer-readable medium 110 (Figure 1). CPB memory 320 may include a CPB that stores encoded video data (such as syntax elements) from an encoded video bit stream. In addition, CPB memory 320 can store video data other than syntax elements of an encoded image, such as temporary data representing outputs from the various video decoder units 300. DPB 314 generally stores decoded images, which the video decoder 300 can send and / or use as reference video data when decoding subsequent data or images from the encoded video bit stream. CPB 320 and DPB 314 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), which includes synchronous DRAM (SDRAM), magneto-resistive RAM (MRAM), resistive RAM (RRAM) or other types of memory devices. The CPB 320 and DPB 314 memory can be provided by the same memory device or separate memory devices. In several examples, the CPB 320 memory may be embedded in a chip with other components of the video decoder 300 or not embedded in relation to those components.
[0236] [0236] Additionally or alternatively, in some examples, the video decoder 300 can retrieve encoded video data from memory 120 (Figure 1). That is, memory 120 can store data as discussed above with CPB memory 320. Similarly, memory 120 can store instructions to be executed by video decoder 300, when some or all of the functionality of video decoder 300 is implemented in software to be executed by the processing circuit of the video decoder 300.
[0237] [0237] The various units shown in Figure 4 are shown to assist with understanding the operations performed by the video decoder 300. The units can be implemented as fixed function circuits, programmable circuits or a combination of them. Similar to Figure 3, fixed-function circuits refer to circuits that provide specific functionality and are preconfigured in the operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, programmable circuits can run software or firmware that makes programmable circuits work in the manner defined by the instructions in the software or firmware. Fixed-function circuits can execute software instructions (such as, for example, to receive parameters or output parameters), but the types of operations that fixed-function circuits perform are generally immutable. In some examples, one or more units may be separate circuit blocks (fixed or programmable) and, in some examples, one or more units may be integrated circuits.
[0238] [0238] The video decoder 300 may include ALUs, EFUs, digital circuits, analog circuits and / or programmable cores formed from programmable circuits. In examples where video decoder operations 300 are performed by software running on programmable circuits, the chip-embedded or non-chip-embedded memory can store instructions (such as object code) from the software that the video decoder 300 receives and executes.
[0239] [0239] The entropy decoding unit 302 can receive encoded video data from the CPB and entropy decode the video data to reproduce syntax elements. The prediction processing unit 304, the reverse quantization unit 306, the reverse transform processing unit 308, the reconstruction unit 310 and the filter unit 312 can generate decoded video data based on the syntax elements extracted from bit stream.
[0240] [0240] In general, the video decoder 300 reconstructs an image on a block-by-block basis. The video decoder 300 can perform a reconstruction operation on each block individually (where the block that is currently reconstructed, i.e., decoded, can be referred to as a "current block").
[0241] [0241] Entropy decoding unit 302 can entropy decode syntax elements that define quantized transform coefficients of a quantized transform coefficient block, as well as transform information, such as a quantization parameter (QP) and / or transform mode indications. The inverse quantization unit 306 can use the QP associated with the quantized transform coefficient block to determine a degree of quantization and, similarly, apply a degree of inverse quantization to the inverse quantization unit 306. The inverse quantization unit 306 can, for example, perform a left shift operation bit by bit to inversely quantize the quantized transform coefficients. The inverse quantization unit 306 can thus form a transform coefficient block that includes transform coefficients.
[0242] [0242] After the inverse quantization unit 306 forms the transform coefficient block, the inverse transform processing unit 308 can apply one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block. For example, the reverse transform processing unit 308 can apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotational transform, an inverse directional transform, or another inverse transform to the block coefficient.
[0243] [0243] In addition, the prediction processing unit 304 generates a prediction block according to the syntax elements of the prediction information that were decoded by entropy by the entropy decoding unit 302. For example, if syntax elements of the information of prediction indicate that the current block is inter-predicted, the motion compensation unit 316 can generate the prediction block. In this case, the syntax elements of the prediction information can indicate a reference image in DPB 314 from which to retrieve a reference block, as well as a motion vector that identifies a location of the reference block in the reference image with respect to to the location of the current tile in the current image. The motion compensation unit 316 can generally effect the inter-prediction process in a manner that is substantially similar to that described with respect to the movement compensation unit 224 (Figure 3).
[0244] [0244] As another example, if the syntax elements of the prediction information indicate that the current block is intra-predicted, the intra-prediction unit 318 can generate the prediction block according to an intra-prediction mode indicated by syntax elements of the prediction information. Again, the prediction unit 318 can generally carry out the prediction process in a manner substantially similar to what has been described with respect to the prediction unit 226 (Figure 3). The intra-prediction unit 318 can retrieve data from neighboring samples for the current block from DPB 314.
[0245] [0245] The reconstruction unit 310 can reconstruct the current block using the prediction block and the residual block. For example, reconstruction unit 310 can add samples from the residual block to the corresponding samples from the prediction block to reconstruct the current block.
[0246] [0246] Filter unit 312 can perform one or more filter operations on reconstructed blocks. For example, filter unit 312 can perform unlocking operations to reduce blocking artifacts along the edges of the reconstructed blocks. As shown by the dashed lines in Figure 4, filter unit 312 operations are not necessarily performed in all examples.
[0247] [0247] The video decoder 300 can store the reconstructed blocks in the DPB 314. As discussed above, the DPB 314 can provide reference information, such as samples of a current image for intra-prediction and previously decoded images for subsequent motion compensation. , for the prediction processing unit 304. In addition, the video decoder 300 can transmit decoded images from the DPB for subsequent display on a display device, such as the display device 118 of Figure
[0248] [0248] In this way, the video decoder 300 represents an example of a video decoding device that includes a memory configured to store video data and one or more processing units implemented in circuits and configured to perform exemplary decoding operations described in this disclosure, including the operations for the various interactions between intra-block copying and the different encoding modes described above.
[0249] [0249] Figure 19 is a flowchart showing an example of a method for encoding video data. For ease of description, exemplary techniques are described with respect to a video encoder (such as, for example, video encoder 200 or video decoder 300) configured to encode (such as, for example, encode or decode). In the example shown in Figure 19, the video encoder can retrieve video data, such as samples of a first color component and a second color component from memory (such as 230 video data memory, DPB 218 or some other video data memory for video encoder 200 or CPB memory 320, DPB 314 or some other video data memory for video decoder 300).
[0250] [0250] Video encoder 200 can partition samples of a first color component according to a first partition tree and partition samples of a second color component according to a second partition tree. The video encoder 200 can signal information from the video decoder 300 indicative of the partition, so that the video decoder 300 can determine which blocks for which the video decoder 300 is receiving information.
[0251] [0251] In the examples described in this disclosure, the second partition tree is different from the first partition tree and the second color component is different from the first color component. The first color component is a luma component and the second color component is a chroma component, or vice versa. As an example, the video encoder can partition samples of the first color component according to the QTBT luma structure, as shown in Figures 2B, 17A and 18A. The video encoder can partition samples of the second color component according to the chroma QTBT structure, as shown in Figure 17B.
[0252] [0252] The video encoder can determine a plurality of blocks of the first color component that corresponds to a block of the second color component (800). As described above, the plurality of blocks of the first color component is generated from the partitioning of samples from the first color component according to a first partition tree (such as, for example, the QTBT luma structure of Figures 2B, 17A and 18A ) and the block of the second color component is generated from the partitioning of samples of the second color component according to a second partition tree (as, for example, the QTBT chroma structure of Figure 17B). The plurality of blocks of the first color component and the block of the second color component can each be part of the same coding block of an image of the video data.
[0253] [0253] For example, to determine the plurality of blocks of the first color component that corresponds to a block of the second color component, the video encoder can determine the location of the plurality of blocks of the first color component and the block of the second color color component, if the sample values of the blocks of the first color component and the sample values of the block of the second color component together form sample values of samples from a coding block and the like. As an example, the luma partition 701 which includes a plurality of luma blocks corresponds to the chroma block 706. The sample values of the luma partition 701 and the chroma 706 block together form sample values from the samples of a CU (such as, for example, a first sample of the luma 701 partition and a first sample of the chroma 706 block together form a first sample of the CU, a second sample of the luma 701 partition and a second sample of the chroma 706 block together form a second sample of the CU, and so on) .
[0254] [0254] The video encoder can partition the block of the second color component based on the first partition tree to generate sub-blocks of the second color component, each corresponding to a block of the plurality of blocks of the first component of color (802).
[0255] [0255] In one or more examples, the video encoder can determine one or more block vectors for one or more of the subblocks of the second color component that are predicted in the intra-block copy prediction (IBC) mode with based on one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component (804). For example, the video encoder may determine that the block of the second color component must inherit one or more block vectors from one or more corresponding blocks of the plurality of blocks of the first color component (as, for example, based on information flagged, such as reference index in a list of reference images).
[0256] [0256] In response to the determination that the block of the second color component inherits one or more block vectors from one or more corresponding blocks of the plurality of blocks of the first color component, the video encoder can determine one or more block vectors for one or more sub-blocks that are predicted in the IBC prediction mode based on one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component. For example, the video encoder can determine that the chroma subblocks 708A and 708B must inherit the block vectors from the luma 704A and 704B blocks, and in response, the video encoder can determine the block vectors for the subblocks. chroma 708A and 708B based on block vectors from luma blocks 704A and 704B.
[0257] [0257] In some examples, to determine one or more block vectors for one or more sub-blocks, the video encoder can be configured to scale one or more block vectors of one or more corresponding blocks from the plurality of blocks from the first color component based on a subsampling format of the first color component and the second color component. For example, in Figures 18A and 18B, the 4: 2: 2 sub-sampling format is used and therefore the video encoder divides the x and y components of the block vectors of the luma blocks 704A and 704B to determine the block vectors for sub-blocks 708A and 708B, respectively.
[0258] [0258] Furthermore, in some examples, at least one of the plurality of block vectors for one of the chroma sub-blocks is different from another of the plurality of block vectors for another of the chroma sub-blocks. For example, the block vector for the chroma sub-block 708A is different from the block vector for the chroma sub-blocks 708B.
[0259] [0259] The video encoder can be configured to encode the block of the second color component based on one or more determined block vectors (806). In the examples where the video encoder is video encoder 200, video encoder 200 can be configured to encode the block of the second color component based on one or more determined block vectors. For example, video encoder 200 can be configured to determine one or more prediction blocks based on one or more predetermined block vectors for one or more sub-blocks, subtract one or more prediction blocks from the respective one or more sub-blocks - blocks to generate one or more residual blocks and signal information indicative of one or more residual blocks.
[0260] [0260] In the examples where the video encoder is video decoder 300, video decoder 300 can be configured to decode the block of the second color component based on one or more determined block vectors. For example, video decoder 300 can be configured to determine one or more prediction blocks based on one or more block vectors determined for one or more sub-blocks, determine one or more residual blocks (such as with based on flagged information) for the one or more sub-blocks and add the one or more residual blocks to the respective or more prediction blocks to reconstruct the block sub-blocks of the second color component.
[0261] [0261] As described above, this disclosure describes techniques for applying the IBC prediction mode with video encoding techniques. For example, the block of the second color component of Figure 19 is assumed to be a first block in a first image. In some examples, the video encoder can be configured to determine that a second block in a second image of the first color component or the second color component is not predicted in the IBC prediction mode, to determine a first set of vector accuracy of motion for the second block, determine that a third block of the first color component or the second color component is predicted in the IBC prediction mode and determine a second set of motion vector precision for the third block, where the second set of motion vector precision is a subset of the first set of motion vector precision.
[0262] [0262] As another example, it is assumed that the block of the second color component of Figure 19 is a first block in a first image. In some examples, the video encoder can be configured to determine that a second block in a second image of the first color component or the second color component is predicted using the alternative time motion vector (ATMVP) prediction, to determine one or more blocks in a reference image used to perform ATMVP in the second block, determine that at least one block of one or more blocks in the reference image is predicted in the IBC prediction mode and perform an ATMVP operation for the second block without using a vector block used by at least one block in the reference image that is predicted in the IBC prediction mode.
[0263] [0263] As another example, it is assumed that the block of the second color component of Figure 19 is a first block in a first image. In some examples, the video encoder can be configured to determine that a second block in a second image is encoded in the IBC prediction mode and at least avoid signaling or parsing of information indicating whether the affine mode is enabled for the second block with based on the determination that the second block is encoded in the IBC prediction mode. For example, video encoder 200 may not signal information indicating that the affine mode is enabled for the second block, and the video decoder 300 may not parse information indicating whether the tuned mode is enabled for the second block.
[0264] [0264] As another example, it is assumed that the block of the second color component of Figure 19 is a first block in a first image. In some examples, the video encoder can be configured to determine that a second block in a second image is encoded in the IBC prediction mode and at least avoid signaling or parse information indicating whether the lighting compensation (IC) mode is activated for the second block based on the determination of whether the second block is encoded in the IBC prediction mode. For example, video encoder 200 may not signal information indicating whether IC mode is enabled for the second block, and video decoder 300 may not parse information indicating whether IC mode is enabled for the second block.
[0265] [0265] As another example, it is assumed that the block of the second color component of Figure 19 is a first block in a first image. In some examples, the video encoder can be configured to determine that a second block in a second image is encoded with a vector that refers to the second image and can avoid the effect of bidirectional optical flow (BIO) in the second block based on the second block that is encoded with the vector that refers to the second image.
[0266] [0266] As another example, it is assumed that the block of the second color component of Figure 19 is a first block in a first image. In some examples, the video encoder can be configured to determine that a second block in a second image is encoded with a vector that refers to the second image and can avoid bilateral matching in the second block based on the second block that is encoded with the vector referring to the second image.
[0267] [0267] It must be recognized that, depending on the example, certain acts or events of any of the techniques described here can be performed in a different sequence, can be added, merged or left out altogether (as, for example, not all the acts or events described are necessary for the practice of the techniques). Furthermore, in certain examples, acts or events can be performed concurrently, such as, for example, through processing with multiple execution flows, processing with interruptions or multiple processors and not sequentially.
[0268] [0268] In one or more examples, the functions described can be implemented in hardware, software,
[0269] [0269] By way of example, and not by way of limitation, such computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices , flash memory or any other means that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. In addition, any connection is appropriately called a computer-readable medium. For example, if instructions are transmitted from a website, server or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (“DSL”), or wireless technologies such as infrared, radio and microwave, then coaxial cable, fiber optic cable, DSL twisted pair or wireless technologies such as infrared, radio and microwave are included in the media definition. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals or other transient means, but instead are directed to tangible non-transitory storage media. Disc (disk and disc, in the original) as used here includes compact disc (“CD”), laser disc, optical disc, digital versatile disc (“DVD”), floppy disc and blu-ray disc, where discs (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of these elements must be included within the range of the computer readable media.
[0270] [0270] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrangements (FPGAs) or other equivalent integrated or discrete logic circuits. Therefore, the term "processor" used here can refer to any of the preceding structures or to any other structure suitable for implementing the techniques described herein. In addition, in some respects, the functionality described here can be presented within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated into a combined CODEC. In addition, the techniques can be fully implemented in one or more circuits or logic elements.
[0271] [0271] The techniques of this disclosure can be implemented on a wide variety of devices or devices, including a wireless telephone device, an integrated circuit (IC) or a set of ICs (a set of chips, for example). Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the techniques disclosed, but do not necessarily require effectuation by different hardware units. Instead, as described above, multiple units can be combined into one codec hardware unit or provided by a collection of interoperating hardware units, which include one or more processors described above, together with software and / or firmware appropriate.
[0272] [0272] Several examples have been described. These and other examples are within the scope of the following claims.

权利要求:
Claims (30)
[1]
1. Method for encoding video data, comprising: determining a plurality of blocks of a first color component that corresponds to a block of a second color component, in which the plurality of blocks of the first color component is generated from the partitioning of samples from the first color component according to a first partition tree, and the block of the second color component is generated from the partitioning of samples from the second color component according to a second partition tree; partitioning the block of the second color component based on the first partition tree to generate sub-blocks of the second color component that each correspond to a block of the plurality of blocks of the first color component; determining one or more block vectors for one or more of the subblocks of the second color component that are predicted in the intra-block copy prediction (IBC) mode based on one or more block vectors of one or more corresponding blocks the plurality of blocks of the first color component; and encoding the block of the second color component based on one or more determined block vectors.
[2]
A method according to claim 1, wherein determining one or more block vectors for one or more sub-blocks comprises determining a plurality of block vectors for a plurality of sub-blocks and wherein at least one of the plurality of block vectors for one of the sub-blocks is different from another of the plurality of block vectors for another of the sub-blocks.
[3]
Method according to claim 1, wherein determining one or more block vectors for one or more sub-blocks comprises staggering one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component based on a subsampling format of the first color component and the second color component.
[4]
A method according to claim 1, wherein the first color component comprises a light component and the second color component comprises a chroma component.
[5]
A method according to claim 1, which further comprises: determining that the block of the second color component inherits one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component, in which to determine one or more block vectors for one or more of the subblocks of the second color component comprises, in response to the determination that the block of the second color component must inherit one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component, determine one or more block vectors for one or more sub-blocks that are predicted in the IBC prediction mode based on one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component.
[6]
A method according to claim 1, wherein the plurality of blocks of the first color component and the block of the second color component are each part of the same image coding block of the video data.
[7]
Method according to claim 1, wherein the block of the second color component comprises a first block in a first image, the method further comprising: determining that a second block in a second image of the first color component or the second color component is not predicted in the IBC prediction mode; determining a first set of motion vector precision for the second block; determining that a third block of the first color component or the second color component is predicted in the IBC prediction mode; and determining a second set of motion vector precision for the third block, where the second set of motion vector precision is a subset of the first set of motion vector precision.
[8]
A method according to claim 1, wherein the block of the second color component comprises a first block in a first image, the method further comprising: determining that a second block in a second image of the first color component or the second color component is predicted using alternative temporal motion vector (ATMVP) prediction; determine one or more blocks in a reference image used to perform ATMVP in the second block; determine that at least one block of one or more blocks in the reference image is predicted in the IBC prediction mode; and performing an ATMVP operation for the second block without using a block vector used for at least one block in the reference image that is predicted in the IBC prediction mode.
[9]
A method according to claim 1, wherein the block of the second color component comprises a first block in a first image, the method further comprising: determining that a second block in a second image is encoded in the IBC prediction mode ; and at least avoid signaling or parsing information indicating whether the affine mode is enabled for the second block based on the determination that the second block is encoded in the IBC prediction mode.
[10]
A method according to claim 1, wherein the block of the second color component comprises a first block in a first image, the method further comprising: determining that a second block in a second image is encoded in the IBC prediction mode ; and at least avoid signaling or parsing information indicating whether the lighting compensation (IC) mode is enabled for the second block based on the determination that the second block is encoded in the IBC prediction mode.
[11]
A method according to claim 1, wherein the block of the second color component comprises a first block in a first image, the method further comprising: determining that a second block in a second image is encoded with a vector that is refers to the second image; and to avoid performing bidirectional optical flow (BIO) in the second block based on the second block that is encoded with the vector that refers to the second image.
[12]
A method according to claim 1, wherein the block of the second color component comprises a first block in a first image, the method further comprising: determining that a second block in a second image is encoded with a vector that is refers to the second image; and avoid bi-literal matching in the second block based on the second block that is encoded with the vector that refers to the second image.
[13]
13. The method of claim 1, wherein encoding the second color component block comprises decoding the second color component block and wherein decoding the second color component block comprises: determining one or more blocks of color. prediction based on one or more block vectors determined for one or more sub-blocks; determining one or more residual blocks for one or more sub-blocks; and adding one or more residual blocks to one or more respective prediction blocks to reconstruct the block sub-blocks of the second color component.
[14]
A method according to claim 1, wherein the coding of the block of the second color component comprises coding of the block of the second color component and wherein the coding of the block of the second color component comprises: determining one or more blocks prediction based on one or more block vectors determined for one or more sub-blocks; subtracting one or more prediction blocks from the respective blocks from one or more sub-blocks to generate one or more residual blocks; and signal information indicative of one or more residual blocks.
[15]
15. Video data encoding device, comprising: a memory configured to store samples of a first color component and samples of a second color component of video data; and a video encoder comprising at least one fixed and programmable function circuit, wherein the video encoder is configured to: determine a plurality of blocks of a first color component that corresponds to a block of a second color component, wherein the plurality of blocks of the first color component is generated from the partitioning of samples of the first color component according to a first partition tree, and the block of the second color component is generated from the partitioning of samples from the second color component according to a second partition tree; partitioning the block of the second color component based on the first partition tree to generate sub-blocks of the second color component that each correspond to a block of the plurality of blocks of the first color component; determining one or more block vectors for one or more of the subblocks of the second color component that are predicted in the intra-block copy prediction (IBC) mode based on one or more block vectors of one or more corresponding blocks the plurality of blocks of the first color component; and encoding the block of the second color component based on one or more determined block vectors.
[16]
A device according to claim 15, wherein to determine one or more block vectors for one or more sub-blocks, the video encoder is configured to determine a plurality of block vectors for a plurality of sub-blocks and wherein at least one of the plurality of block vectors for one of the sub-blocks is different from another of the plurality of block vectors for another of the sub-blocks.
[17]
17. Device according to claim 15, in which to determine one or more block vectors for one or more sub-blocks, the video encoder is configured to scale one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component based on a subsampling format of the first color component and the second color component.
[18]
A device according to claim 15, wherein the first color component comprises a luma component and the second color component comprises a chroma component.
[19]
A device according to claim 15, wherein the video encoder is configured to: determine that the block of the second color component inherits one or more block vectors from one or more corresponding blocks of the plurality of blocks of the first component in which determining one or more block vectors for one or more of the subblocks of the second color component comprises, in response to the determination that the block of the second color component must inherit one or more block vectors of one or more more corresponding blocks of the plurality of blocks of the first color component, determining one or more block vectors for one or more sub-blocks that are predicted in the IBC prediction mode based on one or more block vectors of one or more corresponding blocks the plurality of blocks of the first color component.
[20]
A device according to claim 15, wherein the plurality of blocks of the first color component and the block of the second color component are each part of the same image encoding block of the video data.
[21]
21. The device of claim 15, wherein the block of the second color component comprises a first block and in which the video encoder is configured to:
determining that a second block in a second image of the first color component or the second color component is not predicted in the IBC prediction mode; determining a first set of motion vector precision for the second block; determining that a third block of the first color component or the second color component is predicted in the IBC prediction mode; and determining a second set of motion vector precision for the third block, where the second set of motion vector precision is a subset of the first set of motion vector precision.
[22]
22. The device of claim 15, wherein the block of the second color component comprises a first block and in which the video encoder is configured to: determine that a second block in a second image of the first color component or the second color component is predicted using alternative temporal motion vector (ATMVP) prediction; determine one or more blocks in a reference image used to perform ATMVP in the second block; determine that at least one block of one or more blocks in the reference image is predicted in the IBC prediction mode; and performing an ATMVP operation for the second block without using a block vector used for at least one block in the reference image that is predicted in the IBC prediction mode.
[23]
A device according to claim 15, wherein the block of the second color component comprises a first block in a first image and in which the video encoder is configured to: determine that a second block in a second image is encoded in the IBC prediction mode; and at least avoid signaling or parsing information indicating whether the affine mode is enabled for the second block based on the determination that the second block is encoded in the IBC prediction mode.
[24]
24. The device of claim 15, wherein the block of the second color component comprises a first block in a first image and in which the video encoder is configured to: determine that a second block in a second image is encoded in the IBC prediction mode; and at least avoid signaling or analyzing information indicative of whether the lighting compensation (IC) mode is enabled for the second block based on the determination that the second block is encoded in the IBC prediction mode.
[25]
A device according to claim 15, wherein the block of the second color component comprises a first block in a first image and in which the video encoder is configured to: determine that a second block in a second image is encoded with a vector that refers to the second image; and to avoid performing bidirectional optical flow (BIO) in the second block based on the second block that is encoded with the vector that refers to the second image.
[26]
26. The device of claim 15, wherein the block of the second color component comprises a first block in a first image and in which the video encoder is configured to: determine that a second block in a second image is encoded with a vector that refers to the second image; and avoid bi-literal matching in the second block based on the second block that is encoded with the vector that refers to the second image.
[27]
27. The device of claim 15, wherein the video encoder comprises a video decoder, wherein to encode the block of the second color component, the video decoder is configured to decode the block of the second color component and in which to decode the block of the second color component, the video decoder is configured to: determine one or more prediction blocks based on one or more block vectors determined for one or more sub-blocks; determining one or more residual blocks for one or more sub-blocks; and adding one or more residual blocks to one or more respective prediction blocks to reconstruct the block sub-blocks of the second color component.
[28]
28. The device of claim 15, wherein the video encoder comprises a video encoder, wherein to encode the block of the second color component, the video encoder is configured to encode the block of the second color component and wherein to encode the block of the second color component, the video encoder is configured to: determine one or more prediction blocks based on one or more block vectors determined for one or more sub-blocks; subtracting one or more prediction blocks from the respective blocks from one or more sub-blocks to generate one or more residual blocks; and signal information indicative of one or more residual blocks.
[29]
29. Computer-readable storage medium that stores instructions that, when executed, cause one or more processors of a device to encode video data: determine a plurality of blocks of a first color component that corresponds to a block of a second color component, in which the plurality of blocks of the first color component is generated from the partitioning of samples of the first color component according to a first partition tree, and the block of the second color component is generated from the partitioning of samples from the second color component according to a second partition tree; partition the block of the second color component based on the first partition tree to generate sub-blocks of the second color component that each correspond to a block of the plurality of blocks of the first color component;
determine one or more block vectors for one or more of the subblocks of the second color component that are predicted in the intra-block copy prediction (IBC) mode based on one or more block vectors of one or more corresponding blocks the plurality of blocks of the first color component; and color-code the second component block based on one or more given block vectors.
[30]
30. Device for encoding video data, comprising: means for determining a plurality of blocks of a first color component which corresponds to a block of a second color component, in which the plurality of blocks of the first color component is generated from the partitioning of samples from the first color component according to a first partition tree, and the block from the second color component is generated from the partitioning of samples from the second color component according to a second partition tree; means for partitioning the block of the second color component based on the first partition tree to generate sub-blocks of the second color component that each correspond to a block of the plurality of blocks of the first color component; means for determining one or more block vectors for one or more of the subblocks of the second color component that are predicted in the intra-block copy prediction (IBC) mode based on one or more block vectors of one or more corresponding blocks of the plurality of blocks of the first color component; and means for encoding the block of the second color component based on one or more determined block vectors.

类似技术:

公开号 | 公开日 | 专利标题

BR112020016133A2|2020-12-08|INTRA-BLOCK COPY FOR VIDEO ENCODING

KR102261696B1|2021-06-04|Low-complexity design for FRUC

US20210352312A1|2021-11-11|Partial/full pruning when adding a hmvp candidate to merge/amvp

BR112019013684A2|2020-01-28|motion vector reconstructions for bi-directional | optical flow

WO2018200960A1|2018-11-01|Gradient based matching for motion search and derivation

KR20210024503A|2021-03-05|Reset the lookup table per slice/tile/LCU row

US11245892B2|2022-02-08|Checking order of motion candidates in LUT

BR112021005357A2|2021-06-15|improvements to history-based motion vector predictor

US11153557B2|2021-10-19|Which LUT to be updated or no updating

US10873756B2|2020-12-22|Interaction between LUT and AMVP

BR112021009558A2|2021-08-17|simplification of history-based motion vector prediction

US20210377545A1|2021-12-02|Interaction between lut and shared merge list

WO2019147826A1|2019-08-01|Advanced motion vector prediction speedups for video coding

BR112020006588A2|2020-10-06|affine prediction in video encoding

WO2020003267A1|2020-01-02|Restriction of merge candidates derivation

WO2020065518A1|2020-04-02|Bi-prediction with weights in video coding and decoding

BR112021002967A2|2021-05-11|affine motion prediction

BR112020014522A2|2020-12-08|IMPROVED DERIVATION OF MOTION VECTOR ON THE DECODER SIDE

WO2019164674A1|2019-08-29|Simplified local illumination compensation

CN110719463A|2020-01-21|Extending look-up table based motion vector prediction with temporal information

WO2019136657A1|2019-07-18|Video coding using local illumination compensation

BR112021009732A2|2021-08-17|spatiotemporal motion vector prediction patterns for video encoding

BR112020025982A2|2021-03-23|subpredictable unit movement vector predictor signaling

CN110719465A|2020-01-21|Extending look-up table based motion vector prediction with temporal information

WO2020003271A1|2020-01-02|Interaction between lut and merge: insert hmvp as a merge candidate, position of hmvp

同族专利:

公开号 | 公开日

CN111684806A|2020-09-18|

EP3750317A1|2020-12-16|

WO2019157186A1|2019-08-15|

TW201941605A|2019-10-16|

SG11202006301YA|2020-08-28|

KR20200116462A|2020-10-12|

US11012715B2|2021-05-18|

US20190246143A1|2019-08-08|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US7724827B2|2003-09-07|2010-05-25|Microsoft Corporation|Multi-layer run level encoding and decoding|

CA2924501C|2013-11-27|2021-06-22|Mediatek Singapore Pte. Ltd.|Method of video coding using prediction based on intra picture block copy|

US9883197B2|2014-01-09|2018-01-30|Qualcomm Incorporated|Intra prediction of chroma blocks using the same vector|

US10531116B2|2014-01-09|2020-01-07|Qualcomm Incorporated|Adaptive motion vector resolution signaling for video coding|

US9860559B2|2014-03-17|2018-01-02|Mediatek Singapore Pte. Ltd.|Method of video coding using symmetric intra block copy|

US10432928B2|2014-03-21|2019-10-01|Qualcomm Incorporated|Using a current picture as a reference for video coding|

CN106464905B|2014-05-06|2019-06-07|寰发股份有限公司|The block method for vector prediction encoded for replication mode in block|

US10412387B2|2014-08-22|2019-09-10|Qualcomm Incorporated|Unified intra-block copy and inter-prediction|

JP2017532885A|2014-09-26|2017-11-02|ヴィドスケールインコーポレイテッド|Intra-block copy coding using temporal block vector prediction|

US9918105B2|2014-10-07|2018-03-13|Qualcomm Incorporated|Intra BC and inter unification|

US9591325B2|2015-01-27|2017-03-07|Microsoft Technology Licensing, Llc|Special case handling for merged chroma blocks in intra block copy prediction mode|

US20160360205A1|2015-06-08|2016-12-08|Industrial Technology Research Institute|Video encoding methods and systems using adaptive color transform|

JP6722701B2|2015-06-08|2020-07-15|ヴィドスケールインコーポレイテッド|Intra block copy mode for screen content encoding|

US10326986B2|2016-08-15|2019-06-18|Qualcomm Incorporated|Intra video coding using a decoupled tree structure|

US10368107B2|2016-08-15|2019-07-30|Qualcomm Incorporated|Intra video coding using a decoupled tree structure|

US10979732B2|2016-10-04|2021-04-13|Qualcomm Incorporated|Adaptive motion vector precision for video coding|

EP3813375A1|2017-01-31|2021-04-28|Sharp Kabushiki Kaisha|Systems and methods for partitioning a picture into video blocks for video coding|

US10820017B2|2017-03-15|2020-10-27|Mediatek Inc.|Method and apparatus of video coding|

KR20190137806A|2017-04-13|2019-12-11|파나소닉 인텔렉츄얼 프로퍼티 코포레이션 오브 아메리카|Coding device, decoding device, coding method and decoding method|

US10687071B2|2018-02-05|2020-06-16|Tencent America LLC|Method and apparatus for video coding|EP3435673A4|2016-03-24|2019-12-25|Intellectual Discovery Co., Ltd.|Method and apparatus for encoding/decoding video signal|

US10687071B2|2018-02-05|2020-06-16|Tencent America LLC|Method and apparatus for video coding|

US20190273946A1|2018-03-05|2019-09-05|Markus Helmut Flierl|Methods and Arrangements for Sub-Pel Motion-Adaptive Image Processing|

US10873748B2|2018-05-12|2020-12-22|Qualcomm Incorporated|Storage of high precision motion vectors in video coding|

WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |

GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|

JP2022500909A|2018-09-19|2022-01-04|北京字節跳動網絡技術有限公司Beijing Bytedance Network Technology Co., Ltd.|Use of syntax for affine mode with adaptive motion vector resolution|

WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|

US11140404B2|2018-10-11|2021-10-05|Tencent America LLC|Method and apparatus for video coding|

WO2020089823A1|2018-10-31|2020-05-07|Beijing Bytedance Network Technology Co., Ltd.|Overlapped block motion compensation with adaptive sub-block size|

JP2022507131A|2018-11-13|2022-01-18|北京字節跳動網絡技術有限公司|Build history-based movement candidate list for intra-block copy|

US11153590B2|2019-01-11|2021-10-19|Tencent America LLC|Method and apparatus for video coding|

KR20210121014A|2019-02-02|2021-10-07|베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드|Buffer initialization for intra block copying in video coding|

WO2020164630A1|2019-02-17|2020-08-20|Beijing Bytedance Network Technology Co., Ltd.|Signaling of intra block copy merge candidates|

WO2020173483A1|2019-02-27|2020-09-03|Beijing Bytedance Network Technology Co., Ltd.|Improvement on adaptive motion vector difference resolution in intra block copy mode|

US11218718B2|2019-08-26|2022-01-04|Tencent America LLC|Adaptive motion vector resolution signaling|

CN113875256A|2019-12-23|2021-12-31|腾讯美国有限责任公司|Method and apparatus for video encoding and decoding|

WO2021155862A1|2020-02-07|2021-08-12|Beijing Bytedance Network Technology Co., Ltd.|Bv list construction process of ibc blocks under merge estimation region|

法律状态:
2021-12-07| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201862628101P| true| 2018-02-08|2018-02-08|

US62/628,101|2018-02-08|

US16/269,349|US11012715B2|2018-02-08|2019-02-06|Intra block copy for video coding|

US16/269,349|2019-02-06|

PCT/US2019/017055|WO2019157186A1|2018-02-08|2019-02-07|Intra block copy for video coding|

[返回顶部]