巴西专利BR112018067482A2 robotic arm claw deep machine learning methods and apparatus

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
deep machine learning methods and apparatus for claw robotic arm methods and deep machine learning apparatus related to the manipulation of an object by a robot end effector. Some implementations relate to training a deep neural network to predict the extent to which candidate motion data for a robot end effector will result in a successful understanding of one or more objects by the end effector. Some implementations are directed to utilizing the deep neural network trained to serve a robot's final grab effect to achieve a successful understanding of an object by the final grab effect. For example, trained deep neural network can be used to iteratively update motion control commands for one or more robot actuators that control the pose of a robot grab end performer and to determine when to generate hold control commands. to make a grab attempt for the final grab effect.
公开号:BR112018067482A2
申请号:R112018067482
申请日:2016-12-13
公开日:2019-01-02
发明作者:Krizhevsky Alex；Pastor Sampedro Peter；levine Sergey
申请人:Google Llc；
IPC主号:

专利说明:

METHODS AND DEPTH OF LEARNING MACHINE FOR ROBOTIC PICKING
BACKGROUND [001] Many robots are programmed to use one or more end actuators to pick up one or more objects. For example, a robot can use a handle end actuator such as an impacting handle or an ingressive handle (for example, that physically penetrates an object using pins, needles, etc.) to remove an object from a first location, move the object to a second location and deposit the object at the second location. Some additional examples of robot end actuators that can pick up objects include astringent end actuators (for example, using suction or vacuum to pick up an object) and one or more contiguous end actuators (for example, using surface tension, freezing or sticker to pick up an object), to name a few.
SUMMARY [002] This descriptive report generally refers to methods and apparatus for deep machine learning related to manipulation of an object by means of a robot end actuator. Some implementations are aimed at training a deep neural network, such as a convolutional neural network (also referred to in this document as a CNN), to predict the likelihood that candidate motion data for a robot end actuator will result in a handle successful completion of one or more objects by the end actuator. For example, some implementations enable
Petition 870190014729, of 02/13/2019, p. 7/72
2/55 apply, as input to a trained deep neural network, at least: (1) a candidate motion vector that defines a candidate motion (if any) from a robot handle end actuator and (2) an image that captures at least a portion of the robot's workspace; and generate, based on the application, at least one measure that directly or indirectly indicates the probability that the candidate motion vector will result in a successful catch. The predicted probability can then be used when triggering performance of catch attempts by a robot having a handle end actuator, thereby improving the robot's ability to successfully catch objects in its environment.
[003] Some implementations are directed to use the deep neural network trained to drive a robot's handle-end actuator to achieve a successful grip of an object through the handle-end actuator. For example, the trained deep neural network can be used to iteratively update motion control commands for one or more robot actuators that control the position of a robot handle end actuator, and to determine when to generate commands handle control to effect a grip attempted by the handle end actuator. In various implementations, using the deep neural network trained to drive the handle tip actuator can enable rapid feedback for robotic disturbances and / or movement of environmental object (s) and / or robustness for robotic drive (s) ) inaccurate.
[004] In some implementations, a method is provided
Petition 870190014729, of 02/13/2019, p. 8/72
3/55 which includes generating a candidate end actuator motion vector that defines motion to move a robot handle end actuator from a current position to an additional position. The method further includes identifying a current image that is captured by a vision sensor associated with the robot and that captures the handle end actuator and at least one object in a robot environment. The method further includes applying the current image and the motion vector of the candidate end actuator as inputs to a trained convolutional neural network and generating, in the trained convolutional neural network, a measure of successful object pickup with application of movement. The measurement is generated based on the application of the image and the vector of movement of the end actuator to the trained convolutional neural network. The method additionally optionally includes generating an end actuator command based on the measurement and providing the end actuator command to one or more robot drivers. The end actuator command can be a handle command or an end actuator motion command.
[005] This method and other technology implementations disclosed in this document may each optionally include one or more of the following features.
[006] In some implementations, the method additionally includes determining a successful current measurement of the object without applying the movement, and generating the end actuator command based on the measurement and the current measurement. In some versions of these implementations, the end actuator command is the handle and
Petition 870190014729, of 02/13/2019, p. 9/72
4/55 generating the pickup command is in response to determining which comparison of the measurement with the current measurement satisfies a threshold. In some other versions of these implementations, the end actuator command is the end actuator motion command and generating the end actuator motion command includes generating the end actuator motion command to match the vector of candidate end actuator movement. Also in other versions of these implementations, the end actuator command is the end actuator motion command and generating the end actuator motion command includes generating the end actuator motion command to effect a path correction for the end actuator. In some implementations, determining the current measure of successful object grab without application of motion includes: applying the image and a motion vector of the zero-end actuator as inputs to the trained convolutional neural network; and to generate, in the trained convolutional neural network, the current measure of successful catching of the object without applying the movement.
[007] In some implementations, the end actuator command is the end actuator motion command and conforms to the candidate end actuator motion vector. In some of these implementations, providing the end actuator motion command for the one or more actuators moves the end actuator to a new position, and the method additionally includes: generating an additional candidate end actuator motion vector defining new motion to move the handle end actuator from the new
Petition 870190014729, of 02/13/2019, p. 10/72
5/55 position for an additional position later; identify a new image captured by a vision sensor associated with the robot, the new image capturing the end actuator in the new position and capturing objects in the environment; apply the new image and the motion vector of additional candidate end actuator as inputs to the trained convolutional neural network; generate, in the trained convolutional neural network, a new measure of successful catching of the object with the application of the new movement, the new measure being generated based on the application of the new image and the additional end actuator movement vector for the convolutional neural network trained; generate a new end actuator command based on the new measurement, the new end actuator command being the handle command or a new end actuator motion command; and provide the new end actuator command for one or more robot drivers.
[008] In some implementations, applying the image and motion vector of candidate end actuator as inputs to the trained convolutional neural network includes: applying the image as input to an initial layer of the trained convolutional neural network; and applying the candidate end actuator motion vector to an additional layer of the trained convolutional neural network. The additional layer may be downstream from the initial layer. In some of these implementations, applying the candidate end actuator motion vector to the additional layer includes: passing the end actuator motion vector through a fully connected layer of the convolutional neural network to generate output motion vector from
Petition 870190014729, of 02/13/2019, p. 11/72
6/55 end actuator; and concatenating the end vector motion actuator output with upstream output. The upstream exit is from a layer immediately upstream of the convolutional neural network which is immediately upstream of the additional layer and which is downstream of the initial layer and one or more intermediate layers of the convolutional neural network. The initial layer can be a convolutional layer and the layer immediately upstream can be a grouping layer.
[009] In some implementations, the method additionally includes identifying an additional image captured by the vision sensor and applying the additional image as an additional input to the trained convolutional neural network. The additional image can capture the one or more environmental objects and omit the robotic end actuator or include the robotic end actuator in a different position than the robotic end actuator in the image. In some of these implementations, applying the image and the additional image to the convolutional neural network includes concatenating the image and the additional image to generate a concatenated image, and applying the concatenated image as input to an initial layer of the convolutional neural network.
[010] In some implementations, generating the candidate end actuator motion vector includes generating a plurality of candidate end actuator motion vectors and performing one or more cross entropy optimization iterations on the plurality of candidate actuator motion vectors candidate end to select the candidate end actuator motion vector from the plurality of actuator motion vectors
Petition 870190014729, of 02/13/2019, p. 12/72
7/55 end candidates.
[Oil] In some implementations, a method is provided that includes identifying a plurality of training examples generated based on sensor output from one or more robots during a plurality of grab attempts by the robots. Each of the training examples including training example input and training example output. The training example entry for each of the training examples includes: an image for a corresponding time instance of a corresponding catch attempt from the catch attempts, the image capturing a robotic end actuator and one or more environmental objects in the instance corresponding time; and an end actuator motion vector defining end actuator motion to move from an end actuator time instance position in the corresponding time instance to an end end actuator position for the corresponding pick attempt. The training example output for each of the training examples includes a catch success tag indicative of the corresponding catch attempt's success. The method additionally includes training the convolutional neural network based on the training examples.
[012] This method and other technology implementations disclosed in this document may each optionally include one or more of the following features.
[013] In some implementations, the training example entry for each of the training examples additionally includes an additional image for the corresponding catch attempt. The additional image can capture the one or
Petition 870190014729, of 02/13/2019, p. 13/72
8/55 more environmental objects and omit the robotic end actuator or include the robotic end actuator in a different position than the robotic end actuator in the image. In some implementations, training the convolutional neural network includes applying, for the convolutional neural network, the training example entry for a given training example from the training examples. In some of these implementations, applying the training example entry to the given training example includes: concatenating the image and the additional image from the given training example to generate a concatenated image; and apply the concatenated image as input to an initial layer of the convolutional neural network.
[014] In some implementations, training the convolutional neural network includes applying, for the convolutional neural network, the training example entry for a given training example of the training examples. In some of these implementations, applying the training example entry to the given training example includes: applying the image of the given training example as input to an initial layer of the convolutional neural network; and applying the end actuator motion vector from the given training example to an additional layer of the convolutional neural network. The additional layer may be downstream from the initial layer. In some of these implementations, applying the end actuator motion vector to the additional layer includes: passing the end actuator motion vector through a fully connected layer to generate end actuator motion vector output and concatenating the output actuator motion vector
Petition 870190014729, of 02/13/2019, p. 14/72
9/55 end with upstream outlet. The upstream exit can be from a layer immediately upstream of the convolutional neural network which is immediately upstream of the additional layer and which is downstream of the initial layer and one or more intermediate layers of the convolutional neural network. The initial layer can be a convolutional layer and the layer immediately upstream can be a grouping layer.
[015] In some implementations, the end actuator motion vector defines end actuator motion in the task space.
[016] In some implementations, training examples include: a first group of training examples generated based on the output of a plurality of first robot sensors from a first robot during a plurality of attempts to pick up by the first robot; and a second group of training examples generated based on the output of a plurality of second robot sensors from a second robot during a plurality of grab attempts by the second robot. In some of these implementations: the first robot sensors include a first vision sensor generating the images for the training examples of the first group; the second robot sensors include a second vision sensor generating the images for the training examples of the second group; and a first position of the first vision sensor in relation to a first base of the first robot is distinct from a second position of the second vision sensor in relation to a second base of the second robot.
[017] In some implementations, each of the catch attempts in which a plurality of training examples
Petition 870190014729, of 02/13/2019, p. 15/72
10/55 is based on includes a plurality of random trigger commands that randomly move the end actuator from an end position of the end actuator to the end position of the end actuator, and then pick up with the end actuator in the end position. In some of these implementations, the method additionally includes: generating additional catch attempts based on the trained convolutional neural network; identify a plurality of additional training examples based on additional catch attempts; and updating the convolutional neural network through additional training from the convolutional network based on additional training examples.
[018] In some implementations, the catch success tag for each of the training examples is either a first success rate or a second failure rate.
[019] In some implementations, training comprises performing back propagation in the convolutional neural network based on the output of the training example from the plurality of training examples.
[020] Other implementations may include a computer-readable non-transitory storage media storing instructions executable by a processor (for example, a central processing unit (CPU) or graphics processing unit (GPU)) to execute a method such as a or more of the methods described above. Also another implementation may include a system of one or more computers and / or one or more robots that include one or more processors operable to execute stored instructions to execute a method such as a
Petition 870190014729, of 02/13/2019, p. 16/72
11/55 or more of the methods described above.
[021] It should be noted that all combinations of the concepts indicated above and additional concepts described in more detail in this document are considered to be part of the subject matter disclosed in this document. For example, all combinations of materials in question claimed appearing at the end of this disclosure are considered to be part of the material in question disclosed in this document.
BRIEF DESCRIPTION OF THE DRAWINGS [022] Figure 1 illustrates an example environment in which grab attempts can be performed by robots, data associated with grab attempts can be used to generate training examples and / or training examples can be used to train a convolutional neural network.
[023] Figure 2 illustrates one of the robots in figure 1 and an example of movement of a robot handle end actuator along a path.
[024] Figure 3 is a flow chart illustrating an example method of executing catch attempts and storing data associated with catch attempts.
[025] Figure 4 is a flowchart illustrating an example method of generating training examples based on data associated with robot grab attempts.
[026] Figure 5 is a flowchart illustrating an example method of training a convolutional neural network based on training examples.
[027] Figures 6A and 6B illustrate an example convolutional neural network architecture.
Petition 870190014729, of 02/13/2019, p. 17/72
12/55 [028] Figure 7A is a flowchart illustrating an example method of using a trained convolutional neural network to drive a handle end actuator.
[029] Figure 7B is a flow chart illustrating some implementations of certain blocks in the flow chart of figure 7A.
[030] Figure 8 schematically represents an example robot architecture.
[031] Figure 9 schematically represents an example architecture of a computer system.
DETAILED DESCRIPTION [032] Some implementations of the technology described in this document are directed to train a deep neural network, such as a CNN, to enable use of the trained deep neural network to predict a measure indicating the probability that candidate motion data for an actuator handle end of a robot will result in a successful grab of one or more objects by the end actuator. In some implementations, the trained deep neural network accepts an image (It) generated by a vision sensor and accepts an end actuator motion vector (vt), such as a task space motion vector. The application of the image (It) and the end actuator motion vector (vt) to the trained deep neural network can be used to generate, in the deep neural network, a predicted measure that executing command (s) to implement the movement defined by motion vector (vt), and to catch subsequently, will produce a successful catch. Some implementations are aimed at using the deep neural network trained to drive a robot handle end actuator to achieve a good handle
Petition 870190014729, of 02/13/2019, p. 18/72
13/55 success of an object by the handle end actuator. Additional description of these and other implementations of the technology is provided below.
[033] With reference to figures 1-6B, several CNN training implementations are described. Figure 1 illustrates an example environment in which catch attempts can be performed by robots (for example, by 180A, 180B and / or other robots), data associated with catch attempts can be used to generate training examples and / or training examples can be used to train a CNN.
[034] Example robots 180A and 180B are illustrated in Figure 1. Robots 180A and 180B are robot arms having multiple degrees of freedom to enable displacement of the handle end actuators 182A and 182B along any one of a plurality of potential paths for positioning the handle end actuators 182A and 182B in desired locations. For example, with reference to figure 2, an example of robot 180A moving its end actuator along a path 201 is illustrated. Figure 2 includes an imaginary and a non-imaginary image of robot 180A showing two different positions from a set of positions reached by robot 180A and its end actuator when moving along path 201. Referring again to figure 1, each of the robots 180A and 180B additionally controls the two opposite jaws of their corresponding handle end actuator 182A, 182B to drive the jaws between at least one open position and one closed position (and / or optionally a plurality of partially closed positions) .
Petition 870190014729, of 02/13/2019, p. 19/72
14/55 [035] Example vision sensors 184A and 184B are also illustrated in figure 1. In figure 1, vision sensor 184A is mounted in a fixed position in relation to the base or other stationary reference point of the 180A robot . The 184B vision sensor is also mounted in a fixed position in relation to the base or other stationary reference point of the 180B robot. As shown in figure 1, the position of the vision sensor 184A in relation to the robot 180A is different from the position of the vision sensor 184B in relation to the robot 180B. As described in this document, in some implementations this can be beneficial in enabling the generation of varied training examples that can be used to train a neural network that is robust and / or independent of camera calibration. The vision sensors 184A and 184B are sensors that can generate images related to shape, color, depth and / or other characteristics of objects that are in the line of sight of the sensors. The vision sensors 184A and 184B can be, for example, monographic cameras, stereographic cameras and / or 3D laser scanner. A 3D laser scanner includes one or more lasers that emit light and one or more sensors that collect data related to reflections of the emitted light. A 3D laser scanner can be, for example, a flight time 3D laser scanner or a triangulation based 3D laser scanner and can include a position sensitive detector (BSD) or other optical position sensor.
[036] The 184A vision sensor has a field of view of at least part of the 180A robot workspace, as does the part of the workspace that includes the 191A sample objects. Although the supporting surface (s) for objects 191A are not illustrated in figure 1, these objects
Petition 870190014729, of 02/13/2019, p. 20/72
15/55 can be supported on a table, a tray and / or other surface (s). 191A objects include a spatula, a stapler and a pencil. In other implementations more objects, fewer objects, additional objects and / or alternative objects can be provided during all or parts of the 180A robot's grab attempts as described in this document. The vision sensor 184B has a field of view of at least a part of the workspace of the robot 180B, as does the part of the workspace that includes example objects 191B. Although the support surface (s) for the objects 191B are not shown in figure 1, they can be supported on a table, a tray and / or other surface (s). 191B objects include a pencil, a stapler and glasses. In other implementations more objects, fewer objects, additional objects and / or alternative objects can be provided during all or parts of the 180B robot's grab attempts as described in this document.
[037] Although the particular robots 180A and 180B are illustrated in figure 1, additional and / or alternative robots can be used, including additional robot arms that are similar to the 180A and 180B robots, robots having other forms of robot arms, robots having a human form, robots having an animal form, robots that move by means of one or more wheels (for example, self-balancing robots), submersible vehicle robots, an unmanned aerial vehicle (UAV) and so on. Also, although particular handle end actuators are illustrated in figure 1, additional and / or alternative end actuators can be used, such as alternative impact end handle actuators (eg
Petition 870190014729, of 02/13/2019, p. 21/72
16/55 (example, those with handle plates, those with more or less fingers / claws), ingressive handle tip actuators, additive handle tip actuators, or contiguous handle tip actuators, or handleless tip actuators. Additionally, although particular assemblies of the vision sensors 184A and 184B are illustrated in figure 1, additional and / or alternative assemblies can be used. For example, in some implementations, vision sensors can be mounted directly on the robots, such as on non-actionable components of the robots or on actionable components of the robots (for example, on the end actuator or on a component close to the end actuator). Also, for example, in some implementations, a vision sensor can be mounted on a non-stationary structure that is separate from its associated robot and / or it can be mounted in a non-stationary mode on a structure that is separate from its associated robot.
[038] Robots 180A, 180B and / or other robots can be used to perform a large number of catch attempts and data associated with the catch attempts can be used by the training example generation system 110 to generate training examples . In some implementations, all or aspects of the training example generation system 110 may be implemented on robot 180A and / or robot 180B (for example, by means of one or more processors from robots 180A and 180B). For example, each of the 180A and 180B robots can include an instance of the 110 training sample generation system. In some implementations, all or aspects of the system
Petition 870190014729, of 02/13/2019, p. 22/72
17/55 generation of training examples 110 can be implemented in one or more computer systems that are separate from robots 180A and 180B, but which are in network communication with them.
[039] Each attempt to catch by the robot 180A, 180B and / or by other robots consists of T separate intervals or instances of time. At each time interval, a current image (//) captured by the robot's vision sensor executing the catch attempt is stored, the current position (p ^l _t ) of the end actuator is also stored, and the robot chooses a path (translational and / or rotational) along which the handle must move. In the final time interval T, the robot activates (for example, closes) the catcher and stores additional data and / or performs one or more additional actions to enable evaluation of the success of the catch. The catch success mechanism 116 of the training example generation system 110 assesses the success of the catch by generating a catch success tag (l [).
[040] Each attempt to catch results in T training examples, represented by (/ £, p ^l T - p ^l t, Zj). That is, each training example includes at least the image observed in that time interval (/ £), the end actuator motion vector (p ^ - p ^l _t ) from the position in that time interval to the position that is reached eventually (the final position of the catch attempt), and the successful catch label (Zj) of the catch attempt. Each end actuator motion vector can be determined by the end actuator motion vector mechanism 114 of the training example generation system 110. For example, the end actuator motion vector mechanism 114 can
Petition 870190014729, of 02/13/2019, p. 23/72
18/55 determine a transformation between the current position and the final position of the grab attempt and use the transformation as the end actuator motion vector. The training examples for the plurality of attempts to pick up a plurality of robots are stored by the training example generation system 110 in the training example database 117.
[041] The data generated by sensor (s) associated with a robot and / or the data derived from the generated data can be stored on one or more non-transitory, computer-readable media local to the robot and / or away from the robot . In some implementations, the current image may include multiple channels, such as a red channel, a blue channel, a green channel and / or a depth channel. Each channel in an image defines a value for each of a plurality of pixels in the image as well as a value from 0 to 255 for each of the pixels in the image. In some implementations, each of the training examples may include the current image and an additional image for the corresponding handle attempt, where the additional image does not include the handle end actuator or includes the end actuator in a different position (for example, example, one that does not overlap with the current image position). For example, the additional image can be captured after any preceding grip attempt, but before the end actuator movement for the grip attempt begins and when the grip end actuator is moved out of the sensor's field of view. eyesight. The current position and the end actuator motion vector from the current position to the final position of the grab attempt can be
Petition 870190014729, of 02/13/2019, p. 24/72
19/55 represented in task space, joint space or another space. For example, the motion vector of the end actuator can be represented by five values in task space: three values defining the three-dimensional (3D) translation vector, and two values representing a sine-cosine encoding of the change in orientation of the end actuator around a geometry axis of the end actuator. In some implementations, the catch success tag is a binary tag, such as a 0 / successful or 1 / unsuccessful tag. In some implementations, the catch success tag can be selected from more than two options, such as 0, 1, and one or more values between 0 and 1. For example, 0 can indicate a confirmed unsuccessful catch, 1 can indicate a confirmed successful catch, 0.25 may indicate a most likely unsuccessful catch and 0.75 may indicate a most likely successful catch.
[042] Training mechanism 120 trains a CNN 125, or other neural network, based on training examples from the training sample database 117. Training CNN 125 may include iteratively updating CNN 125 based on the application of training examples for CNN 125. For example, the current image, the additional image and the vector of the current position for the final attempted take of the training examples can be used as a training example entry; and the handle success tag can be used as a training example output. CNN 125 is trained to predict a measurement indicating the likelihood that as a result of imaging
Petition 870190014729, of 02/13/2019, p. 25/72
20/55 current (and optionally an additional image, such as one that at least partially omits the end actuator), displacing a handle according to a given end actuator motion vector, and subsequently picking it up, will produce a very good handle successful.
[043] Figure 3 is a flow chart illustrating an example method 300 of executing catch attempts and storing data associated with catch attempts. For convenience, flowchart operations are described with reference to a system that performs the operations. This system may include one or more components of a robot, such as a processor and / or robot control system of the 180A, 180B, 840 and / or other robot robot. In addition, although method 300 operations are shown in a particular order, this is not intended to be limiting. One or more operations can be reordered, omitted or added.
[044] In block 352, the system initiates a catch attempt. In block 354, the system stores an image of an environment without an end actuator present in the image. For example, the system can move the handle end actuator out of the field of view of the vision sensor (ie, not obstructing the view of the environment) and capture an image in one instance when the handle end actuator is out the field of view. The image can then be stored and associated with the attempted take.
[045] In block 356, the system determines and implements an end actuator movement. For example, the system can generate one or more motion commands to cause one or more of the triggers that control the position of the end actuator to be triggered, thereby changing
Petition 870190014729, of 02/13/2019, p. 26/72
21/55 the position of the end actuator.
[046] In some implementations and / or iterations of block 356, the motion command (s) may be random within a given space, such as the working space reachable by the end actuator , a restricted space within which the end actuator is confined for attempts to pick up, and / or a space defined by position and / or torque limits of the driver (s) that control (m) the position of the end actuator . For example, before the initial training of a neural network is completed, the motion command (s) generated by the system in block 356 to implement end actuator motion may be random. within a given space. Random as used in this document can include truly random or pseudo-random.
[047] In some implementations, the motion command (s) generated by the system in block 356 to implement end actuator motion may be based at least in part on a current version neural network trained and / or based on other criteria. In some implementations, in the first iteration of block 356 for each grab attempt, the end actuator may be out of position because it has been moved out of sight in block 354. In some of these implementations, before the first iteration from block 356 the end actuator can be moved randomly or otherwise back into position. For example, the end actuator can be moved back to an established start position and / or moved to a randomly selected position within a given space.
Petition 870190014729, of 02/13/2019, p. 27/72
22/55 [048] In block 358, the system stores: (1) an image that captures the end actuator and the environment in the current instance of the catch attempt and (2) the position of the end actuator in the current instance. For example, the system can store a current image generated by a vision sensor associated with the robot and associate the image with the current instance (for example, with a time stamp). Also, for example, the system can determine the current position of the end actuator based on data from one or more robot joint position sensors whose positions affect the robot's position, and the system can store that position. The system can determine and store the position of the end actuator in the task space, joint space or another space.
[049] In block 360, the system determines whether the current instance is the final instance for the catch attempt. In some implementations, the system can increment an instance counter in block 352, 354, 356 or 358 and / or increment a time counter as time passes - and determine whether the current instance is the final instance based on comparing one counter value to a threshold. For example, the counter can be a time counter and the threshold can be 3 seconds, 4 seconds, 5 seconds and / or another value. In some implementations, the threshold can vary between one or more iterations of method 300.
[050] If the system determines in block 360 that the current instance is not the final instance for the catch attempt, the system returns to block 356, where it determines and implements another end actuator movement, and then proceeds to block 358 where he
Petition 870190014729, of 02/13/2019, p. 28/72
23/55 stores an image and the position in the current instance. Through multiple iterations of blocks 356, 358 and 360 for a given grab attempt, the position of the end actuator will be changed through the multiple iterations of block 356, and an image and position stored in each of these instances. In many implementations, blocks 356, 358, 360 and / or other blocks can be executed at a relatively high frequency, thereby storing a relatively large amount of data for each grab attempt.
[051] If the system determines in block 360 that the current instance is the final instance for the catch attempt, the system proceeds to block 362, where it activates the end actuator handle. For example, for an impactor handle end actuator, the system can cause one or more plates, fingers and / or other components to be closed. For example, the system can induce components to close until they are in a fully closed position or until a torque reading measured by the torque sensor (s) associated with the components satisfies a threshold.
[052] In block 364, the system stores additional data and optionally performs one or more additional actions to enable determination of the success of the 360 block handle. In some implementations, the additional data is a position reading, a torque reading and / or other reading of the handle end actuator. For example, a position reading that is greater than a threshold (for example, 1 cm) can indicate a successful catch.
[053] In some implementations, in block 364 the system
Petition 870190014729, of 02/13/2019, p. 29/72
24/55 additionally and / or alternatively: (1) keeps the end actuator in the actuated position (for example, closed) and moves (for example, vertically and / or laterally) the end actuator and any object that may be caught by the end actuator; (2) stores an image that captures the original handle position after the end actuator is moved; (3) induces the end actuator to release any object that is being caught by the end actuator (optionally after moving the handle back close to the original handle position); and (4) stores an image that captures the original grip position after the object (if any) has been released. The system can store the image that captures the original handle position after the end actuator and the object (if any) are moved and can store the image that captures the original handle position after the object (if any) has been released - and associate the images with the attempted take. Comparing the image after the end actuator and the object (if any) is displaced with the image after the object (if any) has been released can indicate whether a grip was successful. For example, an object that appears in one image and does not appear in the other can indicate a successful catch.
[054] In block 366, the system restores the counter (for example, the instance counter and / or the time counter), and proceeds back to block 352 to initiate another attempt to catch.
[055] In some implementations, the method 300 in figure 3 can be implemented in each of a plurality of robots, optionally operating in parallel during one or more (for example, all) of their respective iterations of the
Petition 870190014729, of 02/13/2019, p. 30/72
25/55 method 300. This may enable more catch attempts to be reached in a given period of time than if only one robot were operating method 300. In addition, in implementations where one or more of the plurality of robots includes a vision sensor associated with a position relative to the robot that is unique to the position of one or more vision sensors associated with another of the robots, training examples generated based on attempts to pick up the plurality of robots can provide robustness to the vision sensor position in a neural network trained based on these training examples. In addition, in implementations where handle end actuators and / or other hardware components of the plurality of robots vary and / or wear differently, and / or in which different robots (for example, same fabrication and / or model and / or different manufacture (s) and / or model (s) interact with different objects (for example, objects of different sizes, different weights, different shapes, different translucency, different materials) and / or in different environments (for example, surfaces different lighting, different environmental obstacles), training examples generated based on attempts to catch the plurality of robots can provide robustness for various robotic and / or environmental configurations.
[056] In some implementations, objects that are reachable by a given robot and to which grab attempts can be made may be different during different iterations of method 300. For example, a human operator and / or another robot may add and / or removing objects to a robot's workspace between one or more attempts
Petition 870190014729, of 02/13/2019, p. 31/72
26/55 handle of the robot. Also, for example, the robot itself can drop one or more objects out of its workspace after successful grabs of those objects. This can increase the diversity of training data. In some implementations, environmental factors such as lighting, surface (s), obstacles, etc. additionally and / or alternatively they may be different during different iterations of method 300, which may also increase the diversity of training data.
[057] Figure 4 is a flow chart illustrating a 400 example method of generating training examples based on data associated with robot grab attempts. For convenience, flowchart operations are described with reference to a system that performs the operations. This system may include one or more components of a robot and / or another computer system, such as a processor and / or robot control system 180A, 180B, 1220, and / or a processor of the sample generation system training 110 and / or another system that optionally can be implemented separately from a robot. In addition, although method 400 operations are shown in a particular order, this is not intended to be limiting. One or more operations can be reordered, omitted or added.
[058] In block 452, the system starts generating a training example. In block 454, the system selects a pick attempt. For example, the system can access a database that includes data associated with a plurality of stored catch attempts, and select one of the stored catch attempts. The attempt to catch
Petition 870190014729, of 02/13/2019, p. 32/72
27/55 selected can be, for example, a catch attempt generated based on method 300 in figure 3.
[059] In block 456, the system determines a successful catch label for the selected catch attempt based on data stored for the selected catch attempt. For example, as described with respect to block 364 of method 300, additional data can be stored for the catch attempt to enable determination of a successful catch label for the catch attempt. The stored data can include data from one or more sensors, where the data is generated during and / or after the attempt to get it.
[060] As an example, the additional data can be a position reading, a torque reading, and / or another reading of the handle end actuator. In an example like this, the system can determine a catch success tag based on the reading (s). For example, where the reading is a position reading, the system can determine a successful catch label if the reading is greater than some threshold (for example, 1 cm) - and can determine an unsuccessful catch label if the reading is less than some threshold (for example, 1 cm).
[061] As another example, the additional data can be an image that captures the original handle position after the end actuator and the object (if any) are moved and an image that captures the original handle position after the object ( if any) have been released. To determine the catch success tag, the system can compare (1) the image after the end actuator and the object (if any) are moved (2) to the image after the object
Petition 870190014729, of 02/13/2019, p. 33/72
28/55 (if any) have been released. For example, the system can compare pixels from the two images and, if more than a threshold number of pixels between the two images are different, then the system can determine a successful catch label. Also, for example, the system can perform object detection on each of the two images, and determine a successful catch tag if an object is detected in the image captured after the object (if any) has been released, but is not detected in the image captured after the end actuator and the object (if any) are moved.
[062] Also as another example, the additional data can be an image that captures the original grip position after the end actuator and the object (if any) are moved. To determine the catch success tag, the system can compare (1) the image after the end actuator and the object (if any) are displaced to (2) an additional image of the environment obtained before the attempt to catch starts ( for example, an additional image that omits the end actuator).
[063] In some implementations, the catch success tag is a binary tag, just like a successful / unsuccessful tag. In some implementations, the handle success tag can be selected from more than two options, such as 0, 1 and one or more values between 0 and 1. For example, in a pixel comparison approach, 0 can indicate a handle unsuccessful confirmed and can be selected by the system when less than a first threshold number of pixels is different between the two images; 0.25 can indicate a
Petition 870190014729, of 02/13/2019, p. 34/72
29/55 most likely unsuccessful pick and can be selected when the number of different pixels is from the first threshold to a second higher threshold, 0.75 can indicate a most likely successful pick and can be selected when the number of different pixels is greater than the second threshold (or another threshold), but less than a third threshold; and 1 can indicate a confirmed successful catch, and can be selected when the number of different pixels is equal to or greater than the third threshold.
[064] In block 458, the system selects an instance for the attempt to catch. For example, the system can select data associated with the instance based on a
tag in time and / or in another demarcation associated with the data that The differentiates in other instances of the attempt in catch. [065] At the block 460, O system generates a motion vector
of end actuator to the instance based on the position of the end actuator in the instance and the position of the end actuator in the final instance of the handle attempt. For example, the system can determine a transformation between the current position and the final position of the grab attempt and use the transformation as the end actuator motion vector. The current position and the motion vector of the end actuator from the current position to the final position of the grab attempt can be represented in the task space, in the joint space or in another space. For example, the motion vector of the end actuator can be represented by five values in the task space: three values defining the three-dimensional (3D) translation vector and two values
Petition 870190014729, of 02/13/2019, p. 35/72
30/55 representing a sine-cosine encoding of the change in orientation of the end actuator around a geometric axis of the end actuator.
[066] In block 462, the system generates a training example for the instance that includes: (1) the image stored for the instance, (2) the end actuator motion vector generated for the instance in block 460, and (3) the grip success tag determined in block 456. In some implementations, the system generates a training example that also includes an additional image stored for the grip attempt, such as one that at least partially omits the end actuator and that was captured before the attempted catch. In some of these implementations, the system concatenates the image stored for the instance and the additional image stored for the attempt to take to generate a concatenated image for the training example. The concatenated image includes both the image stored for the instance and the additional image stored. For example, where both images include X by Y pixels and three channels (for example, red, blue, green), the concatenated image can include X by Y pixels and six channels (three from each image). As described in this document, the current image, the additional image and the vector from the current position to the final position of the attempted grab of the training examples can be used as a training example entry; and the handle success tag can be used as a training example output.
[067] In some implementations, in block 462 the system can optionally process the image (s). For example, the
Petition 870190014729, of 02/13/2019, p. 36/72
The system can optionally resize the image to fit a defined size of a CNN input layer, remove one or more channels from the image and / or normalize the values for depth channel (s) (in implementations where images include depth channel).
[068] In block 464, the system determines whether the selected instance is the final instance of the catch attempt. If the system determines that the selected instance is not the final instance of the catch attempt, the system returns to block 458 and selects another instance.
[069] If the system determines that the selected instance is the final instance of the catch attempt, the system proceeds to block 466 and determines whether there are additional catch attempts to process. If the system determines that there are additional catch attempts to process, the system returns to block 454 and selects another catch attempt. In some implementations, determining if there are additional catch attempts to process may include determining if there are any remaining unprocessed catch attempts. In some implementations, determining if there are additional catch attempts to process may additionally include and / or alternatively determine whether a threshold number of training examples has already been generated and / or whether other criteria have been met.
[070] If the system determines that there are no additional catch attempts to process, the system proceeds to block 466 and method 400 ends. Another iteration of method 400 can be performed again. For example, method 400 can be run again in response to at least one
Petition 870190014729, of 02/13/2019, p. 37/72
32/55 threshold number of additional catch attempts to be performed.
[071] Although method 300 and method 400 are illustrated in separate figures in this document for the sake of clarity, it is understood that one or more blocks of method 400 may be executed by the same component (s) that executes one or more blocks of method 300. For example, one or more (for example, all) of blocks of method 300 and method 400 can be executed by processor (s) of a robot. Also, it is understood that one or more blocks of method 400 may be executed in combination, or preceding or following one or more blocks of method 300.
[072] Figure 5 is a flow chart illustrating an example 500 method of training a convolutional neural network based on training examples. For convenience, flowchart operations are described with reference to a system that performs the operations. This system may include one or more components of a computer system, such as a processor (for example, a GPU) of the training mechanism 120 and / or another computer system operating on the convolutional neural network (for example, the CNN 125 ). In addition, although method 500 operations are shown in a particular order, this is not intended to be limiting. One or more operations can be reordered, omitted or added.
[073] In block 552, the system starts training. In block 554, the system selects a training example. For example, the system can select a training example generated based on method 400 in figure 4.
[074] In block 556, the system applies an image to the training example instance and an additional image
Petition 870190014729, of 02/13/2019, p. 38/72
33/55 of the training example selected for an initial layer of a CNN. For example, the system can apply the images to an initial CNN convolutional layer. As described in this document, the additional image may at least partially omit the end actuator. In some implementations, the system concatenates the image and the additional image and applies the concatenated image to the initial layer. In some other implementations, the image and the additional image are already concatenated in the training example.
[075] At block 558, the system applies the end actuator motion vector from the selected training example to an additional CNN layer. For example, the system can apply the motion vector of the end actuator to an additional layer of CNN which is downstream of the initial layer to which the images are applied in block 556. In some implementations, to apply the motion vector of end actuator to the additional layer, the system passes the end actuator motion vector through a fully connected layer to generate end actuator motion vector output, and concatenates the end actuator motion vector output with leaving a layer immediately upstream of CNN. The immediately upstream layer is immediately upstream of the additional layer to which the end actuator motion vector is applied and can optionally be one or more layers downstream of the initial layer to which the images are applied in block 556. In some implementations, the initial layer is a convolutional layer, the layer immediately upstream is a layer
Petition 870190014729, of 02/13/2019, p. 39/72
34/55 grouping and the additional layer is a convolutional layer.
[076] In block 560, the system performs back propagation on CNN based on the training success catch tag. In block 562, the system determines whether additional training examples exist. If the system determines that additional training examples exist, the system returns to block 554 and selects another training example. In some implementations, determining whether additional training examples exist may include determining whether there are any remaining training examples that have not been used to train CNN. In some implementations, determining whether additional training examples exist can additionally include and / or alternatively determine whether a threshold number of training examples has been used and / or whether other criteria have been met.
[077] If the system determines that there are no additional training examples and / or that some other criteria have been met, the system proceeds to block 564 or block 566.
[078] On block 564, CNN's training may end. The trained CNN can then be provided for use by one or more robots when driving a handle end actuator to achieve a successful grip of an object by the handle end actuator. For example, a robot can use the trained CNN when executing method 700 in figure 7A.
[079] In block 566, the system can additionally and / or alternatively provide CNN trained to generate additional training examples based on CNN
Petition 870190014729, of 02/13/2019, p. 40/72
35/55 trained. For example, one or more robots can use trained CNN when executing catch attempts and data from those catch attempts can be used to generate additional training examples. For example, one or more robots can use trained CNN when executing catch attempts based on method 700 in figure 7A and data from these catch attempts can be used to generate additional training examples based on method 400 in figure 4. robots whose data are used to generate additional training examples can be robots in a lab / training setting and / or robots in actual use by one or more consumers.
[080] In block 568, the system can update CNN based on additional training examples generated in response to providing CNN trained in block 566. For example, the system can update CNN by performing additional iterations of blocks 554, 556 , 558 and 560 based on additional training examples.
[081] As indicated by the arrow extending between blocks 566 and 568, the updated CNN can be provided again in block 566 to generate additional training examples and these training examples can be used in block 568 to further update CNN . In some implementations, catch attempts performed in association with future iterations of block 566 may be longer time attempts than those performed in future iterations and / or those performed without using a trained CNN. For example, implementations of method 300 in figure 3 that are performed without using a trained CNN may have lesser attempts to catch
Petition 870190014729, of 02/13/2019, p. 41/72
36/55 temporally, those executed with a trained CNN initially may have larger grab attempts temporally, those executed with the next iteration of a trained CNN also have larger grab attempts temporally, etc. This can optionally be implemented using the method 300 instance and / or optional time counter.
[082] Figures 6A and 6B illustrate an example architecture of a CNN 600 of various implementations. The CNN 600 in figures 6A and 6B is an example of a CNN that can be trained based on the 500 method in figure 5. The CNN 600 in figures 6A and 6B is also an example of a CNN that, once trained, can be used when driving a handle end actuator based on method 700 of figure 7A. Generally speaking, a convolutional neural network is a multilayered learning structure that includes an input layer, one or more convolutional layers, layers of weights and / or other options, and an output layer. During training, a convolutional neural network is trained to understand a hierarchy of resource representations. Convolutional layers of the network are wrapped with filters and optionally subsampled by grouping layers. In general, the grouping layers aggregate values in a smaller region by means of one or more subsampling functions such as maximum, minimum and / or normalization sampling.
[083] CNN 600 includes an initial 663 entry layer which is a convolutional layer. In some implementations, the initial input layer 663 is a 6x6 convolutional layer with step 2 and 64 filters. Image with an actuator
Petition 870190014729, of 02/13/2019, p. 42/72
37/55 end 661A and image without an end actuator 661B are also illustrated in figure 6A. The 661A and 661B images are further illustrated by being concatenated (represented by the merging lines extending from each) and the concatenated image being fed to the initial input layer 663. In some implementations, each of the 661A and 661B images may be of 472 pixels, by 472 pixels, over 3 channels (for example, the 3 channels can be selected from depth channel, first color channel, second color channel, third color channel). Therefore, the concatenated image can be 472 pixels, by 472 pixels, by 6 channels. Other sizes can be used such as different pixel sizes or more or less channels. Images 661A and 661B are assembled for the initial input layer 663. The weights of the resources of the initial input layer and other layers of the CNN 600 are known during CNN 600 training based on multiple training examples.
[084] The initial input layer 663 is followed by a maximum cluster layer 664. In some implementations, the maximum cluster layer 664 is a 3x3 maximum cluster layer with 64 filters. The maximum cluster layer 664 is followed by six convolutional layers, two of which are represented in figure 6A by 665 and 666. In some implementations, each of the six convolutional layers is a 5 x 5 convolutional layer with 64 filters. Convolutional layer 666 is followed by a maximum cluster layer 667. In some implementations, the maximum cluster layer 667 is a 3x3 maximum cluster layer with 64 filters.
Petition 870190014729, of 02/13/2019, p. 43/72
38/55 [085] An end actuator motion vector 662 is also illustrated in figure 6A. The end actuator motion vector 662 is concatenated with the output of the maximum cluster layer 667 (as indicated by the + in figure 6A) and the concatenated output is applied to a convolutional layer 670 (figure 6B). In some implementations, concatenating the end actuator motion vector 662 with the output of the maximum cluster layer 667 includes processing the end actuator motion vector 662 through a fully connected layer 668, the output of which is then added point to point. point at each point in the response map of the maximum cluster layer 667 when positioning the exit side by side over spatial dimensions by means of a vector positioned side by side 669. In other words, the end actuator motion vector 662 is passed through the fully connected layer 668 and reproduced, through the vector positioned side by side 669, on the spatial dimensions of the response map of the maximum cluster layer 667.
[086] Now going back to figure 6B, the concatenation of the end actuator motion vector 662 and the output of the maximum cluster layer 667 is provided for convolutional layer 670, which is followed by five more convolutional layers (the last convolutional layer 671 of these five is illustrated in figure 6B, but the four players are not). In some implementations, each of the 670 and 671 convolutional layers, and of the four intervening convolutional layers, is a 3x3 convolutional layer with 64 filters.
[087] Convolutional layer 671 is followed by a layer
Petition 870190014729, of 02/13/2019, p. 44/72
39/55 maximum cluster 672. In some implementations, the maximum cluster layer 672 is a 2x2 maximum cluster layer with 64 filters. The maximum cluster layer 672 is followed by three convolutional layers, two of which are represented in figure 6A by 673 and 674.
[088] The final convolutional layer 674 of CNN 600 is fully connected to a first fully connected layer 675 which, in turn, is fully connected to a second fully connected layer 676. The fully connected layers 675 and 676 can be vectors, such as vectors of size 64. The output of the fully connected second layer 676 is used to generate measure 677 of a successful handle. For example, a sigmoid can be used to generate and send measure 677. In some implementations of training the CNN 600, various values for periods, learning rate, weight decay, probability of loss of signal and / or other parameters can be used. In some implementations, one or more GPUs can be used to train and / or use the CNN 600. Although a particular convolutional neural network 600 is illustrated in figure 6, variations are possible. For example, more or less convolutional layers may be provided, one or more layers may be different in size from those provided as examples, etc.
[089] Once the CNN 600 or other neural network is trained according to the techniques described in this document, it can be used to drive a handle end actuator. Referring to figure 7A, a flowchart illustrating an example 700 method of using a trained convolutional neural network to drive an actuator.
Petition 870190014729, of 02/13/2019, p. 45/72
40/55 handle end is illustrated. For convenience, flowchart operations are described with reference to a system that performs the operations. This system may include one or more components of a robot, such as a processor (for example, CPU and / or GPU) and / or robot control system of the robot 180A, 180B, 840 and / or another robot. When implementing one or more blocks of method 700, the system can operate on a trained CNN that can, for example, be stored locally on a robot and / or can be stored away from the robot. In addition, although method 700 operations are shown in a particular order, this is not intended to be limiting. One or more operations can be reordered, omitted or added.
[090] In block 752, the system generates a motion vector for the candidate end actuator. The candidate end actuator motion vector can be defined in task space, joint space or other space, depending on the input parameters of the trained CNN to be used in additional blocks.
[091] In some implementations, the system generates a candidate end actuator motion vector that is random within a given space, such as the work space reachable by the end actuator, a restricted space within which the end actuator it is confined to attempts to pick up and / or a space defined by position limits and / or torque of the driver (s) that controls (m) the position of the end actuator.
[092] In some implementations the system may use one or more techniques to sample a group of candidate end actuator motion vectors and to
Petition 870190014729, of 02/13/2019, p. 46/72
41/55 select a subgroup from the sampled group. For example, the system can use an optimization technique, such as the cross entropy method (EMC). CEM is a derivative-free optimization algorithm that samples a group of N values in each iteration, fits a Gaussian distribution to M <N of these samples, and then samples a new group of N from this Gaussian. For example, the system can use EMC and values of M = 64 and N = 6, and perform three iterations of EMC to determine (according to EMC) the best available candidate end actuator motion vector.
[093] In some implementations, one or more restrictions may be imposed in relation to the candidate end actuator motion vector that can be generated in block 752. For example, candidate end actuator movements evaluated by means of EMC or by another technique can be restricted based on the restrictions. An example of restrictions are restrictions introduced per person (for example, via a computer system user interface input device) that impose restrictions on the area (s) in which handles can be used. be attempted, restrictions on particular object (s) and / or classification (s) of particular object for which grips can be attempted, etc. Another example of restrictions are computer-generated restrictions that impose restrictions on the area (s) in which handles can be attempted, restrictions on particular object (s) and / or the classification of particular object (s) for which grips can be tried, etc. For example, an object classifier can classify one or more objects based on captured images and impose restrictions that restrict handles to
Petition 870190014729, of 02/13/2019, p. 47/72
42/55 objects of certain classifications. Other cases of restrictions include, for example, restrictions based on a robot workspace, robot joint limits, robot torque limits, restrictions provided by a collision avoidance system that restricts the robot's movement to prevent collision with one or more objects, etc.
[094] In block 754, the system identifies a current image that captures the end actuator and one or more environmental objects. In some implementations, the system also identifies an additional image that at least partially omits the end actuator, such as an additional image of environmental objects that was captured by a vision sensor when the end actuator was at least partially out of sight. vision sensor. In some implementations, the system concatenates the image and the additional image to generate a concatenated image. In some implementations, the system optionally performs image (s) and / or concatenated image processing (for example, to adjust for a CNN entry).
[095] In block 756, the system applies the current image and motion vector of the candidate end actuator to a trained CNN. For example, the system can apply the concatenated image, which includes the current image and the additional image, to a trained CNN starter layer. The system can also apply the candidate end actuator motion vector to an additional layer of trained CNN that is downstream of the initial layer. In some implementations, when applying the candidate end actuator motion vector to the additional layer, the system passes the end actuator motion vector through
Petition 870190014729, of 02/13/2019, p. 48/72
43/55 of a fully connected layer from CNN to generate end actuator motion vector output and concatenate the end actuator motion vector output with upstream output from CNN. The upstream outlet is a layer immediately upstream of CNN which is immediately upstream of the additional layer and which is downstream of the initial layer and one or more intermediate layers of CNN.
[096] In block 758, the system generates, at the trained CNN, a measure of a successful handle based on the end actuator motion vector. The measure is generated based on the application of the current image (and optionally the additional image) and the candidate end actuator motion vector for the trained CNN in block 756 and determine the measure based on the trained weights of the trained CNN.
[097] In block 760, the system generates an end actuator command based on the measurement of a successful handle. For example, in some implementations, the system may generate one or more additional candidate end actuator motion vectors in block 752, and generate successful handle measurements for these additional candidate end actuator motion vectors in additional block iterations. 758 when applying these and the current image (and optionally the additional image) for CNN trained in additional iterations of block 756. The additional iterations of blocks 756 and 758 can optionally be performed in parallel by the system. In some of these implementations, the
system can generate the command actuator far end with based on measure for the vector of movement in actuator in candidate end and in measures to the vectors in
Petition 870190014729, of 02/13/2019, p. 49/72
44/55 movement of additional candidate end actuator. For example, the system can generate the end actuator command to fully or substantially conform to the candidate end actuator motion vector with the most indicative measure of a successful grip. For example, a robot control system of the system can generate motion command (s) to drive one or more robot drivers to move the end actuator based on the end actuator motion vector.
[098] In some implementations, the system can also generate the end actuator command based on a successful handle current measurement if no candidate end actuator motion vector is used to generate new motion commands (for example , the current successful catch measurement). For example, if one or more comparisons of the current measurement to the measurement of the candidate end actuator motion vector that is more indicative of successful catch fails to satisfy a threshold, then the end actuator motion command may be a command handle that induces the end actuator to attempt a handle (for example, closing fingers of an impactive handle end actuator). For example, if the result of the current measure divided by the measure of the motion vector of the candidate end actuator that is more indicative of successful grip is equal to or greater than a first threshold (for example, 0.9), the grip command can be generated (according to the rational analysis of stopping the handle prematurely if closing the handle is almost as likely to produce
Petition 870190014729, of 02/13/2019, p. 50/72
45/55 a successful grip as you move it). Also, for example, if the result is equal to or less than a second threshold (for example, 0.5), the end actuator command can be a motion command to effect a path correction (for example, move the actuator handle end upwards by at least X meters) (according to the rational analysis that the handle end actuator is most likely not positioned in a good configuration and a relatively large movement is required). Also, for example, if the result is between the first and second thresholds, a motion command can be generated that substantially or totally conforms to the candidate end actuator motion vector with the measure that is most indicative of good grip. successful. The end actuator command generated by the system can be a single group of one or more commands, or a sequence of groups of one or more commands.
[099] The successful grip measurement if no candidate end actuator motion vector is used to generate new motion commands can be based on the measurement for the candidate end actuator motion vector used in a previous iteration of the method 700 and / or based on applying a null motion vector and the current image (and optionally the additional image) to the CNN trained in an additional iteration of block 756, and generating the measure based on an additional iteration of block 758 .
[0100] In block 762, the system determines whether the end actuator command is a handle command. If the system determines in block 762 that the actuator control
Petition 870190014729, of 02/13/2019, p. 51/72
46/55 end is a pick command, the system proceeds to block 764 and implements the pick command. In some implementations, the system can optionally determine whether the grip command results in a successful grip (for example, using techniques described in this document) and, if not successful, the system can optionally adjust the position of the end actuator and return to block 752. Even where the catch is successful, the system can return to block 752 later to pick up another object.
[0101] If the system determines in block 762 that the end actuator command is not a handle command (for example, it is a motion command), the system proceeds to block 766 and implements the end actuator command , and then returns to block 752, where it generates another candidate end actuator motion vector. For example, in block 766 the system can implement an end actuator motion command that substantially or totally conforms to the candidate end actuator motion vector with the measure that is most indicative of successful pickup.
[0102] In many implementations, method 700 blocks can be executed at a relatively high frequency, thereby enabling iterative updating of end actuator commands and enabling actuation of the end actuator along a path that is reported by CNN trained to result in a relatively high probability of successful catch.
[0103] Figure 7B is a flow chart illustrating some implementations of certain blocks in the flow chart of figure 7A.
Petition 870190014729, of 02/13/2019, p. 52/72
47/55
In particular, figure 7B is a flow chart illustrating some implementations of blocks 758 and 760 in figure 7A.
[0104] In block 758A, the system generates, on CNN, a measure of a successful handle based on the candidate vector of the candidate end actuator in block 752.
[0105] In block 758B, the system determines the current measurement of a successful handle based on the current position of the end actuator. For example, the system can determine the current successful handle measurement if no candidate end actuator motion vector is used to generate new motion commands based on the measurement for the candidate end actuator motion vector used in a immediately previous iteration of method 700. Also, for example, the system can determine the current measurement based on applying a null motion vector and the current image (and optionally the additional image) to the trained CNN in an additional iteration of block 756 , and generate the measure based on an additional iteration of block 758.
[0106] In block 760A, the system compares the measurements of blocks 758A and 758B. For example, the system can make a comparison by dividing the measures, subtracting the measures and / or applying the measures to one or more functions.
[0107] In block 760B, the system generates an end actuator command based on the comparison of block 760A. For example, if the measure of block 758B is divided by the measure of block 758A and the quotient is equal to or greater than a threshold (for example, 0.9), then the end actuator motion command can be a handle command which induces the end actuator to try a handle. Also,
Petition 870190014729, of 02/13/2019, p. 53/72
48/55 for example, if the measure of block 758B is divided by the measure of block 758A and the quotient is equal to or less than a second threshold (for example, 0.5), the end actuator command can be a movement to make a trajectory correction. Also, for example, if the measure of block 758B is divided by the measure of block 758A and the quotient is between the second threshold and the first threshold, a motion command can be generated that substantially or totally conforms to the motion vector. candidate end actuator.
[0108] Particular examples are given in this document of training a CNN and / or using a CNN to drive an end actuator. However, some implementations may include additional and / or alternative features that vary from particular examples. For example, in some implementations, a CNN can be trained to predict a measure indicating the likelihood that candidate motion data for a robot end actuator will result in a successful catch of one or more particular objects, such as a particular classification (for example, pencils, engraving tools, spatulas, kitchen utensils, objects having a generally rectangular shape, soft objects, objects whose lower limit is between X and Y, etc.).
[0109] For example, in some implementations objects of a particular classification can be included along with other objects for robots to pick up during various picking attempts. Training examples can be generated where a successful handle tag is found only if: (1) the handle was successful and (2) the handle was from an object
Petition 870190014729, of 02/13/2019, p. 54/72
49/55 that is in accordance with that particular classification. Determining whether an object conforms to a particular classification can be done, for example, on the basis of the robot turning the handle tip actuator towards the vision sensor following a grab attempt and using the vision sensor to capture an image of the object (if any) caught by the handle end actuator. A human reviewer and / or an image classification neural network (or other image classification system) can then determine whether the object caught by the end actuator is of the particular classification - and this determination is used to apply an appropriate handle tag . Such training examples can be used to train a CNN as described in this document and, as a result of training through such training examples, the trained CNN can be used to drive a robot handle end actuator to reach a successful grip, by the handle end actuator, of an object that is of particular classification.
[0110] Figure 8 schematically represents an example architecture for an 840 robot. The 840 robot includes an 860 robot control system, one or more 840a-840n operating components and one or more 842a-842m sensors. 842a-842m sensors can include, for example, vision sensors, light sensors, pressure sensors, pressure wave sensors (for example, microphones), proximity sensors, accelerometers, gyroscopes, thermometers, barometers and so on . Although the 842a-m sensors are represented as being integral to the 840 robot, this is not intended to be limiting. In some
Petition 870190014729, of 02/13/2019, p. 55/72
50/55 implementations, sensors 842a-m can be located external to the robot 840, for example, as stand-alone units.
[0111] The operational components 840a-840n may include, for example, one or more end actuators and / or one or more servomotors or other actuators to effect movement of one or more components of the robot. For example, the 840 robot can have multiple degrees of freedom and each of the triggers can control the 840 robot's drive within one or more of the degrees of freedom responsive to control commands. As used in this document, the term actuator encompasses a mechanical or electrical device that creates movement (for example, a motor), in addition to any actuator that may be associated with the actuator and that translates received control commands to one or more signals to trigger the trigger. Therefore, providing a control command to a trigger may comprise providing the control command to a trigger that translates the control command to appropriate signals to trigger an electrical or mechanical device to create desired movement.
[0112] The robot control system 860 can be implemented on one or more processors, such as a CPU, GPU and / or other controller (s) of the robot 840. In some implementations, the robot 840 may comprise a memory box that can include all or aspects of the 860 control system. For example, the memory box can provide real-time bursts of data for the 840a-n operating components, with each of the real-time bursts comprising a set of one or more control commands that dictate, among other things, the motion parameters
Petition 870190014729, of 02/13/2019, p. 56/72
51/55 (if any) for each of one or more of the 840a-n operating components. In some implementations, the 860 robot control system can perform one or more aspects of the 300, 400, 500 and / or 700 methods described in this document.
[0113] As described in this document, in some implementations all or aspects of the control commands generated by the 860 control system when positioning an end actuator to pick up an object can be based on end actuator commands generated based on the use of a trained neural network, such as a trained CNN. For example, a vision sensor from 842am sensors can capture a current image and an additional image, and the robot control system 860 can generate a candidate motion vector. The robot control system 860 can provide the current image, additional image and candidate motion vector to a trained CNN and use a measurement generated based on the application to generate one or more end actuator control commands to control the movement and / or grip of a robot end actuator. Although the 860 control system is illustrated in figure 8 as an integral part of the 840 robot, in some implementations all or aspects of the 860 control system can be implemented in a component that is separate from the 840 robot, but in communication with it. For example, all or aspects of the 860 control system can be implemented in one or more computing devices that are in wired and / or wireless communication with the 840 robot, such as the 910 computing device.
[0114] Figure 9 is a block diagram of an example 910 computing device that optionally
Petition 870190014729, of 02/13/2019, p. 57/72
52/55 can be used to perform one or more aspects of techniques described in this document. The computing device 910 typically includes at least one processor 914 that communicates with various peripheral devices via bus subsystem 912. These peripheral devices can include a storage subsystem 924, including, for example, a memory subsystem 925 and
a subsystem storage files 926, the devices output in interface user 920, the devices input in interface user 922 and one interface subsystem in network 916. The devices in
input and output allow user interaction with the 910 computing device. The 916 network interface subsystem provides an interface for external networks and is coupled with corresponding interface devices on other computing devices.
[0115] The devices in input from interface user 922 may include one keyboard, devices indicators such as a mouse, mouse stationary, touch sensitive surface, or tablet from graphics, one
scanner, a touchscreen built into the display, audio input devices such as speech recognition systems, microphones and / or other types of input devices. In general, use of the term input device is intended to include all possible types of devices and ways to input information into the computing device 910 or a communication network.
[0116] 920 user interface output devices can include a display subsystem, a
Petition 870190014729, of 02/13/2019, p. 58/72
53/55 printer, fax machine or non-visual devices such as audio output devices. The display subsystem may include a cathode ray tube (CRT), a flat panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem can also provide non-visual display such as via audio output devices. In general, use of the term output device is intended to include all possible types of devices and modes for sending information from the computing device 910 to the user or to another computing machine or device.
[0117] The storage subsystem 924 stores programming and data constructs that provide the functionality of all or some of the modules described in this document. For example, the storage subsystem 924 may include the logic for executing selected aspects of the method of figures 3, 4, 5 and / or 7A and 7B.
[0118] These software modules are generally run by the 914 processor alone or in combination with other processors. The 925 memory used in the storage subsystem 924 can include several memories including a main random access memory (RAM) 930 for storing instructions and data during program execution and a read-only memory (ROM) 932 in which fixed instructions are stored . A 926 file storage subsystem can provide permanent storage for program and data files, and can include a hard drive, a hard drive
Petition 870190014729, of 02/13/2019, p. 59/72
54/55 flexible together with associated removable media, a CD-ROM drive, an optical drive or removable media cartridges. Modules running the functionality of certain implementations can be stored by the file storage subsystem 926 in the storage subsystem
924, or in others machines accessible by processor (s) 914. [0119] 0 bus subsystem 912 provides a mechanism for to allow that the various components and
subsystems of the computing device 910 communicate with each other as intended. Although the bus subsystem 912 is shown schematically as a single bus, alternative implementations of the bus subsystem can use multiple buses.
[0120] The computing device 910 can be of various types including a workstation, server, computing cluster, blade server, server set or any other processing system or data computing device. Because of the ever-changing nature of computers and networks, the description of computing device 910 represented in Figure 9 is intended only as a specific example for the purpose of illustrating some implementations. Many other configurations of the computing device 910 are possible having more or less components than the computing device shown in Figure 9.
[0121] Although several implementations have been described and illustrated in this document, a variety of other resources and / or structures to perform the function and / or obtain the results and / or one or more of the advantages described
Petition 870190014729, of 02/13/2019, p. 60/72
55/55 in this document may be used, and each such variation and / or modification is considered to be within the scope of the implementations described in this document. More generally, all parameters, dimensions, materials and configurations described in this document are intended to be exemplary and that the actual parameters, dimensions, materials and / or configurations will depend on the specific application or applications for which the precepts are used. Those skilled in the art will recognize, or be able to discover, using no more than routine experimentation, many equivalences for the specific implementations described in this document. Therefore, it is to be understood that the implementations indicated above are presented by way of example only and that, within the scope of the appended claims and equivalences thereto, implementations may be practiced in any way other than as specifically described and claimed. Implementations of the present disclosure are directed to each individual resource, system, article, material, kit and / or method described in this document. In addition, any combination of two or more of such resources, systems, articles, materials, kits and / or methods, if such resources, systems, articles, materials, kits and / or methods are not mutually inconsistent, is included in the scope of the present revelation.

权利要求:
Claims (28)
[1]
CLAIM
1. Method, characterized by the fact that it comprises:
generating, by one or more processors, a candidate end actuator motion vector defining movement to move a robot handle end actuator from a current position to an additional position;
identify, by one or more of the processors, a current image captured by a vision sensor associated with the robot, the image capturing the handle end actuator and at least one object in a robot environment;
apply, by one or more of the processors, the current image and the motion vector of the candidate end actuator as inputs to a trained convolutional neural network;
generate, in the trained convolutional neural network, a measure of successful catching of the object with application of movement, the measure being generated based on the application of the image and the vector of movement of the end actuator to the trained convolutional neural network;
generating an end actuator command based on the measure, the end actuator command being a handle command or an end actuator motion command; and provide the end actuator command to one or more robot drivers.
[2]
2. Method, according to claim 1, characterized by the fact that it additionally comprises:
to determine, by one or more of the processors, a current measure of successful catching of the object without applying the movement;
Petition 870190014727, of 02/13/2019, p. 7/18
2/11 where generating the end actuator command is based on the measurement and the current measurement.
[3]
3. Method, according to claim 2, characterized by the fact that the end actuator command is the handle command and in which generating the handle command is in response to determining which comparison of the measurement with the current measurement satisfies a threshold.
[4]
4. Method according to claim 2, characterized by the fact that the end actuator command is the end actuator motion command and in which generating the end actuator motion command comprises generating the end motion command end actuator to conform to the candidate end actuator motion vector.
[5]
5. Method according to claim 2, characterized by the fact that the end actuator command is the end actuator motion command and in which generating the end actuator motion command comprises generating the end motion command end actuator to make a path correction to the end actuator.
[6]
Method according to any one of claims 2 to 5, characterized by the fact that determining the current measure of successful object grab without application of the movement comprises:
apply, by one or more of the processors, the image and a motion vector of the zero-end actuator as inputs to the trained convolutional neural network; and generate, in the trained convolutional neural network, the current measure of successful catching of the object without application of the
Petition 870190014727, of 02/13/2019, p. 8/18
3/11 motion, the current measure being generated based on the application of the image and the motion vector of the zero-end actuator to the trained convolutional neural network.
[7]
Method according to any one of claims 1, 2 or 4, characterized in that the end actuator command is the end actuator motion command and is in accordance with the end actuator motion vector candidate, in which providing the end actuator motion command to the one or more actuators moves the end actuator to a new position, and further comprising:
generating, by one or more processors, an additional candidate end actuator motion vector defining new motion to move the handle end actuator from the new position to an additional position further on;
identify, by one or more of the processors, a new image captured by a vision sensor associated with the robot, the new image capturing the end actuator in the new position and capturing objects in the environment;
apply, by one or more of the processors, the new image and the motion vector of additional candidate end actuator as inputs to the trained convolutional neural network;
generate, in the trained convolutional neural network, a new measure of successful catching of the object with the application of the new movement, the new measure being generated based on the application of the new image and the additional end actuator movement vector for the convolutional neural network
Petition 870190014727, of 02/13/2019, p. 9/18
Trained 4/11;
generate a new end actuator command based on
in the new measure, O new command of actuator in far end being the command in catch or a new command in movement of actuator edge; and
provide the new end actuator command for one or more robot actuators.
[8]
8. Method, according to any of the preceding claims, characterized by the fact that applying the image and the motion vector of the candidate end actuator as inputs to the trained convolutional neural network comprises:
apply the image as an input to an initial layer of the trained convolutional neural network; and applying the candidate end actuator motion vector to an additional layer of the trained convolutional neural network, the additional layer being downstream of the initial layer.
[9]
9. Method according to claim 8, characterized in that applying the candidate end actuator motion vector to the additional layer comprises:
passing the end actuator motion vector through a fully connected layer of the convolutional neural network to generate end actuator motion vector output; and concatenating the end vector motion of the end actuator with the upstream outlet, the upstream outlet being of a layer immediately upstream of the convolutional neural network which is immediately upstream of the layer
Petition 870190014727, of 02/13/2019, p. 10/18
5/11 and which is downstream of the initial layer and one or more intermediate layers of the convolutional neural network.
[10]
10. Method according to claim 9, characterized by the fact that the initial layer is a convolutional layer and the immediately upstream layer is a grouping layer.
[11]
11. Method according to any one of the preceding claims, characterized by the fact that it additionally comprises:
identify, by one or more of the processors, an additional image captured by the vision sensor, the additional image capturing one or more environmental objects and omitting the robotic end actuator or including the robotic end actuator in a different position than the robotic end in the image; and apply the additional image as an additional input to the trained convolutional neural network.
[12]
12. Method, according to claim 11, characterized by the fact that applying the image and the additional image to the convolutional neural network comprises:
concatenate the image and the additional image to generate a concatenated image; and apply the concatenated image as input to an initial layer of the convolutional neural network.
[13]
13. Method according to any of the preceding claims, characterized by the fact that generating the candidate end actuator motion vector comprises:
generating a plurality of candidate end actuator motion vectors; and
Petition 870190014727, of 02/13/2019, p. 11/18
6/11 perform one or more iterations of cross entropy optimization on the plurality of candidate end actuator motion vectors to select the candidate end actuator motion vector from the plurality of candidate end actuator motion vectors.
[14]
14. System, characterized by the fact that it comprises:
a vision sensor observing an environment;
a trained convolutional neural network stored in one or more computer-readable non-transitory media;
at least one processor configured to:
generating a candidate end actuator motion vector by defining motion to move a robotic end actuator from a current position to an additional position;
apply the candidate end actuator motion vector and an image captured by the vision sensor as inputs to the trained convolutional neural network, the image capturing an end actuator and at least one object in an object environment;
generate, in the trained convolutional neural network, a measure of successful catching of the object with application of movement, the measure being generated based on the application of the image and the vector of movement of the end actuator to the trained convolutional neural network;
generating an end actuator command based on the measure, the end actuator command being a handle command or an end actuator motion command; and provide the end actuator command to
Petition 870190014727, of 02/13/2019, p. 12/18
7/11 one or more robot drivers.
[15]
15. Method of training a convolutional neural network, characterized by the fact that it comprises:
identify, by one or more processors, a plurality of training examples generated based on sensor output from one or more robots during a plurality of grab attempts by the robots, each of the training examples including training example entry comprising:
an image for a corresponding time instance of a corresponding catch attempt from the catch attempts, the image capturing a robotic end actuator and one or more environmental objects in the corresponding time instance, and an end actuator motion vector defining motion of the end actuator to move from a time actuator position of the end actuator in the corresponding time instance to a final position of the end actuator for the corresponding grip attempt, each of the training examples including output from the training example comprising:
a catch success tag indicative of the success of the corresponding catch attempt;
train, by one or more of the processors, the convolutional neural network based on the training examples.
[16]
16. Method, according to claim 15, characterized by the fact that the training example entry for each of the training examples further comprises:
Petition 870190014727, of 02/13/2019, p. 13/18
8/11 an additional image for the corresponding catch attempt, the additional image capturing the one or more environmental objects and omitting the robotic end actuator or including the robotic end actuator in a different position than the robotic end actuator in the image.
[17]
17. Method, according to claim 16, characterized by the fact that the additional image completely omits the robotic end actuator.
[18]
18. Method, according to claim 16 or 17, characterized by the fact that training the convolutional neural network comprises applying, for the convolutional neural network, the training example entry of a given training example of the training examples, in that applying the training example entry to the given training example comprises:
concatenate the image and the additional image of the given training example to generate a concatenated image; and apply the concatenated image as input to an initial layer of the convolutional neural network.
[19]
19. Method, according to any one of claims 15 to 18, characterized by the fact that training the convolutional neural network comprises applying, for the convolutional neural network, the training example entry of a given training example of the training examples , where applying the training example entry to the given training example comprises:
apply the image of the given training example as an entrance to an initial layer of the convolutional neural network; and
Petition 870190014727, of 02/13/2019, p. 14/18
9/11 apply the end actuator motion vector from the given training example to an additional layer of the convolutional neural network, the additional layer being downstream of the initial layer.
[20]
20. Method according to claim 19, characterized in that applying the end actuator motion vector to the additional layer comprises:
pass the end actuator motion vector through a fully connected layer to generate end actuator motion vector output and concatenate the end actuator motion vector output with upstream output, the upstream output being from one layer immediately upstream of the convolutional neural network which is immediately upstream of the additional layer and which is downstream of the initial layer and one or more intermediate layers of the convolutional neural network.
[21]
21. Method according to claim 20, characterized by the fact that the initial layer is a convolutional layer and the immediately upstream layer is a grouping layer.
[22]
22. Method according to any of claims 15 to 21, characterized in that the end actuator motion vector defines end actuator motion in task space.
[23]
23. Method according to any one of claims 15 to 22, characterized by the fact that the training examples comprise:
a first group of training examples generated based on output from a plurality of first robot sensors from a first robot during a plurality of
Petition 870190014727, of 02/13/2019, p. 15/18
10/11 attempts to catch by the first robot; and a second group of training examples generated based on the output of a plurality of second robot sensors from a second robot during a plurality of grab attempts by the second robot.
[24]
24. Method according to claim 23, characterized by the fact that the first robot sensors comprise a first vision sensor generating the images for the training examples of the first group, in which the second robot sensors comprise a second sensor of vision generating the images for the training examples of the second group, and where a first position of the first vision sensor in relation to a first base of the first
robot is distinct from an Monday second position sensor in vision towards an Monday base of second robot. 25. Method, in wake up with any an of claims 15 to 24 , characterized by the fact that at
catch attempts on which each of a plurality of training examples is based comprise a plurality of random trigger commands that randomly move the end actuator from an end position of the end actuator to the end position of the end actuator, and then handle with the end actuator in the final position.
[25]
26. Method, according to claim 25, characterized by the fact that it additionally comprises:
generate additional catch attempts based on the trained convolutional neural network;
identify a plurality of additional training examples based on additional catch attempts; and
Petition 870190014727, of 02/13/2019, p. 16/18
11/11 update the convolutional neural network through additional training of the convolutional network based on additional training examples.
[26]
27. Method according to any one of claims 15 to 26, characterized by the fact that the handle success tag for each of the training examples is a first success value or a second failure value.
[27]
28. Method, according to any one of claims 15 to 27, characterized by the fact that the training comprises performing backpropagation in the convolutional neural network based on the training example output from the plurality of training examples.
[28]
29. Computer-readable instructions, characterized by the fact that, when executed by at least one processor, induce at least one processor to execute the method as defined in any of claims 1 to 13 and 15 to 28.

类似技术:

公开号 | 公开日 | 专利标题

BR112018067482A2|2019-01-02|robotic arm claw deep machine learning methods and apparatus

US11045949B2|2021-06-29|Deep machine learning methods and apparatus for robotic grasping

US10166676B1|2019-01-01|Kinesthetic teaching of grasp parameters for grasping of objects by a grasping end effector of a robot

US20210229276A1|2021-07-29|Machine learning methods and apparatus for automated robotic placement of secured object in appropriate location

Dyrstad et al.2017|Grasping virtual fish: A step towards robotic deep learning from demonstration in virtual reality

US20200338722A1|2020-10-29|Machine learning methods and apparatus for semantic robotic grasping

EA038279B1|2021-08-04|Method and system for grasping an object by means of a robotic device

KR20190102973A|2019-09-04|Camera-based tactile sensor system

同族专利:

公开号 | 公开日

KR20180114217A|2018-10-17|

JP6586532B2|2019-10-02|

JP6921151B2|2021-08-18|

CN111230871A|2020-06-05|

JP2019508273A|2019-03-28|

EP3405910A1|2018-11-28|

US10946515B2|2021-03-16|

CA3016418A1|2017-09-08|

CA3016418C|2020-04-14|

EP3742347A1|2020-11-25|

US20210162590A1|2021-06-03|

US20170252922A1|2017-09-07|

KR102023588B1|2019-10-14|

EP3405910B1|2020-11-25|

KR20190108191A|2019-09-23|

CN109074513A|2018-12-21|

WO2017151206A1|2017-09-08|

US10207402B2|2019-02-19|

JP2019217632A|2019-12-26|

CN109074513B|2020-02-18|

US20190283245A1|2019-09-19|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP3421608B2|1999-04-08|2003-06-30|ファナック株式会社|Teaching model generator|

EP1262907B1|2001-05-28|2007-10-03|Honda Research Institute Europe GmbH|Pattern recognition with hierarchical networks|

DE10130485C2|2001-06-25|2003-06-26|Robert Riener|Programmable joint simulator|

US7457698B2|2001-08-31|2008-11-25|The Board Of Regents Of The University And Community College System On Behalf Of The University Of Nevada, Reno|Coordinated joint motion control system|

AU2003289022A1|2002-12-12|2004-06-30|Matsushita Electric Industrial Co., Ltd.|Robot control device|

US10589087B2|2003-11-26|2020-03-17|Wicab, Inc.|Systems and methods for altering brain and body functions and for treating conditions and diseases of the same|

US20110020779A1|2005-04-25|2011-01-27|University Of Washington|Skill evaluation using spherical motion mechanism|

US7533071B2|2005-06-28|2009-05-12|Neurosciences Research Foundation, Inc.|Neural modeling and brain-based devices using special purpose processor|

KR101003579B1|2006-02-02|2010-12-22|가부시키가이샤 야스카와덴키|Robot system|

JP2007245326A|2006-02-17|2007-09-27|Toyota Motor Corp|Robot, and robot control method|

US20080009771A1|2006-03-29|2008-01-10|Joel Perry|Exoskeleton|

US20090278798A1|2006-07-26|2009-11-12|The Research Foundation Of The State University Of New York|Active Fingertip-Mounted Object Digitizer|

US20100243344A1|2006-09-25|2010-09-30|Board Of Trustees Of Leland Stanford Junior University|Electromechanically counterbalanced humanoid robotic system|

RU2361726C2|2007-02-28|2009-07-20|Общество С Ограниченной Ответственностью "Алгоритм-Робо"|System of controlling anthropomorphous robot and control method|

US20090132088A1|2007-04-24|2009-05-21|Tairob Ltd.|Transfer of knowledge from a human skilled worker to an expert machine - the learning process|

JP2008296330A|2007-05-31|2008-12-11|Fanuc Ltd|Robot simulation device|

WO2009076452A2|2007-12-10|2009-06-18|Robotic Systems & Technologies, Inc.|Automated robotic system for handling surgical instruments|

US8155479B2|2008-03-28|2012-04-10|Intuitive Surgical Operations Inc.|Automated panning and digital zooming for robotic surgical systems|

US9119533B2|2008-10-07|2015-09-01|Mc10, Inc.|Systems, methods, and devices having stretchable integrated circuitry for sensing and delivering therapy|

US8204623B1|2009-02-13|2012-06-19|Hrl Laboratories, Llc|Planning approach for obstacle avoidance in complex environment using articulated redundant robot arm|

US20120053728A1|2009-04-23|2012-03-01|Koninklijke Philips Electronics N.V.|Object-learning robot and method|

US20110043537A1|2009-08-20|2011-02-24|University Of Washington|Visual distortion in a virtual environment to alter or guide path movement|

JP4837116B2|2010-03-05|2011-12-14|ファナック株式会社|Robot system with visual sensor|

WO2011123669A1|2010-03-31|2011-10-06|St. Jude Medical, Atrial Fibrillation Division, Inc.|Intuitive user interface control for remote catheter navigation and 3d mapping and visualization systems|

FI20105732A0|2010-06-24|2010-06-24|Zenrobotics Oy|Procedure for selecting physical objects in a robotic system|

US8849580B2|2010-07-26|2014-09-30|The University Of Vermont|Uses of systems with degrees of freedom poised between fully quantum and fully classical states|

RU2475290C1|2010-11-17|2013-02-20|Общество С Ограниченной Ответственностью "Айтэм Мультимедиа"|Device for games|

CN102161198B|2011-03-18|2012-09-19|浙江大学|Mater-slave type co-evolution method for path planning of mobile manipulator in three-dimensional space|

JP5787642B2|2011-06-28|2015-09-30|キヤノン株式会社|Object holding device, method for controlling object holding device, and program|

KR20130017123A|2011-08-10|2013-02-20|한국생산기술연구원|Robot prehension intelligence study support system and method|

US20130041508A1|2011-08-12|2013-02-14|Georgia Tech Research Corporation|Systems and methods for operating robots using visual servoing|

JP2013046937A|2011-08-29|2013-03-07|Dainippon Screen Mfg Co Ltd|Object gripping apparatus, object gripping method, and object gripping program|

JP5623358B2|2011-09-06|2014-11-12|三菱電機株式会社|Work picking device|

US8386079B1|2011-10-28|2013-02-26|Google Inc.|Systems and methods for determining semantic information associated with objects|

US8958912B2|2012-06-21|2015-02-17|Rethink Robotics, Inc.|Training and operating industrial robots|

KR101997566B1|2012-08-07|2019-07-08|삼성전자주식회사|Surgical robot system and control method thereof|

US20140180479A1|2012-12-20|2014-06-26|Wal-Mart Stores, Inc.|Bagging With Robotic Arm|

US20150138078A1|2013-11-18|2015-05-21|Eyal Krupka|Hand pose recognition using boosted look up tables|

CN104680508B|2013-11-29|2018-07-03|华为技术有限公司|Convolutional neural networks and the target object detection method based on convolutional neural networks|

US10518409B2|2014-09-02|2019-12-31|Mark Oleynik|Robotic manipulation methods and systems for executing a domain-specific application in an instrumented environment with electronic minimanipulation libraries|

US10226869B2|2014-03-03|2019-03-12|University Of Washington|Haptic virtual fixture tools|

US9533413B2|2014-03-13|2017-01-03|Brain Corporation|Trainable modular robotic apparatus and methods|

JP6415066B2|2014-03-20|2018-10-31|キヤノン株式会社|Information processing apparatus, information processing method, position and orientation estimation apparatus, robot system|

JP6364856B2|2014-03-25|2018-08-01|セイコーエプソン株式会社|robot|

US9978013B2|2014-07-16|2018-05-22|Deep Learning Analytics, LLC|Systems and methods for recognizing objects in radar imagery|

US11256982B2|2014-07-18|2022-02-22|University Of Southern California|Noise-enhanced convolutional neural networks|

US9767385B2|2014-08-12|2017-09-19|Siemens Healthcare Gmbh|Multi-layer aggregation for object detection|

US10768708B1|2014-08-21|2020-09-08|Ultrahaptics IP Two Limited|Systems and methods of interacting with a robotic tool using free-form gestures|

US10754328B2|2014-09-05|2020-08-25|Accenture Global Solutions Limited|Self-adaptive device intelligence as a service enterprise infrastructure for sensor-rich environments|

CA2882968A1|2015-02-23|2016-08-23|Sulfur Heron Cognitive Systems Inc.|Facilitating generation of autonomous control information|

US10335951B2|2015-07-29|2019-07-02|Canon Kabushiki Kaisha|Information processing apparatus, information processing method, robot control apparatus, and robot system|

US9616568B1|2015-08-25|2017-04-11|X Development Llc|Generating a grasp affordance for an object based on a thermal image of the object that is captured following human manipulation of the object|

KR101808840B1|2015-09-04|2017-12-13|한국전자통신연구원|Depth information extracting method based on machine learning and apparatus thereof|

CN106548127A|2015-09-18|2017-03-29|松下电器（美国）知识产权公司|Image-recognizing method|

US9689696B1|2015-09-22|2017-06-27|X Development Llc|Determining handoff checkpoints for low-resolution robot planning|

US9662787B1|2015-09-25|2017-05-30|Google Inc.|Hydraulic pressure variation in a legged robot|

US20170106542A1|2015-10-16|2017-04-20|Amit Wolf|Robot and method of controlling thereof|

US9904874B2|2015-11-05|2018-02-27|Microsoft Technology Licensing, Llc|Hardware-efficient deep convolutional neural networks|

US9959468B2|2015-11-06|2018-05-01|The Boeing Company|Systems and methods for object tracking and classification|

US10471594B2|2015-12-01|2019-11-12|Kindred Systems Inc.|Systems, devices, and methods for the distribution and collection of multimodal data associated with robots|

JP2017102671A|2015-12-01|2017-06-08|キヤノン株式会社|Identification device, adjusting device, information processing method, and program|

US9799198B2|2015-12-18|2017-10-24|General Electric Company|System and method for communicating with an operator of the system|

US10229324B2|2015-12-24|2019-03-12|Intel Corporation|Video summarization using semantic information|

US20170213576A1|2016-01-22|2017-07-27|Artur Nugumanov|Live Comics Capturing Camera|

JP6586532B2|2016-03-03|2019-10-02|グーグルエルエルシー|Deep machine learning method and apparatus for robot gripping|WO2017151926A1|2016-03-03|2017-09-08|Google Inc.|Deep machine learning methods and apparatus for robotic grasping|

JP6586532B2|2016-03-03|2019-10-02|グーグルエルエルシー|Deep machine learning method and apparatus for robot gripping|

US20170277955A1|2016-03-23|2017-09-28|Le HoldingsCo., Ltd.|Video identification method and system|

US10058995B1|2016-07-08|2018-08-28|X Development Llc|Operating multiple testing robots based on robot instructions and/or environmental parameters received in a request|

US10055667B2|2016-08-03|2018-08-21|X Development Llc|Generating a model for an object encountered by a robot|

KR20180055571A|2016-11-17|2018-05-25|삼성전자주식회사|Mobile Robot System, Mobile Robot And Method Of Controlling Mobile Robot System|

DE112017006758T5|2017-02-03|2019-09-26|Ford Global Technologies, Llc|BOOSTER DRONE|

US10546242B2|2017-03-03|2020-01-28|General Electric Company|Image analysis neural network systems|

WO2018165038A1|2017-03-06|2018-09-13|Miso Robotics, Inc.|Augmented reality-enhanced food preparation system and related methods|

US10836525B1|2017-03-07|2020-11-17|Amazon Technologies, Inc.|Robotic gripper for bagging items|

JP6676030B2|2017-11-20|2020-04-08|株式会社安川電機|Grasping system, learning device, gripping method, and model manufacturing method|

US10108903B1|2017-12-08|2018-10-23|Cognitive Systems Corp.|Motion detection based on machine learning of wireless signal properties|

US10792809B2|2017-12-12|2020-10-06|X Development Llc|Robot grip detection using non-contact sensors|

US10682774B2|2017-12-12|2020-06-16|X Development Llc|Sensorized robotic gripping device|

US10981272B1|2017-12-18|2021-04-20|X Development Llc|Robot grasp learning|

US10754318B2|2017-12-21|2020-08-25|X Development Llc|Robot interaction with objects based on semantic information associated with embedding spaces|

CN107972036B|2017-12-25|2021-04-27|厦门大学嘉庚学院|Industrial robot dynamics control system and method based on TensorFlow|

US11097418B2|2018-01-04|2021-08-24|X Development Llc|Grasping of an object by a robot based on grasp strategy determined using machine learning model|

WO2019154510A1|2018-02-09|2019-08-15|Pupil Labs Gmbh|Devices, systems and methods for predicting gaze-related parameters|

CN108393892B|2018-03-05|2020-07-24|厦门大学|Robot feedforward torque compensation method|

JP6911798B2|2018-03-15|2021-07-28|オムロン株式会社|Robot motion control device|

CN108714914B|2018-03-19|2021-09-07|山东超越数控电子股份有限公司|Mechanical arm vision system|

CN110293553B|2018-03-21|2021-03-12|北京猎户星空科技有限公司|Method and device for controlling mechanical arm to operate object and method and device for model training|

JP6810087B2|2018-03-29|2021-01-06|ファナック株式会社|Machine learning device, robot control device and robot vision system using machine learning device, and machine learning method|

US10967507B2|2018-05-02|2021-04-06|X Development Llc|Positioning a robot sensor for object classification|

US10706499B2|2018-06-21|2020-07-07|Canon Kabushiki Kaisha|Image processing using an artificial neural network|

CN112368656B|2018-07-06|2021-08-20|三菱电机株式会社|Machine learning device, numerical control device, machine tool, and machine learning method|

US11192258B2|2018-08-10|2021-12-07|Miso Robotics, Inc.|Robotic kitchen assistant for frying including agitator assembly for shaking utensil|

US10719737B2|2018-08-23|2020-07-21|Denso International America, Inc.|Image classification system for resizing images to maintain aspect ratio information|

US10611026B1|2018-10-16|2020-04-07|University Of South Florida|Systems and methods for learning and generating movement policies for a dynamical system|

US11007642B2|2018-10-23|2021-05-18|X Development Llc|Machine learning methods and apparatus for automated robotic placement of secured object in appropriate location|

KR102228525B1|2018-11-20|2021-03-16|한양대학교 산학협력단|Grasping robot, grasping method and learning method for grasp based on neural network|

WO2020142498A1|2018-12-31|2020-07-09|Abb Schweiz Ag|Robot having visual memory|

WO2020142296A1|2019-01-01|2020-07-09|Giant.Ai, Inc.|Software compensated robotics|

CN110000785B|2019-04-11|2021-12-14|上海交通大学|Agricultural scene calibration-free robot motion vision cooperative servo control method and equipment|

CN110238840B|2019-04-24|2021-01-29|中山大学|Mechanical arm autonomous grabbing method based on vision|

WO2021025955A1|2019-08-02|2021-02-11|Dextrous Robotics, Inc.|Systems and methods for robotic control under contact|

NL2023652B1|2019-08-16|2021-03-24|Laundry Robotics B V|Method and system for detecting and gripping a second corner of a flatwork item|

WO2021034190A1|2019-08-16|2021-02-25|Laundry Robotics B.V.|Method and system feeding a flatwork item to a transport conveyor and/or a flatwork treating device|

NL2023653B1|2019-08-16|2021-03-24|Laundry Robotics B V|Method and system for detecting and gripping a corner of a flatwork item|

NL2023651B1|2019-08-16|2021-02-24|Laundry Robotics B V|Method and system feeding a flatwork item to a transport conveyor and/or a flatwork treating device|

WO2021067680A1|2019-10-02|2021-04-08|Baker Hughes Oilfield Operations Llc|Telemetry harvesting and analysis from extended reality streaming|

KR102288566B1|2019-11-04|2021-08-11|울산과학기술원|Method and system for configuring deep neural network|

US11173610B2|2019-11-13|2021-11-16|Vicarious Fpc, Inc.|Method and system for robot control using visual feedback|

TW202121243A|2019-11-19|2021-06-01|財團法人工業技術研究院|Gripping device and gripping method|

JP6792230B1|2019-12-12|2020-11-25|株式会社エクサウィザーズ|Information processing equipment, methods and programs|

CN111251294A|2020-01-14|2020-06-09|北京航空航天大学|Robot grabbing method based on visual pose perception and deep reinforcement learning|

WO2021177159A1|2020-03-05|2021-09-10|ファナック株式会社|Machine-learning device|

CN111428815A|2020-04-16|2020-07-17|重庆理工大学|Mechanical arm grabbing detection method based on Anchor angle mechanism|

CN112847235A|2020-12-25|2021-05-28|山东大学|Robot step force guiding assembly method and system based on deep reinforcement learning|

法律状态:
2020-07-28| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-09-08| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201662303139P| true| 2016-03-03|2016-03-03|

US15/377,280|US10207402B2|2016-03-03|2016-12-13|Deep machine learning methods and apparatus for robotic grasping|

PCT/US2016/066393|WO2017151206A1|2016-03-03|2016-12-13|Deep machine learning methods and apparatus for robotic grasping|

[返回顶部]