巴西专利BR102017004763A2 INFORMATION PROVIDING DEVICE AND LEGIBLE MEASURE BY NON-TRANSITIONAL COMPUTER STORING INFORMATION PR

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
An information delivery device includes an agent ecu (100) which defines a reward function by using historical data in response from a driver to an operating proposal for a component installed in the vehicle. , and calculates a probability distribution of performance of each of the actions that build a space of actions in each of the states that build a state space, through reinforced learning based on the reward function. agent ecu (100) calculates a degree of dispersion of the probability distribution. agent ecu 100 produces a trial and error operation proposal for selecting a target action from a plurality of candidates and issuing the target action when the probability distribution dispersion degree is equal to or greater than a threshold. , and produces a definitive operation proposal to establish and issue a target action when the probability distribution dispersion degree value is less than the limit.
公开号:BR102017004763A2
申请号:R102017004763-6
申请日:2017-03-09
公开日:2018-03-20
发明作者:Koga Ko
申请人:Toyota Jidosha Kabushiki Kaisha；
IPC主号:

专利说明:

(54) Title: DEVICE OF
INFORMATION SUPPLY AND LEGIBLE MEANS BY NON-TRANSITIONAL COMPUTER STORING INFORMATION SUPPLY PROGRAM (51) Int. CL: B60R 16/037; B60W 40/09; B60W 50/10; G05B 2/13; G06N 5/00; (...) (52) CPC: B60R 16/0373, B60W 40/09, B60W 50/10, G05B 13/0265, G06N 5/00, G06N 5/04 (30) Unionist Priority: 03/11/2016 JP 2016048580 (73) Holder (s): TOYOTA JIDOSHA
KABUSHIKI KAISHA (72) Inventor (s): KO KOGA (74) Attorney (s): DANIEL ADVOGADOS (ALT.DE DANIEL & CIA) (57) Summary: An information delivery device includes an agent ECU (100) that defines a reward function through the use of historical data in a response, from a driver, to an operation proposal for a component installed in the vehicle, and calculates a performance probability distribution for each of the actions that build a space of actions in each of the states that build a space of states, through reinforced learning based on the reward function. The agent ECU (100) calculates a degree of dispersion of the probability distribution. The agent ECU (100) produces a trial and error operation proposal to select a target action from a plurality of candidates and issue the target action when the degree of dispersion of the probability distribution is equal to or greater than a threshold, and produces a proposal for definitive operation to establish and issue a target stock when the value of the degree of dispersion of the probability distribution is less than the limit.
1/52 “INFORMATION SUPPLY DEVICE AND LEGIBLE MEANS BY NON-TRANSITIONAL COMPUTER STORING INFORMATION SUPPLY PROGRAM”
BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION [001] The invention relates to an information delivery device and a non-transitory computer-readable medium storing an information delivery program that accumulates and acquires historical data in a response, from a user (a driver ), the information provided and information provided correspond to the user's (driver's) intention based on a learning outcome.
DESCRIPTION OF RELATED TECHNIQUE [002] As this type of information delivery device, a device (a user interface system) is known, described, for example, in WO 2015/162638. In this device, when performing a sound input function, candidates for a sound operation that will be performed by a user (a driver) using information about the situation of a vehicle at the current time are first estimated, and three of the estimated candidates of the sound operation are displayed as options in descending order of probability by a touch screen. Subsequently, it is determined that one of these options was selected by the driver through manual input, and a target of the sound operation is decided. In addition, an orientation that leads the user to insert a sound is generated according to the decided target of the sound operation, and is emitted. Afterwards, the driver inserts the sound according to this orientation, thus deciding and executing a target vehicle function. Then, an input of the sound operation that corresponds to the user's intention is thus provided according to the situation of the vehicle at the present moment, so that the uncomfortable operation given to the user who inserts the sound is reduced.
Petition 870170015443, of 03/09/2017, p. 9/129
2/52 [003] By the way, in the device described in the aforementioned document, when executing the vehicle function, the user interface is changed from an operation mode based on manual input to an operation mode based on input sound for options displayed on the touch screen. Therefore, the discomfort given to the driver inevitably increases.
[004] Furthermore, in the device described in the aforementioned document, the input of the sound operation is simplified, but the next operation performs nothing but a function similar to that of an existing dialogue system. Consequently, it is desired that the discomfort given to the driver is further reduced.
SUMMARY OF THE INVENTION [005] The invention provides an information delivery device and a non-transitory computer-readable medium storing an information delivery program that can produce a more suitable operation proposal for a component installed in the vehicle in order to correspond to an intention of the driver as the provision of information while preventing the driver from being disturbed, consistently using a simple user interface.
[006] An information delivery device according to a first aspect of the invention is equipped with an electronic agent control unit. The agent ECU has a state space building unit, an action space building unit, a reinforced learning unit, a dispersion-grade computing unit and an information provisioning unit. The state space construction unit is configured to define a vehicle state by associating a plurality of vehicle data types with each other, and to construct a state space as a set of a plurality of states. The stock space building unit
Petition 870170015443, of 03/09/2017, p. 12/109
3/52 is configured to define, as an action, data indicating contents of an operation of a component installed in the vehicle that is performed through a response, by a driver, to an operation proposal for the component installed in the vehicle, and to build a space of actions as a set of a plurality of actions. The enhanced learning unit is configured to accumulate a history of the driver's response to the proposed operation for the component installed in the vehicle, to define a reward function as an index representing a degree of adequacy of the proposed operation for the component installed in the vehicle. vehicle while using the accumulated history, and calculating a performance probability distribution of each of the actions that build the action space in each of the states that build the state space, through reinforced learning based on the reward function. The dispersion degree computation unit is configured to compute a dispersion degree of the probability distribution that is calculated by the reinforced learning unit. The information supply unit is configured to make a definitive operation proposal to establish a target action as a target of the operation proposal and issue that target action when the degree of dispersion of the probability distribution, which is computed by the computing unit of degree of dispersion, is less than a limit, and make a trial and error operation proposal to select the target action as the target of the proposed operation of a plurality of candidates and issue that target action when the degree of dispersion of the distribution of probability, which is computed by the dispersion grade computing unit, is equal to or greater than the limit.
[007] In addition, in a non-transitory computer-readable medium storing an information delivery program in accordance with a second aspect of the invention, the information delivery program is programmed to cause a computer to perform a function of constructing information. space
Petition 870170015443, of 03/09/2017, p. 12/119
4/52 of states, an action space construction function, an enhanced learning function, a dispersion-grade computing function, and an information provision function. The state space construction function is designed to define a vehicle state by associating a plurality of vehicle data types with each other, and to construct a state space as a set of a plurality of states. The action space construction function is designed to define, as an action, data indicating contents of an operation of a component installed in the vehicle that is performed through a response, from a driver, to an operation proposal for the installed component in the vehicle, and build a space of actions as a set of a plurality of actions. The enhanced learning function is designed to accumulate a history of the driver's response to the proposed operation for the component installed in the vehicle, to define a reward function as an index representing a degree of adequacy of the proposed operation for the component installed in the vehicle. vehicle while using the accumulated history, and calculating a performance probability distribution of each of the actions that build the action space in each of the states that build the state space, through reinforced learning based on the reward function. The dispersion degree computation function is designed to compute a dispersion degree of the probability distribution that is calculated using the enhanced learning function. The information provision function is designed to make a definitive operation proposal to establish a target action as a target of the operation proposal and issue that target action when the degree of dispersion of the probability distribution, which is computed through the computation function degree of dispersion, is less than a limit, and to make a trial and error operation proposal to select the target action as the target of the proposed operation of a plurality of candidates and issue that target action when the degree of dispersion gives
Petition 870170015443, of 03/09/2017, p. 12/129
5/52 probability distribution, which is computed through the dispersion degree computation function, is equal to or greater than the limit.
[008] In each of the first and second aspects mentioned above of the invention, the reward function is defined as the index that represents the degree of adequacy of the proposed operation for the component installed in the vehicle, while using the response history, the driver, to the proposed operation for the component installed in the vehicle. Then, a driver decision-making model regarding the proposed operation for the component installed in the vehicle in each of the states is structured through reinforced learning based on this reward function. In addition, the probability distribution of the contents of the operation of the component installed in the vehicle that is performed through the driver's response to the operation proposal for the component installed in the vehicle in each of the states is calculated, while using this structured model . It should be noted here that the degree of dispersion of the probability distribution of the contents of the operation of the component installed in the vehicle generally differs depending on the target of the proposed operation for the component installed in the vehicle. For example, in the case where the target of the proposed operation for the component installed in the vehicle is sound reproduction, that target is generally susceptible to the state of mind of the driver at that time, and the like, as well as the state of the vehicle, and there is a variety of options. Therefore, the degree of dispersion of the probability distribution of the contents of the operation of the component installed in the vehicle is probably great. On the other hand, in the case where the target of the proposed operation for the component installed in the vehicle is the configuration of a destination, it is generally easier to limit the number of options from the state of the vehicle on each occasion than in the case of sound reproduction. Therefore, the degree of dispersion of the probability distribution of the contents of the operation of the component installed in the vehicle is probably small. In this regard, according to
Petition 870170015443, of 03/09/2017, p. 12/13
6/52 the configuration mentioned above, when the degree of dispersion of the probability distribution is less than the limit, the definitive operation proposal is made to establish the target action as the target of the operation proposal and issue that target action. Thus, the operation proposal for the component installed in the vehicle that corresponds to the driver's intention is made without disturbing the driver to select the contents of the operation of the component installed in the vehicle. On the other hand, in the configuration mentioned above, when the degree of dispersion of the probability distribution is equal to or greater than the limit, the trial and error operation proposal is made to select the target action as the target of the operation proposal to from the plurality of candidates and issue this target action. In this way, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention is more adequately produced. That is, in the configuration mentioned above, only a single content of the operation of the component installed in the vehicle is emitted at once as the target of the proposed operation, regardless of whether the degree of dispersion of the probability distribution is large or small. Therefore, the driver only has to express his will, that is, if he agrees with the content of the operation of the component installed in the vehicle that is proposed on each occasion. Therefore, responses to different types of operating proposals for the component installed in the vehicle with different degrees of probability distribution dispersion, such as destination configuration and sound reproduction, can be consistently made while using the same user interface. simple. Thus, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention can be made at the same time that it prevents the driver from being disturbed.
[009] An information delivery device according to a third aspect of the invention is equipped with an electronic agent control unit. The agent ECU has a state space building unit,
Petition 870170015443, of 03/09/2017, p. 12/149
7/52 an action space construction unit, a reinforced learning unit, a dispersion-grade computing unit and an information supply unit. The state space construction unit is configured to define a vehicle state by associating a plurality of vehicle data types with each other, and to construct a state space as a set of a plurality of states. The action space construction unit is configured to define, as an action, data indicating contents of an operation of a component installed in the vehicle that is performed through a response, from a driver, to an operation proposal for the installed component in the vehicle, and build a space of actions as a set of a plurality of actions. The enhanced learning unit is configured to accumulate a history of the driver's response to the proposed operation for the component installed in the vehicle, to define a reward function as an index representing a degree of adequacy of the proposed operation for the component installed in the vehicle. vehicle while using the accumulated history, and calculating a performance probability distribution of each of the actions that build the action space in each of the states that build the state space, through reinforced learning based on the reward function. The dispersion degree computing unit is configured to compute a degree of dispersion of the state space totaling the degree of dispersion of the probability distribution that is calculated by the reinforced learning unit regarding the plurality of states that build the state space. The information supply unit is configured to make a definitive operation proposal to establish a target action as a target of the operation proposal and to issue that target action when the state space dispersion degree, which is computed by the computing unit of degree of dispersion, is less than a limit, and making a trial and error operation proposal to select the target action as the target of the proposed
Petition 870170015443, of 03/09/2017, p. 12/15
8/52 operation of a plurality of candidates and issue that target action when the degree of dispersion of the state space, which is computed by the unit of computation of degree of dispersion, is equal to or greater than the limit.
[010] In a non-transitory, computer-readable medium storing an information delivery program according to a fourth aspect of the invention, the information delivery program is programmed to cause a computer to perform a state space construction function , an action space construction function, an enhanced learning function, a dispersion-grade computing function, and an information provision function. The state space construction function is designed to define a vehicle state by associating a plurality of vehicle data types with each other, and to construct a state space as a set of a plurality of states. The action space construction function is designed to define, as an action, data indicating contents of an operation of a component installed in the vehicle that is performed through a response, from a driver, to an operation proposal for the installed component in the vehicle, and build a space of actions as a set of a plurality of actions. The enhanced learning function is designed to accumulate a history of the driver's response to the proposed operation for the component installed in the vehicle, to define a reward function as an index representing a degree of adequacy of the proposed operation for the component installed in the vehicle. vehicle while using the accumulated history, and calculating a performance probability distribution of each of the actions that build the action space in each of the states that build the state space, through reinforced learning based on the reward function. The dispersion degree computation function is designed to compute a degree of dispersion of the state space totaling the degree of dispersion of the probability distribution that is calculated through the function of
Petition 870170015443, of 03/09/2017, p. 12/169
9/52 reinforced learning regarding the plurality of states that build the state space. The information provision function is designed to make a definitive operation proposal to establish a target action as a target of the operation proposal and to issue that target action when the state space dispersion degree, which is computed through the computation function degree of dispersion, is less than a limit, and to make a trial and error operation proposal to select the target action as the target of the proposed operation of a plurality of candidates and issue that target action when the degree of dispersion of the state space, which is computed through the dispersion degree computation function, is equal to or greater than the limit.
[011] According to each of the above mentioned third and fourth aspects of the invention, the reward function is defined as the index that represents the degree of suitability of the proposed operation for the component installed in the vehicle, while using the response history , from the driver, to the proposed operation for the component installed in the vehicle. Then, a driver decision-making model regarding the proposed operation for the component installed in the vehicle in each of the states is structured through reinforced learning based on this reward function. In addition, the probability distribution of the contents of the operation of the component installed in the vehicle, which is performed through the driver's response to the operation proposal for the component installed in the vehicle in each of the states, is calculated while using this model structured. It should be noted here that the degree of dispersion of the probability distribution of the contents of the operation of the component installed in the vehicle generally differs depending on the target of the proposed operation for the component installed in the vehicle. For example, in the case where the target of the proposed operation for the component installed in the vehicle is sound reproduction, this target is generally susceptible to the driver's state of mind at that moment, and similarityPetition 870170015443, of 03/09/2017, pg . 12/179
10/52 tes, as well as the condition of the vehicle, and there are a variety of options. Therefore, the degree of dispersion of the probability distribution of the contents of the operation of the component installed in the vehicle is probably great. On the other hand, in the case where the target of the proposed operation for the component installed in the vehicle is the configuration of a destination, it is generally easier to limit the number of options from the state of the vehicle on each occasion than in the case of sound reproduction. Therefore, the degree of dispersion of the probability distribution of the contents of the operation of the component installed in the vehicle is probably small. In this regard, according to the configuration mentioned above, when the degree of dispersion of the state space that was obtained from the total value of degrees of dispersion of the probability distribution is less than the limit, the proposal of definitive operation is made for establish the target action as the target of the proposed operation and issue that target action. Thus, the operation proposal for the component installed in the vehicle that corresponds to the driver's intention is made without disturbing the driver to select the contents of the operation of the component installed in the vehicle. On the other hand, in the configuration mentioned above, when the degree of dispersion of the state space that was obtained from the total value of degrees of dispersion of the probability distribution is equal to or greater than the limit, the proposed trial and error operation is made to select the target action as the target of the proposed operation from the plurality of candidates and issue that target action. In this way, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention is more adequately produced. That is, in the configuration mentioned above, only a single content of the operation of the component installed in the vehicle is emitted at once as the target of the proposed operation regardless of whether the degree of dispersion of the space state is large or small. Therefore, the driver only has to express his will, that is, if he agrees with the content of the operation of the component installed in the vehicle that is
Petition 870170015443, of 03/09/2017, p. 12/189
11/52 proposed on each occasion. Therefore, responses to different types of operating proposals for the component installed in the vehicle with varying degrees of state space dispersion, such as setting a destination and reproducing sound, can be consistently made while using the same user interface. simple. Thus, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention can be made at the same time that it prevents the driver from being disturbed.
[012] In the second aspect mentioned above of the invention, the reinforced learning unit can adopt, as a policy, the mapping of each of the states that build the state space for each of the actions that build the action space, define, as a state value function, an estimated value of a cumulative reward that is obtained when policy is followed in each of the states, estimate, as an ideal action value function, an estimated value of a cumulative reward that is always obtained when an ideal policy is followed after a predetermined action is selected from the action space in each of the states that build the state space, assuming that the ideal policy is the policy that maximizes the state value function in all states that construct the state space, and calculate the probability distribution based on the estimated ideal stock value function. The information supply unit can produce the definitive operation proposal aiming at an action that maximizes the ideal action value function in a present state, when the degree of dispersion of the state space, which is computed by the computing unit of degree of dispersion, is less than the limit.
[013] In the configuration mentioned above, when the degree of dispersion of the state space is less than the limit, the definitive operation proposal is made aiming at the action that maximizes the ideal stock value function in the present state, or that is, the action that has the most value and is supposed to be the most likely
Petition 870170015443, of 03/09/2017, p. 12/199
12/52 to be taken by the driver in this state. Thus, the operation proposal for the component installed in the vehicle that corresponds to the driver's intention can be performed with greater reliability.
[014] In the aforementioned information provisioning device, the information provisioning unit can be configured to make the proposed trial and error operation with such a tendency in order to improve a frequency of selecting an action as a target according to probability density of the probability distribution of the action in the present state increases when the degree of dispersion of the state space, which is computed by the unit of computation of degree of dispersion, is equal to or greater than the limit.
[015] In the configuration mentioned above, when the degree of dispersion of the state space is equal to or greater than the limit, the trial and error operation proposal is made with such a tendency to select, as the target of the operation proposal for the component installed in the vehicle, an action with a high probability density of the probability distribution in the present state, that is, an action that is likely to be taken by the driver in the present state. Thus, even under circumstances where it is difficult to specify the driver's action beforehand regarding the proposed operation for the component installed in the vehicle as a target, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention can be performed with greater reliability.
[016] In the aforementioned information provisioning device, the dispersion degree computing unit can be configured to define, as an entropy, the degree of dispersion of the performance probability distribution of each of the actions that build the actions in each of the states that build the state space, and define the degree of dispersion of the state space as an average entropy. The information supply unit
Petition 870170015443, of 03/09/2017, p. 12/20
13/52 can be configured to select the definitive operation proposal or the trial and error operation proposal with such a trend in order to improve a production frequency of the trial and error operation proposal as an ε-value increases, while uses a greedy-ε method in which an average entropy value is defined as the ε-value.
[017] In the configuration mentioned above, the selection frequency of the trial and error operation proposal is improved as the ε-value as the average entropy value that defines the degree of dispersion of the state space increases, that is, as the degree of dispersion of the state space increases. Thus, even under circumstances where it is difficult to specify the driver's action regarding the proposed operation for the component installed in the vehicle as a target, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention can be performed with greater reliability.
[018] In the aforementioned information delivery device, the enhanced learning unit can be configured to define, as the reward function, a frequency of execution of the operation of the component installed in the vehicle through the driver's response to the operation proposal for the component installed in the vehicle, and update the reward function according to a change in the operating history of the component installed in the vehicle when the component installed in the vehicle is operated according to the proposed operation for the component installed in the vehicle .
[019] In the configuration mentioned above, the reward function is defined by applying the frequency of the action that is performed through the driver's response to the operation proposal for the component installed in the vehicle, such as the index of the degree of adequacy of the operation proposal of the component installed in the vehicle as to the driver's intention. The reward function is updated every time the response history is changed. Thus, the distribution of probability of deposition 870170015443, of 03/09/2017, p. 12/21
14/52 performance of each of the actions that build the action space in each of the states that build the state space can be calculated in a way that corresponds to the driver's intention. Also, the accuracy of the probability distribution is improved to adapt to the actual response made by the driver as an individual, as the frequency of the driver's response increases.
[020] In the aforementioned information provisioning device, the state space construction unit can be configured to construct the state space as a set of states as a group of data that associate an operating situation of the component installed in the vehicle , characteristics of a passenger or passengers of the vehicle and a situation of operation of the vehicle with each other.
[021] In the configuration mentioned above, each of the states that build the state space is defined considering elements that influence the operation proposal for the component installed in the vehicle that is made to the driver, such as the operation situation of the component installed in the vehicle, the characteristics of the passenger (s) of the vehicle, the operating status of the vehicle and the like, from a variety of points of view. In this way, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention can be made in order to adapt more precisely to the real circumstances. Eventually, in the configuration mentioned above, it is also estimated that the number of states that build the state space is enormous, as a result of considering several elements as described above. However, through the use of the enhanced learning method in which historical data is accumulated and accuracy is improved, the proposed operation for the component installed in the vehicle that corresponds to the driver's intention can be performed even when a huge number of learning data has not been previously prepared as in the case where, for example, learning assisted by
Petition 870170015443, of 03/09/2017, p. 12/22
15/52 teacher is used.
BRIEF DESCRIPTION OF THE DRAWINGS [022] Characteristics, advantages and technical and industrial significance of exemplary embodiments of the invention will be described below with reference to the accompanying drawings, in which equal numbers represent equal elements and in which:
[023] Figure 1 is a block diagram showing the general configuration of an information delivery device according to the first embodiment of the invention;
[024] Figure 2 is a view showing an example of vehicle data attributes that define a state space;
[025] Figure 3 is a view showing an example of defining contents of a state space table;
[026] Figure 4 is a view showing another example of vehicle data attributes that define a state of spaces;
[027] Figure 5 is a view showing another example of defining contents of a state space table;
[028] Figure 6 is a view showing an example of defining the contents of an action space table;
[029] Figure 7 is a view showing another example of defining the contents of an action space table;
[030] Figure 8 is a view showing an example of a transition probability matrix in taking each of the actions that build an action space in each of the states that build the state space;
[031] Figure 9 is a graph showing an example of a cumulative distribution function that is used to produce an operational trial and error proposal;
[032] Figure 10A is a view showing an example of data attributes
Petition 870170015443, of 03/09/2017, p. 12/23
16/52 vehicle that define a current state;
[033] Figure 10B is a view to illustrate a process for selecting an action that is used for a definitive operational proposal in the state shown in Figure 10A;
[034] Figure 11A is a view showing another example of vehicle data attributes that define a present state;
[035] Figure 11B is a view to illustrate a process for selecting an action that is used for an operational trial and error proposal in the state shown in Figure 11A;
[036] Figure 12 is a view showing an example of a steering wheel command;
[037] Figure 13 is a flowchart showing the content processing of an operational proposal process for components installed in the vehicle as an example of an information supply process;
[038] Figure 14 is a view showing an example of the contents of a conversation that is maintained between an agent ECU and a driver in order to include a final operational proposal;
[039] Figure 15 is a view showing an example of the contents of a conversation that is maintained between the agent ECU and the driver in order to include an operational trial and error proposal;
[040] Figure 16 is a view to illustrate a process for selecting a definitive operational proposal and an operational trial and error proposal on an information delivery device in accordance with the second embodiment of the invention;
[041] Figure 17 is a view showing another example of a steering wheel command; and [042] Figure 18 is a view showing yet another example of a commander 870170015443, of 3/9/2017, p. 12/24
17/52 of the steering wheel.
DETAILED DESCRIPTION OF MODALITIES [043] (First Mode) An information delivery device according to the first embodiment of the invention will be described below. The information delivery device in accordance with the present embodiment of the invention consists of an agent ECU (an electronic control unit) that is mounted on a vehicle and that produces an operation proposal for components installed in the vehicle such as the supply of information to a driver. It should be noted here that the functions of the agent ECU are broadly classified into those of a learning system, those of an information acquisition system, and those of a user interface system. In addition, the agent ECU performs enhanced learning as a learning mode in the learning system while classifying an operating history of the components installed in the vehicle according to a state of the vehicle on each occasion, based on various parts of information acquired through the information acquisition system, and produces an operation proposal for the components installed in the vehicle through the user interface system based on a learning result obtained through enhanced learning. It should be noted here that enhanced learning is a learning method in which the agent ECU is adapted to an environment through trial and error giving some reward to the agent ECU as the environment changes based on a given action, when the ECU of agent selects this action based on the environment. Eventually, in the present embodiment of the invention, the agent ECU defines a state by associating various vehicle data, for example, an operating situation of the components installed in the vehicle, characteristics of a passenger or passenger (s) of the vehicle, a situation vehicle operation and the like with each other, and builds a space of states as a competition 870170015443, from 03/09/2017, p. 12/25
18/52 with a plurality of states. In addition, the agent ECU defines, as an action, a type of an operation of the components installed in the vehicle that can be performed by the agent ECU instead of the driver as the driver responds to an operation proposal, and builds a action space as a set of a plurality of actions. In addition, a history of the operation of the components installed in the vehicle that was performed in response to the proposed operation for the components installed in the vehicle in each of the states that build the state space is equivalent to a reward in enhanced learning. In addition, the agent ECU calculates a performance probability distribution for each of the actions that build the action space in each of the states that build the state space, performing the aforementioned reinforced learning. In addition, the agent ECU provides for an action that is likely to be taken by the driver from a state of the vehicle on each occasion, based on the probability distribution thus calculated, and produces an operation proposal for the components installed in the vehicle taking taking into account a forecast state.
[044] Firstly, the configuration of the device according to the present embodiment of the invention will be described with reference to the drawings. As shown in Figure 1, an agent ECU 100 has a control unit 110 that controls an operating proposal for components installed in the vehicle, and a storage unit 120 that stores an information delivery program that is run by the control unit 110 in the production of the operation proposal for the components installed in the vehicle and various data that are read and written by the control unit 110 in the execution of the information supply program. It should be noted here that the various data that are stored in the storage unit 120 include state space tables T1 and T1a which define a state space, action space tables T2 and T2a
Petition 870170015443, of 03/09/2017, p. 12/26
19/52 that define a space of actions, and a history of RA operation of the components installed in the vehicle. Each of the state space tables functions as a state space construction unit, and each of the action space tables works as an action space construction unit. Eventually, in the present embodiment of the invention, a plurality of types of services, for example, sound reproduction, the configuration of a destination, the configuration of an air conditioner, the configuration of a seat position, the configuration of mirrors, the configuration of cleaners and the like are available as targets of the proposed operation. In addition, the individual state space tables T1 and T1a and the individual action space tables T2 and T2a are stored in the storage unit 120 of agent ECU 100 for each of these types of services.
[045] Figure 2 shows an example of vehicle data attributes that are used to define a state in the configuration of a destination as an example of an operation proposal. It should be noted here that the attributes of the vehicle data are previously registered as elements that contribute to the shape of the destination configuration, and include vehicle data in a DA operation situation of the components installed in the vehicle, DB characteristics of a passenger or passengers of the vehicle and a DC operating condition of the vehicle in the example shown in the drawing. Eventually, a destination DA1, a time DA2, a day of the week DA3, and a current location DA4 are mentioned as an example of the vehicle data in the DA operating situation of the components installed in the vehicle. In addition, the presence or absence of a DB1 spouse, the presence or absence of a DB2 child or children, the number of DB3 travel companions, the presence or absence of a DB4 hobby, and a DB5 purpose are mentioned as an example of vehicle data in the DB characteristics of the passenger (s) of the vehicle. In addition, a traffic situation (degree of competition 870170015443, dated 03/09/2017, p. 27/129
20/52 handling) DC1 and DC2 weather conditions are mentioned as an example of vehicle data in the vehicle's DC operating situation.
[046] Then, as shown in Figure 3, the state space table T1 defines a state by combining the attributes of the vehicle data shown in Figure 2 with each other using the round-robin method, and constructs a state space as a set of a plurality of states. It should be noted here that the number of states m included in the state space table T1 (for example, about four million) increases with the number of types of elements that make up the attributes of the vehicle data (11 types including “destination” and “weather conditions” as mentioned sequentially from the left, in the example shown in Figure 2) or the number of parameters for each of the elements (for example, 8 according to the number of “destination” parameters in the example shown in Figure 2) increases .
[047] On the other hand, Figure 4 shows an example of the attributes of the vehicle data that are used to define a state in the reproduction of a sound as an example of an operation proposal. It should be noted here that the attributes of the vehicle data are previously recorded as elements that contribute to the form of sound reproduction, and include vehicle data in an operating situation DAa of the components installed in the vehicle, DBa characteristics of a passenger or vehicle passengers, and a DCa operating status of the vehicle. Eventually, a DA1a sound source, the DA2a repeat setting, a DA3a sound volume, a DA4a schedule, a DA5a weekday, and a current DA6a location are mentioned as an example of the vehicle data in the DAa operating situation of components installed in the vehicle. In addition, the presence or absence of a DB1 a spouse, the presence or absence of a DB2a child or children, the number of DB3a travel companions, and a degree of drowsiness DB4a of the driver are mentioned as an example of vehicle data
Petition 870170015443, of 03/09/2017, p. 12/28
21/52 on the DBa characteristics of the passenger (s) of the vehicle. In addition, a DC1 environment including a degree of urbanization or suburbanization around the vehicle and a road environment is mentioned as an example of the vehicle data in the vehicle's DCa operating situation.
[048] Then, as shown in Figure 5, the state space table T1a defines a state by combining the attributes of the vehicle data shown in Figure 4 with each other using the round-robin method, and constructs a state space as a set of a plurality of states. Also in this case, the number n of states included in the state space table T1a (for example, about 1.5 billion) increases according to the number of types of elements that constitute the attributes of the vehicle data or the number of parameters of each of the elements increases.
[049] Figure 6 shows an example of the T2 action space table that defines an action at the time when agent ECU 100 defines a destination instead of the driver as an example of an operation proposal, and that constructs a space of shares as a set of a plurality of shares. In the example shown in the drawing, a list of destination place names to be defined is mentioned as types of actions included in the action space. It should be noted here that places as destinations to be defined are previously registered, for example, place names especially in general defined by the driver himself in the past. In the example shown in the drawing, a total of 8 place names, that is, "place 1" to "place 6", as well as "home" and "parent's home" are recorded.
[050] In addition, Figure 7 shows an example of the action space table T2a that defines an action at the moment when the agent ECU 100 plays a sound instead of the conductor as an example of an operation proposal, and that builds a space of actions as a set of a plurality of actions.
Petition 870170015443, of 03/09/2017, p. 12/29
22/52
In the example shown in the drawing, a list of sound sources to be reproduced is mentioned as types of actions included in the action space. It should be noted here that the sources of sounds to be reproduced are previously registered, for example, sources of sounds especially generally reproduced by the driver in the past. In the example shown in the drawing, a total of 100 sound sources including the names of radio stations and song titles saved in the storage medium, such as a portable terminal, compact discs (CDs) and the like are registered.
[051] In addition, as shown in Figure 1, agent ECU 100 is connected to an additional ECU group 130, sensor group 131 and command group 132 through an NW vehicle network that is configured as, for example, an area control network (a CAN) or the like.
[052] Additional ECU group 130 consists of vehicle-mounted ECUs that control the operation of the various components installed in the vehicle. Additional ECU group 130 includes an ECU installed in the vehicle of a vehicle drive system that controls an engine, a brake, a steering wheel and the like, an ECU installed in the vehicle of a body system that controls an air conditioner, a meter and the like, and an ECU installed in the vehicle for an information system that controls a car navigation system, an audio system and the like.
[053] Sensor group 131 is a sensor group for acquiring various vehicle data. Sensor group 131 includes a global positioning system (GPS) sensor, a laser radar, an infrared sensor, an ultrasonic sensor, a rain sensor, an external air temperature sensor, an internal vehicle temperature sensor , a seat sensor, a seat belt fastening status sensor, an internal vehicle camera, a smart key sensor, a tamper monitoring sensor, a DePetition sensor 870170015443, from 09 / 03/2017, p. 12/30
23/52 protection of fine particles such as pollen and the like, an acceleration sensor, an electric field strength sensor, a driver monitor, a vehicle speed sensor, a steering angle sensor, a rate sensor yaw and a biological body sensor.
[054] Command group 132 is a command group for changing the operation of the various components installed in the vehicle. Command group 132 includes an arrow command, a wiper operation command, a headlight operation command, a steering wheel command, an audio / navigation operation command, a window operation command, a locking command / door opening / trunk, an air conditioner operating command, a ventilation command / seat heater, a pre-configuration memory command / seat position adjustment, a tampering system command, a command operation of the mirrors, an adaptive cruise control command (ACC) and an engine control.
[055] Then, when multiple vehicle data is entered to control unit 110 of agent ECU 100 of that additional ECU group 130, that sensor group 131 and that control group 132 via the NW vehicle network, the vehicle unit control 110 of agent ECU 100 looks for a relevant vehicle state with reference to state space tables T1 and T1a that are stored in storage unit 120. In addition, control unit 110 of agent ECU 100 adds a value cumulatively of counting an operation history corresponding to the relevant state as an RA operation history of the components installed in the vehicle that are stored in the storage unit 120, whenever a predetermined action is selected from the actions included in the action space through the response of the driver to an operation proposal for the components installed in the vehicle and the operation of the components installed in the vehicle is performed. In this regard, the control unit 110 of the
Petition 870170015443, of 03/09/2017, p. 12/31
24/52
Agent ECU 100 accumulates historical data in the driver's response to the proposed operation for the components installed in the vehicle in each of the states that build the state space.
[056] In addition, control unit 110 of agent ECU 100 functions as an enhanced learning unit 111 that performs Q-learning as a type of enhanced learning through the following procedure (step 1) to (step 7), while while setting, as a reward function, the value of counting the operating history of the components installed in the vehicle when accepting an operation proposal, for each of the learned states as described above.
[057] In (step 1), when a policy π is defined as the mapping of each of the states that build the state space for each of the actions that build the action space, the arbitrary policy π is initially configured. In (step 2), a present state ts is observed (t represents a time step). In (step 3), an at action is performed according to an arbitrary action selection method (t represents a time step). In (step 4), a reward rt is received (t represents a time step). In (step 5), a state s (t + 1) after a state transition is observed (assuming that (called Markov's property) a transition to state s (t + 1) depends only on the state st and the action on that moment and is not susceptible to a previous state or a previous action). In (step 6), an action value function Q (st, at) is updated. In (step 7), time step t advances to (t + 1) to return to (step 1).
[058] Eventually, it is possible to use a greedy method in which an action that maximizes the value of the action value function Q (st, at), which will be described later, is invariably selected, or conversely, a random method in which all actions are selected with the same probability, as an action selection method in the procedure (step 3). In addition, it is also possible
Petition 870170015443, of 03/09/2017, p. 12/29
25/52 use a greedy-ε method in which a stock is selected according to the random method with a probability ε and a stock is selected according to the greedy method with a probability (1 -ε), a method of selecting Boltzmann in which a stock whose share value function Q (st, at) is high is selected with a high probability and a stock whose share value function Q (st, at) is low is selected with a low probability, or similar.
[059] In addition, the action value function Q (st, at) is updated in the procedure of (step 6), based on an expression (1) shown below.
Q (st, at) = (1-a) Q (st, at) + a (rt + ymaxat + ieAQ (st + 1, at + 1)) ... (1) [060] Eventually, in the expression ( 1), a learning rate α is defined within a numerical range of 0 <α <1. This is for the purpose of producing the value of the action value function Q (st, at) likely to converge by gradually reducing the amount increase in the share value function Q (st, at) that is updated over time. Furthermore, for the same reason, in expression (1), Q (st, at) represents the aforementioned share value function, and represents an estimated value of a discounted cumulative reward Rt that is obtained in the case where the policy π it is followed after taking action in the state st assuming that the reinforced learning unit 111 adopts the given policy π regardless of the time interval. It should be noted here that the discounted cumulative reward Rt is the sum of rewards that are obtained as a state transition is repeated. The discounted cumulative reward Rt is obtained from an expression (2) shown below.
Rt = ç ₊₁ + γ „ ₂ + ^γ r _{t +} 3 + ... = (2) k = 0 [061] Eventually, in the expression (2) (as well as in the expression (1)), a discount rate γ is defined within a numerical range of 0 <γ <1. This is for the purpose of producing the discounted cumulative reward value Rt likely to converge by gradually reducing the reward value that is obtained with the
Petition 870170015443, of 03/09/2017, p. 12/33
26/52 time.
[062] Then, after that, the reinforced learning unit 111 calculates an ideal action value function Q * (st, at) that maximizes (optimizes) the action value function Q (st, at), repeatedly performing the procedure from (step 1) to (step 7) mentioned above. It should be noted here that the ideal stock value function Q * (st, at) represents an estimated value of the discounted cumulative reward Rt that is obtained in the event that an ideal policy π * is followed after stock selection in state st when a state value function V (st) is defined as a function that represents an estimated value of the discounted cumulative reward Rt that is obtained in the case where policy π is followed in state st and the ideal policy π * is defined as the policy π that meets V (st)> V '(st) in all states st.
[063] Then, the reinforced learning unit 111 assigns the ideal action value function Q * (st, at) obtained as described above to an expression (3) shown below. Thus, a transition probability matrix that maximizes the discounted cumulative reward Rt between transition probability matrices of each of the states that build the state space for each of the actions that build the stock space, that is, a matrix of transition probability P (st, at) that corresponds to the driver's intention while considering the count value of the RA operation history for each of the states is calculated.
P (st, at) = (e ^Q * ^{(st, at) / T} ) / Σ at'eA _and Q {st, at ') / T (3) [064] Figure 8 shows an example of the probability matrix transition period P (st, at) which is calculated as described above. Each row of the transition probability matrix P (st, at) corresponds to each of the states that build the state space, and each column of the transition probability matrix P (st, at) corresponds to each of the actions that build the action space. Furthermore, in the example shown in the drawing, for example, the probability of taking a
Petition 870170015443, of 03/09/2017, p. 12/29
27/52 action al in a sl state is "0.01". For the same reason, the probability of taking action a2 in state s1 is "0.10". For the same reason, the probability of taking an action a100 in state s1 is "0.03".
[065] Then, control unit 110 of agent ECU 100 calculates an entropy of information H (s) while using expressions shown in Figure 8 when these probabilities are represented by p. Eventually, the entropy of information H (s) is a parameter that serves as an index of the degree of dispersion of a probability distribution. In this respect, the control unit 110 of the agent ECU 100 also functions as a dispersion degree computing unit 112 that computes a dispersion degree of a probability distribution that is calculated by the reinforced learning unit 111. So it means that as the value of the entropy of information H (s) increases, the degree of dispersion of the probability distribution increases, that is, the degree of homogeneity of the distribution of the probabilities of taking the respective actions that build the action space in the state st increases . Therefore, in the case where the value of the entropy of information H (s) is large, it is difficult to predict an action that can be taken by the driver from the actions that build the action space.
[066] In addition, the dispersion degree computation unit 112 calculates an average entropy Η (Ω) totaling the information entropies H (s) calculated for the respective states that construct the state space, as indicated by an expression ( 4) shown below.
Η (Ω) = (ΣIh (S,)) / N (i = 1,2, ... N) ... (4) [067] Eventually, the average entropy Η (Ω) is a parameter indicating the degree dispersion of the state space. So, it means that, as the mean entropy value Η (Ω) increases, the degree of dispersion of the state space increases, that is, the degree of homogeneity of probabilities of taking the respective actions that build the action space in each one. of states when the space of
Petition 870170015443, of 03/09/2017, p. 12/35
28/52 states are seen as a whole increases. Therefore, the mean entropy value Η (Ω) is an index indicating whether it is possible or not to predict an action that can be taken by the driver from the actions that build the action space in relation to services as targets of the proposed operation.
[068] In this way, the control unit 110 of the agent ECU 100 also functions as a proposed information generation unit 113 that generates information about the operation proposal for the components installed in the vehicle while using the greedy-ε method in which the average entropy Η (Ω) obtained by the reinforced learning unit 111 is used as an ε-value, according to an algorithm shown below. The proposed information generation unit also functions as an information supply unit.
ε = Η (Ω) δ = rand (1) if δ> ε n (s) and arg max Q (st, at) ^toA ... (5) still δ <ε τ = rand (2) n (S) and F (S) = ΣÍAeA ^ Cst, at) = τ ... ₍₆₎ [069] Eventually, in the aforementioned algorithm, the proposed information generation unit 113 defines a random number δ (a limit) that assumes a numeric range 0 to 1, and applies the expression (5) when a condition “δ> ε” is satisfied. That is, the proposed information generation unit 113 improves the frequency of application of the expression (5) as the mean entropy value Η (Ω) obtained by the reinforced learning unit 111 decreases. Then, the proposed information generation unit 113 issues, as a target of an operation proposal, the action that maximizes the ideal action value function Q * (st, at) obtained by the reinforced learning unit 111 as described above , that is, the
Petition 870170015443, of 03/09/2017, p. 12/36
29/52 action of greater value in state s, by applying the expression (5), and produces a proposal for a definitive operation.
[070] On the other hand, in the aforementioned algorithm, the proposed information generation unit 113 applies the expression (6) when a condition “δ <ε” is satisfied. That is, the proposed information generation unit 113 improves the frequency of application of the expression (6) as the average entropy value H (n) obtained by the reinforced learning unit 111 increases. In the application of expression (6), the proposed information generation unit 113 first obtains a cumulative distribution function F (s) by adding probabilities to take the respective actions that build the action space in a given state s. Then, when a random number τ that takes a numerical range from 0 to 1 is defined as a variable other than the aforementioned random number δ, the proposed information generation unit 113 produces a trial and error operation proposal to issue, as a target of the proposed operation, an action that satisfies a condition “F (s) = τ”.
[071] As is also evident from the cumulative distribution function F (s) shown in Figure 9 as an example, the amount of increase in the cumulative distribution function F (s) also fluctuates according to the probability of taking each one. of the actions that build the action space. In concrete terms, the amount of the cumulative distribution function F (s) increases sharply in a section along with the abscissa axis that corresponds to actions with relatively high probability, while the amount of the cumulative distribution function F (s) also increases slowly in a section along with the abscissa axis that corresponds to actions with a relatively high probability. Therefore, when the random number τ is changed within the numerical range from 0 to 1, actions with a relatively high probability are more likely to satisfy the “F (s) = τ” condition, and such actions with a relatively high probability are less likely of satisfactionPetition 870170015443, of 03/09/2017, p. 37/129
30/52 have the condition “F (s) = τ”. Therefore, as described above, when each of the shares that satisfies the condition “F (s) = τ” is issued as a target of the proposed operation, that action is issued with such a tendency in order to improve the selection frequency of that transaction. action as the probability of it increases. Eventually, in the example shown in the drawing, the corresponding action at the moment when the condition F (s) = τ is satisfied is an action a3 '. Therefore, action a3 'is selected as a target action of the proposed operation from the plurality of actions that build the share space, and is issued.
[072] Figures 10A and 10B show concrete examples to illustrate the selection of a definitive operation proposal or a trial and error operation proposal using the greedy-ε method in configuring a destination as the operation proposal.
[073] In this example, as shown in Figure 10A, agent ECU 100 first extracts that one of the respective states that build the state space in the T1 state space table that is relevant to the present state (extracts that state as a state Si in the drawing), based on various vehicle data that are acquired through the NW vehicle network. So, in this example, there is a situation where the average entropy H (n) that is obtained from the transition probability matrix P (st, at) is relatively high, and the frequency of production of a definitive operation proposal to which the above mentioned expression (5) is applied is high. In this case, as shown in Figure 10B, the agent ECU 100 issues, as a target of the operation proposal, the most valuable action in the present state (“house” in the example shown in the drawing) among the respective actions that build the space of actions.
[074] In addition, Figures 11A and 11B show a concrete example to illustrate the selection of a definitive operation proposal or a trial and error operation proposal using the greedy-ε method, in the reproduction of
Petition 870170015443, of 03/09/2017, p. 12/38
31/52 a sound like the proposed operation.
[075] Also in this example, as shown in Figure 11A, agent ECU 100 first extracts one of the respective states that construct the state space in the state space table T1a that is relevant to the present state (extracts this state as a status Sj in the drawing), based on various vehicle data that are acquired through the NW vehicle network. So, in this example, there is a situation where the average entropy H (n) that is obtained from the transition probability matrix P (st, at) is relatively low, and the frequency of production of a trial and error operation proposal to which the aforementioned expression (6) is applied is high. In this case, as shown in Figure 11B, the agent ECU 100 randomly emits each of the actions that build the action space as a target of the operation proposal, with such a tendency to improve the frequency of action selection according to the density probability of a probability of transition of the action from the present state increases (“FMD” in the example shown in the drawing).
[076] Then, the agent ECU 100 produces an operation proposal for the components installed in the vehicle through a sound or an image, transmitting information about the action thus emitted as a target of the operation proposal to a sound emission unit. 140, such as a loudspeaker or the like or an image emitting unit 141, such as a liquid crystal display (an LCD), a frontal view system (head-up display - HUD) or similar over the network NW vehicle.
[077] In addition, agent ECU 100 also functions as an operation detection unit 114 that detects a response, from the driver, to an operation proposal receiving, through the NW vehicle network, an operation signal from an input of operation or a sound input via an operation input unit 142, such as a steering wheel control, a microphone or semePetition 870170015443, from 03/09/2017, p. 12/39
32/52 similar.
[078] Figure 12 is a view to illustrate an example of an operating input via the steering wheel command. In the example shown in the drawing, a steering wheel control 142A has four operating buttons BA1 to BA4. Among these operation buttons, the first BA1 operation button that is located above and the second BA2 operation button that is located below are allocated as operation buttons that are operated in response to a proposed operation of agent ECU 100. So , the first BA1 operation button is operated on the acceptance of the operation proposal, and the second BA2 operation button is operated, on the contrary, on the rejection of the operation proposal. In addition, among these operation buttons, the third BA3 operation button that is located on the left and the fourth BA4 operation button that is located on the right are allocated as operation buttons that are operated in the operation of the components installed in the vehicle. regardless of a proposed operation of agent ECU 100. Then, the third BA3 operation button is operated when the driver himself operates the components installed in the vehicle through manual input, and the fourth BA4 operation button is operated when the driver himself operates the components installed in the vehicle with high frequency regardless of the condition of the vehicle on each occasion. Eventually, the fourth BA4 operation button can be allocated as an operation button that is operated when information about the operation of the components installed in the vehicle performed in the past by another driver in the same situation as now is acquired from an external server and provided to the driver himself. conductor.
[079] Then, when detecting an operation signal via operation detection unit 114, control unit 110 of agent ECU 100 promotes the transmission of an activation signal from a learning update activation unit 115 to the unit enhanced learning process 111. Eventually, in the
Petition 870170015443, of 03/09/2017, p. 40/129
In the present embodiment of the invention as described above, the value of counting the operating history of the components installed in the vehicle at the time of acceptance of a proposed operation is defined as a reward function in enhanced learning. Therefore, if the handwheel command 142A shown in Figure 12 is mentioned as an example, the transmission of an activation signal from the learning update activation unit 115 to the reinforced learning unit 111 is promoted when the first BA1 operating button is operated to accept a proposed transaction.
[080] Then, upon receiving the activation signal from the learning update activation unit 115, the enhanced learning unit 111 discovers which of the states that construct the state space in each of the state space tables T1 and T1a is relevant to the present state, based on various vehicle data that are acquired through the NW vehicle network at that point in time. Then, the reinforced learning unit 111 cumulatively adds the count value of the operating history corresponding to the relevant state, in the RA operating history of the components installed in the vehicle that is stored in the storage unit 120.
[081] In addition, when updating the RA operating history of the components installed in the vehicle, the reinforced learning unit 111 again calculates the ideal action value function Q * (st, at) and the transition probability matrix P ( st, at) based on the ideal stock value function Q * (st, at) while using a post-upgrade reward function that coincides with updating the RA operation history. Then, the proposed information generation unit 113 produces an operation proposal for the components installed in the vehicle that corresponds to the driver's intention, based on the transition probability matrix P (st, at) newly calculated by the reinforced learning unit 111.
[082] Next, a concrete processing procedure for a PROPETITION 870170015443, of 03/09/2017, p. 41/129
34/52 process of proposal of operation of the components installed in the vehicle which is executed by the agent ECU 100 according to the present embodiment of the invention after reading the information supply program stored in the storage unit 120 will be described. It should be noted here that agent ECU 100 initiates an operation proposal process for the components installed in the vehicle shown in Figure 13 under a condition that a vehicle ignition command is triggered.
[083] As shown in Figure 13, in this operation proposal process for the components installed in the vehicle, agent ECU 100 first determines whether the RA operating history that is stored in storage unit 120 has been updated, that is, it determines whether an activation signal was transmitted from the learning update activation unit 115 to the enhanced learning unit 111 (step S10).
[084] So, if the RA transaction history is updated (YES in step S10), the reward function is also updated, so that agent ECU 100 calculates the optimal stock value function Q * (st, at ) through the enhanced learning unit 111 while using the post-upgrade reward function (step S11).
[085] In addition, agent ECU 100 calculates the transition probability matrix P (st, at) for each of the states that build the state space for each of the actions that build the action space through the unit of reinforced learning 111, based on the ideal action value function Q * (st, at) thus calculated (step S12).
[086] In addition, agent ECU 100 calculates the entropy of information H (s) for each of the states that build the state space using the 112 degree dispersion computing unit, based on the transition probability matrix. P (st, at) thus calculated (step S13). In addition, Agent ECU 100
Petition 870170015443, of 03/09/2017, p. 42/129
35/52 calculates the average entropy Η (Ω) that is obtained by totaling the entropies of information H (s) for the respective states, through the computation unit of degree of dispersion 112 (step S14).
[087] So, if the average entropy H (n) thus calculated is less than the random number δ defined as a random number (YES in step S15), agent ECU 100 produces a definitive operation proposal to establish, as a self-defining target, the action a that maximizes the ideal stock value function Q * (st, at) calculated in the previous step S11, and issue action a from the proposed information generation unit 113 to the issuing unit sound 140 or the imaging unit 141 (step S16).
[088] On the other hand, if the average entropy H (n) calculated in the previous step S14 is equal to or greater than the random number δ (NOT in step S15), the agent ECU 100 produces a tentative operation proposal and error to randomly issue an action as an auto-defining target, with such a tendency to improve the frequency of action selection as the probability of performance of that action in the present state st increases, based on the transition probability matrix P (st , at) calculated in the previous step S12 (step S17).
[089] Subsequently, when there is a response, from the driver, to the operation proposal in the previous step S16 or in the previous step S17, the agent ECU 100 acquires information about the response through the operation input unit 142 (step S18). Then, the agent ECU 100 determines whether the driver response thus acquired accepts the proposed operation (step S19). This determination is made depending, for example, on whether a decision-making button (the first BA1 operation button in the example shown in Figure 12) was pressed in the case of an operation input via the steering wheel command, or if a word meaning an affirmative answer (for example, “Yes” or similar) has been inserted in the case of a sound input through the microphone.
Petition 870170015443, of 03/09/2017, p. 43/129
36/52 [090] So, if the driver's response accepts the operation proposal (YES in step S19), agent ECU 100 performs the action issue as the automatic definition target in the previous step S16 or step S17 (step S20). In addition, as the action issue as the auto-defining target is performed, agent ECU 100 transmits an activation signal from the learning update activation unit 115 to the reinforced learning unit 111, updates the RA operation history of the components installed in the vehicle through the reinforced learning unit 111 (step S21) and transfers the process to step S22.
[091] On the other hand, if the driver's response does not accept the operation proposal (NOT in step S19), agent ECU 100 transfers the process to step S22 without going through the content processing of previous step S20 and step S21 .
[092] Then, while the vehicle ignition control is on (NOT in step S22), agent ECU 100 returns the process to step S10, and repeats the content processing from step S10 to step S22 in a predetermined cycle . At this point, if the RA operating history of the components installed in the vehicle has been updated in the previous step S21, the agent ECU 100 recalculates the optimal stock value function Q * (st, at) and the transition probability matrix. P (st, at) based on the optimal stock value function Q * (st, at) while using the post-upgrade reward function that coincides with updating the RA operation history (step S11 and step S12). Then, the agent ECU 100 produces the aforementioned definitive operation proposal or the aforementioned trial and error operation proposal as the operation proposal for the components installed in the vehicle, based on the newly calculated transition probability matrix P (st, at) (step S16 and step S17).
[093] After that, whenever the operation input unit 142 is opePetition 870170015443, from 03/09/2017, p. 44/129
37/52 as a response to the operation proposal to accept the operation proposal, agent ECU 100 updates the RA operating history of the components installed in the vehicle and repeats the reinforced learning by the reinforced learning unit 111 according to the update . Thus, as the frequency of the driver's response to the proposed operation for the components installed in the vehicle increases, the precision of the transition probability matrix P (st, at) is improved in order to adapt to the real actions performed by the driver as an individual.
[094] In the following, the operation of agent ECU 100 in accordance with the present embodiment of the invention will be described, focusing especially on the operation of producing an operation proposal for the components installed in the vehicle. In producing an operating proposal for the components installed in the vehicle, the difficulty in predicting an action that can be taken by the driver according to the state of the vehicle on each occasion generally differs depending on the type of operation proposal as a target. For example, the reproduction of a sound when the vehicle is running, for example, turning on the radio, playing a song or the like, is generally susceptible to the driver's state of mind at that time, and the like, as well as the state of the vehicle , and there are also a variety of options. Therefore, it is estimated that it is difficult to predict an action that can be taken by the driver. On the other hand, for example, the configuration of a destination or the like makes it generally easier to limit the number of options from the state of the vehicle on each occasion than the reproduction of a sound and it is estimated that it is easier to predict an action that can be taken by the driver.
[095] Thus, in the present embodiment of the invention, the agent ECU
100 records, as a log, the RA operation history of the components installed in the vehicle as a response to the operation proposal, individually for the type of each operation proposal, and performs reinforced learning in which the history
Petition 870170015443, of 03/09/2017, p. 12/45
38/52 registered RA operation is defined as a reward function. In this way, agent ECU 100 calculates the transition probability matrix P (st, at) of each of the states that build the state space for each of the actions that build the action space in order to suit the action performed by the driver as an individual.
[096] In this case, as described above, in the transition probability matrix P (st, at), which is calculated based on the RA operating history of the components installed in the vehicle corresponding to the reproduction of a sound, the probability of taking each one of the actions that build the action space in each of the states that build the state space is relatively likely to be dispersed. On the other hand, in this case as described above, in the transition probability matrix P (st, at), which is calculated based on the RA operating history of the components installed in the vehicle corresponding to the configuration of a destination, the probability of taking each one of the actions that build the action space in each of the states that build the state space is relatively unlikely to be dispersed.
[097] Thus, in the present embodiment of the invention, the agent ECU 100 assesses the degree of dispersion of this state space, based on the average entropy value H (n) that is obtained by totaling the information entropy values H ( s) for the respective states that build the state space.
[098] So, when the average entropy H (n) is less than the random number δ, agent ECU 100 produces a definitive operation proposal to establish the highest value action in the present state as a target of the operation proposal and issue that action. In this case, agent ECU 100 improves the frequency of production of a definitive operation proposal as the average entropy value H (H) decreases.
[099] Figure 14 shows an example of the contents of a conversation
Petition 870170015443, of 03/09/2017, p. 46/129
39/52 which is maintained between the agent ECU 100 and the driver in order to include a definitive operation proposal. In the example shown in the drawing, agent ECU 100 confirms that the destination as an auto-defining target is “home”, as a definitive operation proposal. Then, when a sound command indicating acceptance of the definitive operation proposal (“Yes” in the example shown in the drawing) is input from the driver, agent ECU 100 automatically sets “home” as a destination. As described so far, the agent ECU 100 produces an operation proposal for the components installed in the vehicle that corresponds to the driver's intention, without disturbing the driver to select an action, in a situation where it is easy to specify which of the actions they build the action space will be taken by the driver in the present state as in the case, for example, of the configuration of a destination.
[0100] On the other hand, when the average entropy H (n) is equal to or greater than the random number δ, agent ECU 100 produces a trial and error operation proposal to issue, as a target of the operation proposal , a randomly selected stock with such a trend in order to improve the stock selection frequency as the probability density of the stock transition probability of the present state increases. In this case, agent ECU 100 improves the frequency of production of a trial and error operation proposal as the average entropy value H (n) increases.
[0101] Figure 15 shows an example of the contents of a conversation that is held between agent ECU 100 and the driver in order to include a trial and error operation proposal. In the example shown in the drawing, agent ECU 100 first asks the driver to confirm whether to initiate a trial and error operation proposal or not. Then, when a sound command indicating acceptance of the trial and error operation proposal (“Yes” in the example shown in the drawing) is input from the driver, the scheduling ECU 870170015443, from 03/09/2017, p. 47/129
40/52 t and 100 proposes that the driver select “FMA” as a randomly selected stock from stocks whose probability density of the transition probability of the present state is relatively high. Then, when a sound command indicating acceptance of a proposed sound is input to the agent ECU 100 by the driver, the agent ECU 100 automatically defines “FMA” as the sound. In addition, when a sound command indicating rejection of the proposed sound (“No” in the example shown in the drawing) is inserted into agent ECU 100 after playing the sound, agent ECU 100 proposes that the driver select “music n on CD ”As another randomly selected stock with such a trend in order to improve the frequency of stock selection as the probability density of the aforementioned transition probability of the stock increases. Then, until the sound command indicating acceptance of the proposed sound is inserted into the driver's agent ECU 100, the agent ECU 100 sequentially proposes another randomly selected action to the driver with such a tendency to improve the frequency of action selection according to probability density of the transition probability increases. Then, when the proposal to select “song 2 on CD” is accepted, agent ECU 100 automatically defines “song 2 on CD” as the sound. Thus, in a situation where it is difficult to specify which of the actions that build the action space will be taken by the driver in the present state as in the case, for example, of the definition of a sound, the agent ECU 100 more adequately produces a proposal of operation for the components installed in the vehicle that corresponds to the driver's intention, by selecting a target action from a plurality of candidates and issuing that target action.
[0102] As described above, according to the present embodiment of the invention, the following effects can be obtained. (1) When the average entropy H (n) obtained from the total value of the entropies of information H (s) for the respective states in the transition probability matrix P (st, at) calculated through learning 870170015443, from 03/09 / 2017, p. 48/129
41/52 reinforced zoning is less than the random number δ, agent ECU 100 produces a definitive operation proposal to establish a target action as a target of the operation proposal and issue that target action. Thus, the proposed operation for the components installed in the vehicle that corresponds to the driver's intention is made without disturbing the driver to select an action. On the other hand, when the average entropy H (n) obtained from the total value of the entropies of information H (s) for the respective states in the transition probability matrix P (st, at) calculated through reinforced learning is equal to or greater than than the random number δ, agent ECU 100 produces a trial and error operation proposal to select a target action as a target for the operation proposal of a plurality of candidates and issue that target action. Thus, the proposed operation for the components installed in the vehicle that corresponds to the driver's intention is more adequately produced. That is, only a single content of the operation of the components installed in the vehicle is emitted at once as a target of the proposed operation regardless of whether the average entropy H (n) is large or small. Therefore, the driver only has to express his will, that is, if he agrees with the content of the operation of the components installed in the vehicle that is proposed on each occasion. Therefore, responses to different types of operating proposals for the components installed in the vehicle whose degrees of dispersion of the average entropy H (n) are different from each other, such as the configuration of a destination and the reproduction of a sound, can be consistently made while using the operation input unit 142 as the same simple user interface. In this way, the operation proposal for the components installed in the vehicle that corresponds to the driver's intention can be made at the same time that it prevents the driver from being disturbed.
[0103] (2) When the mean entropy value H (n) is less than the random number δ, agent ECU 100 produces a definitive operation proposal having
Petition 870170015443, of 03/09/2017, p. 12/49
42/52 targets an action that maximizes the function of the ideal action value Q * (st, at) in the present state, that is, an action that has more value in the present state and which is supposedly the most likely to be taken by the conductor. Thus, the operation proposal that corresponds to the driver's intention can be performed with greater reliability.
[0104] (3) When the mean entropy value H (n) is equal to or greater than the random number δ, agent ECU 100 produces a trial and error operation proposal with such a tendency to improve the frequency of selection of, as a target, an action whose probability density of the probability distribution in the present state is high, that is, an action that is likely to be taken by the driver in the present state. Thus, even under the circumstances in which it is difficult to specify the operation of the target components previously installed on the vehicle, an operation proposal that corresponds to the driver's intention can be carried out with greater reliability.
[0105] (4) Agent ECU 100 selects a definitive operation proposal or a trial and error operation proposal with such a tendency to improve the frequency of production of a trial and error operation proposal according to the value- ε increases, while using the greedy-ε method in which the average entropy value H (n) is defined as the ε-value. Therefore, in agent ECU 100, the selection frequency of a trial and error operation proposal increases as the ε-value increases as the mean entropy value increases, that is, as the degree of dispersion of the state space increases. Thus, under the circumstances in which it is difficult to specify a driver's action regarding the provision of information as a target, the proposed operation that corresponds to the driver's intention can be performed with greater reliability.
[0106] (5) Agent ECU 100 defines a reward function by applying the frequency of an action that is performed after selecting the actions
Petition 870170015443, of 03/09/2017, p. 50/129
43/52 that build the space of actions through a response to an operation proposal, as an index of a degree of adequacy of the operation proposal for the components installed in the vehicle regarding the driver's intention, and also updates the reward function every time a response history (an RA operation history of the components installed in the vehicle) is updated. In this way, the transition probability matrix P (st, at), in which each of the actions that build the action space is executed in each of the states that build the state space in order to correspond to the driver's intention, can be calculated, and the precision of the transition probability matrix P (st, at) can be improved to adapt to the actual response by the driver as an individual as the frequency of the driver's response increases.
[0107] (6) Agent ECU 100 defines each of the states that build the state space at the same time considering a variety of elements that influence the operation proposal for the components installed in the vehicle, such as the DA operation situations and DAa of the components installed in the vehicle, the DB and DBa characteristics of the passenger (s) of the vehicle, the DC and DCa operation status of the vehicle, and the like. In this way, the operation proposal that corresponds to the driver's intention can be carried out in a way that best suits the real circumstances. Eventually, it is also estimated that the number of states that build the state space is huge, as a result of considering several elements as described above. In this regard, according to the aforementioned modality of the invention, the operation proposal that corresponds to the driver's intention can be carried out even when a huge number of learning data has not been previously prepared as is the case with the use of, for example, example, teacher-assisted learning, using the enhanced learning method in which an attempt is made to improve accuracy as a history of RA operation is accumulated.
Petition 870170015443, of 03/09/2017, p. 51/129
44/52 [0108] (Second Mode) In the following, an information delivery device according to the second embodiment of the invention will be described with reference to the drawings. Eventually, the second modality of the invention is different from the first modality of the invention in that a definitive operation proposal or a trial and error operation proposal is selected based on an information entropy value corresponding to a current state instead of obtain an average entropy value as the sum of information entropy values for the respective states. Therefore, in the description that follows, configuration details that are different from those of the first embodiment of the invention will be primarily described, and redundant description of configuration details that are identical or equivalent to those of the first embodiment of the invention will be omitted.
[0109] Figure 16 shows an example of the transition probability matrix P (st, at) that is used to select a definitive operation proposal or a trial and error operation proposal in the present embodiment of the invention. In the example shown in the drawing, for example, the probability of taking action a1 in a state si is "0.03". For the same reason, the probability of taking action a2 in state si is "0.04". For the same reason, the probability of taking action a100 in state si is "0.02". In addition, agent ECU 100 calculates the entropy value of information H (s) while using the expressions shown in Figure 8, when these probabilities are represented by p. In this case, these probabilities are homogeneously dispersed, so that the value of the entropy of information H (s) is relatively large.
[0110] Furthermore, for the same reason, in the example shown in the drawing, for example, the probability of taking action a1 in a state sj is "0.6". For the same reason, the probability of taking action a2 in the sj state is "0.02". For the same reason, the probability of taking action a100 is "0.04". In addition, the
Petition 870170015443, of 03/09/2017, p. 52/129
45/52 agent 100 calculates the entropy value of information H (s) while using the expressions shown in Figure 8, when these probabilities are represented by p. In this case, these probabilities are locally deviated (“action a1”), so that the value of the entropy of information H (s) is relatively small.
[0111] Then, the agent ECU 100 generates information about an operation proposal for the components installed in the vehicle, while using the greedy-ε method in which the entropy value of information H (s) corresponding to a current state is the ε-value, generally according to the algorithm used in the first aforementioned modality of the invention. Thus, when the value of the information entropy H (s) corresponding to a current state is relatively large as in the case where a current state is the state itself shown in Figure 16, agent ECU 100 improves the frequency of production of a trial and error operation proposal by applying the aforementioned expression (6). On the other hand, when the entropy value of information H (s) corresponding to a current state is relatively small as in the case where a current state is the state sj shown in Figure 16, agent ECU 100 improves the production frequency of a definitive operation proposal by applying the aforementioned expression (5). That is, even in the case where the average entropy value H (n) is relatively small when the state space is seen as a whole as in the case, for example, of the configuration of a destination, the agent ECU 100 determines that there is a situation in which it is difficult to specify which of the actions that build the action space will be taken by the driver exclusively in the present state, and produces a trial and error operation proposal, if the value of the information entropy H (s) corresponding to a current state is equal to or greater than the random number δ. Furthermore, on the contrary, even in the case where the average entropy value H (n) is relatively large when the state space is seen as a whole as in the case, for example, of the definition of a sound, the agent ECU 100
Petition 870170015443, of 03/09/2017, p. 53/129
46/52 determines that there is a situation in which it is easy to specify which of the actions that build the action space will be taken by the driver exclusively in the present state, and produces a definitive operation proposal, if the entropy value of information H (s) corresponding to a current state is less than the random number δ. As previously described, the agent ECU 100 produces an operation proposal for the components installed in the vehicle that corresponds to the driver's intention in order to best suit the real circumstances, considering individually and concretely the ease with which the driver's action is specified in the present state.
[0112] As described above, according to the above mentioned second embodiment of the invention, the following effects can be obtained in addition to the above mentioned effect (1) of the first embodiment of the invention. (1A) When the entropy of information H (s) corresponding to a current state is equal to or greater than the random number δ in the transition probability matrix P (st, at) calculated through enhanced learning, agent ECU 100 produces a trial and error operation proposal to select a target action from a plurality of candidates and issue that target action, as an operation proposal for the components installed in the vehicle. Thus, the proposed operation for the components installed in the vehicle that corresponds to the driver's intention is more adequately produced. On the other hand, when the entropy of information H (s) corresponding to a current state is less than the random number δ in the transition probability matrix P (st, at) calculated through enhanced learning, agent ECU 100 produces a definitive operation proposal to establish and issue a target action, such as an operation proposal for the components installed in the vehicle. Thus, the proposed operation for the components installed in the vehicle that corresponds to the driver's intention is made without disturbing the driver to select an action. That is, only a single content of the operation of the
Petition 870170015443, of 03/09/2017, p. 54/129
47/52 components installed in the vehicle are emitted at once as a target of the proposed operation, regardless of whether the degree of dispersion of information entropy H (s) for each of the states is large or small. Therefore, the driver only has to express his will, that is, whether he agrees with the content of the operation of the components installed in the vehicle that is proposed on each occasion or not. Therefore, responses to different types of operating proposals for the components installed in the vehicle whose degrees of dispersion of information entropy H (s) for each of the states are different from each other, such as the configuration of a destination and the reproduction of a sound, can be made while using operation input unit 142 as the same simple user interface. In this way, the operation proposal for the components installed in the vehicle that corresponds to the driver's intention can be made at the same time that it prevents the driver from being disturbed. In addition, agent ECU 100 selects a trial and error operation proposal or a definitive operation proposal based on the value of the information entropy H (s) corresponding to a current state, regardless of the value of the average entropy H (Q ) that defines the degree of dispersion of the state space at the moment when the state space is seen as a whole. In this way, the agent ECU 100 can make an operation proposal for the components installed in the vehicle that corresponds to the driver's intention in order to better suit the real circumstances, individually and concretely considering the ease with which the driver's action is specified in the present state.
[0113] (Other Modes) Eventually, each of the aforementioned modalities of the invention can also be carried out in the following ways.
In the first aforementioned embodiment of the invention, the average entropy H (Q) that defines the degree of dispersion of the state space is calculated by totaling the entropies of information H (s) for all states that define the state space.
Petition 870170015443, of 03/09/2017, p. 55/129
48/52
Instead, the average entropy Η (Ω) can be calculated by totaling the entropy of information H (s) for some of the states that define the state space.
[0114] In the first aforementioned embodiment of the invention, the random number δ is used as the limit to be compared with the average entropy H (n). In this way, a wider range of allocations becomes possible. Instead, however, in order to alleviate the processing load, a fixed value can be used as the limit to be compared with the average entropy H (n). In this case, a definitive operation proposal can be made through the application of the aforementioned expression (5) when the average entropy H (n) is less than the fixed value, whereas a trial and error operation proposal can be made by applying the expression mentioned above (6) when the average entropy H (n) is equal to or greater than the fixed value.
[0115] For the same reason, in the second aforementioned modality of the invention, the random number δ is used as the limit to be compared with the entropy of information H (s) corresponding to a current state. Instead, a fixed value can be used as the limit to be compared with the entropy of information H (s) corresponding to a current state. In this case, a definitive operation proposal can be made by applying the aforementioned expression (5) when the entropy of information H (n) is less than the fixed value, whereas a trial and error operation proposal can be done by applying the aforementioned expression (6) when the entropy of information H (n) corresponding to a current state is equal to or greater than the fixed value.
[0116] In the first aforementioned embodiment of the invention, the degree of dispersion of the state space is evaluated based on the average entropy H (n) obtained by totaling the entropies of information H (s) corresponding to the respective states that build the state space . Instead, the degree of dispersion of the state space can be assessed based on the value obtained by totaling deviations paPetição 870170015443, of 03/09/2017, p. 56/129
49/52 pattern or variations of probability distributions for the respective states that build the state space.
[0117] For the same reason, in the second aforementioned embodiment of the invention, the degree of dispersion of the probability distribution in the present state is evaluated based on the entropy of information H (s) corresponding to a current state. Instead, however, the degree of dispersion of the probability distribution in the present state can be assessed based on the variation or standard deviation of the probability distribution in the present state.
[0118] In each of the aforementioned modalities of the invention, the vehicle data attributes that define the states include the DA and DAa operating situations of the components installed in the vehicle, the DB and DBa characteristics of the passenger (s) of the vehicle, and the DC and DCa operation status of the vehicle. The invention is not limited to that. Other elements can be adopted as the vehicle data attributes that define the states, as long as the elements contribute to the way the driver operates the components installed in the vehicle.
[0119] In each of the aforementioned modalities of the invention, as a definitive operation proposal, the action that maximizes the ideal action value function Q * (st, at) in the present state among the respective actions that build the space of shares, that is, the highest value share in the present state is issued as a target of the proposed operation. Instead, for example, the action that maximizes the probability of transition in the present state can be issued as a target of the proposed operation. In short, it is sufficient to make a proposal for a definitive operation targeting the action that is supposedly the most likely to be taken by the driver.
[0120] In each of the above mentioned modalities of the invention, as a trial and error operation proposal, the action that satisfies the condition
Petition 870170015443, of 03/09/2017, p. 57/129
50/52 “F (s) = τ” is issued as a target of the proposed operation. Instead, when the cumulative distribution function F (s) is obtained by swapping the probabilities of taking the respective actions that build the stock space in the given state s in ascending order and adding those probabilities, the action that satisfies a condition “F (s)> τ” can be issued as a target of the proposed operation. In addition, when the cumulative distribution function F (s) is obtained by exchanging the probabilities of taking the respective actions that build the stock space in the given state s in descending order and adding these probabilities, the action that satisfies a condition “F (s) <τ ”can be issued as a target of the proposed operation. In short, it is sufficient to make a trial and error operation proposal with such a tendency in order to improve the frequency of selection of an action as the probability density of the probability distribution of the action in the present state increases.
[0121] In each of the aforementioned embodiments of the invention, the number of operating times of the first BA1 operating button on the steering wheel control 142A shown in Figure 12 as a response to an operating proposal is defined as the reward function in the enhanced learning. Instead, a value obtained by subtracting the number of operating times of the second operating button BA2 from the number of operating times of the first operating button BA1 on the steering wheel control shown in Figure 12 can be defined as the reward function in the enhanced learning. In addition, a value obtained by further subtracting the number of operating times of the third operating button BA3 or the number of operating times of the fourth operating button BA4 from the number of operating times of the first operating button BA1 can also be defined as the reward function in enhanced learning. In addition, a value obtained by recording, such as a log, the number of times the conductor was absent in response to a proposed operation for the components
Petition 870170015443, of 03/09/2017, p. 12 589
51/52 installed in the vehicle and subtracting that number recorded as the log of the number of times of operation of the first BA1 operation button can also be defined as the enhanced learning reward function. In addition, the number of times the driver feels comfortable and uncomfortable developing an operation proposal for the components installed in the vehicle can be measured based on a biological or similar signal from the driver, and the number of times the driver develops feeling of comfort by the driver can be defined as the reward function in enhanced learning. In addition, a value obtained by subtracting the number of times a driver feels uncomfortable from the number of times a driver feels comfortable developing can also be defined as the reward function in enhanced learning. In short, any index representing the degree of suitability of an operation proposal for the components installed in the vehicle with respect to the driver's intention can be defined as the reward function in enhanced learning.
[0122] In each of the aforementioned modalities of the invention, the configuration in which the steering wheel control has the third BA3 operation button and the fourth BA4 operation button that are activated in the operation of the components installed in the vehicle regardless of a proposal for operation of the agent ECU 100 as well as the first BA1 operation button and the second BA2 operation button which are triggered in response to the agent ECU 100 operation proposal has been described as an example. It should be noted, however, that a configuration in which a steering wheel command 142B, which has only the first BA1 operating button and the second BA2 operating button, is actuated in response to a proposed operation of agent ECU 100 , is used as another example of the steering wheel command as shown in Figure 17 can be adopted. In addition, a configuration in which a steering wheel control 142C having a terPetition 870170015443, of 03/09/2017, p. 59/129
52/52 third BA3a operation button, which is operated in the activation of a concierge service, instead of the third BA3 operation button, which is operated in the activation of the components installed in the vehicle through manual entry by the driver himself, as shown in Figure 12, is used as another example of the steering wheel command as shown in FIGURE 18 can be adopted. In addition, in the configuration of that steering wheel command 142B or 142C, a response, from the driver, to the proposed operation can be detected by activating the steering wheel command 142B or 142C and used as the reward function in reinforced learning.
[0123] In each of the aforementioned modalities of the invention, Qlearning is performed as an enhanced learning method. Unlike this, other methods, for example, a SARSA method, an actor-critical method and the like, can also be used as the reinforced learning method.
Petition 870170015443, of 03/09/2017, p. 60/129
1/7

权利要求:
Claims (16)
[1]
1. Information provision device, CHARACTERIZED by the fact that it comprises:
an electronic agent control unit (100) including a state space construction unit (T1, T1a) that is configured to define a vehicle's state by associating a plurality of vehicle data types with each other, and to construct a space of states as a set of a plurality of states, a unit of construction of space of actions (T2, T2a) that is configured to define, as an action, data indicating contents of an operation of a component installed in the vehicle that is executed through a response, from a driver, to an operation proposal for the component installed in the vehicle, and build a space of actions as a set of a plurality of actions, a reinforced learning unit (111) that is configured to accumulate a history of the driver's response to the operation proposal for the component installed in the vehicle, define a reward function as an index representing a degree of adequacy of the operation proposal for the component installed in the vehicle while using the accumulated history, and calculating a performance probability distribution of each of the actions that build the action space in each of the states that build the state space, through reinforced learning based on the function of reward, a dispersion grade computing unit (112) that is configured to compute a dispersion degree of the probability distribution that is calculated by the reinforced learning unit, and an information delivery unit (113) that is configured to make a definitive operation proposal to establish a target action as a target of the operation proposal and issue the target action when the degree of dispersion of the
Petition 870170015443, of 03/09/2017, p. 61/129
[2]
2/7 probability distribution that is computed by the dispersion grade computing unit is less than a threshold, and making a trial and error operation proposal to select the target action as the target of the operation proposal from a plurality of candidates and issue the target action when the degree of dispersion of the probability distribution that is computed by the computing unit of degree of dispersion is equal to or greater than the threshold.
2. Information provision device, CHARACTERIZED by the fact that it comprises:
an electronic agent control unit (100) including a state space construction unit (T1, T1a) that is configured to define a vehicle's state by associating a plurality of vehicle data types with each other, and to construct a space of states as a set of a plurality of states, a unit of construction of space of actions (T2, T2a) that is configured to define, as an action, data indicating contents of an operation of a component installed in the vehicle that is executed through a response, from a driver, to an operation proposal for the component installed in the vehicle, and build a space of actions as a set of a plurality of actions, a reinforced learning unit (111) that is configured to accumulate a history of the driver's response to the operation proposal for the component installed in the vehicle, define a reward function as an index representing a degree of adequacy of the operation proposal for the component installed in the vehicle while using the accumulated history, and calculating a performance probability distribution of each of the actions that build the action space in each of the states that build the state space, through reinforced learning based on the function of reward, a dispersion-grade computing unit (112) that is configured
Petition 870170015443, of 03/09/2017, p. 62/129
[3]
3/7 to compute a degree of dispersion of the state space totaling the degree of dispersion of the probability distribution that is calculated by the reinforced learning unit regarding the plurality of states that build the state space, and an information supply unit ( 113) that is configured to make a definitive operation proposal to establish a target action as a target of the operation proposal and issue the target action when the state space dispersion degree, which is computed by the dispersion degree computing unit , is less than a limit, and make a trial and error operation proposal to select the target action as the target of the operation proposal of a plurality of candidates and issue the target action when the degree of dispersion of the state space, which is computed by the dispersion grade computing unit, is equal to or greater than the limit.
3. Information provisioning device, according to claim 2, CHARACTERIZED by the fact that the reinforced learning unit (111) is configured to adopt, as a policy, the mapping of each of the states that build the state space for each of the actions that build the share space, define, as a state value function (V (st)), an estimated value of a cumulative reward that is obtained when the policy is followed in each of the states, estimate , as a function of ideal share value (Q * (st, at)), an estimated value of a cumulative reward that is always obtained when an ideal policy is followed after a predetermined action is selected from the share space in each of the states that build the state space assuming that the ideal policy is the policy that maximizes the state value function (V (st)) in all states that build the state space, and calculate the probability distribution based on the function of value of a estimated ideal situation, and
Petition 870170015443, of 03/09/2017, p. 63/129
[4]
4/7 the information supply unit (113) is configured to make the definitive operation proposal targeting an action that maximizes the ideal action value function in a present state, when the degree of dispersion of the state space that is computed by the dispersion grade computing unit (112) is less than the limit.
4. Information supply device according to claim 3, CHARACTERIZED by the fact that the information supply unit (113) is configured to make the trial and error operation proposal with such tendency in order to improve a frequency of selection of an action as a target as the probability density of the probability distribution of the action in the present state increases, when the degree of dispersion of the state space that is computed by the dispersion degree computing unit (112) is equal to or greater than the limit.
[5]
5. Information delivery device, according to claim 3 or 4, CHARACTERIZED by the fact that the dispersion degree computing unit (112) is configured to define, as an entropy (H (s)), the degree of dispersion of the performance probability distribution of each of the actions that build the stock space in each of the states that build the state space, and define the degree of dispersion of the state space as an average entropy (H (^) ), and the information supply unit (113) is configured to select the definitive operation proposal or the trial and error operation proposal with such tendency in order to improve a production frequency of the trial and error operation proposal as an ε-value increases, while using a greedy method- in which an average entropy value (H (^)) is defined as the ε-value.
[6]
6. Information delivery device, according to any
Petition 870170015443, of 03/09/2017, p. 64/129
5/7 of claims 1 to 5, CHARACTERIZED by the fact that the reinforced learning unit (111) is configured to define, as the reward function, a frequency of execution of the operation of the component installed in the vehicle through the driver's response to operation proposal for the component installed in the vehicle, and update the reward function according to a change in an operating history of the operation of the component installed in the vehicle when the component installed in the vehicle is operated according to the operation proposal for the component installed in the vehicle.
[7]
7. Information provision device according to any one of claims 1 to 6, CHARACTERIZED by the fact that the state space construction unit (T1, T1a) is configured to construct the state space as a set of states as a group of data associating an operating situation (DA, DAa) of the component installed in the vehicle, characteristics (DB, DBa) of a passenger or passengers of the vehicle and an operating situation (DC, DCa) of the vehicle with each other others.
[8]
8. Non-transitory computer-readable medium that stores an information supply program, CHARACTERIZED by the fact that it comprises:
the information delivery program that is programmed to make a computer perform the function of space construction of states defining a state of a vehicle by associating a plurality of types of vehicle data with each other, and constructing a state space as a set of a plurality of states, a function of space construction of defining actions, as an action, of data indicating contents of an operation of a component installed in the vehicle that is performed through a response, a driver, to an operation proposal for the component installed in the vehicle, and construction of a space of actions as a set of a plurality of actions,
Petition 870170015443, of 03/09/2017, p. 65/129
6/7 an enhanced learning function of accumulating a history of the driver's response to the operation proposal for the component installed in the vehicle, definition of a reward function as an index representing a degree of adequacy of the operation proposal for the component installed in the vehicle while using the accumulated history, and calculating a probability distribution of performance of each of the actions that build the action space in each of the states that build the state space, through reinforced learning based on the function reward, a computation function of degree of dispersion computation of a degree of dispersion of the probability distribution that is calculated through the enhanced learning function, and a function of providing production information from a definitive operation proposal to establish a target action as a target of the proposed operation and issue the target action when the degree of dispersion of the distribution of probability that is computed through the dispersion degree computation function is less than a limit, and production of a trial and error operation proposal to select the target action as the target of the operation proposal of a plurality of candidates and issue the target action when the degree of dispersion of the probability distribution that is computed using the degree of dispersion computation function is equal to or greater than the limit.
[9]
9. Non-transitory computer-readable medium that stores an information supply program, CHARACTERIZED by the fact that it comprises:
the information delivery program that is programmed to cause a computer to perform a state space construction function of defining a state of a vehicle by associating a plurality of types of vehicle data with each other, and that construction of a space of states as a set of a pluralityPetition 870170015443, of 03/09/2017, p. 66/129
1Π of states, a space-building function of defining actions, such as an action, of data indicating contents of an operation of a component installed in the vehicle that is performed through a response, from a driver, to an operation proposal for the component installed in the vehicle, and construction of an action space as a set of a plurality of actions, a reinforced learning function of accumulating a history of the driver's response to the proposed operation for the component installed in the vehicle, definition of a reward function as an index representing a degree of adequacy of the proposed operation for the component installed in the vehicle while using the accumulated history, and calculation of a probability distribution of performance of each of the actions that build the share space in each of the states that build the state space, through reinforced learning based on the reward function, a degree computing function of dispersion of computation of a degree of dispersion of the state space totaling the degree of dispersion of the probability distribution that is calculated through the reinforced learning function regarding the plurality of the states that construct the state space, and an information supply function of producing a definitive operation proposal to establish a target action as a target of the operation proposal and issue the target action when the state space dispersion degree, which is computed through the dispersion degree computation function, is lower than a limit, and production of a trial and error operation proposal to select the target action as the target of the operation proposal of a plurality of candidates and issue the target action when the degree of dispersion of the state space, which is computed through the dispersion degree computation function, is equal to or greater than the limit.
Petition 870170015443, of 03/09/2017, p. 67/129
1/16
Petition 870170015443, of 03/09/2017, p. 68/129
2/16 csi
O
Q
O
O
LO m
the m
Q m Q co
CO
Q
FIGURE 2
CM CQ O
CO
Q <
Q <. Q co <
Q
Ol <
Q
Petition 870170015443, of 03/09/2017, p. 69/129
ICC - n3 l_ ϋ
Cl
O
0J
Ό _O
The ^J > π .oo <u
5>
'ΐΛ-'Οo σ
ω cn □ the laughs
CL
QJ laughs
EQ
QJ
Έ
7 (Q o
i) c -fco d (_) E o
ίΠ3 C
U-r rt σι h
QJ
Ό
Cl
E qj (Λ (A o
t E ⁰¹ ta ia
E>
The rtj
U T3
Ú
"The
QJ
-O
O
Ϊ03
O
O
Cl
AND
O
CJ
QJ
TZ>
O _A
7W! t> ₃ ca ₍ ü rtj 4J
Cl> O O m c xj λ Q ta
Ü, S 43
The t / i rte r o
T3
The river
U * r o
rj
QJ qj, £ ra
Cl laughs
QJ σι c
o (J = o E <í
AND
QJ Ώ O *. £ £ £>, E j ± «3
E (Λ o
tC3 aj (Λ
E (θ laughs «03 03
The CL
O'
Li, Q>
<ft -ÍÜ
03 03
The CL
EÚ>
u! r Ό U ca N 2
QJ c NO «! E dJ Έ ¹ M a «1 E
N ta
O
The cn
Φ <n
The ü
3/16
Petition 870170015443, of 03/09/2017, p. 70/129
Od
Ο
Ω
LO
C0
Ω * 3CQ
Ω σο <
CC
Ζ) ο
co
C0
Ω
ÇXJ m
Ω
ASS
Ω co <
Ω
CxJ
Ω
ΠϋOl ο ο SCO οο ο - ω ζ = OCiO DΙΛ . . . 0 >03Z οΌ(C0® C «0. όο Η O_l ΐ>_l . . . readE «ib tS> 1 UΌ03203[=Lu O£03_Ci03l · - οJ =hereλ03Η . . . <to C ^r N CD ca £ _l «z- οCLΕdjοαΙΛιή03- □ _ ANDcrt Εω . . . OICCz —Τ-Trip's partner O’P3 α<caΖ . . . - sodOç□ 3O OI03ΓΓ α= 103Ζ . . . ANDίή 4> .o, c ⁷ «o (J O'PdZ αIC0Ζ - - ANDώ 1-LocationCurrent CDCflhereO heretn03U . . . çTheOD_l Day ofWeek aaT3ç3Hellou(Λ cnTheç3Hello4)ül . . . ThethereçΈTheQ Schedule sO s . . . C "CcoC-J Destiny 03crt03ω «5O* σ03ιΛ, £ 2 ια π O Cl . . . ΦsO_) OΌ03InLU ίΛ tr * u<ZJ . . . εt / J
4/16
FIGURE 4
Petition 870170015443, of 03/09/2017, p. 71/129
5/16
Petition 870170015443, of 03/09/2017, p. 72/129 & - = 4 ^- . Ω Ω
ΓΟ ω
Ω
ÇJ
CM V. Ω Ω ιη <
cr
Ζ)
Ο
LL & ί ςο <
Ω
LO <Ω £ 3 ^ ϊ-.
<
Ω £ 3 co
Ω
Ε3
C-J <
Ω
Environment Urban area Urban area . . . Saw CDOÇDÇD. crt OΑΪ3: = Oífci= a . . . BICOΣΣ Trip's partner - - . . . R- * COO*çfrog- U Oü <nΞ OϊεΠΣΣ . . . ANDOO Spouse OüdnΞ OiiQZ . . . ANDGO OICOOCDNS g »< <ΛOXI" -Acrt .g π ca U CL COOx" -A«.93 π caThe Cl . . . ç03σι3_l Day ofWeek hereσç3σι<DG0 σσç3σιfrogG0 . . . Sunday Schedule cz>O OOG . . . 23 00 c ^E E o = Có S «3> XS OP3 OX3 . . . OCD□ j The icd θ '<D CL Φ _r cC Undefined Undefined . . . p3υ0= mIn»£ raΞ ° - Source ofSound TheΞ £ . . . OççfrogOrt3 Q ΖΞ fj stateNo. 1Λ cn . . .
6/16
Τ2
FIGURE 6 /
ACTION N ^Q.DEFINITION OF DESTINATION to 1 Home a2 Parents' house a3 Location 1 a4 Location 2 a5 Location 3 a6 Location 4 a7 Location 5 a8 Location 6
Τ2α
FIGURE 7 /
ACTION N °. DEFINITION OF MUSIC to 1 Not a2 FM A a3 FM B a4 FM C a5 FM D a6 Music 1 on the Portable Terminal a7 Music 2 on the Portable Terminal a8 Music 1 on CD a100 Music n on CD
Petition 870170015443, of 03/09/2017, p. 73/129
7/16 d
GO
The d
tuO
The d
GjO
The co <
CC = 3
O
LL <
the <
the <
ÇD LO LÜ CL
5th
CLÇ>
w CO Q
Petition 870170015443, of 03/09/2017, p. 74/129 d
I
- * <
UJ «j
L ~ J
II
Cl
W nj
LM
II
CSI dY ^3B
CD
O
CO
Cvl
CO cn
CM TN
k.
O
The <
LU
8/16
FIGURE 9
Petition 870170015443, of 03/09/2017, p. 75/129
9/16
Ο Ό c <α Ο (Λ s £
X CL
Cfl φ Λ Ν
5 ^φ
Ci>
FIGURE 10A
Petition 870170015443, of 03/09/2017, p. 76/129
cn ο ca α O i £ 5 αο Φ α ΞΞ Sun Traffic Status Sparse Goal Job ANDφΠ£<0 o CL O. Not Trip's partner - ----. ...hereO*cz<gO Yes Spouse ANDώ OLb03NΠ QCu 3 ° 3 Location 6 Day ofWeek Sunday Schedule csj D(PO herepoophereO Oσ«0LU zz s
<
=) £
ο □
ί
Μ
LU (FIX AND ISSUE ACTION THAT MAXIMIZES IDEAL ACTION VALUE FUNCTION)
[10]
10/16 <
<
CC
Z o
LL <
IX es
Petition 870170015443, of 03/09/2017, p. 77/129
historicPast ThreeTimes
Environment Periphery Somnolence Intermediate Trip's partner CsJ 93ÇÇO1_α OjoaΞ Spouse O coz LocationCurrent Location 1 ΠS dj q ω Monday Schedule ΌANDa and o> C / I O Repetition Notdefined Sound Source OP«□«ÚUlI zí
LU
Q υ
z «LU o
z
LU
O o
LLI
H z
LU <
t £.
The <
z «Ϊ
O
Q í
CO
LU
WAY TO IMPROVE THE FREQUENCY OF ACTION SELECTION AS THE DENSITY OF PROBABILITY INCREASES)
[11]
11/16
FIGURE 12
Petition 870170015443, of 03/09/2017, p. 78/129
[12]
12/16
FIGURE 13 c
Petition 870170015443, of 03/09/2017, p. 79/129
[13]
13/16
FIGURE
DEFINITIVE OPERATION PROPOSAL
AGENT "IS DESTINATION YOUR HOME "
DRIVER “Yes. (SOUND COMMAND) ”
AGENT “DESTINATION CONFIGURED FOR YOUR HOME, WHICH IS YOUR DUAL DESTINATION.
FIGURE 15
TRY AND ERROR OPERATION PROPOSAL
AGENT “SHOULD I SEARCH WHAT WOULD YOU LIKE TO HEAR ” DRIVER “Yes. (SOUND COMMAND} ”
AGENT “WHAT ABOUT FM A ”
DRIVER “(OH, THIS AGENT KNOWS MY TASTE WELL, I SEE PROGRAM 1 IS ON THE AIR, THEN WOULD LIKE TO LISTEN!) Yes” AGENT “I'LL CHOOSE TO”
... STARTS BASEBALL GAME TRANSMISSION
DRIVER “UH-OH, THEY WILL TRANSMIT THE BASEBALL GAME
TODAY ... (DISAPPOINTED)
AGENT “SHOULD I CONTINUE WITH ”
DRIVER “No.”
AGENT “WHAT SUCH MUSIC IS ON THE CD ”
DRIVER “HUM, I LIKE MUSIC n, BUT I DON'T WANT TO HEAR TODAY ... No,”
AGENT “WHAT SO MUSIC 2 ON THE CD ”
GREAT DRIVER, THAT'S WHAT I WANT! ” Yes.
Petition 870170015443, of 03/09/2017, p. 80/129
[14]
14/16
CO <
CC
Z o
<
the ’<
the <
Petition 870170015443, of 03/09/2017, p. 81/129 <
CL
Ο ί
LU
LU □
to
LU
O
O '<
The á
LU
CL
O
CC UJ LU <
LL
O
LU
CL
O
LU $
you
CC LU ω <-p <«>
Sgfc
LL
CA (Λ
HERE
CA ra ω
O
Q £
CO
LU
[15]
15/16
FIGURE 17
BA1 142B
BA2
Petition 870170015443, of 03/09/2017, p. 82/129
[16]
16/16
FIGURE 18
BA3tt
BA2
Petition 870170015443, of 03/09/2017, p. 83/129
1/1

类似技术:

公开号 | 公开日 | 专利标题

BR102017004763A2|2018-03-20|INFORMATION PROVIDING DEVICE AND LEGIBLE MEASURE BY NON-TRANSITIONAL COMPUTER STORING INFORMATION PROGRAM

US10652312B2|2020-05-12|Methods for transferring user profiles to vehicles using cloud services

US10225350B2|2019-03-05|Connected vehicle settings and cloud system management

US9215274B2|2015-12-15|Methods and systems for generating recommendations to make settings at vehicles via cloud systems

US9524597B2|2016-12-20|Radar sensing and emergency response vehicle detection

US9240019B2|2016-01-19|Location information exchange between vehicle and device

US10360714B1|2019-07-23|Systems and methods for displaying autonomous vehicle environmental awareness

US20160247377A1|2016-08-25|Guest vehicle user reporting

US8060297B2|2011-11-15|Route transfer between devices

US20140306834A1|2014-10-16|Vehicle to vehicle safety and traffic communications

US20210264536A1|2021-08-26|Systems and methods for managing insurance contracts using telematics data

US10692149B1|2020-06-23|Event based insurance model

CN106412822A|2017-02-15|Computing system with geofence mechanism and method of operation thereof

US11068990B1|2021-07-20|Blockchain controlled multi-carrier auction system for usage-based auto insurance

US20220070017A1|2022-03-03|Secure controller area network | transceiver

US20200408548A1|2020-12-31|Systems and methods for routing decisions based on door usage data

CN112532679A|2021-03-19|Distributed vehicle authorization operations

同族专利:

公开号 | 公开日

TW201734926A|2017-10-01|

KR20170106227A|2017-09-20|

CN107179870B|2020-07-07|

US9939791B2|2018-04-10|

TWI626615B|2018-06-11|

JP2017162385A|2017-09-14|

CA2960140A1|2017-09-11|

US20170261947A1|2017-09-14|

KR102000132B1|2019-07-15|

MY179856A|2020-11-18|

CA2960140C|2019-06-11|

RU2657179C1|2018-06-08|

JP6477551B2|2019-03-06|

CN107179870A|2017-09-19|

EP3217333A1|2017-09-13|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

RU2138409C1|1998-06-24|1999-09-27|Вознесенский Александр Николаевич|Method of integrated presentation of visual information for vehicle driver|

US6679702B1|2001-12-18|2004-01-20|Paul S. Rau|Vehicle-based headway distance training system|

JP3999530B2|2002-02-25|2007-10-31|日本電信電話株式会社|Content information classification apparatus, program, and recording medium recording the program|

US20120253823A1|2004-09-10|2012-10-04|Thomas Barton Schalk|Hybrid Dialog Speech Recognition for In-Vehicle Automated Interaction and In-Vehicle Interfaces Requiring Minimal Driver Processing|

JP2006085351A|2004-09-15|2006-03-30|Fuji Xerox Co Ltd|Image processing device, control method therefor and control program|

US20080167820A1|2007-01-04|2008-07-10|Kentaro Oguchi|System for predicting driver behavior|

JP4682992B2|2007-02-08|2011-05-11|株式会社デンソー|VEHICLE AIR CONDITIONER, CONTROL METHOD AND CONTROL DEVICE FOR VEHICLE AIR CONDITIONER|

DE112008002030B4|2007-10-12|2013-07-04|Mitsubishi Electric Corp.|Information providing device in vehicle|

JP5180639B2|2008-03-21|2013-04-10|株式会社デンソーアイティーラボラトリ|Content presentation device, content presentation method, and program|

JP4656177B2|2008-04-14|2011-03-23|トヨタ自動車株式会社|Navigation device, operation unit display method|

JP5272605B2|2008-09-18|2013-08-28|日産自動車株式会社|Driving operation support device and driving operation support method|

JP2010134714A|2008-12-04|2010-06-17|Nippon Telegr & Teleph Corp <Ntt>|Collaborative sorting apparatus, method, and program, and computer readable recording medium|

US8266091B1|2009-07-21|2012-09-11|Symantec Corporation|Systems and methods for emulating the behavior of a user in a computer-human interaction environment|

JP5633734B2|2009-11-11|2014-12-03|ソニー株式会社|Information processing apparatus, information processing method, and program|

TW201122995A|2009-12-31|2011-07-01|Tomtom Int Bv|Methods of adaptively determining the accessibility of features provided through a user interface and navigation apparatuses using the same|

CN101840586B|2010-04-02|2012-04-11|中国科学院计算技术研究所|Method and system for planning motion of virtual human|

US9213522B2|2010-07-29|2015-12-15|Ford Global Technologies, Llc|Systems and methods for scheduling driver interface tasks based on driver workload|

JP5552009B2|2010-09-22|2014-07-16|インターナショナル・ビジネス・マシーンズ・コーポレーション|Method, program, and apparatus for determining optimal action in consideration of risk|

JP5620805B2|2010-12-21|2014-11-05|株式会社エヌ・ティ・ティ・データ|Database encryption apparatus, database encryption system, database encryption method and program|

WO2013014709A1|2011-07-27|2013-01-31|三菱電機株式会社|User interface device, onboard information device, information processing method, and information processing program|

US9266018B2|2012-11-08|2016-02-23|Audible, Inc.|Customizable in-vehicle gaming system|

US9177475B2|2013-11-04|2015-11-03|Volkswagen Ag|Driver behavior based parking availability prediction system and method|

US20170010859A1|2014-04-22|2017-01-12|Mitsubishi Electric Corporation|User interface system, user interface control device, user interface control method, and user interface control program|

KR101765635B1|2016-02-02|2017-08-07|현대자동차 주식회사|System and method for driving mode conversion of hybrid vehicle|US11084440B2|2017-06-06|2021-08-10|Toyota Motor Engineering & Manufacturing North America, Inc.|Smart vehicle accommodation device adjustment|

JP6805112B2|2017-11-08|2020-12-23|株式会社東芝|Dialogue system, dialogue method and dialogue program|

JP6477943B1|2018-02-27|2019-03-06|オムロン株式会社|Metadata generation apparatus, metadata generation method and program|

CN110196587A|2018-02-27|2019-09-03|中国科学院深圳先进技术研究院|Vehicular automatic driving control strategy model generating method, device, equipment and medium|

DE102018206717A1|2018-05-02|2019-11-07|Audi Ag|Method for the driving situation-dependent operation of a motor vehicle system of a motor vehicle, personalization device and motor vehicle|

WO2020010526A1|2018-07-10|2020-01-16|Beijing Didi Infinity Technology And Development Co., Ltd.|Systems and methods for determining a marketing strategy for an online to offline service|

JP2020017104A|2018-07-26|2020-01-30|日本電信電話株式会社|Learning device, learning method, and computer program|

US20200089244A1|2018-09-17|2020-03-19|Great Wall Motor Company Limited|Experiments method and system for autonomous vehicle control|

US10831208B2|2018-11-01|2020-11-10|Ford Global Technologies, Llc|Vehicle neural network processing|

JP2020101696A|2018-12-21|2020-07-02|トヨタ自動車株式会社|Map generation device, map generation system, map generation method, and map generation program|

WO2020161854A1|2019-02-07|2020-08-13|三菱電機株式会社|Vehicle device control system, vehicle device control method, and vehicle device control device|

KR102323482B1|2019-03-19|2021-11-09|한국전자인증 주식회사|Conversation agent system and method using emotional history|

CN110065455A|2019-04-24|2019-07-30|深圳市麦谷科技有限公司|Vehicle-mounted function intelligent starting method, apparatus, computer equipment and storage medium|

CN110979341A|2019-10-08|2020-04-10|复变时空数据科技有限公司|Driver driving behavior analysis method and analysis system|

WO2021090413A1|2019-11-06|2021-05-14|日本電信電話株式会社|Control device, control system, control method, and program|

KR20210117619A|2020-03-19|2021-09-29|삼성전자주식회사|Proactive digital assistant|

法律状态:
2018-03-20| B03A| Publication of a patent application or of a certificate of addition of invention [chapter 3.1 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

JP2016-048580|2016-03-11|

JP2016048580A|JP6477551B2|2016-03-11|2016-03-11|Information providing apparatus and information providing program|

[返回顶部]