![]() PERFECTLY AUDIO HELMET DEVICE
专利摘要:
The invention relates to a data processing for a sound reproduction on a sound reproduction device (DIS), of the headset or headset type, portable by a user in an environment (ENV). The device comprises at least one speaker (HP), at least one microphone (MIC), and a connection to a processing circuit comprising: - an input interface (IN) for receiving signals from at least the microphone, a processing unit (PROC, MEM) for reading at least one audio content to be reproduced on the loudspeaker, and an output interface (OUT) for delivering at least audio signals to be reproduced by the loudspeaker. The processing unit is arranged to: a) analyze the signals from the microphone to identify sounds emitted by the environment and corresponding to predetermined target sound classes, b) select at least one identified sound, according to a criterion of user preference, and c) constructing said audio signals to be reproduced by the speaker, by a mix selected between the audio content and the selected sound. 公开号:FR3059191A1 申请号:FR1661324 申请日:2016-11-21 公开日:2018-05-25 发明作者:Slim ESSID;Raphael Blouet 申请人:Telecom ParisTech;Institut Mines Telecom IMT; IPC主号:
专利说明:
© Publication number: 3,059,191 (to be used only for reproduction orders) ©) National registration number: 16 61324 ® FRENCH REPUBLIC NATIONAL INSTITUTE OF INDUSTRIAL PROPERTY COURBEVOIE © Int Cl 8 : H 04 R 5/033 (2017.01), H 04 R 3/00, G 10 K 11/16 A1 PATENT APPLICATION ©) Date of filing: 21.11.16. © Applicant (s): INSTITUT MINES TELECOM Etablis- ©) Priority: public education - FR and BLOUET RAPHAËL - FR. @ Inventor (s): ESSID SLIM and BLOUET RAPHAËL. ©) Date of public availability of the request: 25.05.18 Bulletin 18/21. ©) List of documents cited in the report preliminary research: Refer to end of present booklet (© References to other national documents ® Holder (s): INSTITUT MINES TELECOM Etablisse- related: public, BLOUET RAPHAËL. ©) Extension request (s): ® Agent (s): CABINET PLASSERAUD. IMPROVED HEADPHONES DEVICE. FR 3 059 191 - A1 (5 /) The invention relates to data processing for sound reproduction on a sound reproduction device (DIS), of headphones or earphones type, portable by a user in an environment (ENV). The device comprises at least one loudspeaker (HP), at least one microphone (MIC), and a connection to a processing circuit comprising: - an input interface (IN) for receiving signals from at least the microphone, a processing unit (PROC, MEM) for reading at least one audio content to be reproduced on the loudspeaker, and - an output interface (OUT) for delivering at least audio signals to be reproduced by the loudspeaker. The processing unit is designed to: a) analyze the signals from the microphone to identify sounds emitted by the environment and corresponding to predetermined target sound classes, b) select at least one identified sound, according to a user preference criterion, and c) constructing said audio signals to be reproduced by the loudspeaker, by a chosen mix between the audio content and the selected sound. Advanced headset device The invention relates to a portable sound listening device. It can be headphones with left and right earphones, or portable left and right earphones. Noise-canceling headphones are known, based on a pickup by a network of microphones of the user's sound environment. In general, these devices seek to construct, in real time, the optimal filter making it possible to minimize the contribution of the sound environment to the sound signal perceived by the user. A filter for surrounding noise has recently been proposed which can be a function of the type of environment entered by the user himself, who can then select different modes of noise cancellation (office, outdoor, etc.). The "outdoor" mode in this case provides for reinjection of the surrounding signal (but at a much lower level than without a filter, and this in order to allow the user to remain aware of his environment). Also known are selective headsets and earphones, allowing personalized listening to the environment. Recently introduced, these products make it possible to modify the perception of the environment along two axes: - increased perception (speech intelligibility), and - protection of the hearing aid in a noisy environment. It can be audio headphones, configurable via a smartphone application. Speech amplification is possible in a noisy environment, speech being generally located in front of the user. It can also be audio headphones connected to a smartphone, allowing the user to configure his perception of the sound environment: adjust the volume, add an equalizer or sound effects. We can also cite interactive headsets and earphones, for augmented reality, allowing to enrich the sound environment (game, historical reconstruction) or to accompany a user activity (virtual coach). Finally, the processes implemented by certain hearing aids to improve the experience of the hearing impaired user propose areas of innovation such as improving spatial selectivity (following the direction of the eyes of the user for example). However, these different existing achievements do not allow: analyze and interpret the user's activity, neither the content they consume, nor the environment (in particular the sound scene) in which they are immersed; automatically modify the audio rendering according to the results of these analyzes. Typically, earmuffs are based on an exclusively audio multi-channel recording of the user's environment. They seek to globally reduce their contribution to the signal perceived by the user regardless of the nature of the environment, even if it contains potentially interesting information. These devices therefore tend to isolate the user from his environment. Protective headset prototypes allow the user to configure their sound environment, for example by applying equalization filters or by increasing speech intelligibility. These devices improve the perception of the user's environment but do not actually modify the content broadcast according to the user's state or the classes of sounds present in the environment. In this configuration, the user listening to music at a high volume is always isolated from his environment and the need for a device allowing the user to capture the relevant information in his environment is always present. Of course, interactive headsets and earphones can be equipped with sensors to load and distribute content associated with a place (as part of a tourist visit for example) or an activity (game, sports training). If some devices even have inertial or physiological sensors to monitor user activity and if the dissemination of certain content may then depend on the results of the analysis of the signals from these sensors, the content broadcast does not result from an automatic generation process taking into account the analysis of the sound scene surrounding the user and does not automatically select the components of this environment relevant to the user. Furthermore, the operating modes are static, and do not automatically follow the evolution over time of the sound environment, and even less other evolutionary parameters such as a physiological state, for example of the user. The present invention improves the situation. To this end, it proposes a method implemented by computer means, of data processing for sound reproduction on a sound reproduction device, of headphones or earphones type, portable by a user in an environment, the device comprising: - at least one speaker, - at least one microphone, - a connection to a processing circuit, the processing circuit comprising: - an input interface for receiving signals from at least the microphone, a processing unit for reading at least one audio content to be reproduced on the loudspeaker, and - an output interface for delivering at least audio signals to be reproduced by the loudspeaker. In particular, the processing unit is further arranged to implement the steps: a) analyze the signals from the microphone to identify sounds emitted by the environment and corresponding to predetermined target sound classes, b) select at least one identified sound, according to a user preference criterion, and c) constructing said audio signals to be reproduced by the loudspeaker, by a chosen mixture between the audio content and the selected sound. In one possible embodiment, the device comprises a plurality of microphones and the analysis of the signals from the microphones further comprises a processing for the separation of sound sources in the environment applied to the signals from the microphones. For example, in step c), the selected sound can be: - analyzed at least in frequency and duration, - enhanced by filtering after source separation processing, and mixed with audio content. In an embodiment where the device comprises at least two speakers and the reproduction of the signals on the speakers applies a 3D sound effect, a position of sound source, detected in the environment and emitting a selected sound, can be taken into account. account to apply a sound spatialization effect of the source in the mix. In one embodiment, the device may also include a connection to a man-machine interface available to a user for entering preferences for selecting environmental sounds (in the general sense, as will be seen below) and the criterion user preference is then determined by learning a history of preferences entered by the user and stored in memory. In one embodiment (alternative or complementary), the device may further comprise a connection to a database of user preferences and the criterion of user preference is then determined by analysis of the content of said database. The device may further comprise a connection to one or more state sensors of a user of the device, so that the user preference criterion takes account of a current state of the user, thus contributing to a definition of " the environment "of the user, in the general sense. In such an embodiment, the device may include a connection to a mobile terminal available to the user of the device, this terminal advantageously comprising one or more user status sensors. The processing unit can also be arranged to select a content to be read from a plurality of content, according to the state received from the user. In one embodiment, the predetermined target sound classes may include at least speech sounds, the voiceprints of which are prerecorded. In addition, by way of example, step a) can optionally comprise at least one of the following operations: • construction and application of a dynamic filter for noise cancellation in signals from the microphone; • localization and isolation of sound sources from the environment by application of a source separation processing applied to signals from several microphones, and exploiting for example channel formation (called "beamforming"), to identify sources of 'interest (for the user of the device) • extract parameters specific to these sources of interest with a view to a subsequent restitution of the sounds captured and coming from these sources of interest in a spatialized audio mixing; • identification of the different sound classes corresponding to the sources (in different spatial directions) by a classification system (for example by deep neural networks) of known sound classes (speech, music, noise, etc.), • and possible identification by other classification techniques of the sound scene (for example, sound recognition of an office, an outdoor street, transport, etc.). In addition, by way of example, step c) can optionally comprise at least one of the following operations: - temporal filtering, spectral filtering and / or spatial filtering (for example Wiener filtering, and / or Duet algorithm), to enhance, from one or more audio streams picked up by a plurality of microphones, a given sound source (in based on the parameters extracted by the aforementioned source separation module); - 3D audio rendering, for example using HRTF (Head Related Transfer Functions) filtering techniques. The present invention also relates to a computer program comprising instructions for the implementation of the above method, when this program is executed by a processor. The invention also relates to a sound reproduction device, of the headset or earpiece type, portable by a user in an environment, the device comprising: - at least one speaker, - at least one microphone, - a connection to a processing circuit, the processing circuit comprising: - an input interface for receiving signals from at least the microphone, a processing unit for reading at least one audio content to be reproduced on the loudspeaker, and - an output interface for delivering at least audio signals to be reproduced by the loudspeaker. The processing unit is further arranged for: - analyze the signals from the microphone to identify sounds emitted by the environment and corresponding to predetermined target sound classes, - select at least one identified sound, according to a user preference criterion, and - construct the said audio signals to be reproduced by the loudspeaker, by a mix chosen between the audio content and the selected sound. The invention thus proposes a system including an intelligent audio device, integrating for example a network of sensors, at least one speaker and a terminal (e.g. smartphone). The originality of this system is to be able to automatically generate, in real time, the optimal soundtrack for the user, that is to say the multimedia content best suited to his environment and his own state. The personal status of a user can be defined by: i) a set of preferences (type of music, classes of sound of interest, etc.); ii) his activity (at rest, at the office, in sports training, etc.); iii) its physiological (stress, fatigue, effort, etc.) and / or socio-emotional (personality, mood, emotions, etc.) states. The multimedia content generated can include main audio content (to be broadcast in the headset) and possibly secondary multimedia content (text, images, video) which can be broadcast via the smartphone-type terminal. The different content elements include both the elements of the user's content base (music, video, etc., hosted on the terminal or in the cloud), the result of captures carried out by a network of sensors that includes the system and synthetic elements generated by the system (notifications, sound or text “jingles”, comfort noise, etc.). Thus, the system can automatically analyze the user's environment and predict the components potentially of interest to the user in order to restore them in an increased and controlled manner, by optimally superimposing them on the content consumed by it (typically the music that 'He listens). The actual content restitution takes into account the nature of the content and the components extracted from the environment (as well as the user's own state in a more sophisticated embodiment). The sound stream broadcast in the headphones no longer comes from two competing sources: - a main source (music or radio program or other), and - a disturbing source (amber noise), mars of a set of information flows whose relative contributions are adjusted according to their relevance. Thus, a message broadcast inside a station will be reproduced so that it is well perceived by the user even when the latter listens to music at a high level, while reducing irrelevant amber noise. for the user. This possibility is offered by the addition of an intelligent processing module integrating in particular algorithms for source separation and classification of sound scenes. The direct application advantage is on the one hand to reconnect the user with his environment or to warn him if a class of targeted sounds is detected, and on the other hand to automatically generate content adapted at each moment to the expectations of the user. '' utrlrsateur thanks to a recommendation engine taking care of the various elements of contents, above mentioned. It should be recalled that the state-of-the-art devices do not automatically identify each class of sound present in the user environment in order to associate with each of them a treatment that meets the expectations of the user (for example highlighting a sound, or on the contrary reducing it, generating an alert), depending on its identification in the environment. The state of the art does not use sound scene analysis, nor the state of the user or his activity to calculate the sound rendering. Other advantages and characteristics of the invention will appear on reading the detailed description of exemplary embodiments below, and on examining the appended drawings in which: - Figure 1 illustrates a device according to the invention, in a first embodiment, FIG. 2 illustrates a device according to the invention, in a second embodiment, here connected to a mobile terminal, FIG. 3 illustrates the steps of a method according to an embodiment of the invention, and - Figure 4 specifies steps of the method of Figure 3, according to a particular embodiment. With reference to FIG. 1, a device DIS for sound reproduction (of headphones or earpieces type), worn for example by a user in an ENV environment, comprises at least: one (or two, in the example shown) loudspeakers, at least one sensor, for example a MIC microphone (or a row of microphones in the example shown to pick up a directivity of the sounds coming from the environment), and a connection to a processing circuit. The processing circuit can be integrated directly into the headset and be housed in an enclosure of a loudspeaker (as illustrated in FIG. 1), or can, in the variant illustrated in FIG. 2, be implemented in a TER terminal of the user, for example a mobile terminal of the smartphone type, or else be distributed between several terminals of the user (a smartphone, and a connected object possibly including other sensors). In this variant, the connection between the headset (or the headsets) and the dedicated processing circuit of the terminal is carried out by a USB or short-range radio frequency connection (for example by Bluetooth or other) and the headset (or the headsets) is equipped with a BT1 transmitter / receiver, communicating with a BT2 transmitter / receiver included in the TER terminal. A hybrid solution in which the processing circuit is distributed between the helmet enclosure and a terminal is also possible. In one or other of the above embodiments, the processing circuit comprises: an input interface IN, for receiving signals from at least the microphone MIC, a processing unit typically comprising a processor PROC and a memory MEM, for interpreting, relative to the environment ENV, the signals from the microphone by learning ( for example by classification, or again by “matching” of the “finger printing” type for example), an output interface OUT for delivering at least audio signals which are functions of the environment and to be reproduced by the loudspeaker. The memory MEM can store instructions of a computer program within the meaning of the present invention, and possibly temporary data (of calculation or other), as well as durable data, such as for example the preferences of the user, or even data of definition of models or others, as will be seen below. The IN input interface is, in a sophisticated embodiment, connected to a network of microphones, as well as to an inertial sensor (provided on the helmet or in the terminal) and the definition of user preferences. User preference data can be stored locally in MEM memory, as shown above. As a variant, they can be stored, with possibly other data, in a remote database DB accessible by a communication via a local or wide area network NW. An LP communication module with such a network suitable for this purpose can be provided in the headset or in the TER terminal. Advantageously, a man / machine interface can allow the user to define and update his preferences. In the embodiment of FIG. 2 where the DIS device is paired with the TER terminal, the man / machine interface can simply correspond to a touch screen of the TER smartphone for example. Otherwise, such an interface can be provided directly on the helmet. In the embodiment of FIG. 2, however, it is advantageously possible to take advantage of the presence of additional sensors in the TER terminal to enrich the definition of the user's environment, in the general sense. These additional sensors can be physiological sensors specific to the user (measurement of electroencephalogram, measurement of heart rate, pedometer, etc.) or any other sensor making it possible to improve the knowledge of the couple environment / current state of the user. In addition, this definition may directly include notification by the user himself of his activity, his own state and his environment. The definition of the environment can also take into account: all the accessible content and a history of the contents consulted (music, videos, radio broadcasts, etc.), metadata (for example genre, listening occurrences by song) associated with the user's music library can also be associated; in addition, the browsing and application history of his smartphone; the history of its consumption of content in streaming (via a service provider) or locally; preferences and current activity of his connections on social networks. Thus, the input interface can, in a general sense, be connected to a set of sensors, and also include connection modules (in particular the LP interface) for characterizing the environment of the user, but also of his habits and preferences (history of content consumption, streaming activities and / or social networks). A description is given below with reference to FIG. 3 of the processing carried out by the aforementioned processing unit, monitoring the environment and possibly the state of the user in order to characterize the relevant information which is likely to be restored in the flow multimedia output. In one embodiment, this monitoring is implemented by the automatic extraction, via signal processing and artificial intelligence modules, in particular of machine learning (represented by step S7 in FIG. 3), of parameters important for creating the output media stream. These parameters, noted Pt, P2, ..., in the figures can typically be environment parameters which must be taken into account for the reproduction on loudspeakers. For example, if a sound picked up in the environment is identified as a speech signal to be reproduced: - a first set of parameters can be coefficients of an optimal filter (Wiener filter type) making it possible to enhance the speech signal to increase its intelligibility; - a second parameter is the directivity of the sound captured in the environment and to be restored, for example using binaural rendering (rendering technique using HRTF type transfer functions); - etc. It will thus be understood that these parameters PI, P2, are to be interpreted as “descriptors” of the environment and of the user's own state in the general sense, which feed a program for generating the “optimal soundtrack” for this user. This soundtrack is obtained by composition of its contents, elements of the environment and synthetic elements. During the first step SI, the processing unit requests the input interface to collect the signals coming from the microphone or from the microphone network MIC carried by the device DIS. Of course, other sensors (of inertia, or others) in the terminal TER in step S2, or elsewhere in step S3 (connected sensors of heart rate, EEG, etc.), can communicate their signals to the processing unit. Furthermore, information data other than captured signals (user preferences in step S5, and / or the consumption history of content and connections to social networks in step S6) can be transmitted. by the memory MEM and / or by the database BD at the processing unit. In step S4, all of these data and signals specific to the environment and the state of the user (hereinafter generically called "environment") are collected and interpreted by the implementation, in step S7, a computer module for decoding the environment using artificial intelligence. For this purpose, this decoding module can use a learning base which can, for example, be remote and requested in step S8 via the network NW (and the communication interface LP), in order to extract parameters relevant PI, P2, P3, ..., in step S9 which models the environment in general. As detailed below with reference to FIG. 4, on the basis of these parameters in particular, the sound scene to be reproduced is generated in step S10 and transmitted in the form of audio signals to the loudspeakers HP in step S11. This sound scene may possibly be accompanied by graphical information, for example metadata, to be displayed on the screen of the TER terminal in step S12. Thus, an analysis of the environmental signals is carried out, with: - an identification of vironnementenvironment in order to estimate prediction models making it possible to characterize Γreuser's environment and his own state (these models being used with a recommendation engine as will be seen below with reference to FIG. 4), and - a fine acoustic analysis allowing the generation of more precise parameters and used for the manipulation of the audio content to be restored (separation / enhancement of particular sound sources, sound effects, mixing, spatialization, or others). The identification of the environment makes it possible to characterize, by automatic learning, the couple environment / own state of the user. It is mainly: detect if certain classes of target sounds, among several prerecorded classes, are present in the user environment and determine, where appropriate, their direction of origin. Initially, the classes of its targets can be defined, one by one, by the user via his terminal or by using predefined operating modes; to determine the activity of the user: rest, at the office, in activity in a gym, or others; determine the emotional and physiological state of the user (for example "fit", according to a pedometer, or "stressed" according to his EEG); describe the content they consume by means of content analysis techniques (hearing and computer vision techniques, and natural language processing). The fine acoustic analysis makes it possible to calculate the acoustic parameters which are used for the audio reproduction (for example in 3D reproduction). Referring now to FIG. 4, in step S17, a recommendation engine is used to receive the descriptors of the “environment”, in particular the classes of identified sound events (parameters P1, P2, etc.), and providing thereon a recommendation template (or a combination of templates) in step S19. For this purpose, the recommendation engine can use the characterization of the user's contents and their similarity to external contents as well as the preferences of the user, which were recorded in a learning base in step S15, and / or standard preferences of other users in step S18. The user can also intervene at this stage with his terminal to enter a preference at stage S24, for example with respect to a content or a list of contents to be played. From all of these recommendations, a relevant recommendation model is chosen according to the environment and the user's state (for example in the group of rhythmic music, in situation of movement of the user apparently in a gym). H is then implemented a composition engine in step S20, which combines the parameters PI, P2 ..., with the recommendation model, to develop a composition program in step S21. This is a routine that suggests for example: - a specific type of content to search for in user content, - taking into account its own state (for example its activity) and certain types of sounds of the external environment identified in the parameters PI, P2, ..., - to mix with the content, according to a sound level and a spatial rendering (3D audio) which has been defined by the composition engine. The synthesis engine, strictly speaking, of the sound signal intervenes in step S22, to develop the signals to be restored in steps S11 and S12, from: - user content (from step S25 (as a sub-step from step S6), of course, one of the content having been selected in step S21 by the composition engine, - sound signals picked up in the environment (SI, possibly of parameters PI, P2, ... in the case of a synthesis of the sounds of the environment to be restored), and - other sounds, possibly synthetic, of notifications (beep, bell, or other), which can announce an external event and mix with the content to be restored (selected in step S21 from step S16), with possibly a 3D rendering defined in step S23. Thus, the flow generated is adapted to the expectations of Γ user and optimized according to the context of its distribution, according to three main stages in a particular embodiment: - the use of a recommendation engine to filter and select in real time the content elements to be mixed for the sound reproduction (and possibly also visual) of a multimedia stream (called "controlled reality"); - the use of a media composition engine which programs the temporal, frequency and spatial arrangement of the content elements, with respective sound levels also defined; - the use of a synthesis engine generating the sound rendering signals (and possibly visual), possibly with sound spatialization, according to the program established by the composition engine. The multimedia stream generated comprises at least audio signals but potentially textual, haptic and or visual notifications. The audio signals include a mix: - content selected from the user's content base (music, video, etc.), entered as a preference by the user in step S24, or recommended directly by the recommendation engine according to the status of the user and of the environment, with possibly - sounds picked up by the network of MIC sensors, selected in the sound environment (therefore filtered), enhanced (for example by source separation techniques) and processed so that they are of frequency texture, intensity and spatialization, adjusted to be injected into the mix in a timely manner, and - synthetic elements recovered from a base in step S16, for example sound / text notification sounds / jingles, comfort noise, etc.). The recommendation engine is jointly based on: user preferences obtained explicitly through a form of questioning, or implicitly by exploiting the result of the decoding of one's own state, collaborative filtering techniques and social graphs, exploiting the models of several users to the times (step S18), the description of the contents of Γ user and their similarity, in order to build models making it possible to decide which elements of content should be played to the user. The models are updated continuously over time to adapt to the evolution of the user. The composition engine plans: - the time at which each piece of content should be played, including the order in which user content is presented (for example, the order of music in a playlist), and external moments or sounds or the notifications are broadcast: in real time or delayed (for example between two songs in a playlist) so as not to disturb the listening or the activity in progress of the user at an inappropriate time; - the spatial position (for 3D rendering) of each content item; - the different audio effects (gain, filtering, equalization, dynamic compression, echo or reverberation (reverb), time deceleration / acceleration, transposition ...) which must be applied to each element of content. Planning is based on models and rules built from the decoding of the user's environment and his own state. For example, the spatial position of a sound event captured by the microphones and the gain level associated with it depend on the result of the location detection of sound sources that the decoding of the environment achieves in step S7 of the figure 3. The synthesis engine relies on signal processing techniques, natural languages and images, respectively for the synthesis of audio, textual and visual outputs (images or videos), and jointly for the generation of multimedia outputs, for example video. In the case of audio output synthesis, temporal, spectral and / or spatial filtering techniques can be used. For example, the synthesis is first performed locally on short time windows and the signal is reconstructed by addition-overlap before being transmitted to at least two speakers (one for each ear). Different gains (power levels) and audio effects are applied to different content items, as expected by the composition engine. In a particular embodiment, the processing applied by windows can include filtering (for example from Wiener) making it possible to enhance, from one or more of the audio streams captured, a particular sound source (as provided by the composition engine) . In a particular embodiment, the processing can include 3D audio rendering, possibly using HRTF filtering techniques (HRTF transfer functions for "Head Related Transfer Functions"). In a first example illustrating a minimal implementation, the description of the user's environment is limited to his sound environment; the user's own state is limited to his preferences: class of his target, notifications he wishes to receive, these preferences being defined by the user using his terminal; the device (possibly in cooperation with the terminal) is equipped with inertial sensors (accelerometer, gyroscope and magnetometer); the playback parameters are automatically modified when a target sound class is detected in the user's environment; short messages can be recorded; notifications can be sent to the user to notify him of the detection of an event of interest. The signals received are analyzed to determine: - the classes of sounds present in the user's environment and the directions from which they come, with, for this purpose: - detection of the directions of higher sound energies by analyzing the contents in each of these directions independently, - a global determination for each direction of the contribution of each of the sound classes (for example by using a source separation technique), - the parameters of models describing the environment of the user and those of the parameters feeding the recommendation engine. In a second example illustrating a more sophisticated implementation, a set of sensors comprising a network of microphones, a video camera, a pedometer, inertial sensors (accelerometers, gyroscopes, magnetometers), physiological sensors can capture the visual and sound environment of the user (microphones and camera), the data characterizing their movement (inertial sensors, pedometer) and their physiological parameters (EEG, ECG, EMG, electrodermal) as well as all of the content they are viewing (music , radio shows, videos, browsing history and smartphone apps). Then, the different flows are analyzed to extract information related to the activity of the user, his mood, his state of fatigue and his environment (for example running on a treadmill in a gym, in a good mood and in state of low fatigue). A musical stream adapted to the environment and to the user's own state can be generated (for example a playlist, each song of which is selected according to his musical tastes, his stride and his state of fatigue). While all the sound sources are canceled in the user's helmet, the voice of a trainer (“sports coach”) near the user, when identified (previously recorded voice print), is mixed with the stream and spatially rendered using binaural rendering techniques (by HRTF for example).
权利要求:
Claims (12) [1" id="c-fr-0001] 1. Method implemented by computer means, for processing data for sound reproduction on a sound reproduction device, of the headset or earpiece type, portable by a user in an environment, the device comprising: - at least one speaker, - at least one microphone, - a connection to a processing circuit, the processing circuit comprising: - an input interface for receiving signals from at least the microphone, a processing unit for reading at least one audio content to be reproduced on the loudspeaker, and - an output interface for delivering at least audio signals to be reproduced by the loudspeaker, characterized in that the processing unit is also arranged to implement the steps: a) analyze the signals from the microphone to identify sounds emitted by the environment and corresponding to predetermined target sound classes, b) select at least one identified sound, according to a user preference criterion, and c) constructing said audio signals to be reproduced by the loudspeaker, by a chosen mixture between the audio content and the selected sound. [2" id="c-fr-0002] 2. Method according to claim 1, characterized in that, the device comprising a plurality of microphones, the analysis of the signals from the microphones further comprises a processing for the separation of sound sources in the environment applied to the signals from the microphones. [3" id="c-fr-0003] 3. Method according to claim 2, characterized in that, in step c), the selected sound is: - analyzed at least in frequency and duration, - enhanced by filtering after source separation processing, and mixed with audio content. [4" id="c-fr-0004] 4. Method according to one of claims 2 and 3, characterized in that, the device comprising at least two speakers and the reproduction of the signals on the speakers applying a 3D sound effect, a position of sound source, detected in the environment and emitting a selected sound, is taken into account to apply a sound spatialization effect of the source in the mix. [5" id="c-fr-0005] 5. Method according to one of the preceding claims, characterized in that the device also comprises a connection to a man-machine interface available to a user for entering preferences for selecting environmental sounds, and in that the user preference criterion is determined by learning a history of the preferences entered by the user and stored in memory. [6" id="c-fr-0006] 6. Method according to one of the preceding claims, characterized in that the device further comprises a connection to a user preference database and the user preference criterion is determined by analysis of the content of said database. data. [7" id="c-fr-0007] 7. Method according to one of the preceding claims, characterized in that the device further comprises a connection to one or more status sensors of a user of the device, and in that the user preference criterion takes account of a current state of the user. [8" id="c-fr-0008] 8. Method according to claim 7, characterized in that the device comprises a connection to a mobile terminal available to the user of the device, the terminal comprising one or more user status sensors. [9" id="c-fr-0009] 9. Method according to one of claims 7 and 8, characterized in that, the processing unit is further arranged to select a content to be read from a plurality of content, depending on the state of the user. [10" id="c-fr-0010] 10. Method according to one of the preceding claims, characterized in that the predetermined target sound classes include at least speech sounds, prerecorded voiceprints. [11" id="c-fr-0011] 11. Computer program characterized in that it includes instructions for implementing the method according to one of claims 1 to 10, when this program is executed by a processor. [12" id="c-fr-0012] 12. Sound reproduction device, of headphones or earphones type, portable by a user in an environment, the device comprising: - at least one speaker, - at least one microphone, - a connection to a processing circuit, the processing circuit comprising: - an input interface for receiving signals from at least the microphone, a processing unit for reading at least one audio content to be reproduced on the loudspeaker, and - an output interface for delivering at least audio signals to be reproduced by the loudspeaker, characterized in that the processing unit is also arranged for: - analyze the signals from the microphone to identify sounds emitted by the environment and corresponding to predetermined target sound classes, - select at least one identified sound, according to a user preference criterion, and - construct the said audio signals to be reproduced by the loudspeaker, by a mix chosen between the audio content and the selected sound. 1/3 ENV OUT
类似技术:
公开号 | 公开日 | 专利标题 FR3059191B1|2019-08-02|PERFECTLY AUDIO HELMET DEVICE CN105898429B|2019-05-28|It is capable of method, equipment and the augmented reality equipment of augmented reality performance CN105637903B|2019-05-28|System and method for generating sound US20170316718A1|2017-11-02|Converting Audio to Haptic Feedback in an Electronic Device US10187740B2|2019-01-22|Producing headphone driver signals in a digital audio signal processing binaural rendering environment US10860645B2|2020-12-08|Systems and methods for creation of a listening log and music library EP2920979B1|2019-08-14|Acquisition of spatialised sound data EP3434022A1|2019-01-30|Method and device for controlling the setting of at least one audio and/or video parameter, corresponding terminal and computer program McGill et al.2020|Acoustic transparency and the changing soundscape of auditory mixed reality Ramsay et al.2018|The intrinsic memorability of everyday sounds US10466955B1|2019-11-05|Crowdsourced audio normalization for presenting media content WO2021129444A1|2021-07-01|File clustering method and apparatus, and storage medium and electronic device CN106488311B|2019-12-13|Sound effect adjusting method and user terminal CN112352441A|2021-02-09|Enhanced environmental awareness system EP2362392B1|2012-09-26|Method for browsing audio content JP6990042B2|2022-01-12|Audio providing device and audio providing method JPWO2018180024A1|2020-02-06|Information processing apparatus, information processing method, and program EP2206236A1|2010-07-14|Audio or audio-video player including means for acquiring an external audio signal CN110493635A|2019-11-22|Video broadcasting method, device and terminal WO2019186079A1|2019-10-03|Method and system for broadcasting a multichannel audio stream to terminals of spectators attending a sporting event CN111610851A|2020-09-01|Interaction method and device and user terminal for realizing interaction method FR3064795A1|2018-10-05|MODELING HUMAN BEHAVIOR FR2921746A1|2009-04-03|Portable musical signal listening device e.g. MPEG-1 audio layer 3 walkman, for e.g. car, has transferring stage transferring external audio signal to musical signal listening unit, and processor applying processing function to audio signal
同族专利:
公开号 | 公开日 US20200186912A1|2020-06-11| WO2018091856A1|2018-05-24| FR3059191B1|2019-08-02| TW201820315A|2018-06-01| EP3542545A1|2019-09-25|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 EP2602728A1|2011-12-05|2013-06-12|France Télécom|Device and method for selecting and updating the profile of a user| US20150078575A1|2013-02-11|2015-03-19|Symphonic Audio Technologies Corp.|Audio apparatus and methods| US20150222989A1|2014-02-04|2015-08-06|Jean-Paul Labrosse|Sound Management Systems for Improving Workplace Efficiency| US20160163303A1|2014-12-05|2016-06-09|Stages Pcs, Llc|Active noise control and customized audio system| US8155334B2|2009-04-28|2012-04-10|Bose Corporation|Feedforward-based ANR talk-through|TWI671738B|2018-10-04|2019-09-11|塞席爾商元鼎音訊股份有限公司|Sound playback device and reducing noise method thereof| US11252497B2|2019-08-09|2022-02-15|Nanjing Zgmicro Company Limited|Headphones providing fully natural interfaces| TWI731472B|2019-11-14|2021-06-21|宏碁股份有限公司|Electronic device and automatic adjustment method for volume| TWI740374B|2020-02-12|2021-09-21|宏碁股份有限公司|Method for eliminating specific object voice and ear-wearing audio device using same|
法律状态:
2017-09-25| PLFP| Fee payment|Year of fee payment: 2 | 2018-05-25| PLSC| Search report ready|Effective date: 20180525 | 2018-09-21| PLFP| Fee payment|Year of fee payment: 3 | 2019-09-26| PLFP| Fee payment|Year of fee payment: 4 | 2020-10-27| PLFP| Fee payment|Year of fee payment: 5 | 2021-09-28| PLFP| Fee payment|Year of fee payment: 6 |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 FR1661324|2016-11-21| FR1661324A|FR3059191B1|2016-11-21|2016-11-21|PERFECTLY AUDIO HELMET DEVICE|FR1661324A| FR3059191B1|2016-11-21|2016-11-21|PERFECTLY AUDIO HELMET DEVICE| US16/462,691| US20200186912A1|2016-11-21|2017-11-20|Audio headset device| PCT/FR2017/053183| WO2018091856A1|2016-11-21|2017-11-20|Improved audio headphones device| EP17808108.9A| EP3542545A1|2016-11-21|2017-11-20|Improved audio headphones device| TW106140244A| TW201820315A|2016-11-21|2017-11-21|Improved audio headset device| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|