专利摘要:
engine (104) die mediaanfactoren (401;402) bepaalt Γ m voor genoemde primaire verhouding (301) en voor ii genoemde secundaire verhouding (302); - een correlatiecoëfficiënt-bepalende engine (105) die het volgende bepaalt: - een product van medianen (500) omvattende het vermenigvuldigen van genoemde mediaanfactoren (401;402); en - genoemde correlatiecoëfficiënt (1) omvattende het bepalen van de vierkantswortel van genoemd product van medianen (500); en - een uitvoerinterface (106) die genoemde correlatiecoëfficiënt (1) uitvoert. A correlation estimation device (100) comprising: - a first receiving interface (101) receiving a first data set (201); - a second receiving interface (102) receiving a second data set (202), - a ratio determining unit (103) determining a primary ratio (301) and a secondary ratio (302), comprising dividing said second data series (202) by said first data series (201) and said first data series (201) by said second data series (202); - a median calculation- engine (104) determining median factors (401; 402) Γ m for said primary ratio (301) and for ii said secondary ratio (302); - a correlation coefficient determining engine (105) which determines the following: - a product of medians (500) comprising multiplying said median factors (401; 402); and - said correlation coefficient (1) comprising determining the square root of said median product (500); and - an output interface (106) which outputs said correlation coefficient (1).
公开号:BE1023099B1
申请号:E2016/5177
申请日:2016-03-10
公开日:2016-11-21
发明作者:Jan Holvoet
申请人:Aphilion Bvba;
IPC主号:
专利说明:

A DEVICE FOR CORRELATION ESTIMATION AND RELATED
METHOD
Field of the Invention The present invention relates generally to the field of statistics, and more specifically to the statistical analysis of relationships or correlations between data sets.
Background of the Invention A correlation coefficient is a coefficient that quantifies a certain type of correlation and dependency, namely statistical relationships between two or more random variables or observed data samples. A correlation coefficient between two data series X and Y is classically defined as:
(1) where μχ and σχ are respectively the expected value and the standard deviation of the data series X, where μγ and σγ are respectively the expected value and the standard deviation of the data series Y, and where E represents the expectation of the data series X and Y. In the specific case where the data sets X and Y are standardized data sets, i.e. when the expectation values of the data sets X and Y are equal to 0 and where the standard deviation of the data sets X and Y is equal to 1, the equation (1) are reduced to: Ρχ, γ = E [XY] (2) [03] For standardized data series X and Y, the correlation coefficient between the data series X and Y is therefore the expected value of the data series corresponding to the product of the data series X and Y The expected value of a data series is generally estimated as the arithmetic mean of the data series. For example, assuming equation (2), Pearson's product-moment correlation coefficient can be written as:
(3) where n is the size of the data series XY and where xt and yÉ are the values of X and Y for the ith sample of the data series XY.
[04] A correlation coefficient between two data series X and Y is ideally calculated on the basis of good samples representing the joint distribution of X and Y. Due to the existence of outliers, ie observations that are outside the general pattern of a distribution, the correlation coefficient estimated by a classical method such as that of Pearson is often incorrect. Namely, the estimated correlation coefficient is very sensitive to the presence of gross errors in the data, since the presence of even a few outliers in the data set may, for example, destroy the full correlation coefficient determined by classical methods, or even a change in the character . To prevent outliers from affecting the correlation coefficient, it is necessary to manually pre-process the data set before determining the correlation coefficient to identify and remove such outliers from the data set. This subjective and arbitrary selection makes estimating the correlation between the two data sets complex, time-consuming and very unreliable.
[05] Spearman's rank correlation coefficient is an example of a correlation coefficient defined to withstand the presence of outliers in the data sets. Spearman's rank correlation coefficient assesses the correlation between two data sets using the arrangement of data samples from the data sets in the data sets rather than the actual numerical value of the data samples. In this way, outliers weigh less and do not dominate in the determination of a correlation coefficient. Another estimator of the correlation coefficient that can withstand the presence of outliers is the minimum covariance determinant, also called MCD estimator.
[06] Determining robust correlation coefficients derived from Spearman's method or from the MCD estimator is very time-consuming. Namely, estimation by the MCD involves performing an iterative process on each data series, omitting some values to obtain a covariance matrix with the smallest determinant. The covariance matrix is then used as the estimator of the covariance matrix of the entire data series. However, this limits the accuracy that can be obtained with the MCD estimator.
WO2007 / 064860 describes an alternative method for calculating a correlation coefficient that is resistant to outliers. A weight is assigned to each data sample from the data series to indicate the probability that the data sample in question is an outlier. Each weight is proportional to the inverse of a distance between the data sample and a sample average. In other words, if the data sample follows the common distribution of the data series, that data sample is given a high weight; otherwise it will get low weight. The correlation coefficient between the data sets is then estimated taking into account the weights of the data samples.
The method described in WO2007 / 064860 is complex. Assigning weights to all data samples of the data series is very time-consuming and jeopardizes the efficiency of the method. With regard to the Pearson's product-moment correlation, the method described in WO2007 / 064860 relies on the calculation of a sample mean, which limits the insensitivity of the correlation coefficient to the presence of outliers and thus renders the estimation thereof unreliable.
[09] In the context of statistics, a correlation coefficient can illustrate the correlation between the presence of one or more external factors and the behavior of one or more systems that are subjected to those external factors. For example, a correlation coefficient may be useful in the field of machine condition monitoring to prevent damage to the machine, in the medical domain where data from medical sensors and / or medical data about one or more patients are analyzed, etc. A large a number of outliers are present in financial markets. In the context of the management of collective investment funds, the estimation of a correlation coefficient, which is reliable and resistant to the presence of outliers, is therefore required to guarantee an efficient selection of shares and securities. A time-consuming method as described above implies that the collection of recently published financial results and their interpretation by financial analysts takes a few hours, or even a few days, which causes a considerable delay in taking adequate financial decisions.
[10] It is an object of the present invention to describe a device that overcomes the shortcomings of existing solutions identified above. More specifically, it is an objective to describe a device that provides a fast, robust, and efficient determination of the relationships or correlations between data sets.
Summary of the invention [11] According to a first aspect of the present invention, the objectives defined above are achieved by a correlation estimation device to determine a correlation coefficient between data sets, the correlation estimation device comprising: - a first receiving interface adapted to a first data set receive; - a second receiving interface adapted to receive a second data set; - a ratio-determining unit, operatively linked to the first receiving interface and to the second receiving interface, configured to determine the following: a primary ratio comprising dividing the second data series by the first data series; and a secondary relationship comprising dividing the first data series by the second data series; - a median calculation engine, operatively linked to the ratio determining unit, configured to determine the following: o a primary median factor for the primary ratio; and o a secondary median factor for the secondary ratio; - a correlation coefficient determining engine, operatively linked to the median calculation engine, configured to determine the following: o a product of medians comprising multiplying the primary median factor by the secondary median factor; and o the correlation coefficient between the first data series and the second data series comprising determining the square root of the medians product; and an output interface adapted to execute the correlation coefficient.
[12] According to the present invention, the correlation estimation device determines a correlation coefficient between a first data series and a second data series in a fast and efficient manner, and ensures that the correlation coefficient performed is robust in the presence and absence of outliers in the first data series and / or the second data set. The determination of a correlation coefficient is indeed based on the determination of a first median factor and a second median factor. A determination comprising the use of median factors instead of average factors ensures that the correlation coefficient determined by the correlation estimation device is robust when outliers are present in the data sets, and also guarantees that the correlation coefficient determined by the correlation estimation device is reliable when no outliers are present in the data series. The determination comprising the use of a median further guarantees that the correlation estimation device performs a simple determination of a correlation coefficient whereby the complexity of the determination is greatly reduced. This reduces the costs associated with the implementation of the correlation estimation device and its operation. In addition, for example in the context of investment fund management, the correlation estimation device makes it possible to quickly and efficiently collect recently published financial results and have them interpreted by financial analysts. The correlation estimation apparatus according to the present invention indeed determines a correlation coefficient at a speed 135 times higher than the speed at which a correlation coefficient is determined with the MCD estimator and is therefore scalable for larger data sets. The correlation estimation apparatus according to the present invention can, for example, perform 8,000,000 correlations with a calculation time of about 40 minutes. This makes it possible to determine a correlation coefficient in real time, i.e. as soon as data samples from data series are available. This greatly reduces the time needed to analyze the correlation between data series, and thus makes it possible to quantify the impact of and respond to new publications much faster than what is now possible with existing state-of-the-art methods and systems of the technology.
[13] According to the present invention, the correlation estimation device comprises a first receiving interface and a second receiving interface. Alternatively, the second receiving interface may correspond to the first receiving interface. The median calculation engine determines a primary median factor for the primary ratio and a secondary median factor for the secondary ratio. In other words, the median calculation engine determines the median of the primary ratio and the median of the secondary ratio.
[14] According to the present invention, the first receiving interface receives a first data series, referred to as data series X, and the second receiving interface receives a second data series, referred to as data series Y. When a linear regression is performed from the second data series on the first data set, a factor β is determined by: Y = βΧ (4) [15] The best estimate of the factor β is:
(5) where covXY is the covariance between the first data series and the second data series and where σχ is the standard deviation of the first data series. By definition, the correlation coefficient between the first data series and the second data series is:
(6) [16] Equation (4) can now be reduced to:
(7) [17] and equation (7) can be rearranged as:
(8) [18] According to the present invention, the ratio determining unit determines a primary ratio by dividing the second data series by the first data series, and further determines a secondary ratio by dividing the first data series by the second data series. According to the present invention, the median calculation engine determines a primary median factor of the resulting data series - as an estimator of pXY * In a similar X σχ way, the median calculation engine determines a secondary median factor of the resulting data series ^ as an estimator of pXY * The correlation coefficient -
determining engine determines a product of the primary median factor and of the secondary median factor to obtain equation (9) and then equation (10) '(9) (10) [19] The correlation coefficient determining engine then determines the correlation coefficient pXY by the to determine the square root of equation (10) and to obtain the correlation coefficient of equation (11) that the output interface of the correlation estimation device performs:
(11) [20] According to an optional embodiment, the correlation estimation device further comprises an expectation value determining unit adapted to determine an expectation value of the first data set and / or the second data set.
[21] According to an optional embodiment, the correlation estimation device further comprises a standard deviation determining unit adapted to determine a standard deviation for the first data series and / or the second data series.
[22] According to an optional embodiment, the correlation estimation device further comprises a standardization module adapted to standardize the first data series and the second data series.
[23] The standardization module of the correlation estimation device is operatively linked to the expectation value determining unit and the standard deviation determining unit of the correlation estimation device. In this way, the standardization module sets the expectation value of the first data series and of the second data series determined by the expectation value determining unit to 0 and sets the standard deviation of the first data series and of the second data series determined by the standard deviation determination unit , to 1, and thereby standardizes both the first data series and the second data series. This makes determining the correlation coefficient even easier since the covariance between two standardized data sets is the same as the correlation between the two standardized data sets.
[24] According to an optional embodiment, the correlation estimation device further comprises the following: - a character identification unit, operatively linked to the correlation coefficient determining engine, adapted to detect a primary mathematical character of the primary median factor and a secondary mathematical character of the secondary median factor; and wherein the correlation coefficient determining engine is further configured to determine the correlation coefficient as being equal to 0 when the primary mathematical sign differs from the secondary mathematical sign.
[25] In this way, the correlation coefficient determining engine can always determine the square root of equation (10), ensuring that equation (11) is always valid and correct. This ensures that a correlation coefficient can always be determined by the correlation estimation device, even if the primary median factor and the secondary median factor differ from sign, i.e., even if the first data series and the second data series are hardly correlated.
According to a second aspect of the present invention, there is provided a computer-implemented method for estimating a correlation coefficient between data sets, comprising the steps of: - receiving a first data set; - receiving a second data series; - determining a primary ratio comprising dividing the second data series by the first data series; - determining a secondary ratio comprising dividing the first data series by the second data series; - determining a primary median factor for the primary ratio; - determining a secondary median factor for the secondary ratio; - determining a product of medians comprising multiplying the primary median factor by the secondary median factor; - determining the correlation coefficient between the first data series and the second data series comprising determining the square root of the medians product; and - performing the correlation coefficient.
[27] According to the present invention, a correlation coefficient between a first data series and a second data series is determined in a fast and efficient manner. The method according to the present invention guarantees that the correlation coefficient performed is robust both in the presence and in the absence of outliers in the first data series and / or the second data series. The determination of a correlation coefficient is indeed based on the determination of a first median factor and a second median factor. A determination comprising the use of median factors instead of average factors ensures that the correlation coefficient is robust when outliers are present in the data set, and also guarantees that the correlation coefficient is reliable when no outliers are present in the data set. The determination comprising the use of a median further guarantees that the method according to the present invention is simple, whereby the complexity of the determination is greatly reduced. This reduces the costs associated with the implementation of the method of the present invention. In addition, in the context of investment fund management, for example, recently published financial results and their interpretation by financial analysts can be collected quickly and efficiently. The method according to the present invention indeed determines a correlation coefficient at a speed 135 times higher than the speed at which a correlation coefficient is determined with the MCD estimator and is therefore scalable for large data sets. For example, the method of the present invention can perform 8,000,000 correlations with a calculation time of about 40 minutes. This makes it possible to determine a correlation coefficient in real time, i.e. as soon as data samples from data series are available. This greatly reduces the time needed to analyze the correlation between data series, and thus makes it possible to quantify the impact of and respond to new publications much faster than what is now possible with existing methods and systems being described in the prior art.
In addition, the present invention also relates to a computer program comprising software code adapted to perform the computer-implemented method of the present invention when executed by a computer system.
The invention further relates to a computer-readable storage medium comprising the computer program according to the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic representation of an embodiment of the correlation estimation apparatus of the present invention.
FIG. 2 is a schematic representation of the steps of an embodiment of the computer-implemented method according to the present invention.
FIG. 3 is a table summarizing examples of values of correlation coefficients determined on the basis of different methods and the time within which the values of the correlation coefficients were obtained.
FIG. 4 is a schematic representation of the standard deviation of the value 0.5 from the correlation coefficients calculated with an MCD estimator corresponding to the example illustrated in FIG. 3 and of the correlation coefficients calculated with the correlation estimation apparatus of the present invention corresponding to the example illustrated in FIG. 3.
FIG. 5 is a schematic representation of a suitable computer system to host the correlation estimation apparatus of Figure 1.
Detailed Description of the Embodiment (s) [35] According to an embodiment shown in FIG. 1, a correlation estimation device 100 according to the present invention comprises a first receiving interface 101, a second receiving interface 102, a ratio determining unit 103, a median calculation engine 104, a correlation coefficient determining engine 105 and an output interface 106. The first receiving interface 101 receives a first data series 201. The second receiving interface 102 receives a second data sequence 202. According to an alternative embodiment, the second receiving interface 102 may correspond to the first receiving interface 101. The determining unit 103 receives the first data sequence 201 from the first receiving interface 101 and the second data sequence 202 from the second receiving interface 102. The ratio-determining unit 103 then calculates a primary ratio 301 comprising dividing the second data series 202 by the first data series 201. The ratio-determining unit 103 also calculates a secondary ratio 302 comprising dividing the first data series 201 by the second data series 202. The median calculation engine 104 receives the primary ratio 301 and the secondary ratio 302 from the ratio determining unit 103. The median calculation engine 104 then determines a primary median factor 401 for the primary ratio 301 and further determines a secondary median factor 402 for the secondary ratio 302. The correlation coefficient determining engine 105 receives the primary median factor 401 and the secondary median factor 402 from the median calculation engine 104. The correlation coefficient determining engine 105 determines a product of median 500 comprising multiplying the primary median factor 401 by the secondary median factor 402. The correlation coefficient determining engine 105 further determines a correlation coefficient 1 between the first data series 201 and the second data series 202 comprising determining the square root of the product from medians 500. The output interface 106 receives the correlation coefficient 1 and outputs the correlation coefficient 1.
[36] InFig. 1, the embodiment 100 of the correlation estimation apparatus 100 according to the present invention further comprises an expectation value 107 determination unit, a standard deviation determination unit 108, and a standardization module 109. The expectation value determination unit 107 receives the first data series 201 and the second data series 202 from the first receiving interface 101 and the second receiving interface 102. The expectation value determining unit 107 determines the expectation values 601, 602 of the first data series 201 and the second data series 202, respectively. The standard deviation 108 determining unit receives the first data series 201 and the second data series 202 from the first receiving interface 101 and the second receiving interface 102. The standard deviation 108 determining unit determines the standard deviations 701, 702 of the first data series 201 and the second data series 202, respectively The standardization module 109 receives the first data series 201 and the second data series 202 from the first reception interface 101 and the second reception interface 102, respectively. The standardization module 109 sets the expectation values 601, 602 of the first data series 201 and the second data series 202 to equal to 0 and sets the standard deviations 701, 702 of the first data series 201 and the second data series 202 to 1 to standardize the first data series 201 and the second data series 202. The ratio determining unit 103 then receives the standardized first data series 201 and the standardized second data series 202 from the standardization module 109, and further determines a primary ratio 301 comprising dividing the standardized second data series 202 by the standardized first data series 201 and a secondary ratio 302 comprising dividing the standardized first data set 201 by the standardized second data set 202.
[37] InFig. 1, the embodiment 100 of the correlation estimation apparatus 100 according to the present invention comprises a character identification unit 110. The character identification unit 110 receives the primary median factor 401 and the secondary median factor 402 from the median calculation engine 104. The character identification unit 110 detects a primary mathematical sign 801 of the primary median factor 401 and a secondary mathematical sign 802 of the secondary median factor 402. The correlation coefficient determining engine 105 receives the primary mathematical sign 801 and the secondary mathematical sign 802 from the sign identification unit 110 and determines the correlation coefficient 1 as equal to 0 when the primary mathematical character 801 is different from the secondary mathematical character 802.
An embodiment of the steps of the method for estimating a correlation coefficient 1 according to the present invention is illustrated in FIG. 2. In step 10, a first data set 201 is received. In step 11, after or simultaneously with step 10, a second data sequence 202 is received. In step 12, after steps 10 and 11, a primary ratio 301 is determined, comprising dividing the second data series 202 by the first data series 201. In step 13, after or simultaneously with step 12, a secondary ratio 302 is determined, comprising dividing the first data set 201 by the second data set 202. In step 14, after steps 12 and 13, a primary median factor 401 is determined for the primary ratio 301, including determining the median of the primary ratio 301. In step 15 , after or simultaneously with step 14, a secondary median factor 402 is determined for the secondary ratio 302, including determining the median of the secondary ratio 302. In step 16, a product of medians 500 is determined, comprising multiplying the primary median factor 401 with the secondary median factor 402. A correlation coefficient 1 between the first data series 201 and the second data series 202 is then determined in step 17 comprising determining the square root of the product of medians 500. In step 18, the correlation coefficient 1 is performed.
In FIG. 3, examples of values of correlation coefficients, determined by different methods / devices, and the time in seconds within which the correlation coefficients are obtained are compared with each other. Two standardized and normally distributed data sets are generated with a predetermined correlation of 0.5. Each data series comprises 300 data values. The two data series are then corrupted with outliers: four data values in each data series are randomly selected and intentionally replaced by an extreme value of 5 or -5. A correlation coefficient between the two data sets is then calculated at each iteration according to different methods listed in the current state of the art and according to the method of the present invention. An average deviation of the obtained correlation coefficients from the predetermined correlation of 0.5 is then listed in columns 22 and 32 of the table 2 shown in FIG. 3. Table 2 summarizes the results of 100,000 simulations in the presence of outliers. In other words, two random data sets were generated 100,000 times with a predetermined correlation of 0.5, the data sets were corrupted each time according to the above corruption and their correlation coefficient was determined each time. In addition, the behavior of the estimators in the absence of outliers is also studied to test their reliability. Table 2 also summarizes the results of 100,000 simulations in the absence of outliers. In other words, two random data sets were generated 100,000 times with a predetermined correlation of 0.5, the two data sets were not corrupted at any time and their correlation coefficient was determined each time. Five estimators are compared: Pearson's product moment 40; Spearman's rank 41; a combination of Pearson's product moment 40 with the median estimate of the two data series 42 instead of the average of the data series corresponding to the product of a first data series and a second data series; the MCD estimator 43; and the correlation estimation apparatus 100 of the present invention. In column 20, the average calculated values 21 of the correlation coefficients are estimated in the absence of outliers, and the average deviation 22 of 0.5 as well as the time within which the values 21 were estimated are determined. In column 30, the average calculated values 31 of the correlation coefficients are estimated in the presence of outliers, and the average deviation 32 of 0.5 as well as the time 33 within which the values 31 were estimated are determined.
[40] In the presence of outliers in column 30, Pearson's product moment estimator 40 clearly fails. In column 31, corruption of the data series with outliers inevitably causes the average value of the correlation coefficients determined with Pearson's product moment estimator 40 to deviate from 0, and the average estimate of the correlation coefficient with Pearson's product moment 40 is therefore too low and reaches an average value of 0.368. Spearman's rank estimator 41, believed to be more robust than Pearson's product moment 40 in the presence of outliers, is indeed more robust, but still partially shows a similar deviation from the correlation to 0 and reaches an average value of 0.456. The estimate of the median 42 fails completely and reaches an average correlation coefficient of 0.1657. The MCD estimator 43 manages to estimate a correlation coefficient value that is close to 0.5 and reaches an average value of 0.4998. The very good estimate of MCD estimator 43 can be explained by the fact that the method of MCD estimator 43 excludes some data values from the data series before the correlation coefficient estimation is performed, for example the extreme data values from the data series that usually correspond to outliers. Finally, the correlation estimation device 100 according to the present invention shows its accuracy, relevance and efficiency in the presence of outliers in column 30. Indeed, the correlation estimation device 100 determines an average value 31 of the correlation coefficient of 0.4798, which is a much better result is then Spearman's rank estimator 41. In addition, the correlation estimation device 100 more quickly determines an average value 31 of the correlation coefficient of 0.4798 than Spearman's rank estimator 41 determines an average value 31 of the correlation coefficient of 0.4984.
[41] In the absence of outliers in column 20, Pearson's product-moment estimator 40 is remarkable in accuracy and speed and estimates an average correlation coefficient in column 21 equal to 0.4991 in 0.0001 seconds. The median 42 estimate fails completely and reaches an average correlation coefficient of 0.1664. The MCD estimator 43 and the correlation estimation device 100 of the present invention show excellent results in estimating the correlation coefficients and provide an average value 21 of 0.4979 and 0.4977, respectively. However, one advantage of the correlation estimation device 100 according to the present invention is that the correlation estimation device 100 exhibits a lower average deviation from 0.5 than the MCD estimator 43.
[42] The main advantage of the correlation estimation device 100 in the absence of outliers in column 20 is the time 23 at which the correlation coefficients are determined. Indeed, the correlation estimation apparatus 100 of the present invention determines correlation coefficients at a rate 135 times higher than the more robust correlation estimator, i.e., the MCD estimator 43, while still providing values 21 of the correlation coefficients that are comparable to those values are supplied by the MCD estimator 43. This gain in time 23 makes it possible to determine correlation coefficients in real time. For example, 8,000,000 correlations are determined in 40 minutes by the correlation estimation apparatus 100 of the present invention. If these 8,000,000 correlations were to be performed 135 times slower, 4 full days would be needed to perform the determinations. Moreover, the method according to the present invention is fully adaptable and can be fully implemented in simple architectures and software programs such as Excel, which is impossible with estimators such as the MCD estimator 43.
[43] Combining Pearson's product-moment estimator 40 with a determination of the medians 42 instead of determining the average values of the data series corresponding to the product of a first data series and a second data series does not yield relevant results as is visible in line 42 of Table 2 in FIG. 3. In the presence of outliers in column 30, this combined estimator of the median 42 completely fails and reaches an average correlation coefficient of 0.1657. In the absence of outliers in column 20, this combined estimator of the median 42 completely fails and reaches an average correlation coefficient of 0.1664.
The results of the 100,000 simulations summarized in FIG. 3 is plotted in FIG. 4 for two correlation coefficient estimators. In the absence of outliers in column 20 of the table in FIG. 3, the results of the 100,000 determinations of the correlation coefficients estimated by the MCD estimator 43 and the correlation estimation device 100 are plotted. The results of the MCD estimate 43 show more extreme deviations of 0.5 than the results of the correlation estimation device 100, which emphasizes the advantageous accuracy and reliability of the correlation estimation device 100 according to the present invention.
FIG. 5 shows a suitable computer system 800 for hosting the correlation estimation device 100 of FIG. A computer system 800 can generally be configured as a suitable general purpose computer and a bus 510, a processor 502, a local memory 504, one or more optional input interfaces 514, one or more optional output interfaces 516, a communication interface 512, a storage element interface 506 and one or more storage elements 508. Bus 510 can include one or more conductors that allow communication between the components of the computer system. Processor 502 can include any type of conventional processor or microprocessor that interprets and executes programming instructions. Local memory 504 may include a Random Access Memory (RAM) or any other type of dynamic storage device that stores information and instructions to be executed by processor 502 and / or a Read Only Memory (ROM) or any other type of static storage device that has static stores information and instructions for use by processor 504. Input interface 514 may include one or more conventional mechanisms that allow an operator to input information into computer system 800, such as a keyboard 520, a mouse 530, a pen, voice recognition and / or or biometric mechanisms, etc. Output interface 516 may include one or more conventional mechanisms that output information to the operator, such as a display 540, a printer 550, a speaker, etc. Communication interface 512 may include any transceiver-like mechanism such as two 1 Gb Ethernet interfaces enabling computer system 800 to communicate with others devices and / or systems, for example mechanisms for communicating with one or more other computer systems 900. The communication interface 512 of computer system 800 can be connected to such other computer system by means of a LAN (Local Area Network) or WAN (Wide Area Network). ), such as the Internet, in which case the other computer system 580 may comprise, for example, a suitable Web server. Storage element interface 506 may include a storage interface such as a Serial Advanced Technology Attachment (SATA) interface or a Small Computer System Interface (SCSI) to connect bus 510 to one or more storage elements 508, such as one or more local disks, e.g. 1TB SATA disk drives and control reading and writing of data to and / or from these storage elements 508. Although the storage elements 508 are described above as a local disk, generally any other computer-readable media can also be used, such as a removable magnetic disk, optical storage media such as a CD or DVD-ROM, SSDs, flash memory cards, etc. The system 800 described above can also be run on top of the physical hardware as a Virtual Machine.
The correlation estimation device 100 of FIG. 1 can be implemented as programming instructions stored in local memory 504 of the computer system 800 to be executed by its processor 502. Alternatively, the correlation estimation device 100 of FIG. 1 are stored on the storage element 508 or are accessible from another computer system 900 via the communication interface 512.
Although the present invention has been illustrated with reference to specific embodiments, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be practiced with various modifications and modifications without leaving the scope of the invention. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being described by the appended claims and not by the foregoing description, and all modifications falling within the meaning and scope of the claims, are therefore included here. In other words, it is assumed that this covers all changes, variations or equivalents that fall within the scope of the underlying basic principles and whose essential attributes are claimed in this patent application. In addition, the reader of this patent application will understand that the words "comprising" or "include" do not exclude other elements or steps, that the word "a" does not exclude a plural, and that a single element, such as a computer system, a processor or other integrated unit can fulfill the functions of different tools mentioned in the claims. Any references in the claims should not be construed as limiting the claims in question. The terms "first", "second", "third", "a", "b", "c" and the like, when used in the description or in the claims, are used to distinguish between similar elements or steps and do not necessarily describe a sequential or chronological order. Similarly, the terms "top", "bottom", "over", "under" and the like are used for the purposes of the description and do not necessarily refer to relative positions. It is to be understood that those terms are interchangeable under proper conditions and that embodiments of the invention are capable of functioning in accordance with the present invention in sequences or orientations other than described or illustrated above.
权利要求:
Claims (8)
[1]
CONCLUSIONS
A correlation estimation device (100) for determining a correlation coefficient (1) between data sets (201; 202), said correlation estimation device (100) comprising: - a first receiving interface (101) adapted to a first data series (201) receive; - a second receiving interface (102) adapted to receive a second data set (202); - a ratio determining unit (103) operatively linked to said first receiving interface (101) and to said second receiving interface (102) configured to determine the following: o a primary ratio (301) comprising sharing said second data series (202) by said first data set (201); and a secondary ratio (302) comprising dividing said first data series (201) by said second data series (202); - a median calculation engine (104) operatively coupled to said ratio determining unit (103) configured to determine the following: o a primary median factor (401) for said primary ratio (301); and o a secondary median factor (402) for said secondary ratio (302); - a correlation coefficient determining engine (105) operatively linked to said median calculation engine (104) configured to determine the following: o a product of medians (500) comprising multiplying said primary median factor (401) by said secondary median factor (402); and o said correlation coefficient (1) between said first data series (201) and said second data series (202) comprising determining the square root of said median product (500); and - an output interface (106) adapted to output said correlation coefficient (1).
[2]
A correlation estimation device (100) according to claim 1, wherein said correlation estimation device (100) further comprises an expectation value determination unit (107) adapted to an expectation value (601; 602) of said first data series (201) and / or determining said second data set (202).
[3]
A correlation estimation device (100) according to claim 1, wherein said correlation estimation device (100) further comprises a standard deviation determining unit (108) adapted to a standard deviation (701; 702) for said first data series (201) and / or determining said second data set (202).
[4]
A correlation estimation device (100) according to claims 2 and 3, wherein said correlation estimation device (100) further comprises a standardization module (109) adapted to standardize said first data series (201) and said second data series (202).
[5]
A correlation estimation device (100) according to any one of the preceding claims, wherein said correlation estimation device (100) further comprises: - a sign identification unit (110) operatively coupled to said correlation coefficient determining engine (105), adapted to detect a primary mathematical sign (801) of said primary median factor (401) and a secondary mathematical sign (802) of said secondary median factor (402); and wherein said correlation coefficient determining engine (105) is further configured to determine said correlation coefficient (1) as being equal to 0 when said primary mathematical sign (801) is different from said secondary mathematical sign (802).
[6]
A computer-implemented method for estimating a correlation coefficient (1) between data sets (201; 202), said method comprising the steps of: - receiving a first data set (201); - receiving a second data set (202); - determining a primary ratio (301) comprising dividing said second data series (202) by said first data series (201); - determining a secondary ratio (302) comprising dividing said first data series (201) by said second data series (202); - determining a primary median factor (401) for said primary ratio (301); - determining a secondary median factor (402) for said secondary ratio (302); - determining a product of medians (500) comprising multiplying said primary median factor (401) by said secondary median factor (402); - determining said correlation coefficient (1) between said first data series (201) and said second data series (202) comprising determining the square root of said median product (500); and - performing said correlation coefficient (1).
[7]
A computer program comprising software code adapted to perform the computer-implemented method according to claim 6 when executed by a computer system.
[8]
A computer-readable storage medium comprising computer-executable instructions that, if executed by a computer system, perform the method of claim 6.
类似技术:
公开号 | 公开日 | 专利标题
JP6771751B2|2020-10-21|Risk assessment method and system
CA2845743C|2020-03-31|Resolving similar entities from a transaction database
Horváth et al.2010|Testing the stability of the functional autoregressive process
WO2007133685A2|2007-11-22|Collaterized debt obligation evaluation system and method
CN107045503B|2019-03-05|A kind of method and device that feature set determines
Kaeck et al.2013|Stochastic Volatility Jump‐Diffusions for European Equity Index Dynamics
US8572095B2|2013-10-29|Adaptive object identification
US20140379310A1|2014-12-25|Methods and Systems for Evaluating Predictive Models
US10032231B2|2018-07-24|Inferred matching of payment card accounts by matching to common mobile device via time and location data analysis
US10789225B2|2020-09-29|Column weight calculation for data deduplication
Hjelm et al.2005|A Monte Carlo study on the pitfalls in determining deterministic components in cointegrating models
CN110717824A|2020-01-21|Method and device for conducting and calculating risk of public and guest groups by bank based on knowledge graph
BE1023099B1|2016-11-21|A DEVICE FOR CORRELATION ESTIMATION AND THE RELATED METHOD
Kapetanios et al.2021|Detection of units with pervasive effects in large panel data models
JPH11175602A|1999-07-02|Credit risk measuring device
Chen et al.2017|Inference for a mean-reverting stochastic process with multiple change points
EP3163463A1|2017-05-03|A correlation estimating device and the related method
Bittencourt et al.2020|Evaluating company bankruptcies using causal forests
WO2019192136A1|2019-10-10|Electronic device, financial data processing method and system, and computer-readable storage medium
JP2021135541A|2021-09-13|Model generator, model generation method, and model generation program
Han et al.2016|Using Source Code and Process Metrics for Defect Prediction-A Case Study of Three Algorithms and Dimensionality Reduction.
US10867249B1|2020-12-15|Method for deriving variable importance on case level for predictive modeling techniques
Olaniyan et al.2019|Predicting S&P 500 based on its constituents and their social media derived sentiment
Gyamfi et al.2019|Further evidence on the validity of purchasing power parity in selected African countries
KR102345267B1|2021-12-31|Target-oriented reinforcement learning method and apparatus for performing the same
同族专利:
公开号 | 公开日
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

法律状态:
优先权:
申请号 | 申请日 | 专利标题
EP15191429.8|2015-10-26|
[返回顶部]