澳大利亚专利AU2013217310A1 Interactive content search using comparisons

专利PDF首页>>澳大利亚专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
In interactive content search through comparisons, a search for a target object in a database is performed by finding the object most similar to the target from a small list of objects. A new object list is then presented based on the earlier selections. This process is repeated until the target is included in the list presented, at which point the search terminates. A solution to the interactive content search problem is provided under the scenario of
公开号:AU2013217310A1
申请号:U2013217310
申请日:2013-02-06
公开日:2014-08-14
发明作者:Efstratios Ioannidis；Laurent Massoulie
申请人:Thomson Licensing；
IPC主号:G06F17-30

专利说明:
WO 2013/119626 PCT/US2013/024881 PU120027 INTERACTIVE CONTENT SEARCH USING COMPARISONS CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Application Serial No. 61/595502, filed February 6, 2012, which is incorporated by reference herein in its entirety. TECHNICAL FIELD The present principles relate to interactive content search through comparisons. BACKGROUND OF THE INVENTION Content search through comparisons is a special case of nearest neighbor search (NNS). The principles described herein extend earlier work by considering the NNS problem for objects embedded in a metric space. It is also assumed that the embedding has a small intrinsic dimension, an assumption that is supported by many practical studies. Prior works consider navigating nets, a deterministic data structure for supporting NNS in doubling metric spaces. A similar technique has also been considered for objects embedded in a space satisfying a certain sphere packing property, while other work has relied on growth restricted metrics. All of the above assumptions have connections to the doubling constant considered herein. In all of the previous work, the demand over the target objects is assumed to be homogeneous. NNS with access to a comparison oracle has been studied previously. A considerable advantage of previous studies is that the assumption that objects are a-priori embedded in a metric space is removed; rather than requiring that similarity between objects is captured by a distance metric, the prior works only assume that any two objects can be ranked in terms of their similarity to any target by the comparison oracle. Nevertheless, these works also assume homogeneous demand, so the principles herein are an extension of searching with comparisons to heterogeneity. In this respect, a heterogeneous demand distribution is a starting 1 WO 2013/119626 PCT/US2013/024881 PU120027 point for the principles herein. Under the assumptions that a metric space exists and the search algorithm is aware of it, the present principles improve average search cost. The main problem some prior works is that their approach is memoryless, i.e., it does not make use of previous comparisons, whereas the present principles solve this problem by deploying an E-net data structure. Pairwise comparisons between images has been previously proposed. It was then extended to the context of content search. The use of comparison oracle is not limited only to content retrieval/search. An individuals' rating scale tends to fluctuate a lot. In addition, ratings scales may vary between people. For these reasons it is more natural to use the pairwise comparisons as the basis for the recommendation systems. The advantages of this approach and the challenges of how to make such a system operational have been well described. SUMMARY OF THE INVENTION These and other drawbacks and disadvantages of the prior art are addressed by the present principles, which are directed to a method for interactive content search through comparisons. According to an aspect of the present principles, there is provided a method for searching content within a data base. The method is comprised of steps for constructing a net having a size containing a target, choosing a plurality of exemplars, comparing each exemplar with every other exemplar, and determining the exemplar closest to the target. The method is further comprised of steps of reducing the size of the net to a smaller size that contains the target. The method is further comprised of a step of repeating the choosing, comparing, determining, and reducing steps until the size of the net is small enough to locate the target. According to another aspect of the present principles, there is provided an apparatus for searching content within a data base. The apparatus is comprised of a computer that performs the steps comprising the method described herein. The computer can be comprised of circuitry to construct a net having a size that contains a target. The computer can also be comprised of circuitry to choose a plurality of exemplars, and comparator circuitry that operates on the exemplars. 2 WO 2013/119626 PCT/US2013/024881 PU120027 The computer also comprises a determining circuit that finds the exemplar closest to the target and circuitry to reduce the size of the net to a smaller size that contains the target. The computer also comprises control circuitry to cause the circuitry to construct a net, the circuitry to choose exemplars, the comparator circuitry, the determining circuitry, and the circuitry to reduce the size of the net to repeat their operation if a terminal condition has not been reached. These and other aspects, features and advantages of the present principles will become apparent from the following detailed description of exemplary embodiments, which are to be read in connection with the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 shows one embodiment of a method for performing a content search under the present principles. Figure 2 shows an apparatus for performing a content search under the present principles. Figure 3 shows an exemplary embodiment of elements comprising the apparatus of Figure 2. DETAILED DESCRIPTION OF THE INVENTION The present principles are directed to a method and apparatus for interactive content search through comparisons. The method is termed "interactive" because there are repeated stages of interacting with the results of a previous stage. The method navigates through a database of objects (e.g., objects, pictures, movies, articles, etc.) having certain measureable characteristics using comparisons. In particular, the method determines, from two objects at a time, the one closest to the target (e.g., a picture or movie or article, etc.) Closeness to the target, i.e. distance, can be measured in a number of ways, such as absolute difference, sum of absolute differences, etc. Based on the selection, the method selects a new pair of objects, and the process is repeated in similar stages until the pair of objects contains the desired target. In each stage, a small list of objects is presented for comparison. One object among the list is selected as the object closest to the target; a new object list is then 3 WO 2013/119626 PCT/US2013/024881 PU120027 presented based on earlier selections. This process continues until the target is included in the list presented, at which point the target is found and the search terminates. In an alternative embodiment, the process can be repeated for a certain number of iterations, or until the selected object is within a threshold distance of the desired target. Also, an alternative method can be used to locate the target within the net after the net has been reduced so that all of its objects are within a threshold distance of the target. The method requires: 1) A metric embedding of the objects, i.e., a representation of the objects in a metric space describing their features. For example, this could be the pixel values of the image objects. The distance in this metric space captures how "similar" or "close" objects are. 2) The results of the comparisons at each stage indicating which objects are closest to the target At each stage, the method generates a new pair of objects to propose as target possibilities. The proposed objects can be used in a next iteration of the method, or if they contain the target or are close enough to a desired target, the search can be stopped. In simple terms, the method constructs a tree that organizes objects in a hierarchy. Nodes in this tree at that lie in the same level "cover" roughly equal sized regions of the metric space in which objects are represented. The method proceeds by proposing pairs of objects in the first layer of the tree: identifying which of the objects in this level of the tree is closest to the target narrows down the selection of objects that lie below this object in the hierarchy. The method then proceeds recursively by proposing pairs of objects among the children of this node. The proposed method has the following properties: 1) It finds the sought out object quickly, within a few pairs proposed. 2) The guarantees that it provides work for non-homogenous demand: that is, it works even if some objects are more likely to be chosen than others. 4 WO 2013/119626 PCT/US2013/024881 PU120027 Compared to earlier work in this area, the present method has better guarantees, so that it finds objects faster. The present method requires knowledge of the entire metric space, whereas earlier methods required knowledge of the order of distances between objects and a target, although not the exact numerical values of these distances. The present method does not require knowledge of the likelihood an object may be chosen, while earlier methods do. The present method also implements a fundamentally different algorithm than earlier work in this area. This kind of interactive navigation, also known as exploratory search, has numerous real-life applications. One example is navigating through a database of pictures of people photographed in an uncontrolled environment, such as the databases Fickr or Picasa. Automated methods may fail to extract meaningful features from such photos. Moreover, in many practical cases, images that present similar low-level descriptors (such as SIFT features) may have very different semantic content and high level descriptions, and thus be perceived differently by users. On the other hand, a human searching for a particular person can easily select from a list of pictures the subject most similar to the person she has in mind. Formally, the behavior of a human user can be modeled by a so-called comparison oracle. In particular, assume that that the database of pictures is represented by a set . endowed with a distance metric d. This metric captures the "distance" or "dissimilarity" between pictures of different people. The oracle/human has a specific target t eN in mind, and can answer questions of the following kind: "Between two objects x and y in N, which one is closest to t under the metric d " The goal of interactive content search through comparisons is thus to find a sequence of proposed pairs of objects to the oracle/human that leads the target object with as few queries as possible. The principles described herein consider the problem under the scenario of heterogeneous demand, where the target object t eN is sampled from a probability distribution p. In this setting, interactive content search through comparisons has a strong relationship to the classic "twenty-questions game" problem. In particular, a membership oracle is an oracle that can answer queries of the following form: "Given a subset A 9N , does t belong to A " 5 WO 2013/119626 PCT/US2013/024881 PU120027 It is well known that to find a target t one needs to submit at least H(p) queries, on average, to a membership oracle, where H(p) is the entropy of p. Moreover, there exists an algorithm (Huffman coding) that finds the target with only H(p) + 1 queries on average. Content search through comparisons departs from the above setup in assuming that the database K is endowed with the metric d. A membership oracle is stronger than a comparison oracle as, if the distance metric d is known, comparison queries can be simulated through membership queries. On the other hand, a membership oracle is harder to implement in practice: unless A can be expressed in a concise fashion, a user will answer a membership query in linear time in |Al. This is in contrast to a comparison oracle, for which answers can be given in constant time. In short, our study of search through comparisons seeks similar performance bounds to the classic setup (a) for an oracle that is easier to implement and (b) under an additional assumption on the structure of the database (namely, that it is endowed with a distance metric). Intuitively, the performance of searching for an object through comparisons will depend not only on the entropy of the target distribution, but also on the topology of the target set , as described by the metric d. In particular, it has been established that Q(cH(p)) queries are necessary, in expectation, to locate a target using a comparison oracle, where c is the so-called doubling-constant of the metric d. Moreover, a scheme exists that locates the target in O(c 3 H log(1/*)) queries, in expectation, where p* = minxe : p(x). Under the principles herein, an improvement on the previous bound is achieved by proposing an algorithm that locates the target with O(c 5 H(p)) queries, in expectation. DEFINITIONS AND NOTATION Consider a set of objects A', where I I = n. We assume that there exists a metric space (MI,d), where d(xy) denotes the distance between xy e M, such that objects in AYare embedded in (M-,d): i.e., there exists a one-to-one mapping from Vto a subset of :V. The objects in !'may represent, for example, pictures in a database. The metric embedding can be thought of as a mapping of the database entries to a set of features (e.g., the age of person depicted, her hair and eye color, etc.). The distance between 6 WO 2013/119626 PCT/US2013/024881 PU120027 two objects would then capture how "similar" two objects are w.r.t. these features. In what follows, some notation will be written as c- A, keeping in mind that there might be difference between the physical objects (the pictures) and their embedding (the attributes that characterize them). A. Comparison Oracle A comparison oracle is an oracle that, given two objects xy and a target t, returns the closest object to t. More formally, Observe that if x = Oracle(x,y,t) then d(xt) 5 d(yt); this does not necessarily imply however that d(xt) < d(yt). It is important to note here that although it is written Oracle(x,y,t) to stress that a query always takes place with respect to some target t, in practice the target is hidden and only known by the oracle. Alternatively, following the "oracle as human" analogy, the human user has a target in mind and uses it to compare the two objects, but never discloses it until actually being presented with it. B. Demand, Entropy and Doubling Constant A probability distribution p over the set of objects in NVwhich can be called the demand. In other words, p will be a non-negative function such that I tE :p(t) = 1. In general, the demand can be heterogeneous as p(t) may vary across different targets. The target distribution p will play an important role in the following analysis. In particular, two quantities that affect the performance of searching in the described scheme will be the entropy and the doubling constant of the target distribution. These two notions are defined formally below. The entropy of p is defined as H(p) = Xesupp(p)yp(x)Iog , (2) where supp(p) is the support of p. The max-entropy of p is defined as 7 WO 2013/119626 PCT/US2013/024881 PU120027 (3) Hmax(P) = maxxesupp(p) log (3) Given an object x E;V , the closed ball of radius R 0 around x is denoted by Bx(R) = {y e M4 d(xy) 5 R} (4) Given a set A cA" let The doubling constant c(p) of a distribution P is defined to be the minimum c > 0 for which p(Bx(2R)) 5 c -p(Bx(R)), (5) for any x esupp(p) and any R 0. Moreover, it can be said that P is c-doubling if c(p) = C. Note that, contrary to the entropy H(p), the doubling constant c(p) depends on the topology of supp(p), determined by the embedding of Nin the metric space (A,d). TABLE I SUMMARY OF NOTATION Set of objects (: M,d) Metric space d(x,y) Distance between xy EM P The demand distribution H(p) The entropy of p Hmax(p) The max-entropy of p Bx(r) The ball of radius r centered at x c(p) The doubling constant of p In formulating the problem, the notation of prior works in this area is followed. Given access to a comparison oracle, it is desired to navigate through NAuntil a target object is found. In particular, a greedy content search is defined as follows. Let t be the 8 WO 2013/119626 PCT/US2013/024881 PU120027 target object and s some object that serves as a starting point. The greedy content search algorithm proposes an object w and asks the oracle to select, between s and w, the object closest to the target t, i.e., it evokes Oracle(s,wt). This process is repeated until the oracle returns something other than s, i.e., the proposed object is "more similar" to the target t. Once this happens, say at the proposal of some w', if w' t, the greedy content search repeats the same process now from w'. If at any point the proposed object is t, the process terminates. More formally, let xkyk be the k-th pair of objects submitted to the oracle: xk is the current object, which greedy content search is trying to improve upon, and yk is the proposed object, submitted to the oracle for comparison with xk. Let be the oracle's response, and define to be the sequence of the first k inputs given to the oracle, as well as the responses obtained. hk is the "history" of the content search up to and including the k-th access to the oracle. The starting object is always one of the first two objects submitted to the oracle, i.e., x 1 = s. Moreover, in greedy content search, i.e., the current object is always the closest to the target among the ones submitted so far. On the other hand, the selection of the proposed object yk.1 will be determined by the history Nk and the object xk. In particular, given tk and the current object xk there exists a mapping (Hgxk)'+ (kkxk) eX such that yk.1 = .(-k,xk), k = 0,1,..., where here xo = s eA (the starting object) and ,o = 0 (i.e., before any comparison takes place, there is no history). The mapping Y is called the selection policy of the greedy content search. In general, if the selection policy is allowed to be randomized; in this case, the object returned by F('Kxk) will be a random variable, whose distribution 9 WO 2013/119626 PCT/US2013/024881 PU120027 Pr(r( k,x) = w), w c (6) is fully determined by (ikxk). Observe that Jdepends on the target t only indirectly, through xk and xk; this is consistent with the assumption that t is only "revealed" when it is eventually located. A selection policy is said to be memoryless if it depends on xk but not on the history k. In other words, the distribution is the same when xk = x e A , irrespective of the comparisons performed prior to reaching xk. Assuming that when xk = t, the search effectively terminates (i.e., the human reveals that this is indeed the target), the desired goal is to select I so that the number of accesses to the oracle is minimized. In particular, given a target t and a selection policy Y, the search cost is defined: to be the number of proposals to the oracle until t is found. This is a random variable, as 7 is randomized; let E[C.-(t)] be its expectation. The Content Search Through Comparisons problem is then defined as follows: CONTENT SEARCH THROUGH COMPARISONS (CSTC): Given an embedding of k:Ninto (A,d) and a demand distribution p(t), select F that minimizes the expected search cost Note that, as y is randomized, the free variable in the above optimization problem is the distribution. A LOWER BOUND AND A MEMORYLESS ALGORITHM A lower bound on the expected number of queries that one needs to submit to a comparison oracle to locate a target t has been established previously by the inventors. Theorem 1. For any integer K and D, there exists a metric space ( M,d) and a target measure p with entropy H(p) = K log(D) and doubling constant c(p) = D such that the average search cost of any selection policy satisfies 10 WO 2013/119626 PCT/US2013/024881 PU120027 (7) Interestingly, a simple memoryless selection policy satisfies an upper bound that is within an O(C 2 (p)Hmax(p)) factor of this bound. Algorithm 1. Memoryless Content Search Input: oracle(.,.,t) , demand distribution p, starting object s. Output: target t. 1: x <- s 2: while x/t do 3: Sample y cN" from the probability distribution .... .... .... .... .... ... ( 8 ) 4: x <- Oracle(x,y,t). 5: end while Theorem 2. The expected search cost of AlgorithmI is bounded by C., 6c 3 (p) H(p) Hmax(p). There are several interesting observations to be made about Algorithm 1. To begin, the memoryless selection policy has the following appealing properties. For two objects yz that have the same distance from x, if p(y) > p(z) then y has a higher probability of being proposed. When two objects yz are equally likely to be targets, if d(y,x) < d(zx) then y has a higher chance of being proposed. The distribution (8) thus biases both towards objects close to x as well as towards objects that are likely to be targets. Moreover, in implementing the policy outlined in Algorithm 1, it is assumed that, at each x, a random y can be sampled from distribution (8). This assumes that the distribution p and the embedding A (or the distance metric d) are a-priori known. However, it is in fact true that Algorithm 1 can be implemented even if only the ordering relationships between objects, rather than their actual distances between targets, are 11 WO 2013/119626 PCT/US2013/024881 PU120027 known. This is important, as the latter can be obtained by only accessing a comparison oracle. In particular, all such ordering relationships can be revealed by asking I A'Jlog I I| oracle queries offline (e.g., during a training phase). As noted, the main discrepancy factor between the upper bound in Theorem 2 and the lower bound in Theorem 1 is of the order of C 3 Hmax. The next result, appearing in the next section eliminates the Hmax term at the expense of a dependence on the doubling dimension through an O(c 5 ) term. AN ALGORITHM BASED ON E-NETS The objective in this section is to establish that comparison-based search can compete in identifying an object target t e initially sampled according to probability distribution p in a number of steps C, whose average value C, verifies for some fixed exponent k to be identified. To this end, a number of intermediate results are established. A. E-Nets E-Nets are defined as follows: Definition 1. An E-net of a subset A c ; is a maximal collection of points {x,...,xk} of A such that for ifj, d(xi,x;) > E. In order to construct an E-net, one needs to have access to the underlying metric space and the distance d between any two points. The construction of the net can happen in a greedy fashion in O(KIAI) time, where K the size of the E-net. There are in fact efficient algorithms that can construct such nets. Lemma 1. Given a ball Bx(R) cA ,and an integer f > 0, any (R2)-net {xl,...,x} of Bx(R) is such that: (9) 12 WO 2013/119626 PCT/US2013/024881 PU120027 and for all i j ~ ~ ~ (1 0) Moreover, the cardinality k of any such (R/2')-net is at most c'*3. Proof: If (9) does not hold, then there exists y in B,(R) such that d(y,x;) > R2' for all i= 1,...,k. This contradicts the maximality of {x1,...,xk}. For all i/j, any point z in the intersection Bx;(R2'' 1 ) n Bx;(R2'' ) is such that This contradicts the property that d(x,x;) > R21, hence the intersection Bx;(R2' 1 ') n Bx;(R/2''1) is necessarily empty. Finally, property (10) implies On the other hand, applying [+ 2 times the fact that p is c-doubling, then for all i= because of the fact that Bx(R) c Bx;(2R), which follows from x; e Bx(R). To conclude, note that Then: The upper bound k 5 c'' 3 follows immediately. The following is now necessary: Lemma 2. Let 6 e (0,1 ) verify 6 > V3. Let the ball B,(R) be such that there exists a y e .. NV for which d(xy) = R and p({y}) > 0. Then the following holds. Let p > 0 be such that p < min(6,(1 - 6)2)R, and let [ > 0 be a positive integer such that 13 WO 2013/119626 PCT/US2013/024881 PU120027 77(11) Then for any z e Bx(R), one has ~ (12) Proof: Let z e B,(R) be fixed. Let B' Bz(T). Note that by the assumption that p 5 6R, it follows that B' is included in the ball B B,('-. By assumption, there exists y eV such that d(xy) = R and p({y}) > 0. Thus either d(x,z) or d(yz) is lower-bounded by R2: indeed, by the triangle inequality, Assume first that d(xz) R2. By the triangle inequality again, for any z'e B', one has so that Note that the lower bound R2 -p(1 -6) is positive under the assumptions p < (1 -6)2R. In other words, for any a > 0, the ball B' is disjoint from the ball B" defined as This entails that (13) Let now [be an integer verifying (11). A fortiori, [is such that, for some small enough positive a, This entails that Applying [times the c-doubling property of p, this inequality further implies Combined with (13), this last inequality leads to 14 WO 2013/119626 PCT/US2013/024881 PU120027 which is the desired bound (12). Assume next that d(xz) < R2, so that necessarily d(yz) R2. Now for any z'E B', by the triangle inequality one has so that, defining now B.' to be For some arbitrarily small a > 0, the two balls B' and B"' are disjoint. Note further that B" is contained B, since for any z'e B"', one has and the assumption 6 > 13 ensures that (3,2)R 5 R'(1 - 6), which is the radius of B. Similar to (13) we thus have Let now [be a positive integer verifying (1). An application of the triangle inequality implies that the inclusion must hold for small enough a > 0. Indeed, for any point x'e B, one has A and property (1I) guarantees that x' is in the corresponding ball By(2'(R2 - p(1 - 6) a)). Finally, using [times the c-doubling property of p allows to establish that p(B) 5 c'p(B.'); combined with (j3), this leads as in the previous case to the desired property (12). Remark 1. For a given R > 0, the assumptions of Lemma 2 are verified if one takes p = R4, 6 = 1/3+E for small enough E > 0, and [ = 5. Indeed, the condition p < min(6,(1 56)2)R holds because 1/4 < V3. Writing (1 - 6)-1 = (32) + E' for some arbitrary small positive E', Condition (11) reads after simplification by R: which is clearly verified for [ = 5 and E' > 0 small enough. 15 WO 2013/119626 PCT/US2013/024881 PU120027 B. Algorithm and Upper Bound Algorithm 2. E-Net Content Search Input: Oracle(.,.,t), demand distribution p, starting object s, embedding (M,d). Output: target t. 1: Initialize x 0 <- s. 2: Initial the search radius Ro according to Ro := supye .x d(xo,y). 3: j<- 0. 4: while x;/t do 5: Construct an .- net. 6: By using the comparison oracle, find the closest object x 1 to the target t among the points in the ) *-net and x;. 7: Update the search radius 8: j<-j+ 1. 9: end while The algorithm proposed under the present principles based on E-nets can be found in Algorithm 2. In short, the search strategy considered proceeds in stages. These stages are denoted as j= 1,...,S. At the beginning of a stage j, the current best exemplar is given, denoted x;, and the current radius of the search, Rj, which is such that in view of the selections made in previous stages, the search target is necessarily within the ball B := Bx;(R;). It is further imposed that at each stage, the search radius R; is such that there exists a point y e such that p({y}) > 0 and d(x;,y) = Rj, i.e., the demand distribution p puts some mass on the boundary of Bj. The first stage is initialized by picking an arbitrary initial candidate x 1 eA' . The corresponding initial search radius is then defined as R 1 := supyesupp(/)d(x1,y). Hence, by construction, this initial ball B 1 indeed has non-zero mass at its boundary. The search during an arbitrary stage j proceeds as follows. The current search center x; is completed by additional points of B 1 to form a pj-net of B, where p = R;4. Then one comparison is performed between the last choice and each of the points of 16 WO 2013/119626 PCT/US2013/024881 PU120027 the net that are distinct from x;. By the end of these comparisons, let x'; be the last selection of the user. Clearly, this selection is among the points of the net, that which is closest to the target of the search. Since (in view of Lemma 1) the union of balls centered at the points of the net, and with radius pj, covers entirely the current search ground Bj, it follows that necessarily the target must lie in the ball Bx.(p;). One last operation is needed to specify how the next stage j+ 1 is initialized. The center of search at stage] + 1 will be set to x; 1 := x';. It is known that the target lies within Bx;.
1 (p;). Then, specify the search radius R 1 to be the smallest R such that )p(Bx;.1(R)) = p(Bx;.
1 (p;)). Thus necessarily, R pj, and moreover the minimality of R; 1 implies that measure p puts some mass on the boundary of the resulting search ball Bj 1 . As such, this method has indeed ensured by construction that at any stage (a) the target lies in the current ball B and (b) the ball contains an object of non-zero mass at its boundary. The number of queries submitted to the oracle can be bounded by Algorithm 2. Algorithm 2 is a greedy algorithm that uses the history of the search to propose new objects. One embodiment of a method 100 under the present principles is shown in Figure 1. The method comprises a step 110 of constructing a net of certain size. This net is constructed in a way that ensures to contain the target (think of it as a ball containing a point inside). The method is further comprised of a step 120 of choosing a few exemplars and also comprised of a step 130 for comparing the exemplars with one another. The exemplar that is closer to the target is chosen in step 140 and then another net with a smaller size (i.e., a smaller ball) is again constructed in step 150 around this object. The method must ensure that the target is contained in the net. This process is repeated until a terminal condition is reached in step 160, such as locating the target. If the terminal condition has been reached, the target is locatable within the net and the method stops. If the terminal condition has not been reached, the method reverts back to step 120 and chooses exemplars with the smaller net size. One embodiment of an apparatus 200 to perform a content search is shown in Figure 2. The apparatus is comprised of a computer that executes the method 100. 17 WO 2013/119626 PCT/US2013/024881 PU120027 One embodiment of the details of apparatus 200 for searching content is shown in Figure 3. The apparatus comprises Net Construction Circuitry 210. This net is constructed in a way that ensures to contain the target. The apparatus further comprises Exemplar Selection Circuitry 220. The apparatus also comprises Comparator Circuitry 230. Comparator Circuitry 230 can compare exemplars in pairs, or all at once, depending upon resource and/or time availability. The apparatus also comprises Determining Circuitry 240. Determining Circuitry 240 determines which of the exemplars is closest to the target. Determination can be performed in one or more variety of ways, such as absolute difference, etc. The apparatus further comprises Net Reduction Circuitry 250. Net Reduction Circuitry 250 must ensure that the target is still contained in the net, while reducing the size of the net. This process is repeated until a terminal condition is reached. The apparatus also comprises Control Circuitry 260 which is used to control the operation of the various elements and, in particular, controls the number of iterations that the elements perform in order to reduce the net to the terminal condition, which is monitored by the control circuitry. The terminal condition can be one condition or a combination of conditions. For example, one possible condition is that the net is small enough to locate the target. Another possible condition is that the size of the net is within a threshold value. Another possible condition is that the loop in method 100 is performed a predetermined number of times. Another possible condition is that the target itself is chosen when determining the exemplar closest to the target. In a further embodiment, the size of the net can be reduced by carrying out repeated operations of the loop until the net is reduced, and then an alternative method can be used to actually locate the target within the reduced size net. This embodiment may be used, for example, when it is more computationally efficient to do the final selection with the alternative method rather than performing more iterations of the loop. Theorem 3. The expected search cost of Algorithm 2 can be bounded by 18 WO 2013/119626 PCT/US2013/024881 PU120027 (14) At each stage j one comparison is performed between the last choice and each of the points of the pj-net that are distinct from x;. The size of this pj-net is, by Lemma , at most c 5 . Thus, at most c 5 - 1 binary comparisons are needed at each stage. Denote again by x'; the last selection at stage j. Also denote by Tr;:= p(Bx;(R/(1 6))) the mass put by measure p on the search ground Bj, after enlarging its radius by a factor 141 -6), where 6 = 1/ + E, for some small E chosen as in Remark 1. It now follows by Lemma 2 and Remark I that necessarily, Note also that, critically, by Lemma 2 and an induction argument, it is guaranteed that at each stage jof the search Then place the condition on the target element z eN . Considering its probability p({z}) and the previous bound on the probability of the search range after stages, clearly the search will have completed after stages provided or equivalently, provided The average number of stages, S, is then upper-bounded by Noting that, within a stage, at most c 5 1 comparisons are performed, the upper bound (14) follows. It is noted that Theorem 3 gives an upper bound which is matching lower bound (7), up to a discrepancy in the exponent of the doubling constant c. In contrast to Algorithm 1, which could be implemented only using ordering relationships between objects rather than exact distances, Algorithm 2 indeed requires full knowledge of the underlying metric space. Interestingly, Algorithm 2 does not require knowledge of the 19 WO 2013/119626 PCT/US2013/024881 PU120027 target distribution p. All steps in the algorithm (and, in particular, the shrinking of the ball B to ensure it has non-zero mass at the boundary) can be implemented as long as the support supp(p) is known. CONCLUSIONS The principles described herein provide a solution to the problem of content search through comparisons (CSTC) under heterogeneous demands, tying performance to the topology and the entropy of the target distribution. The search strategy considered in Algorithm 2 relies on the construction of E-nets at different stage of the search, which necessitates access to detailed information about the geometry of the search space (M,d), but no information about the demand distribution p. One or more implementations having particular features and aspects of the presently preferred embodiments of the invention have been provided. However, features and aspects of described implementations can also be adapted for other implementations. For example, these implementations and features can be used in the context of other video devices or systems. The implementations and features need not be used in a standard. Reference in the specification to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment. The implementations described herein can be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed can also be implemented in other forms (for example, an apparatus or computer software program). An apparatus can be implemented in, for example, appropriate hardware, software, and 20 WO 2013/119626 PCT/US2013/024881 PU120027 firmware. The methods can be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users. Implementations of the various processes and features described herein can be embodied in a variety of different equipment or applications. Examples of such equipment include a web server, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment can be mobile and even installed in a mobile vehicle. Additionally, the methods can be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) can be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact disc, a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions can form an application program tangibly embodied on a processor-readable medium. Instructions can be, for example, in hardware, firmware, software, or a combination. Instructions can be found in, for example, an operating system, a separate application, or a combination of the two. A processor can be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium can store, in addition to or in lieu of instructions, data values produced by an implementation. As will be evident to one of skill in the art, implementations can use all or part of the approaches described herein. The implementations can include, for example, instructions for performing a method, or data produced by one of the described embodiments. A number of implementations have been described. Nevertheless, it will be understood that various modifications can be made. For example, elements of different 21 WO 2013/119626 PCT/US2013/024881 PU120027 implementations can be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes can be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this disclosure and are within the scope of these principles. 22

权利要求:
Claims (10)
[1] 1. A method for searching content within a data base, comprising the steps of: constructing a net having a size that contains a target; choosing a plurality of exemplars; comparing each exemplar with every other exemplar; determining the exemplar closest to the target; reducing the size of the net to a smaller size that contains the target; repeating said choosing, comparing, determining, and reducing steps until the size of the net is small enough to locate the target.
[2] 2. The method of Claim 1, wherein said repeating step is performed for at least two iterations.
[3] 3. The method of Claim 1, wherein said repeating step is performed until the size of the last net is within a threshold value.
[4] 4. The method of Claim 1, wherein said repeating step is performed for a predetermined number of iterations.
[5] 5. The method of Claim 1, wherein the target is located by an alternative search method after the net becomes small enough.
[6] 6. A computer for searching content within a data base, comprising: circuitry to construct a net having a size that contains a target; circuitry to choose a plurality of exemplars; comparator circuitry that operates on the exemplars; a determining circuit that finds the exemplar closest to the target; circuitry to reduce the size of the net to a smaller size that contains the target; and 23 WO 2013/119626 PCT/US2013/024881 PU120027 control circuitry to cause said circuitry to construct, said circuitry to choose, said comparator, said determining circuit, and said circuitry to reduce to repeat their operation until the size of the net is small enough to locate the target.
[7] 7. The apparatus of Claim 6, wherein said control circuitry causes said circuitry to construct, said circuitry to choose, said comparator circuitry, said determining circuit, and said circuitry to reduce to repeat their operation for at least two iterations.
[8] 8. The apparatus of Claim 6, wherein said control circuitry causes said circuitry to construct, said circuitry to choose, said comparator circuitry, said determining circuit, and said circuitry to reduce to repeat their operation until the size of the last net is within a threshold value.
[9] 9. The apparatus of Claim 6, wherein said control circuitry causes said circuitry to construct, said circuitry to choose, said comparator circuitry, said determining circuit, and said circuitry to reduce to repeat their operation until the size of the last net is within a threshold value.
[10] 10. The apparatus of Claim 6, wherein said control circuitry causes the target to be located by an alternative search method after the net becomes small enough. 24

类似技术:

公开号 | 公开日 | 专利标题

Fang et al.2020|A survey of community search over big graphs

Feng et al.2015|Fast localization in large-scale environments using supervised indexing of binary features

KR101472452B1|2014-12-17|Method and Apparatus for Multimedia Search and method for pattern recognition

US10621755B1|2020-04-14|Image file compression using dummy data for non-salient portions of images

AU2009357597A1|2012-07-05|Methods and apparatuses for facilitating content-based image retrieval

CN103678661A|2014-03-26|Image searching method and terminal

CN108288208B|2020-08-28|Display object determination method, device, medium and equipment based on image content

JP5175724B2|2013-04-03|Method and apparatus for generating a sequence of elements

EP3210133A1|2017-08-30|Tagging personal photos with deep networks

AU2018204876A1|2018-07-19|Interactive content search using comparisons

Wang et al.2020|Relation embedding for personalised translation-based poi recommendation

CN110738577B|2022-02-22|Community discovery method, device, computer equipment and storage medium

JP2016173780A|2016-09-29|Data categorization program, data categorization method and data categorization device

Song et al.2019|Hybrid recommendation algorithm based on weighted bipartite graph and logistic regression

KR101937987B1|2019-01-11|Apparatus and method for matching user with similar preferences

CN110366100A|2019-10-22|Localization method, positioning device, readable storage medium storing program for executing and the terminal device of terminal

CN108764324A|2018-11-06|A kind of text data immediate processing method based on K-Means algorithms and co-occurrence word

Drosou et al.2013|Poikilo: a tool for evaluating the results of diversification models and algorithms

JP5845818B2|2016-01-20|Region search method, region search program, and information processing apparatus

JP2017211846A|2017-11-30|Retrieval data management device, retrieval data management method, and retrieval data management program

Lin et al.2014|Location-based personalized mobile search

KR102343848B1|2021-12-27|Method and operating device for searching conversion strategy using user status vector

Karbasi et al.2012|Hot or not: Interactive content search using comparisons

Yadamjav et al.2017|Boosting Point-Based Trajectory Search with Quad-Tree

CN105512156A|2016-04-20|Method and device for generation of click models

同族专利:

公开号 | 公开日

KR102032008B1|2019-10-14|

JP2015510639A|2015-04-09|

BR112014018810A2|2021-05-25|

WO2013119626A1|2013-08-15|

EP2812816A1|2014-12-17|

KR20140129099A|2014-11-06|

US20140372480A1|2014-12-18|

JP6278903B2|2018-02-14|

AU2018204876A1|2018-07-19|

CN104508661A|2015-04-08|

HK1205304A1|2015-12-11|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6636849B1|1999-11-23|2003-10-21|Genmetrics, Inc.|Data search employing metric spaces, multigrid indexes, and B-grid trees|

JP2002169810A|2000-12-04|2002-06-14|Minolta Co Ltd|Computer-readable recording medium with recorded image retrieval program, and method and device for image retrieval|

US6748398B2|2001-03-30|2004-06-08|Microsoft Corporation|Relevance maximizing, iteration minimizing, relevance-feedback, content-based image retrieval |

US20030120630A1|2001-12-20|2003-06-26|Daniel Tunkelang|Method and system for similarity search and clustering|

CA2388358A1|2002-05-31|2003-11-30|Voiceage Corporation|A method and device for multi-rate lattice vector quantization|

CA2467985C|2003-05-22|2011-07-12|At&T Corp.|Apparatus and method for providing near-optimal representations over redundant dictionaries|

US7668867B2|2006-03-17|2010-02-23|Microsoft Corporation|Array-based discovery of media items|

CN101583028A|2008-05-14|2009-11-18|深圳市融合视讯科技有限公司|Video compression coding search algorithm|

US9171077B2|2009-02-27|2015-10-27|International Business Machines Corporation|Scaling dynamic authority-based search using materialized subgraphs|

US20120158784A1|2009-08-06|2012-06-21|Zigmund Bluvband|Method and system for image search|

CN101710988B|2009-12-08|2011-10-05|深圳大学|Neighborhood particle pair optimization method applied to image vector quantization of image compression|

US8374386B2|2011-01-27|2013-02-12|Polytechnic Institute Of New York University|Sensor fingerprint matching in large image and video databases|

US8706711B2|2011-06-22|2014-04-22|Qualcomm Incorporated|Descriptor storage and searches of k-dimensional trees|

US9916187B2|2014-10-27|2018-03-13|Oracle International Corporation|Graph database system that dynamically compiles and executes custom graph analytic programs written in high-level, imperative programming language|KR101960218B1|2018-01-30|2019-03-27|김영호|System for providing interactive information using database structure|

CN109033372A|2018-07-27|2018-12-18|北京未来媒体科技股份有限公司|A kind of content information retrieval method and system based on artificial intelligence|

法律状态:
2018-08-02| MK5| Application lapsed section 142(2)(e) - patent request and compl. specification not accepted|

优先权:

申请号 | 申请日 | 专利标题

US201261595502P| true| 2012-02-06|2012-02-06||

US61/595,502||2012-02-06||

PCT/US2013/024881|WO2013119626A1|2012-02-06|2013-02-06|Interactive content search using comparisons|AU2018204876A| AU2018204876A1|2012-02-06|2018-07-04|Interactive content search using comparisons|

[返回顶部]