![]() Method for using profiling to determine application-specific values for an application, method for p
专利摘要:
METHOD FOR USING PROFILING TO GET APPLICATION-SPECIFIC PREFERRED PARAMETER VALUES FOR AN APPLICATION AND METHODS FOR USER DIRECTED PROFILING The present invention relates to a method of using profiling to obtain specific preferred parameter values from application to an application. First, a parameter for which to obtain an application-specific value is identified (301). The code is then extended to the specific profile definition by applying parameter (302). The parameter is profiled and profile data is collected (303). The profile data is then analyzed to determine the application's preferred parameter value for the profile parameter (304, 305). 公开号:BR112015024334B1 申请号:R112015024334-7 申请日:2014-04-01 公开日:2022-01-25 发明作者:Teresa Louise Johnson;Xinliang David Li 申请人:Google Llc; IPC主号:
专利说明:
BACKGROUND [1] Feedback directed optimization (FDO) is a technique used to tune application executions based on application runtime behavior in order to improve performance. FDO is also known as Profile-Oriented Optimization (PGO) and Profile-Based Optimization (PBO). In order to fine-tune the applications, FDO conducts profiling in the applications. Profiling is the process of gathering information about how an application behaves during runtime. This profile information is used to drive decisions regarding various application optimizations. [2] As illustrated in Figure 1, conventional feedback-targeted optimization is a dual-compilation model technique that uses static instrumentation to collect edge and value profiles. An instrumented build (101) allows the compiler to insert code into an application binary to create an instrumented version of the binary (102). This embedded code typically counts edges or collects value profiles. The instrumented binary (102) runs on a representative set of training data (104) in a training run (103). At the end of the training run, all collected value information and edge counts are written and aggregated into a profile database or gcov data file (GCDA)(105). An optimization compilation (106) then takes place, in which the compiler uses the generated profile to make optimization decisions such as inline decisions, instruction scheduling, base block reordering, function division, and record allocation. [3] A problem with conventional FDO is that the technique relies on compilers that do not understand the high level of detail of the applications in which FDO runs. For example, compilers do not know application-specific parameters or algorithms. Conventional feedback-driven optimization can therefore only handle a very limited set of low-level profiles, such as control flow edge profiling or a predefined set of value profiles that include indirect function call targets, size String function call and alignment profiles. [4] In many cases, software developers have in-depth knowledge of the high level of detail for their applications. Developers often know which aspects of their code are important and which parameters should be adjusted in order to improve application performance. However, in common runtime libraries, these parameters are left unset or the parameters are set for average applications, as conventional FDO processes do not allow for user-defined profile optimizations. While some developers choose to manually adjust their code for specific applications, manual processes can be cumbersome. As recognized by the inventors, there should be a framework that allows easy application performance tuning to be done automatically during compilation. SUMMARY [5] This descriptive report generally describes technologies related to application performance enhancement, and specifically methods and systems for automatically tuning performance parameters based on application runtime behavior. [6] In general, an aspect of the material described in this specification can be incorporated into a method for using profiling to obtain application-specific preferred parameter values for an application. The method may include receiving identification of a parameter for which an application-specific value is obtained; receive extended code for parameter profiling; profile the parameter and collect profile data; analyze profile data; and determining the application's preferred parameter value for the profiled parameter based on the analyzed profile data. A second aspect of the matter can be incorporated into a method for profiling global value per user-directed class which may include: receiving the user-defined profile instrumentation initialization routine; within the profile initialization routine, initialize a counter and register a user-defined analysis callback routine; execute a profile update function in a code location where the counter value should be updated; and executing a profile handler method to process the counter data and write the counter value. A third aspect of the subject matter described in this descriptive report may be incorporated into a per-site, user-directed value profiling method or per-site, user-directed object value profiling method that may include: allocating space in a compiler static counter arrangement for a counter; run a user-directed value profiling instrumentation support interface to run instructions on what type of counter to use and what value to profile; and running a user-directed value profile transformation support interface to perform specific value profile transformation on the parameter during an optimization build. [7] These and other modalities may optionally include one or more of the following features: profiling may be user-directed per-class global value profiling; profiling may be value profiling, per site, targeted at the user; profiling can be object value profiling, per site, targeted at the user; parameter profiling and profiling data collection can include generating an instrumentation binary from an instrumentation build, running a training run with one or more representative workloads using the instrumentation binary to generate profile data, and store the generated profile data; determining the application's preferred parameter value may include using a specific callback method to perform custom processing of the profile data to select the preferred parameter value and write the preferred value; the registered preferred value can be used in an optimization build to initialize the parameter; determining the application's preferred parameter might include running an optimization build that consumes the profile data and using a set of default-value profile transformations, which transform a parameter value into a preferred value based on the profile data; a profile counter can be defined for a parameter to have the profile defined; an entry in a static counter array may be allocated to a counter; counter allocation can be done by calling a compiler extension for counter allocation; a user-driven value profile instrumentation support interface can be a GCC integration language extension; a user-driven value profile transformation support interface can be a GCC integration language extension; a profile counter address can be traced by providing a special purpose claim attribute to the profile counter; profile initialization can be provided by a special-purpose declaration attribute to designate a profile initialization role; a user-defined analysis callback routine can be defined using a GCC interface; and a parameter can be written using a GCC interface. [8] Details of one or more embodiments of the invention are presented in the accompanying drawings, which are presented by way of illustration only, and in the description below. Other features, aspects and advantages of the invention will become apparent from the description of the drawings and the claims. Reference numbers and similar designations in the various drawings indicate similar elements. BRIEF DESCRIPTION OF THE DRAWINGS [9] Figure 1 is a block diagram illustrating a conventional feedback-driven optimization loop. [10] Figure 2 is a block diagram illustrating an exemplary user-driven feedback-driven optimization loop. [11] Figure 3 is a flow diagram of an example method for achieving application-specific profile-driven optimizations for an application. [12] Figure 4 is a flow diagram of an example method for profiling global value, per class, directed to the user. [13] Figure 5 is an example code that illustrates a modality of global value profiling, per class, directed to the user. [14] Figure 6 is a flow diagram of an exemplary method for profiling user-directed, per-site, value or per-site, user-directed object value profiling. [15] Figure 7 is an example code that illustrates a modality of value profiling, by site, directed to the user. [16] Figure 8a is a conventional code depicting a vector class that pre-allocates a vector size. [17] Figure 8b is an example code that illustrates an object value profiling modality, by site, directed to the user. [18] Figure 9a is a conventional code depicting a vector class that pre-allocates a vector size and an element size. [19] Figure 9b is an example code that illustrates an object value profiling modality, by site, directed to the user. [20] Figure 10 is a block diagram illustrating an exemplary computing device. DETAILED DESCRIPTION [21] According to an exemplary embodiment, a framework can facilitate user-directed profile-driven optimizations by allowing a software developer to profile selected parts of an application and direct a compiler to perform value profile transformations based on in profiling results as shown in Figure 2. User-defined information can be included in both instrumentation builds and optimization builds of a feedback loop. [22] An example method for obtaining application-specific preferred parameter values for an application using profiling begins with identifying a parameter for which an application-specific value is obtained (301) as shown in Figure 3 The code could then be extended to specific profiling by applying parameter (302). In some embodiments, the code to be extended may be application code. In other embodiments, the code to be extended may be a library that is used by the application code. The parameter is then profiled specifically for the application (303). Profiling can include two steps: (a) an instrumentation build and (b) a training run. [23] Instrumentation compilation can invoke the compiler to generate a binary that contains the instrumentation code needed to profile the parameter or parameters when the binary is executed. This step can use user-specific annotations to insert special instrumentation for profiling annotated parameters. [24] The training run takes the binary produced in the instrumentation build and runs the binary with one or more representative workloads. This training run produces profile data, which is collected and stored in a database or data file. [25] After the profile data is collected, the profile data can be analyzed to determine the application's preferred parameter value for the profiled parameter (304, 305). [26] In some cases, the user can specify a callback method to perform custom processing of profile data to select the preferred parameter value. This value is then written to the profile database or data file and used in an optimization build to initialize the parameter. This process allows for more complex special-case manipulation of the profile data to select the best parameter value. In this case, the optimization build can blindly apply the selected parameter value at the end of the training run via the user callback. [27] In other cases, the optimization build can consume the profile data directly and, using a set of default value profile transforms that can be provided in compiler support libraries such as gcov, transform the code or initialize a parameter value based on profile data. [28] An example framework might include support for user-directed profile counter allocation, user-directed value profile instrumentation, user-directed value profile transformations, profiler counter address tracker, user-directed profiling, runtime integration of user-defined profiling callback routines, and recording the decisions of user-defined transformations. [29] As illustrated in Figure 2, this functionality can work together to enhance feedback-driven optimization. Source code can be instrumented by a user to profile a specific parameter or parameters through new interfaces (200). The source code can then be compiled into an instrumentation build using compiler support for the new interfaces (201). An instrumented binary can contain user callbacks and new counter types and methods (202) that can then be invoked in a training run (203). A profile database or data file can collect value profiles for instrumented parameters (205). As discussed above, a user callback method can be invoked after profiling. If a user callback method is invoked, the preferred parameter value is written to the profile database or data file and used in the optimization build (206) to create the optimized binary (207). In other cases, the optimization build (206) can parse the profiled data directly to transform the profiled parameter or parameters to create the optimized binary (207). [30] There may be several ways to implement this functionality in an example structure. In one embodiment, user-directed profile counter allocation, user-directed value profile instrumentation, and user-directed value profile transformations may be language extensions of the GNU Compiler Collection (GCC) integrations functions. ). Profile counter address tracking and user-directed profile launches can be implemented using special-purpose claim attributes. Runtime integration of user-defined profiling callback routines and recording user-defined transformation decisions can be implemented using, for example, GCC application programming interfaces. The methods and interfaces to the framework can use standard objects from the GCOV which is a test coverage library created for the GCC. More details regarding each piece of exemplary framework functionality are provided in the following paragraphs. USER DIRECTED PROFILE COUNTER ALLOCATION [31] A user-directed profile counter allocation interface can be used to instruct a compiler to allocate an entry in the static counter array to the specified counter during instrumentation. An example user-directed profile counter allocation method takes four parameters: (1) the type of counter that should be allocated, (2) a pointer to the object that has its profile defined, (3) an optional parameter is the name of the counter's base address indicator field, and (4) an optional parameter which is a string id. The optional parameter representing the counter's base address indicator field name can be used when there are multiple profiled values on the same site. This parameter is discussed in more detail below. The optional string id parameter can be used when multiple values of the same type are profiled on the same site. [32] An example profile counter can be allocated using the new GCC integration language extension as below, based on inventive concepts aspects: “_builtin_vpi_alloc(GCOV_COUNTER_TYPE gt, void *this, const char *fname= 0, gcov_unsigned seq_id=0);” [33] In this generic declaration “gt” represents the type of counter. The parameter, “this”, is an optional parameter that represents a pointer to the object having its profile defined, “fname” is the optional parameter that represents the name of the base address pointer field, and “seq_id” is the optional sequence id used when multiple values of the same type are profiled on the same site. [34] Profile counter allocation can cause a counter to be allocated in an array that may be called “counter_array[site_index].” GCC will allocate the counter arrays to retain the counter values of whatever is profiled. GCC integration language extensions can cause the compiler to allocate space in the proper array for the counters required by the specified integration. For example, the compiler might initialize the base address field as follows: “counter_base_ = &counter_array[site_index];” [35] In this generic assignment, the counter base receives the address of the index to which the counter was assigned in the compiler's counter array. “Site_index” refers to the index allocated to this counter. USER-DRESSED VALUE PROFILE INSTRUMENTATION [36] User-directed value profiling instructs the compiler to inject code to profile specified values during profile training runs. An exemplary user-directed value profile interface can accept four parameters: (1) a counter type, (2) a parameter, or value, to be profiled, (3) an optional pointer to the object having its profile defined. , and (4) an optional parameter representing the base address indicator field name of the counter can be used when there are multiple profiled values on the same site. For example, an instrumentation interface can be instantiated using the following new GCC integration language extension as below, based on aspects of inventive concepts: “void _builtin_vpi(GCOV_COUNTER_TYPE gt, gcov_type v, void *this= NULL, const char *fname = 0);” [37] In this generic declaration “gt” represents the counter type and “v” is the parameter, or value, to define the profile. The parameter, “this”, is an optional parameter for the pointer to the object having its profile defined, and “fname” is the optional name of the base address pointer field. VALUE PROFILE TRANSFORMATION DIRECTED TO THE USER [38] User-directed value profile transformation instructs a compiler to perform a transformation on the profiled parameter, or value, based on the profile data for the parameter. An example user-directed value profile transformation interface can accept four parameters: (1) a counter type, (2) the transformation to be performed, (3) the parameter or value on which the transformation is performed, and (4) an optional string id. For example, a transform interface can be instantiated using the new GCC integration language extension as below, based on aspects of inventive concepts: “__builtin_vpt(GCOV_COUNTER_TYPE gt, GCOV_VPT_TYPE vptt, gcov_type a, gcov_unsigned seq_id = 0 );" This interface can be used to instruct the compiler to perform a vptt value profile transformation using the value “a” of the counter with type GT. [39] In this generic declaration, “gt” represents the type of counter to use and “vptt” is the type of transformation to perform. The parameter, “a”, is the parameter, or value, to transform and “seq_id” is an optional parameter for a sequence id. PROFILE COUNTER ADDRESS TRACKING [40] A compiler annotation can be used to specify a new base address field. This attribute should be applied in a non-static field declaration. For example, a base address field attribute can be specified using a special purpose declaration attribute like the following: “_attribute_((gcov_counter_base));” [41] A counter declaration might be as follows: “gcov_type *counter_base attribute_((gcov_counter_base));” [42] Appending the base counter attribute to a counter declaration allows the compiler to generate code to initialize the base address field corresponding to the counter right after the counter is allocated. For example, the compiler can initialize the bse address field using the following assignment that was described above: “counter_base_ = &counter_array[site_index];” USER DIRECTED PROFILE LAUNCHES [43] In some cases, a software developer may not want to use predefined counters and update strings provided by a compiler and compiler runtime. A software developer can initialize a profile instrumentation routine by deploying a profile initialization function. This function can be treated as a static initialization routine invoked by an .init function in instrumentation compilation. In the optimization compilation, the function will be parsed and discarded. In an exemplary embodiment, a profile initialization function can be specified using a special-purpose declaration attribute like the following: “_attribute_((profile_init))” [44] An example profile initialization routine can be declared as follows: “static void profile_init(void) attribute ((profile_init);” [45] Appending the profile initialization attribute to a function declaration allows the compiler to understand that the declared function defines the profile initialization. When this attribute is applied in a function declaration, the function is treated as a static routine invoked by an .init function in a compiled instrumentation compilation (201). In an optimization build (206), the function will be parsed and discarded. USER DIRECTED PROFILE UPDATES [46] When a software developer does not use the predefined counters and update sequences provided by a compiler and compiler runtime, the developer may need to update the value of a parameter as the parameter is profiled in the training run. A software developer can invoke an update function at the appropriate place in code where the parameter value changes. An invocation function can accept three parameters: (1) a pointer to an update function, (2) an pointer to the user-defined counter variable, and (3) the parameter, or value, to be profiled. An example method can be defined as follows: “void builtin_invoke(void(*updater)(void *), void *profile_counter, gcov_type data);” [47] In this generic declaration, “updater” indicates the user-defined update function named “updater”, “profile_counter” is a pointer to the user-defined counter variable, and “data” is the value, or parameter, to take. your defined profile. In instrumentation build, this integration can be expanded in one call: “updater(profile_counter, data)” In an optimization build, the integration may do nothing. RUN TIME INTEGRATION OF USER DEFINED PROFILE ANALYSIS CALL RETURN ROUTINES [48] If a software developer is not using predefined counters and update sequences to profile and update parameter values, the developer needs a way to process collected profile data and write the preferred parameter value for each parameter. with defined profile. The software developer can define a callback routine to process the collected profile data and write the preferred parameter value to use during an optimization build. The callback routine should be registered by the user in the profile initialization function discussed above. To register the callback routine, the developer can instantiate a registration interface that accepts a pointer to the callback routine as a parameter. This instantiation should be called by the profile initialization function discussed above. In some embodiments, the callback routine registration can be defined as follows: “void__gcov_register_profile_handler(void (*handler) (void));” [49] In this generic register function declaration “*handler” represents a pointer to the function handler function or user-defined callback routine. RECORDING OF USER DEFINED TRANSFORMATION DECISIONS [50] As noted above, if a software developer is not using predefined counters and update sequences to profile and update parameter values, the developer needs a way to process collected profile data and write the parameter value. preference for each parameter with defined profile. Saving values can be done with an interface that accepts two parameters: (1) a macro name and (2) the preferred value. For example, a recording interface can be instantiated using the following API: [51] “void gcov_record_parameter_value(const char *macro_name, gcovj pe optimal_value);” [52] In this generic declaration “macro_name” represents the name of the source macro to be updated with the optimized profile value “optimal_value” in the optimization build directed to feedback (206). The parameter, “optimal_value”, is the preferred value to which the designated macro should be set. This interface writes the mapping between “macro_name” and “optimal_value” for use in the optimization build. USE OF THE STRUCTURE [53] The pieces of an example framework can be used together to allow a software developer to define and use several different value profiling classes which include: global per-class value profiling for the user, definition of user-directed per-site value profiling, and user-directed per-site object value profiling. DEFINITION OF GLOBAL VALUE PROFILE BY USER DIRECTED CLASS [54] User-directed per-class global value profiling is used to profile and optimize specific parameter values for an application. This type of profiling can be particularly beneficial when an application uses a library that is shared by multiple programs, which have no connection to each other. The library is a collection of code that is commonly shared by different programs that define certain behaviors through a well-defined interface. A typical library usually has many parameters by which to control runtime performance. However, most libraries generally do not set these parameters or set the parameters for some applications that use the library, but not all. Keeping a different set of parameters for each application that uses the library is not practical because parameter values can become obsolete over time. [55] This type of profiling may require a software developer to keep track of compiler-independent parameter counters. A compiler may not understand the critical values of an application that should be traced or how to trace the values. Therefore, a user must keep track of parameter counters. [56] In order to use user-directed per-class global value profiling to obtain application-specific parameter values for a particular application, a software developer may use several pieces of the example framework that include an initialization function for user-directed profile, a registered user-defined callback routine, a user-defined update function, and a method for recording user-defined transformation decisions. With the use of framework parts, a software developer can incorporate the decision process to choose preferred parameters in the optimization loop directed to feedback based on the compiler. [57] An example method for global value profiling per user-directed class starts with a user specifying value profile counter types and defining value profile instrumentation routines rather than relying on predefined counters and conventional update sequences as shown in Figure 4 (401, 402). [58] To profile a parameter, a software developer can first build a counter. Then, a user can create a profile initialization routine function. This role can be assigned by adding a profile initialization attribute to the role declaration as discussed above. When a profile initialization attribute is applied in a function declaration, the function can be treated as a static initialization routine invoked by an .init function in instrumentation compilation. In optimizing compilation, the function can be parsed and discarded. Within the profile initialization function, the developer can initialize the counter to profile a particular parameter. [59] In order to update the parameter value as it changes during an instrumentation build, a software developer provides a profile update function at a location where the counter value should be updated (404). A software developer may use a language extension for GCC integration or some other code to invoke a profile user-defined update function at the designated location. As discussed above, an invocation function can expand into an update function. This update function writes the profiled parameter values during instrumentation compilation. [60] A profile handler function can also be written by the software developer to process the counter profile data and write the preferred parameter value determined for the profiled parameter defined during the optimization build. The profile handler function can be registered in a callback routine in the profile initialization function (403). The software developer can also instantiate a recording method as discussed above to record user-defined transformation decisions (405). [61] For example, as depicted in Figure 5, a software developer may wish to determine the preferred size for temporary storage in library code for a specific application. The temporary storage size parameter value is first set in a macro to be 32 (501). [62] In order to determine and define a preferred staging size for a specific application, a software developer can use aspects of an example framework to extend the code and profile the staging. [63] As shown in Figure 5, a software developer builds a counter, ProfCounter(502). Then, a user-defined profile instrumentation routine function can be designated by including a profile initialization attribute in the function definition. In Figure 5, “profile_init” is designated as the profile instrumentation routine function using “_attribute_((profile_init))” (503). The software developer then initializes the counter within the profile initialization function. [64] The software developer can then define an update method to update the profiled parameter value when the parameter value changes during an instrumentation build. As shown in the staging example in Figure 5, a software developer invokes the profile update function within a resizing method that resizes the staging (509). In this invocation, “profile_update” is the user-defined update function, “counter_” is the user-defined counter variable, and “s” is the data value to be profiled. In an instrumentation build, this invocation can expand into call:”profile_update(counter_,s)” [65] A software developer can then define a profile handler function that processes the profiled data from the instrumentation build and determines a preferred size for the temporary storage (507). This profiling handler function writes the preferred temporary storage size to a profiling database and uses a mechanism to pass the macro to a parser for use during optimization compilation. The compiler may read the gcov data file (GCDA) earlier in the optimization build process. This read process code can be enhanced to understand the new macro parameter types and communicate the macro parameters to the part of the compiler that parses the source code. In this case, the “BUF_SIZE” macro receives the preferred parameter value “optimal_buffer_size” when the profile handler function is executed during the optimization build. The “optimal_buffer_size” is determined by processing all recorded buffer sizes. For example, processing might include weighting the recorded staging sizes, and determining the preferred staging size might be the computed average. [66] The profile handler can then be registered as the callback routine to be executed during the optimization compilation (505). In this example, the software developer passes the address of the profile handler function, “profile_handler” to the registration method. DEFINITION OF VALUE PROFILE PER SITE DIRECTED TO THE USER [67] User-directed per-site value profiling can, for example, determine preferred parameters for function call parameters, expensive operation operands, or loop limits. For per-site value profiling, the instrumentation site and the transformation site are in the same location, which means there is no need for parsing callbacks. [68] In order to support user-directed per-site value profiling, a software developer may use several pieces of the example framework including: user-directed profile counter allocation, user-directed value profiling instrumentation, and user-driven value profiling transformation. As illustrated in Figure 6, an exemplary user-directed profile counter allocation interface can be used to instruct a compiler to allocate an entry in the static counter array to the counter specified during instrumentation (601). [69] An example user-directed value profile instrumentation support interface can be used to instruct code on the type of counter to use and the value to profile when defining the value profile for parameters during a instrumentation compilation (602). [70] An example framework can also provide a user-directed value profile transformation support interface that can be used in optimization builds (604). This interface can be used to instruct a compiler to perform a value profile transformation on a certain parameter using the counter value with the specified counter type. [71] Figure 7 illustrates an example of value profiling and multiversion transformation for the first parameter of the call to a function, “foo.” The code is first extended to the parameter profiling. A method is entered for profile counter allocation. The profile counter allocation in this example is an allocation for a counter that finds the most frequent N values, “_GCOV_TOPN_VAL” (702). The user-directed value profile instrumentation support interface specifies that the compiler should instrument code to find the most frequent “a” values using the _GCOV_TOPN_VAL (704) counter. The user-directed value profile transformation support interface specifies that the compiler should perform a multiversion transformation, “_GCOV_MULTI_VER”, using the value “a” of the counter “_GCOV_TOPN_VAL” (706). CREATION OF OBJECT VALUE BY USER DIRECTED WEBSITE [72] Per-site user-directed object value profiling can be used to perform value profiling and transformation for objects instantiated in the same static site. In order to support user-directed per-site value profiling, a software developer may use several pieces of the example framework which include: user-directed profile counter allocation, user-directed value profiling instrumentation, tracking address profile and user-directed value profiling transformation. [73] An example method for this type of profiling starts with extending the code for an instrumentation compilation. The compiler annotation can be used to specify a new base address field to track the address of a profile counter. This attribute should be applied in a non-static field declaration. For example, a base address field attribute can be specified using a special purpose declaration attribute like the following: “_attribute_((gcov_counter_base));” [74] Code can also be entered for profile counter allocation. Then, instrumentation code can be entered to profile the desired parameter or value. Finally, code can be entered to do the profile transformation. [75] For example, Figure 8a depicts a vector class that pre-allocates a vector size. The new vector is created with a size of 10. However, 10 may not be the preferred vector size. In order to determine the amount of space to reserve for the vector in the “foo” function and allocate the object appropriately, a software developer can use user-directed per-site object value profiling, [76] Figure 8b illustrates the process for conducting user-directed per-site object value profiling to determine the preferred length of an array. First, the class should be extended in order to profile the length parameter. [77] In this example, a macro is created for the instrumentation compilation, “PROFILE_GENERATE” (802). A compiler annotation is used to specify a new base address field as shown in Figure 8b in the counter “,counter_base_” (802). [78] The code is then entered for the profile counter allocation. The profiled counter allocation in this example is an allocation to a counter that keeps running a sum and count of the profiled value using the default counter “_GCOV_COUNTER_AVERAGE” (804). In this example, the allocation uses “this” which is a pointer to the vector object having its profile. [79] The next step is to enter instrumentation code to record the final length of the vector. The final length of the vector can be determined in the vector destructor since, at that point of destruction, the code knows the final length that was needed for the vector. As illustrated in Figure 8b, the user-defined value profile instrumentation support interface specifies that the compiler should instrument code to keep running a sum and count of the length using a default counter of type _GCOV_COUNTER_AVERAGE(808 ). [80] After entering the instrumentation code to define the value profile for the final vector length, the code should be entered to perform a transformation on the vector length parameter to determine the preferred vector length. The transformation takes place during the optimization build using the profiling data received from the instrumentation build. As illustrated in Figure 8b, the user-directed value profile transformation support interface specifies that the compiler should perform a weighting on lengths using the default average transformation of GCOV, “_GCOV_AVERAGE_VAL”, and the values received from “ n” from the default counter “_GCOV_COUNTER_AVERAGE” during training run (806). [81] This average value is then sent to the profile database, or gcda file, and used as the preferred length parameter for vector instances created at the instrumentation site. Every time the compiler generates code for a new array, an integration can tell the compiler to set the array's size to be the average value of the profile data. This type of profiling and optimization is leveraging the fact that profiling sites, or places in code to perform the transformations, are online in many places. As a result, there may be a context-sensitive online site-specific profile collected and generated at each online site. In this case, vectors instantiated in different places throughout the application code can get length values based on different profiles rather than the library pattern. [82] A further example of user-directed per-site object value profiling is illustrated in Figure 9a and 9b. This example is a continuation of the example discussed above in connection with Figures 8a and 8b. In addition to determining the length of the vector, the amount of space to pre-allocate at the beginning of an insert operation can be determined. Typically, an insert operation increases the length of an array one element at a time, but this amount of increase may not be enough. Figure 9a illustrates conventional vector code. [83] Figure 9b illustrates extended code that allows the automated process to conduct user-directed per-site object value profiling to determine the preferred length of an array and the amount of space that should be pre-defined. allocated at the beginning of an insert operation. [84] In this example, a macro is created for the instrumentation compilation, “PROFILE_GENERATE”. Two counters are defined, one for vector length and one for insert size. Each counter is defined with a compiler annotation that is used to specify new base address field. Figure 9b shows the compiler annotation “_attribute_((gcov_counter_base))” in “counter_base_constr_” and “counter_base_insert_” (902). [85] The code is then entered for the profile counter allocation. The counter for the vector length is instantiated and then the counter for the insert size is instantiated. Both counters are standard counter type gcov to keep running sum and count, “_GCOV_COUNTER_AVERAGE” (904). In this example, both allocations use pointers to “this” which is the vector object being profiled. The optional parameter “fname” is used to distinguish the names of the base address indicator fields defined in the macro, “counter_base_constr_” and “counter_base_insert_.” The field name is used to initialize the appropriate counter, but it is a string to avoid parsing errors when not in the instrumentation build. This optional parameter is required when there are multiple values on the same site. The optional additional parameter “seq_id” is used when there are multiple values with the same type profiled on the same site, as is the case in this example. In this case, the string ids are set to 1 and 2, respectively. [86] The next step is to enter user-directed value profile instrumentation code to define the value profile for the final vector length. The final length of the vector can be determined in the vector destructor since, at this point of destruction, the code knows the final length that was needed for the vector (906). As illustrated in Figure 9b, the instrumentation support interface specifies that the compiler should instrument code to write the current array length. [87] The instrumentation code should also be entered to define the value profile for the entered length of elements in the vector (908). The entered length can be determined in the vector insertion function. As illustrated in Figure 9b, the instrumentation support interface specifies that the compiler should instrument code to write the insert length. The field name is used in both of these uses of the user-directed value profile instrumentation support interface to generate proper updates to the proper counters. Field names are strings to avoid parsing errors when instrumentation compilation is not performed. [88] After entering the user-directed value profile instrumentation code to define the value profile for the final length of the vector and the entered length of the elements in the vector, the code should be entered to perform the value profile transformations (910). As illustrated in Figure 9b, a user-directed value transformation support interface specifies that the compiler should perform a weighting on the lengths using the standard gcov transform for the weighting, “_GCOV_AVERAGE_VAL”, and the values of “n” from the counter type “_GCOV_COUNTER_AVERAGE” with a sequence_id of 1. A second transformation support interface specifies that the compiler should weight the entered lengths using the length values entered from the counter type “_GCOV_COUNTER_AVERAGE ” with a sequence_id of 2. The sequence_id flags are used to associate the transformation site with the correct allocation site so that the proper counter value from the profile database, or GCDA file is used in the correct optimization decision . [89] These average values are then sent to the profile database, or gcda file, and used as the preferred length parameter for the array and the preferred pre-allocation size for inserting elements. This type of optimization and profiling is leveraging the fact that profiling sites, or places in the code to perform the transformations, are aligned in many places. As a result, there may be a specific profile per online site that is context sensitive collected and generated at each online site. In that case, vectors instantiated in different places throughout the application code can get different profile-based length values instead of the library default. Additionally, the pre-allocation size can be determined on a per-site basis. [90] Other examples of aspects of inventive concepts include value profiling for costly operations such as divisions or multiplications and value profiling for loop iteration limits. [91] An example framework can be used to tune libraries such as STL and memory allocators, on a per-application basis based on application-specific profile information. This performance tuning can turn into very large performance improvements and machine preservation. The framework integrates directly with the FDO framework which makes application tuning automatic and removes the possibility of inactive parameters. Furthermore, an example framework can be used to tune code behavior for non-performance purposes such as memory consumption and memory fragmentation. [92] Figure 10 is a high-level block diagram of an example computer (1000) that is arranged to create user-directed profile-driven optimizations. In a very basic configuration (1001), the computing device (1000) typically includes one or more processors (1010) and system memory (1020). A memory bus (1030) may be used for communication between the processor (1010) and system memory (1020). [93] Depending on the desired configuration, the processor (1010) can be of any type that includes, without limitation, a microprocessor (μP), a microcontroller (μC), a digital signal processor (DSP), or any combination thereof. . The processor (1010) can include one or more levels of caching, such as a level 1 cache (1011) and a level 2 cache (1012), a processor core (1013), and registers ( 1014). The processor core (1013) may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP core), or any combination thereof. A memory controller (1016) may also be used with the processor (1010), or in some implementations the memory controller (1015) may be an internal part of the processor (1010). [94] Depending on the desired configuration, system memory (1020) can be of any type including, without limitation, volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination of the above. same. System memory (1020) typically includes an operating system (1021), one or more applications (1022), and program data (1024). The application (1022) may include a method for creating and obtaining application-specific profile-driven optimizations for the application. Program data (1024) includes storage instructions which, when executed by one or more processing devices, implement a method for code optimizations. (1023). In some embodiments, the application (1022) may be arranged to operate with program data (1024) in an operating system (1021). [95] The computing device (1000) may have additional features or functionality, and additional interfaces to facilitate communications between the base configuration (1001) and any required devices and interfaces. [96] System memory (1020) is an example of a computer storage medium. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, storage disk drives or other magnetic storage devices, or any media that can be used to store desired information and that can be accessed by computing device 1000. Any such computer storage media may be part of device (1000). [97] The computing device (1000) may be implemented as a portion of a small form factor portable (or mobile) electronic device such as a cell phone, a smart phone, a personal data assistant (PDA), a personal media player, a tablet computer, a wireless webwatch device, a personal microphone headset device, an application-specific device, or a hybrid device that includes any of the above functions. The computing device (1000) can also be deployed as a personal computer that includes both laptop-type computer configurations and non-laptop-type computer configurations. [98] The preceding detailed description stipulated various modes of devices and/or processes through the use of block diagrams, flowcharts and/or examples. Although these block diagrams, flowcharts and/or examples contain one or more functions and/or operations, it will be understood by those skilled in the art that each function and/or operation in such block diagrams, flowcharts or examples may be implemented, individually and/or or collectively, by a wide range of hardware, software, firmware, or virtually any combination thereof. In one embodiment, various portions of the subject matter described in the present invention may be implemented via application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), or other integrated formats, or as a web service. However, those skilled in the art will recognize that some aspects of the embodiments described in the present invention, in whole or in part, may be equivalently implemented on integrated circuits, in the form of one or more computer programs running on one or more computers, in the form of one or more programs running on one or more processors, in the form of firmware, or in the form of virtually any combination thereof, and designing the circuits and/or writing the code for the software and/or firmware would be satisfactorily within the scope of the practice of a person skilled in the art in light of this description. Furthermore, those skilled in the art will understand that the mechanisms of the subject matter described in the present invention are capable of being delivered in the form of a program product in a variety of ways, and that an illustrative embodiment of the subject matter described in the present invention is applicable irrespective of the specific type of non-transient signal carrier medium used to effectively carry out the distribution. Examples of a non-transient signal carrier medium include, without limitation, the following: a rewritable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disc (DVD), a digital tape, a computer memory, etc.; and a transmission-type medium such as a digital and/or analog communication medium, (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communications link, etc.) [99] With respect to the use of substantially any plural and/or singular terms in the present invention, those skilled in the art may change from plural to singular and/or singular to plural as appropriate to the context and/or application. . The various singular/plural permutations may be expressly set forth in the present invention for the sake of clarity. [100] In this way, specific modalities of the matter were described. Other embodiments are within the scope of the claims presented below. In some cases, the actions cited in the claims may be performed in a different order and still achieve the desired results. Furthermore, the processes described in the accompanying figures do not necessarily require the specific order depicted or the sequential order to obtain desirable results. In certain implementations, multitasking and parallel processing can be advantageous.
权利要求:
Claims (17) [0001] 1. Method for using profiling to determine application-specific values for an application, the method being characterized in that it comprises: profiling, by a computer, one or more application-specific parameters for which to determine at least one application-specific value (301) and collecting profile data (303) by performing operations of: generating, by the computer, an instrumentation binary from an instrumentation compilation, the instrumentation binary containing at least one routine user-defined callback registered by the user in a profile initialization function; execute, by the computer, a training run with one or more representative workloads using the instrumentation binary, the run invoking at least one user-defined callback routine (403) to write the at least one value application-specific to application in the profile data collected; computer analyzing the profile data (304) collected using a set of standard value profile transformations; and generating, by the computer using the collected profile data, an optimized binary using the at least one application-specific value for the defined profile-specific application parameter recorded in the profile data collected by the invoked callback routine. [0002] 2. Method according to claim 1, characterized in that the profile initialization function is specified using a special-purpose declaration attribute. [0003] 3. Method according to claim 1, characterized in that the profile definition is a global value profile definition per class. [0004] 4. Method according to claim 1, characterized in that the profiling is a value profiling per site. [0005] 5. Method according to claim 1, characterized in that the profiling is an object value profiling per site. [0006] 6. Method for profiling global parameter value per class, the method being characterized in that it comprises: initializing, by a computer, within a user-defined profile instrumentation initialization routine (402), a counter to profile one or more parameter values and register a profile generic registration function as a user-defined analysis callback routine (403); generating, by the computer, an instrumentation binary from an instrumentation build, the instrumentation binary containing at least one user-defined callback routine registered by the user in a profile initialization function; run, by the computer, a training run with one or more representative workloads using the instrumentation binary, the training run performing operations of: run, by the computer, a profile update function to update the parameter value profile defined in a code location where the counter value should be updated (404); and defined parsing callback routine, to write a preferred parameter value of the defined profile parameter values; and generate, by the computer using profile data collected during the training run, an optimized binary. [0007] 7. Method according to claim 6, characterized in that the counter is allocated at an input in a static counter arrangement. [0008] 8. Method according to claim 7, characterized in that the counter is allocated in the static counter array using a compiler extension. [0009] 9. Method according to claim 6, characterized in that the counter profile address is specified using a special purpose declaration attribute. [0010] 10. Method according to claim 6, characterized in that the profile initialization function is specified by the use of a special-purpose declaration attribute. [0011] 11. Method according to claim 6, characterized in that the profile manipulation function is registered using a GCC (GNU compiler collection) interface. [0012] 12. Method according to claim 6, characterized in that the preferred parameter value of the profile parameter values is recorded using a GCC (GNU compiler collection) interface. [0013] 13. Non-transient computer-readable media that store instructions, characterized in that instructions, when executed by one or more processors, cause the one or more processors to: profile one or more application-specific parameters for the which determines at least one application-specific value and collects profile data by performing the operations of: generating an instrumentation binary from an instrumentation build, the instrumentation binary containing at least one user-defined callback routine registered by the user in a profile boot role; execute, by the computer, a training run with one or more representative workloads using the instrumentation binary, the run invoking at least one user-defined callback routine (403) to write the at least one value application-specific to application in the profile data collected; analyzing the profile data (304) collected using a set of default value profile transformations; and an optimized binary using the collected profile data. [0014] 14. Computer readable non-transient media according to claim 13, characterized in that the profiling is a global value profiling per class. [0015] 15. Non-transient computer-readable media according to claim 13, characterized in that the profiling is a per-site value profiling. [0016] 16. Computer readable non-transient media according to claim 13, characterized in that the profiling is a per-site object value profiling. [0017] 17. Computer readable non-transient media according to claim 13, characterized in that the profile initialization function is specified by the use of a special purpose declaration attribute.
类似技术:
公开号 | 公开日 | 专利标题 BR112015024334B1|2022-01-25|Method for using profiling to determine application-specific values for an application, method for profiling global parameter value by class, and non-transient computer-readable media Della Toffola et al.2015|Performance problems you can fix: A dynamic analysis of memoization opportunities US20130031531A1|2013-01-31|Method and system for performing backward-driven path-sensitive dataflow analysis US7730470B2|2010-06-01|Binary code instrumentation to reduce effective memory latency Rul et al.2010|A profile-based tool for finding pipeline parallelism in sequential programs US8402435B1|2013-03-19|Systems and methods for organizing source code US20070174819A1|2007-07-26|Method for simplifying compiler-generated software code Sato et al.2012|Whole program data dependence profiling to unveil parallel regions in the dynamic execution Oh et al.2017|A generalized framework for automatic scripting language parallelization US9141356B2|2015-09-22|Process for generating dynamic type Peña et al.2016|A data-oriented profiler to assist in data partitioning and distribution for heterogeneous memory in HPC Calvert2010|Parallelisation of java for graphics processors Zhang et al.2019|Understanding the performance of GPGPU applications from a data-centric view Georgiou et al.2015|On the value and limits of multi-level energy consumption static analysis for deeply embedded single and multi-threaded programs Bluemke et al.2010|A tool supporting C code parallelization Ashraf et al.2017|Memory and communication profiling for accelerator-based platforms Zhang2018|Data-centric performance measurement and mapping for highly parallel programming models Janjusic et al.2014|Scalability analysis of gleipnir: A memory tracing and profiling tool, on titan Leuverink et al.2020|An implementation of Andersen-style pointer analysis for the x86 mov instruction Lee et al.2002|Benchmarking hpjava: Prospects for performance US20150082443A1|2015-03-19|System to automate compliance with licenses of software third-party content Baker et al.2020|Co-designing OpenMP Features Using OMPT and Simulation Tools Alam2014|Is fortran still relevant? comparing fortran with java and c++ Leopoldseder2019|Simulation-Based Code Duplication in a Dynamic Compiler/submitted by DI David Leopoldseder, BSc. Sima et al.2009|Runtime memory allocation in a heterogeneous reconfigurable platform
同族专利:
公开号 | 公开日 US20180107464A1|2018-04-19| KR101759256B1|2017-07-31| CN105637480A|2016-06-01| BR112015024334A2|2017-07-18| US9760351B2|2017-09-12| WO2014165515A1|2014-10-09| AU2014248296A1|2015-09-03| US20140298307A1|2014-10-02| CN105637480B|2020-03-03| US20200019390A1|2020-01-16| EP2981891B1|2018-03-28| US10365903B2|2019-07-30| AU2014248296B2|2017-01-19| EP2981891A1|2016-02-10| JP6275819B2|2018-02-07| KR20150138290A|2015-12-09| JP2016517109A|2016-06-09| DE202014010942U1|2017-01-26|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 US5815720A|1996-03-15|1998-09-29|Institute For The Development Of Emerging Architectures, L.L.C.|Use of dynamic translation to collect and exploit run-time information in an optimizing compilation system| JPH11149380A|1997-11-18|1999-06-02|Hitachi Ltd|Compiler, program optimizing method and recording medium recording its processing program| US8065504B2|1999-01-28|2011-11-22|Ati International Srl|Using on-chip and off-chip look-up tables indexed by instruction address to control instruction execution in a processor| US6742178B1|2000-07-20|2004-05-25|International Business Machines Corporation|System and method for instrumenting application class files with correlation information to the instrumentation| JP4184900B2|2003-08-26|2008-11-19|富士通株式会社|Compiling processing program for performing statistical processing of variables, recording medium thereof, processing method thereof, and processing apparatus thereof| US20050125784A1|2003-11-13|2005-06-09|Rhode Island Board Of Governors For Higher Education|Hardware environment for low-overhead profiling| US7769974B2|2004-09-10|2010-08-03|Microsoft Corporation|Increasing data locality of recently accessed resources| US7607119B2|2005-04-26|2009-10-20|Microsoft Corporation|Variational path profiling| KR100750834B1|2005-10-06|2007-08-22|아이피엠에스|A method of data call stack tracing in data monitoring of JAVA byte code and a device for storing the method in compter program type| US7908593B2|2007-01-04|2011-03-15|International Business Machines Corporation|Technique for evaluating software performance online to support online tuning| US8214806B2|2007-05-09|2012-07-03|International Business Machines Corporation|Iterative, non-uniform profiling method for automatically refining performance bottleneck regions in scientific code| US20090089805A1|2007-09-28|2009-04-02|Microsoft Corporation|Profiling techniques and systems for computer programs| US20100125838A1|2008-11-19|2010-05-20|Nokia Corporation|Method and Apparatus for Optimizing a Program| US8387026B1|2008-12-24|2013-02-26|Google Inc.|Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling| US8856767B2|2011-04-29|2014-10-07|Yahoo! Inc.|System and method for analyzing dynamic performance of complex applications| CN102609351B|2012-01-11|2015-12-02|华为技术有限公司|For the method, apparatus and system of the performance of analytic system| US9256410B2|2012-08-09|2016-02-09|Apple Inc.|Failure profiling for continued code optimization|US20150154100A1|2013-12-04|2015-06-04|International Business Machines Corporation|Tuning business software for a specific business environment| WO2016032495A1|2014-08-28|2016-03-03|Hewlett Packard Enterprise Development Lp|Sloppy feedback loop compilation| US9547483B1|2015-11-06|2017-01-17|International Business Machines Corporation|Feedback directed optimized compiling of optimized executable code| US9535673B1|2016-02-03|2017-01-03|International Business Machines Corporation|Reducing compilation time using profile-directed feedback| US9485320B1|2016-03-31|2016-11-01|International Business Machines Corporation|Monitoring and controlling perception of an online profile of a user| US10108404B2|2016-10-24|2018-10-23|International Business Machines Corporation|Compiling optimized entry points for local-use-only function pointers| US9785422B1|2016-10-31|2017-10-10|International Business Machines Corporation|Applying multiple rewriting without collision for semi-automatic program rewriting system| US10248554B2|2016-11-14|2019-04-02|International Business Machines Corporation|Embedding profile tests into profile driven feedback generated binaries| US10338932B2|2016-11-15|2019-07-02|Google Llc|Bootstrapping profile-guided compilation and verification| US10853044B2|2017-10-06|2020-12-01|Nvidia Corporation|Device profiling in GPU accelerators by using host-device coordination| US10922779B2|2018-12-28|2021-02-16|Intel Corporation|Techniques for multi-mode graphics processing unit profiling| US11120521B2|2018-12-28|2021-09-14|Intel Corporation|Techniques for graphics processing unit profiling using binary instrumentation|
法律状态:
2018-01-02| B25D| Requested change of name of applicant approved|Owner name: GOOGLE LLC (US) | 2018-03-27| B15K| Others concerning applications: alteration of classification|Ipc: G06F 9/00 (2018.01) | 2018-11-13| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]| 2020-03-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]| 2021-11-16| B09A| Decision: intention to grant [chapter 9.1 patent gazette]| 2022-01-25| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 01/04/2014, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 US13/855,557|US9760351B2|2013-04-02|2013-04-02|Framework for user-directed profile-driven optimizations| US13/855,557|2013-04-02| PCT/US2014/032530|WO2014165515A1|2013-04-02|2014-04-01|A framework for user-directed profile-driven optimizations| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|