专利摘要:
The invention relates to a method for the detection of abnormal conditions, in particular caused by manipulation, in a computer network (1), which comprises a plurality of computers (1a, 1b, 1c), wherein computers (1a, 1b, 1c) on the occurrence of predetermined events Create log data set (3a, 3b, 3c), wherein the log lines (3a, 3b, 3c) from the individual log files (4a, 4b, 4c) are homogenized and written to a central log file (4), where a recoded log file (5) the central log file (4) is created by converting consecutive characters or character strings of the central log file (4) into a recoded log file (5) on the basis of an encoding rule (f), the individual rows (5a, 5b , 5c) of the recoded log file (5) are analyzed for their similarity and grouped into groups (6a, 6b, 6c), and where, by groups (6a, 6b, 6c) with a low A Number of lines (5a, 5b, 5c), in particular with only a single line (5a, 5b, 5c), is searched.
公开号:AT518805A1
申请号:T50601/2016
申请日:2016-07-07
公开日:2018-01-15
发明作者:Fiedler Roman;Skopik Florian;Wurzenberger Markus
申请人:Ait Austrian Institute Tech Gmbh;
IPC主号:
专利说明:

The invention relates to a method for the detection of abnormal conditions in a computer network according to claims 1 and 2.
It is known from the prior art to examine log files created by different processes in order to determine whether the processes described in the log files represent an abnormal state of the processes or of the computer network in which these processes take place.
In the above-mentioned methods, there are substantial problems usually analyzed in human-readable log files for particular patterns, so as to detect operating conditions that are unusual or unique and indicate abnormal operating conditions. Concretely, individual methods are known from the prior art which connect different lines of protocol files which are associated with one another and thus detect typical patterns. In particular, such a procedure is known from the Austrian patent 514215.
Such approaches generally allow the discovery of anomalous conditions in a computer network, but are relatively complex and require, in particular, the combination of several, sometimes widely spaced, rows, resulting in an overall increased need for log data analysis resources.
It is an object of the present invention to provide a method for detecting abnormal conditions in a computer network that quickly and easily finds critical or abnormal conditions in the computer network or in the flow of individual processes running in the computer network. The invention solves this problem with the method according to the invention shown in claims 1 and 2.
According to the invention, the described method is used to detect anomalous conditions in computer networks comprising several computers, a) whereby protocols are created by the computers of the computer network or of processes running on these computers, b) by the computers or processes Occurrence of predetermined events for each of these events, a log record is created in the form of a log line consisting of a time stamp and a description record of the respective logged event, c) the log lines created by the computers or processes are stored in a log file associated with the computer or process, d) wherein the log lines from the individual log files are homogenized by being in a uniform format, in particular with a uniform time stamp format line by line and based on the timestamp in time, in a cent e) wherein a recoded log file of the central log file is created by line by line successive characters or strings of the central log file are converted due to the same predetermined, in particular lossy, coding rule in successive characters or strings of the recoded log file, f) especially in the context of the recoding, the order of the information contained in the individual characters within the descriptive data record of the individual protocol lines is retained, preferably the number of symbols used to describe the content is reduced, g) the individual lines of the recoded protocol file being analyzed for their similarity and grouped together because of their similarity, h) searching for groups with a small number of lines, in particular with only a single line, w ird, and i) if such lines exist, an anomalous condition is identified in the computer network.
The invention also achieves the object with a method for the detection of abnormal conditions, in particular caused by manipulation, in a computer network which comprises a plurality of computers, a) wherein protocols are created by the computers of the computer network or of processes running on these computers, b in which, on the occurrence of predetermined events for each of these events, the computers or the processes create a log record in the form of a log line, consisting of a time stamp and a description record of the respective logged event c) where the log lines created by the computers or processes are stored in a log file Computer or process associated log file are stored, d) wherein the log lines from the individual log files are homogenized by using a uniform format, in particular with a uniform timestamp format line by line and based on time-ordered, written to a central log file, e) wherein different centralized log files are created for different predetermined, especially equally long time periods based on the time stamps, f) wherein for each central log file each a recoded log file is created by line by line g) in which, in particular in the context of the recoding, the order of the information contained in the individual characters within the individual protocol line is obtained in the context of the recoding preferably the number of symbols used to describe the content is reduced, h) the individual lines of the recoded log files, in particular separately, i) where, for the individual groups of a recoded log file, a group of the previous recoded log file is found which contains similar rows and the groups are assigned to each other in such a way; then the mutually associated groups of two consecutive recoded log files are compared with each other, and k) that an abnormal state is detected, - if individual groups occur only for one of the log files, or - if the difference of the number of rows contained in each other groups one exceeds the given threshold.
With both methods it is easily possible to detect abnormal events or states in the respective network.
This concrete procedure makes it possible to find individual log lines that occur only rarely compared with the other log lines and therefore represent an anomalous state in the computer network.
This is ensured by the two steps according to the invention, namely the creation of a recoded log file on the one hand, in which the individual log lines created by the processes are converted according to a prescribed rule and, on the other hand, a grouping method in which the converted log lines are grouped according to their structure become.
In this method according to the invention, individual lines which do not correspond to the usual patterns can easily be found and in this way abnormal states can be detected. Alternatively, the invention also provides the ability to create individual log files for different time periods and to separately create a recoded log file for each log file and to examine this recoded log file separately with a grouping method. The investigation in this case is directed to the modification of the group.
A preferred embodiment of the invention, after carrying out a first-time grouping, enables a real-time analysis of the individual recorded protocol lines and a rapid detection of abnormal states in real-time operation. It is provided that after the grouping of the log lines based on a given central log file during operation of the computer network new log lines are obtained from the computers or processes, - for each of the log lines thus obtained each successive characters or strings of the log file due to the used Encoding rule can be transformed into consecutive characters or strings of a recoded log line, - it is examined whether the recoded log line can be assigned to one of the already created groups, and if this is not the case, an abnormal state is detected.
A particularly simple implementation of the grouping method is made possible if the grouping method is based on sequence alignments, wherein the sequence alignment of these recoded protocol lines is calculated to determine how similar or dissimilar two recoded protocol lines of the recoded protocol file are Similarity of Levenshtein distance of the two considered recoded protocol lines, in particular based on the sequence alignment, is calculated and then normalized by the length of the sequence alignment and thus the similarity measures of the recoded protocol lines, preferably on a scale of 0 to 1, in particular two recoded Log lines are considered to be similar if the similarity measure of the two recoded log lines exceeds a predetermined threshold.
To comply with data protection claims, the method provides that during the recoding of the log lines individual substrings of the log lines, which contain information related to the computer or the user of the computer, in particular IP addresses and usernames selected according to predetermined criteria and in anonymized and pseudonymized form are written to the recoded log file, while the remaining substrings are written to the recoded log file unchanged based on the encoding rule.
An advantageous preselection of relevant information that simplifies anomaly detection, provides that the coding rule is selected such that certain substrings that meet predetermined criteria, in particular occur at a frequency below a threshold, are transformed due to the coding rule without loss of information, while the remaining Text parts are lossy transformed.
Further data reduction provides that strings or substrings that occur in the log files with a threshold crossing frequency are identified and are discarded or lossy transformed by the encoding rule, while the remaining portions of the text are not lossy transformed.
An advantageous recoding of log lines provides that only the description records of the individual log lines are recoded, with each recoded log line being created by indexing each log line with a number that, inter alia, contains the time stamp and, in particular, enables the unique reconversion to the log line, from which the recoded logline was created using only the recoded descriptor record of a recoded logline for grouping.
An advantageous data reduction provides that the coding rule is selected such that individual characters or character strings, which are described in particular by plain text, preferably 96 characters, are each transformed into a character of a target alphabet, preferably with the exception in claims 6 and 7, preferably wherein the target alphabet has no more than 20 different symbols and / or less than 1/4 of the different symbols occurring in the central log file, and thus in particular means a data reduction, which leads to an acceleration of the grouping method.
When reduced to a total of 20 symbols in a target alphabet, grouping algorithms developed from bioinformatics, which routinely provide 20 input symbols corresponding to the individual canonical amino acids, can be used effectively.
Advantageous marking or marking of log files in which anomalous states occur, provides that each of the lines of the log file is assigned to the respective recoded line of the recoded log file, which was created by applying the encoding rule to the respective line of the log file, and those recoded lines of the recoded log file are identified, which are contained in groups with only one recoded line or a small number of recoded lines, and / or which are contained in groups to which no group of the preceding recoded log file could be assigned, and / or - contained in groups in which the (absolute) difference between the number of recoded lines contained in the respective group and the number of recoded lines contained in an associated group exceeds a predetermined threshold, and that the lines of Pr log file or the central log file as an indicator of anomalous state to which the recoded log file recoded lines thus identified have been assigned.
This allows a simple check of the abnormal state or a simple error handling.
A preferred embodiment of the invention will be described in more detail with reference to the following drawing figures.
Fig. 1 shows a computer network 1 consisting of computers 1a, 1b, 1c, in which a plurality of processes 2a, 2b, 2c take place. The processes 2a, 2b, 2c create at different times protocol messages in the form of protocol lines 3a, 3b, 3c, which are stored in each case a log file 4a, 4b, 4c. The individual protocol lines 3a, 3b, 3c are usually written in the order of their arrival in the log files 4a, 4b, 4c.
The protocol lines 3a, 3b, 3c from the individual log files consisting of a time stamp 31a, 31b, 31c and a description data set 32a, 32b, 32c are homogenized. The protocol lines 3a, 3b, 3c are converted into a uniform format. In the homogenization, the time stamps 31a, 31b, 31c of protocol lines 3a, 3b, 3c are converted from different sources into a uniform format. Among other things, the format of the timestamps 31a, 31b, 31c is standardized in order to enable the comparison of these. The individual protocol lines 3a, 3b, 3c of the log files 4a, 4b, 4c are combined to form a single central log file 4, the individual log lines 3a, 3b, 3c being sorted according to the time coded in the time stamps 31a, 31b, 31c ,
A first embodiment of the invention operates only on a single central log file 4, in which the individual log lines 3a, 3b, 3c, which were recorded during a predetermined period of time, are stored. From the central log file 4, a recoded log file 5 is created in accordance with the procedure described below.
The following shows the creation of a recoded log file 5 based on a coding rule f (x). When the recoded log file 5 is created on the basis of a central log file 4, the central log file 4 is split into its individual log lines 3a, 3b, 3c, which are each separately transformed into the lines 5a, 5b, 5c of the recoded log file 5. Only the description data set 32a, 32b, 32c of the protocol lines 3a, 3b, 3c is always recoded.
The recoding of the central log file 4 takes place line by line, wherein the recoded log file 5 is created from the central log file 4. Successive lines or character strings of the central log file 4 are converted or transformed into successive characters or strings of the recoded log file 5 on the basis of the coding rule f (x). In this case, the internal structure of the rows is maintained, so that in each case a recoded line 5a, 5b, 5c of the recoded log file 5 results from each line 3a, 3b, 3c of a log file. Only the description data set 32a, 32b, 32c of the protocol lines 3a, 3b, 3c is always recoded.
In the embodiment of the invention presented below, each individual character of a protocol line 3a, 3b, 3c of the central protocol file 4 is converted into a character of a protocol line 5a, 5b, 5c of the recoded protocol file 5. In the present embodiment, the output alphabet of the central log file 4 comprises a number of 256 characters, each character corresponding to a number between 0 and 255 in ASCII code. This number associated with the character is converted into a number between 0 and 19 according to the coding rule used in the present embodiment according to the formula y = f (x) = x mod 20, in order to speed up the application of the further method and also to a higher data reduction achieve.
In a further step, the individual lines 5a, 5b, 5c of the recoded log file 5 are analyzed for their similarity in order to be grouped together (FIG. 2) on the basis of their similarity to groups 6a, 6b, 6c. The grouping methods used in the invention are preferably based on the application of a distance function which indicates the similarity between two lines 5a, 5b, 5c of the recoded log file 5. In an advantageous embodiment of the invention can be selected as a distance function of Levenshtein distance between two strings. This distance function is suitable for character strings, i. recoded log lines 5a, 5b, 5c, different length to each other and thus allows easy quantification of the difference or similarity of two lines 5a, 5b, 5c of the recoded log file. 5
Particularly preferably, the grouping method is based on a sequence alignment, wherein to determine how similar or dissimilar two lines 5a, 5b, 5c of the recoded protocol file 5 are, the sequence alignment of these recoded protocol lines 5a, 5b, 5c is calculated. For numerical determination of the similarity, the Levenshtein distance of the two examined recoded protocol lines 5a, 5b, 5c is calculated based on their sequence alignment. Subsequently, the value thus determined is normalized by the length of the sequence alignment, whereby a similarity measure of the recoded log lines 5a, 5b, 5c is determined, which has values on a scale of 0 to 1 due to the normalization. Two recoded log lines are then considered similar and assigned to the same group if the similarity measure of each of two recoded log lines 5a, 5b, 5c exceeds a predetermined threshold. Depending on how exactly the desired result should be, the threshold can be selected higher or lower.
A variety of grouping methods are known in the art which rely on the application of a distance function and allow for merging individual rows into groups 6a, 6b, 6c or groups of mutually similar rows. Such clustering methods are exemplified by CLIQUE (Agrawal, R., Gehrke, J., Gunopoulos, D., & Raghavan, P. (1998) Automatic subspace clustering of high dimensional data for data mining applications (Vol 27, No 2, pp. 94-105), ACM.), MAFIA (Goil, S., Nagesh, H., & Choudhary, A. (1999, June) MAFIA: Efficient and scalable subspace clustering for very large data sets In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 443-452), ACM.), CACTUS (Ganti, V., Gehrke, J., & Ramakrishnan, R. (1999, August CACTUS-clustering categorical data using summaries In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 73-83) ACM.), PROCULUS (Aggarwal, CC, Wolf, J. L, Yu , PS, Procopiuc, C., & Park, JS (1999, June) .Alarm algorithms for projected clustering. In ACM SIGMoD Record (Vol.28, No. 2, pp. 61-72) .ACM.) And SLCT (Vaarandi, R. (2003, October). A data cl ustering algorithm for mining patterns from event logs. In Proceedings of the 2003 IEEE Workshop on IP Operations and Management (IPOM) (pp. 119-126).
As a result of the grouping process, a plurality of groups 6a, 6b, 6c are obtained, each containing similar or identical rows.
If some of the recoded log lines 5c differ substantially from the other lines 5a, 5b of the recoded log file 5, the determined distance or the determined distance to the other lines is very large, so that they do not coincide with the same group 6a, 6b the remaining lines are, but in the course of the grouping process as a loner or Outlier be detected. Groups 6c that contain only a single line 5c or very few lines may indicate the presence of critical or anomalous states, since they do not correspond to an event that occurs regularly in the network or in the individual processes.
In order to further reduce the data to be analyzed, it is possible to not transfer individual log lines 3a, 3b, 3c to the recoded log file 5 and not to use the grouping method. For this purpose, different criteria can be specified, to which individual log lines 3a, 3b, 3c must correspond in order to be excluded from the recoding or the grouping method. Such a criterion can be determined, for example, by excluding from the further processing protocol lines 3a, 3b, 3c which contain particular substrings or which contain a specific pattern. Such substrings or patterns can be defined, for example, in the form of regular expressions.
In addition, there is also the possibility that individual lines 3a, 3b, 3c, although the recoding or the grouping method are used as a basis, individual substrings of 3a, 3b, 3c, but either deleted at all or be recoded with loss of information. This means that the concrete content of the line 3a, 3b, 3c can no longer be transformed back from the line 5a, 5b, 5c of the recoded log file 5. Portions of lines that satisfy certain criteria, such as numerical values, can be transformed without loss of information.
In a preferred embodiment, a coding rule may be chosen such that particular individual strings satisfy the predetermined criteria and occur infrequently, without loss of information, while the remaining portions of the text are transformed lossy. This has the advantage that individual recurring parts of the text, which have only little information content, only need a few resources in the context of the similarity comparison.
In addition, there is also the possibility that as part of the recoding each recoded protocol line 5a, 5b, 5c is created by the respective protocol line 3a, 3b, 3c, on which it is based, indexed with a number, inter alia, the time stamp 31a , 31b, 31c and in particular allows a unique reconversion into the protocol line 3a, 3b, 3c, from which the recoded protocol line 5a, 5b, 5c was created. Only the recoded descriptive record of a recoded protocol line 5a, 5b, 5c is used for grouping.
Another preferred parallel or alternatively applicable possibility of designing a recoding rule is that individual substrings of the protocol lines 3a, 3b, 3c, which indicate on the computers concerned 1a, 1b, 1c or the user of the computer 1a, 1b, 1c, in particular IP Addresses or usernames, anonymized / pseudonymized stored in the recoded log lines. During the step of the recoding of the log lines, the respective substrings of the log lines in which the relevant personal information is located are selected according to predetermined criteria, in particular by means of a regular expression search or a database comparison, and in the anonymized or pseudonymized form in the recoded Log file 5 written. The remaining substrings can be changed in the context of the recoding unchanged or changed according to another coding rule in the recoded log file 5.
In addition, it can be determined in the coding rule that individual characters or character strings are described in plain text by means of 96 mutually distinguishable characters of a character set, are transformed into a target alphabet having 20 letters. Advantageously, the number of characters of the target alphabet is less than 1/4 of the number of symbols of the alphabet in which the central log file 4 is coded. At best, individual strings can be excluded from this and either discarded at all or, on the other hand, transformed without loss of information. In particular, it is also possible to represent a character in a protocol line lossless by two characters of the recoded protocol line.
Thus, for example, there is the possibility of frequently occurring character strings of the protocol lines 3a, 3b, 3c, which indicate, for example, the nature of a process 2a, 2b, 2c and are therefore only of minor importance, with a few characters, which means in particular a data reduction and loss of information while rarely occurring character strings of the protocol lines 3a, 3b, 3c, which indicate, for example, parameter values and are therefore of greater importance, are recoded without loss of information.
The above methods can be easily continued after the first-time performing a grouping process in real time. In this case, each of the individual processes 2a, 2b, 2c continuously creates new log lines 3a, 3b, 3c and converts them into recoded log lines 5a, 5b, 5c in accordance with the previously used coding rule f (x). It is examined whether the recoded protocol line thus obtained can be assigned to an already created group 6. If this is not the case, then an abnormal condition is detected.
The above methods can be easily continued after the first-time performing a grouping process in real time. In this case, each of the individual processes 2a, 2b, 2c continuously creates new log lines 3a, 3b, 3c and converts them into recoded log lines 5a, 5b, 5c in accordance with the previously used coding rule f (x). In this case, the grouping method is again applied to the specified time range and then, as before, the groups thus obtained are compared with the groups from the past time ranges. If a group can not be assigned to a group of the past time range, or if these change too much in size, an abnormal state is detected.
In a further preferred embodiment of the invention, shown in FIG. 3, groups are created for different, mostly equally long, periods of time as in the first exemplary embodiment of the invention (FIG. 3). Thus, for example, during a first day, a first central log file 41 can be created, from which a first recoded log file 51 is derived, during a second tag a second central log file 42 will be created according to the same rules, which will also be recoded into one according to the same rules Log file 52 is converted according to the same encoding rule f (x). For each individual of the recoded log files 51, 52, a grouping procedure is carried out separately, with individual groups 61 a, 61 b, 62 a, 62 b, 62 c of lines of the recoded log files 51, 52 being available separately for each central log file 41, 42.
The individual groups 61a, 61b, 62a, 62b, 62c of the individual log files can be assigned to one another, for example by generating one of the groups 62a, 62b of the second recoded log file 52 for each group 61a, 61b which was created from the first recoded log file 51 in which the distances of the rows assigned to the respective groups 61a, 61b, 62a, 62b are the lowest. Subsequently, it is examined whether the individual groups 61a, 62a differ significantly in the respective log files. If this is the case, for example, if the difference in the number of lines contained in each other groups exceeds a predetermined threshold, an abnormal state is detected. In addition, an abnormal state may be detected even if individual groups 62c occur only in one log file but not in the other log file.
Alternatively, it is also possible to apply the grouping method to all log lines created by the processes. In this case, the groups contain lines from different recoded log files, which groups can then be subdivided into recoded log files. In this case, a subsequent search of the assignment of the groups is not required.
If a comparison of the groups shows that individual groups only consist of rows from individual log files, while rows contained in this group in other recoded log files can not be assigned to similar rows, then an abnormal status is detected. The same applies even if the number of lines contained in the groups assigned to each other differs greatly, i. the difference between the number of lines contained in the individual groups assigned to one another exceeds a predetermined threshold value.
The procedure according to the invention also makes it possible to locate the individual lines 3a, 3b, 3c of the log files 3 in a simple manner, in which states of the processes which indicate an anomalous state are documented. In order to enable such an assignment, each of the lines 3a, 3b, 3c of the log file 3 is assigned the respective line 5a, 5b, 5c of the recoded log file 5 which, on the basis of the line 3a, 3b, 3c of the log file 3 and the coding rule f (FIG. x) was created. This is done by a clear indexing of the protocol lines, which contains, inter alia, the timestamp 31a, 31b, 31c. Each line 3a, 3b, 3c of the log file 3 is assigned to the line 5a, 5b, 5c of the recoded log file 5, which was created by applying the coding rule f (x) to the respective line 3a, 3b, 3c of the log file 3. Subsequently, those lines 5a, 5b, 5c of the recoded log file 5 are sought, which are contained in groups 6 with only one line or a small number of lines 5a, 5b, 5c. Alternatively, it is also possible to search for the lines 5a, 5b, 5c of the recoded log file 5 which are contained in groups 62c (FIG. 4) to which no group 61 of the other log file 61 could be assigned and / or in groups 61a, 62a are included, wherein the amount difference of the number of lines contained in the respective group 61 a, 62 a to the number of rows contained in an associated group 6 exceeds a predetermined threshold. After the lines of the recoded log file 5 have been identified, search is made for those lines 3a, 3b, 3c of the log file 3, 31, 32 to which the identified lines of the recoded log file 5, 51, 52 have been assigned. These lines 3a, 3b, 3c of the log file 3 are marked as an abnormal state indicator and displayed to the user or made available for further processing. Alternatively, it is also possible to search only for the protocol lines 3a, 3b, 3c, which correspond to the recoded protocol lines 5a, 5b, 5c, which describe a potentially abnormal state.
权利要求:
Claims (11)
[1]
claims:
A method for detecting abnormal conditions, in particular caused by manipulation, in a computer network (1) comprising a plurality of computers (1a, 1b, 1c), a) of the computers (1a, 1b, 1c) of the computer network (1 ) or protocols (2a, 2b, 2c) running on these computers (1a, 1b, 1c) respectively, b) being generated by the computers (1a, 1b, 1c) or the processes (2a, 2b, 2c) upon occurrence of predetermined events for each of these events, a log record is prepared in the form of a log line (3a, 3b, 3c) consisting of a timestamp (31a, 31b, 31c) and a description record (32a, 32b, 32c) of the respective logged event; c) wherein the protocol lines (3a, 3b, 3c) created by the computers (1a, 1b, 1c) or processes (2a, 2b, 2c) are stored in a computer (1a, 1b, 1c) or process (2a, 2b, 2c) associated with the corresponding log file (4a, 4b, 4c), d) wherein the log lines (3a, 3b, 3c) from the individual log files (4a, 4b, 4c) are homogenized by being written in a uniform format, in particular with a uniform time stamp format line by line and based on the time stamp (31a, 31b, 31c), in a central log file (4), e) wherein a recoded log file (5) of the central log file (4) is created by sequentially sequential characters or character strings of the central log file (4) due to the same predetermined, in particular lossy, coding rule (f) in successive characters or strings of the recoded F) wherein, in particular in the context of the recoding, the order of the information contained in the individual characters within the descriptive data record of the individual protocol lines (3a, 3b, 3c) is retained, preferably the number of symbols used to describe the content is reduced, g) wherein the individual lines (5a, 5b, 5c) of the recoded log file (5) are analyzed for their similarity and are grouped into groups (6a, 6b, 6c) on the basis of their similarity, h) according to groups (6a, 6b, 6c) with a small number of lines (5a, 5b, 5c), in particular with only a single line (5a, 5b, 5c), and i) if such lines (5a, 5b, 5c) are present, an anomalous state in the computer network (1) is identified.
[2]
2. A method for detecting abnormal conditions, in particular caused by manipulation, in a computer network (1), which comprises a plurality of computers (1a, 1b, 1c), a) of the computers (1a, 1b, 1c) of the computer network (1 ) or protocols (2a, 2b, 2c) running on these computers (1a, 1b, 1c) respectively, b) being generated by the computers (1a, 1b, 1c) or the processes (2a, 2b, 2c) upon occurrence of predetermined events for each of these events, a log record in the form of a log line (3a, 3b, 3c) is created, consisting of a timestamp (31a, 31b, 31c) and a description record (32a, 32b, 32c) of the respective logged event c ) wherein the protocol lines (3a, 3b, 3c) created by the computers (1a, 1b, 1c) or processes (2a, 2b, 2c) are stored in a computer (1a, 1b, 1c) or process (2a, 2b, 2c ) associated log file (4a, 4b, 4c) are stored, d) wherein the log lines (3a, 3b, 3c) from the individual log files (4a, 4b, 4c) are homogenized by being written in a uniform format, in particular with a uniform time stamp format line by line and based on the time stamp (31a, 31b, 31c), in a central log file (4), e) wherein different centralized log files (41, 42) are created for different predetermined, in particular equally long time ranges based on the time stamps (31a, 31b, 31c), f) wherein for each central log file (41, 42) a recoded log file ( 51, 52, 53) is created by converting consecutive characters or character strings of the central log file (41, 42) into successive characters or character strings of the recoded log files (51, 52) on the basis of the same predetermined, in particular lossy, coding rule (f) g) in which, in particular in the context of the recoding, the order of in the individual characters contained within the individual protocol line (3a, 3b, 3c), preferably the number of symbols used to describe the content is reduced, h) wherein the individual lines (5a, 5b, 5c) of the recoded log files (51 , 52), in particular separately, are analyzed with regard to their similarity and, because of their similarity, are combined into groups (61a, 61b, 62a, 62b, 62c), i) where for the individual groups (61a, 61b, 62a, 62b, 62c) a recoded log file (51, 52) is determined in each case a group (61a, 61b, 62a, 62b, 62c) of the temporally preceding recoded log file (51, 52) containing similar lines and the groups (61a, 61b, 62a, 62b , 62c) are assigned to each other in such a way, j) wherein subsequently the mutually associated groups (61a, 61b, 62a, 62b, 62c) of two temporally successive recoded log files (51, 52) are compared with each other, and k) da ss an abnormal state is detected, - if individual groups (61a, 61b, 62a, 62b, 62c) occur only for one of the log files, or - if the difference in the number of mutually associated groups (61a, 61b, 62a, 62b, 62c) exceeds a predetermined threshold.
[3]
3. The method according to claim 1 or 2, characterized in that after grouping the protocol lines based on a predetermined central log file (4; 41, 42) during operation of the computer network new log lines (3a, 3b, 3c) from the computers ( 1a, 1b, 1c) or processes (2a, 2b, 2c), wherein for each of the protocol lines (3a, 3b, 3c) thus obtained consecutive characters or character strings of the protocol file due to the coding rule used in successive characters or It is examined whether the recoded protocol line (5a, 5b, 5c) of one of the already created groups (6a, 6b, 6c; 61a, 61b; 62a, 62b, 62c), and if not, an abnormal condition is detected.
[4]
4. Method according to one of the preceding claims, characterized in that the grouping method is based on sequence alignment, wherein for determining how similar or dissimilar two recoded protocol lines (5a, 5b, 5c) of the recoded log files (5; 51, 52) are , the sequence alignment of these recoded log lines is calculated, wherein for numerical determination of the similarity of the Levenshtein distance of the two considered recoded log lines (5a, 5b, 5c), in particular based on the sequence alignment, is calculated and then normalized by the length of the sequence alignment and the similarity measures of the recoded protocol lines (5a, 5b, 5c) are preferably on a scale of 0 to 1, whereby in particular two recoded protocol lines (5a, 5b, 5c) are considered similar if the similarity measure of the two recoded protocol lines (FIG. 5a, 5b, 5c) exceed a predetermined threshold tet.
[5]
5. The method according to any one of the preceding claims, characterized in that during the recoding of the protocol lines (3a, 3b, 3c) individual substrings of the protocol lines (3a, 3b, 3c), which on the respective computer (1a, 1b, 1c) or the user of the computer (1a, 1b, 1c) related information, in particular IP addresses and usernames included, selected according to predetermined criteria and written in anonymized and pseudonymized form in the recoded log file (5; 51, 52), while the remaining substrings be written unchanged based on the coding rule in the recoded log file (5; 51,52).
[6]
6. The method according to any one of the preceding claims, characterized in that the coding rule (f) is selected such that certain substrings that meet predetermined criteria occur, in particular with a threshold below a lower frequency, are transformed due to the coding rule (f) without loss of information while the remaining parts of the text are transformed lossy.
[7]
Method according to one of the preceding claims, characterized in that character strings or sub-character strings which occur in the protocol files (4a, 4b, 4c) with a threshold-exceeding frequency are identified and are rejected by the encoding rule (f) or are lossy transformed while the remaining parts of the text are not transformed lossy.
[8]
8. The method according to any one of the preceding claims, characterized in that only the descriptive data sets of the individual protocol lines (3a, 3b, 3c) are recoded, each recoded protocol line (5a, 5b, 5c) is created by each protocol line (3a, 3b , 3c) is indexed with a number, inter alia, the time stamp (31a, 31b, 31c) and in particular the unique back conversion into the protocol line (3a, 3b, 3c), from which the recoded protocol line (5a, 5b, 5c ), whereby only the recoded descriptive data record of a recoded protocol line (5a, 5b, 5c) is used for grouping.
[9]
9. The method according to any one of the preceding claims, characterized in that the coding rule (f) is selected such that individual characters or character strings, which are described in particular by plain text, preferably 96, characters, each transformed into a character of a target alphabet , preferably with the exception of the sub-strings mentioned in claims 6 and 7, wherein preferably the target alphabet has not more than 20 different symbols and / or less than 1/4 of the different symbols appearing in the central log file (4; means a data reduction that leads to an acceleration of the grouping process.
[10]
10. The method according to any one of the preceding claims, characterized in that each of the rows (3a, 3b, 3c) of the log file (4a, 4b, 4c) the respective recoded line (5a, 5b, 5c) of the recoded log file (5; , 52), which was created by applying the coding rule to the respective line (3a, 3b, 3c) of the log file (4a, 4b, 4c), and that those recoded lines (5a, 5b, 5c) of the recoded log file (FIG. 5, 51, 52), which are arranged in groups (6a, 6b, 6c, 61a, 61b, 62a, 62b, 62c) with only one recoded line (5a, 5b, 5c) or a small number of recoded lines (FIG. 5a, 5b, 5c) are contained, and / or - which are contained in groups (62c), to which no group (61a, 61b) of the respective preceding recoded log file (51, 52) could be assigned, and / or - which in Groups (61a, 61b, 62a, 62b, 62c) are included in which the (absolute) difference of the number of in the respective group (61a, 61b, 62a, 62b, 62c) contained recoded lines (5a, 5b, 5c) to the number of in an associated group (61a, 61b; 62a, 62b, 62c) exceeds a predetermined threshold, and that the thus identified lines (3a, 3b, 3c) of the log file (4a, 4b, 4c) or the central log file (4; 41, 42) are marked as an abnormal state indicator to which the recoded lines (5a, 5b, 5c) thus identified have been assigned to the recoded log file (5; 51, 52).
[11]
11. disk on which a program for carrying out a method according to one of claims 1 to 10 is stored.
类似技术:
公开号 | 公开日 | 专利标题
EP3267625B1|2018-11-14|Method for detection of abnormal conditions in a computer network
DE102014204830A1|2014-09-18|Computer-implemented systems and methods for comparing and associating objects
DE102014204834A1|2014-09-18|Computer-implemented systems and methods for comparing and associating objects
DE60121231T2|2007-06-06|DATA PROCESSING
DE19627472A1|1998-01-15|Database system
EP2940924A1|2015-11-04|PUF based Derivation of a device-specific value
EP2323083A1|2011-05-18|Technical classification system
EP3049965B1|2020-06-03|Automatic data harmonisation
WO2009149926A2|2009-12-17|System and method for the computer-based analysis of large quantities of data
EP3528162B1|2020-06-10|Method for recognizing abnormal operational states
EP3719651A1|2020-10-07|Method for characterizing the operating state of a computer system
DE60007633T2|2004-11-18|CONTENT-BASED PLAYBACK OF SERIAL DATA
DE112016004924T5|2018-08-02|System for excavating a user cycle mode and its method
WO2003094093A2|2003-11-13|Comparison of processing protocols
DE102005008844B4|2009-09-17|Method for computer-aided classification of data and apparatus for carrying it out
EP2927818B1|2019-05-29|Method for automatically processing a number of protocol files of an automation system
WO2012017056A1|2012-02-09|Method and apparatus for automatically processing data in a cell format
DE102009016588A1|2010-10-14|Method for determination of text information from portable document format documents, involves reading portable document format document, and analyzing structure of portable document format document
EP3961447A1|2022-03-02|Method for detecting abnormal operating states of a computer system
AT523829B1|2021-12-15|Method for detecting abnormal operating states of a computer system
EP3787229A1|2021-03-03|Method and device for automatically selecting analysis strings for feature extraction
EP2530604B1|2013-09-18|Computer-implemented method and device for producing a structure tree
DE112017006528T5|2019-09-26|ATTACK / ABNORMALITY DETECTION DEVICE, ATTACK / ABNORMALITY DETECTION PROCEDURE AND ATTACK / ABNORMALITY DETECTION PROGRAM
EP0563077B1|1997-07-16|Method of detecting, by computing machine, identical data elements in two data sequences
EP1318462A1|2003-06-11|Data administration and search method
同族专利:
公开号 | 公开日
EP3267625B1|2018-11-14|
AT518805B1|2018-05-15|
EP3267625A1|2018-01-10|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
EP2299650A1|2009-09-21|2011-03-23|Siemens Aktiengesellschaft|Method for recognising anomalies in a control network|
EP2800307A1|2013-04-29|2014-11-05|AIT Austrian Institute of Technology GmbH|Method for detecting deviations from a given standard state|AT520746B1|2018-02-20|2019-07-15|Ait Austrian Institute Tech Gmbh|Method for detecting abnormal operating conditions|US20070300300A1|2006-06-27|2007-12-27|Matsushita Electric Industrial Co., Ltd.|Statistical instrusion detection using log files|AT521665B1|2018-06-11|2021-05-15|Ait Austrian Inst Tech Gmbh|Grammar recognition|
AT522281A1|2019-04-02|2020-10-15|Ait Austrian Institute Tech Gmbh|Method for characterizing the operating status of a computer system|
AT523829B1|2020-07-28|2021-12-15|Ait Austrian Inst Tech Gmbh|Method for detecting abnormal operating states of a computer system|
AT523948A1|2020-09-01|2022-01-15|Ait Austrian Inst Tech Gmbh|Method for detecting abnormal operating states of a computer system|
法律状态:
优先权:
申请号 | 申请日 | 专利标题
ATA50601/2016A|AT518805B1|2016-07-07|2016-07-07|A method for detecting abnormal conditions in a computer network|ATA50601/2016A| AT518805B1|2016-07-07|2016-07-07|A method for detecting abnormal conditions in a computer network|
EP17179531.3A| EP3267625B1|2016-07-07|2017-07-04|Method for detection of abnormal conditions in a computer network|
[返回顶部]