# CORRELATION ANALYSIS

CORRELATION ANALYSIS — set of the methods of assessment of communication between the accidental phenomena and events based on the mathematical theory of correlation. At the same time the elementary characteristics demanding a minimum of calculations are used. The term «correlation» is usually identified with the concepts «communication» and «interdependence». However they are not adequate. Correlation is only one of types of communication between signs, edges is shown on average and has linear character. If between two sizes there is an unambiguous communication, then such communication is called functional and (reason) can determine by one of sizes unambiguously value of other size (investigation). Funkts, dependence is private expression accidental (probabilistic, stochastic) dependences when communication is shown not for each values of two sizes but only on average.

To. and. it is applied during the studying of two or bigger quantities of variates for the purpose of detection of two major quantitative characteristics: the mathematical equation of communication between these sizes and estimates of narrowness of communication between them. As basic data for definition of these characteristics serve synchronous results of observation (measurement, an experiment), i.e. the statistical data on signs which are at the same time obtained from experience, communication between to-rymi is studied. Basic data can be set in the form of tables with records of results of observation or their equivalent representations on a magnetic tape, a punched tape or punched cards.

To. and. found broad application in medicine and biology for definition of narrowness and the equations of communication between various signs, napr, results of analyses a wedge, signs or the special examinations conducted over healthy or sick people (see. Correlation of functions of an organism ). Results To. and. are used for drawing up objective forecasts of diseases, assessment of a condition of the patient, disease (see. Forecasting ). A priori, only by results of theoretical biol, and medical researches, difficult or at all it is impossible to foretell how the studied signs are connected among themselves. To answer this question, make observation or a special experiment.

Two-dimensional correlation analysis is applied during the processing of the experimental data of manifestation of any two signs.

CORRELATION TABLE. Note. The intervals of signs of X and Y, and also frequency of their emergence (in the center of the table) counted by results of the morphometric analysis of a microcirculator bed of bulbokonjyunktivalny area where Y — diameter of a venule, and X — diameter of an arteriole (in MMK) are given in the table.

Each result of experience represents a variate, and objective patterns are shown only in all set of results of measurement. Therefore conclusions are drawn by results of processing of all set of experimental data, but not on separate values which are accidental. For reduction of influence of an accidental event basic data combine in groups that is reached by drawing up the correlation table (see the tab.). Such table contains intervals (or their middle) values of two signs — Wu Yi of X, and also the frequency of emergence of X and Y values [m ij (x, s)] in the corresponding interval of these values. These frequencies counted by results of experience represent practical assessment of probability of joint emergence of values X and Y of a specific interval. Creation of the correlation table is the first stage of processing of initial information. Creation of correlation tables and their further full processing are carried out quickly on universal or specialized COMPUTERS (see. Electronic computer ). According to the grouped data of the correlation table calculate empirical characteristics of the equation and narrowness of communication. For definition of the equation of communication between Y and X calculate average values of a sign of Y in each interval of a sign of X. Thus receive for each i-go of an interval Yxi value which connection for all i-intervals gives the empirical regression line characterizing a form of communication of a sign of Y with a sign of X on average — a function graph of Yx = to f (x). If between signs of Y and X there was an unambiguous communication, there would be enough equation of communication for the solution of practical and theoretical tasks since with its help it is always possible to define value of a sign of Y if a preset value of X. In practice communication between Y and X is not unambiguous, this communication is accidental and to one value X there corresponds a number of values Y. Therefore one more characteristic measuring force, narrowness of communication between Y and X is necessary. Such characteristics are the dispersive (correlation) relation ηух and a correlation coefficient of ryx. The first of these sizes serves as the characteristic of narrowness of communication between Y and X in arbitrary function of f, and ryx — is used only in case f is linear function.

Sizes ηyx and ryx also simply are determined by the correlation table. Calculation is usually conducted in the following order: determine average values of both signs of X and Y, their devirage quadratic deviations σx and σy, and then ηxy by a formula:

and ryx on a formula:

where n — total number of experiences, Xcpi — average value X i-go of an interval, Ycpj — an average value of Y j-go of an interval, k, l — quantity of intervals of signs of X and Y respectively, mi(x) — the frequency (quantity) of Xcpi values. Quantitative characteristics of accuracy of definition ηyx and ryx serve their devirage quadratic deviations which are equal

Values of coefficient η lie in limits between zero and unit <<(0=ηyx=1). If ηyx = 0 (fig., a), then it demonstrates that signs of Y and the X nedispersirovana, i.e. regression of Yx = does not give to f (x) communication between signs of Y and X, and at ηyx = 1 there is an unambiguous communication between Y and X (fig., g). For <ηyx1 the sign of Y only partially is defined by a sign of X, and studying of accessory signs is necessary for increase in reliability of definition of Y (fig., d, e, i).

Paul of correlation and regression lines depending on values of the dispersive relation ηyx and a correlation coefficient of ryx. The field of correlation is designated by set of points, each of which represents result of separate measurement (observation); regression lines is a graphic representation of functional dependence of a sign of Y on a sign of X. It is visible that between them there is no accurate communication at ηyx = 0 (a); at ηyx — 1 (6, g) there is an accurate and unambiguous dependence; at ηyx< 1 (d, e, i) the sign of Y only partially is defined by a sign of X (big «dispersion» of values of separate measurements is visible). Similarly, when there is no ryx = 0 (and, g), between signs of Y and X linear relation; at ryx = + 1 (b) or ryx =-1 dependence between signs has linear character (proportional at positive and inversely proportional — at negative ryx value; at ryx =< +1 (d, e) the sign of Y only partially is defined by values X.

Value of coefficient of r lies in limits between — 1 and +1 (— <<1=ryx=1). When ryx = 0 (fig., and, g), between signs of Y and Z does not exist linear relation, i.e. they are not correlated. If ryx = +1 or ryx = — 1 (fig., c), then between Y and X exists unambiguous linear relation and it is possible to determine a sign of Y by a preset value of a sign of X. At the same time at ryx = +1 communication between Y and X is directly proportional (with growth of X grows and Y), and at ryx = — 1 — an obratnoproportsionalna (Y decreases with growth of X). When ryx =< +1 (fig., d, e), a sign of Y only partially is defined by a sign of X and increase in reliability of definition of Y can be reached only due to inclusion in consideration additional connected from Y signs. Determination of the sizes ryx and ηyx for signs of X and U which are summed up shows in the correlation table which is stated above that sizes of a correlation coefficient (ryx) and the dispersive relation (ηyx) are approximately equal 0,5, corresponding, thus, to the graphic representation in fig., d, i.e. linear relation between signs of X and Y is traced. At the same time specification of nature of communication between these signs requires carrying out additional observations (inspections). In case of nonlinear regression [Yx = f(x)] the correlation coefficient underestimates the valid narrowness of communication between signs of Y and X (fig., i) and in this case it is necessary to use the dispersive relation ηyx.

Multidimensional correlation analysis — definition of the equation and narrowness of communication in cases, when number of the studied signs more than two. So, if Y is a difficult sign and its outcome depends on emergence of a set of signs of X1, X2..., Xn, according to experimental data, shall be defined: a) equation of communication of a sign of Y with set of signs of X1, X2..., Xn, i.e. Yx1x2... xn = F (x1, x2..., xn); b) narrowness of communication between Y and set of X1, H2..., Xn.

Pretreatment of results of observation at multidimensional To. and. is that for each couple of signs values of the dispersive relations ηyxi (i = 1,2 are defined..., n) and ηxixj (i! =j) correlation coefficients of ryxi and rxixj, and also pair regressions of Yxi = fi(xi). The equations of multiple regression of Yx1x2 are determined by these data then... xn = F (x1, x2..., xn), multiple dispersive relation ηyx1x2... xn and multiple correlation coefficient of Ryx1x2... xn. The equation of multiple regression gives the chance to determine value of a sign of Y by set of X1, H2 values..., Xn, i.e. in the presence of this equation it is possible to predict values Y by results of specific values of the received set (e.g., results of the analysis on signs of X1, H2... Xn). Value ηyx1x2... xn is used as the characteristic of narrowness of communication between Y and set of signs of X1, X2... Xn for arbitrary function of F, a Ryx1x2... xn — for a case when function F is linear. Coefficients ηyx1x2.... xn and Ryx1x2... xn accept values between zero and unit. Inclusion in consideration at multidimensional To. and. accessory signs gives the chance to receive values ηyx1x2... xn, Ryx1x2... xn is closer to unit and thus to increase the accuracy of the forecast of a sign of Y for the multiple equation of regression.

As an example we will consider results pair To. and., and also the equation of multiple regression and a multiple correlation coefficient between signs: Y — a steady pseudoparesis, X1 — lateralization of motor defect in extremities on the right, X2 — the same in extremities at the left, X3 — vegetative crises. Values of the dispersive relations and coefficients of pair correlation for them will be respectively ηyx1 = 0,429, ηyx2 = 0,616, ηyx3 =-0,334, a ryx1 = 0,320, ryx2 = 0,586, ryx3 =-0,325. On the equation of multiple linear regression of Yx1x2x3 = 0,638 x1 + 0,839 x2 — 0,195 x3. The coefficient of multiple correlation will be expressed of Ryx1x2x3 =0,721. From an example it is visible that according to X1, X2 and X3 with an accuracy, sufficient for practice, it is possible to predict a steady pseudoparesis.

Methods K. and. give also the chance to receive dynamic characteristics and. In this case the studied signs (e.g., an ECG, EEG etc.) are considered as stochastic functions of Y(t) and X (t). By results of observation over these functions two major characteristics are also defined: a) assessment of the telecom operator (the mathematical equation) between Y (t) and X (t); b) assessment of narrowness of communication between them. As characteristics of narrowness of communication dispersive and correlation functions of stochastic functions Y (t) and X (t) are accepted. These functions represent generalization of the dispersive relations and correlation coefficients. So, rated mutual dispersive function ηyx(t) each fixed value t represents the dispersive relation between values of signs Y (t) and X (t). Similarly rated mutual correlation Ryx (t) function represents for each fixed value t a correlation coefficient between signs of Y(t) and X (t). The characteristic of linear communication (dependence) for the same studied size in various timepoints carries the name of autocorrelation.

To. and. is one of methods of the solution of a problem of the identification which found a wide spread occurance during the receiving mathematical models and automation medical - biol, researches and treatments.

Bibliography: Computing systems and automatic diagnosis of heart diseases, under the editorship of Ts. Cáceres and L. Dreyfus, lane with English, M., 1974; Guttman S. R. About two models of the electroencephalogram meeting to normal accidental process in book: Management and inform. processes in wildlife, under the editorship of V. V. Larin, page 205, M., 1971; Zaslavskaya R. M., Perepel-kin E. G. and Akhmetov K. Zh. Correlation bonds between indicators of hemocoagulation and lipidic exchange at patients with a .stenokardiya within a day, Cardiology, t. 17, No. 6, page 111, 1977; To r and m of e r. Mathematical methods of statistics, the lane with English, M., 1975; Pasternak E. B., etc. A research of electric activity of auricles at a ciliary arrhythmia by means of instrument correlation analysis, Cardiology, t. 17, Hya 7, page 50, 1977; Sinitsyn B. S. Automatic correlators and their use, Novosibirsk, 1964, bibliogr.; At r-@ and x V. Yu. Statistical analysis in biological and medical researches, M., 1975, bibliogr.

V. N. Raybman, N. S. Raybman.