One Continuous Response to the Past
Continuous Response Variable
In the case of continuous response variables, Meredith and Tisak (1984, 1990) have shown that the random coefficient model of the previous section can be formulated as a latent variable model.
From: Categorical Variables in Developmental Research , 1996
Growth Modeling with Binary Responses
Bengt O. Muthén , in Categorical Variables in Developmental Research, 1996
3.4 Implementation in Latent Variable Modeling Software
In the case of continuous response variables, Meredith and Tisak (1984, 1990) have shown that the random coefficient model of the previous section can be formulated as a latent variable model. For applications in psychology, see McArdle and Epstein (1987); for applications in education, see Muthén (1993) and Willett and Sayer (1993); and for applications in mental health, see Muthén (1983, 1991). For a pedagogical introduction to the continuous case, see Muthén, Khoo, and Nelson Goff (1994) and Willett and Sayer (1993). Muthén (1983, 1993) pointed out that this idea could be carried over to the binary and ordered categorical case. The basic idea is easy to describe. In Equation 1, α i is unobserved and varies randomly across individuals. Hence, it is a latent variable. Furthermore, in the product term β itk , β i is a latent variable multiplied by a term tk which is constant over individuals and can therefore be treated as a parameter. The tks may be fixed as in Equation (6), but with three or more time points they may be estimated for the third and later time points to represent nonlinear growth. More than one growth factor may also be used.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780127249650500055
Numerical Prediction
Robert Nisbet Ph.D. , ... Ken Yale D.D.S., J.D. , in Handbook of Statistical Analysis and Data Mining Applications (Second Edition), 2018
Data Mining and Machine Learning Algorithms Used in Numerical Prediction
The most common machine-learning algorithms used for predicting continuous response variables are
- •
-
CART,
- •
-
neural nets,
- •
-
decision trees,
- •
-
SVMs and other kernel techniques.
-
These algorithms were also introduced in Chapter 9 in their classification forms.
The material in the following sections may repeat some of the characteristics of some machine-learning algorithms, but it will be discussed in relation to their application in numerical prediction.
Numerical Prediction With CART
CART can be used for regression problems and classification problems. In prediction, the continuous dependent (or response) variable Y is treated similarly to regression. Predicted values are continuous numbers rather than categories. The continuous predictor variables are "binned"; that is, their ranges are divided into subranges using calculated split points. Each bin can participate in the formation of a number of if-then logical conditions. As was shown in Chapter 9, these if-then statements can be combined together to form a tree structure. The tree is grown along a particular branch using the Gini score as a split criterion (or some other metric) until the splitting process can't continue any further along that path because one of the stopping criteria was met.
The Tree Structure
A tree was built in STATISTICA Data Miner on an industrial failures data set, the first few nodes of which are shown in Fig. 10.11.
Fig. 10.11. First three nodes of a decision tree, showing one terminal node.
The first variable (with the highest ranking) is split to separate those cases with values less than or equal to 9.86 and those with values greater than 9.86. Node 3 does not split any further because one of the stopping criteria has been met (the 15 cases in this node were less than the 1060 case minimum set in the tool). The SQL for the node is as follows:
/* Selecting cases related to Node 3 */
SELECT * FROM < TABLE >
WHERE ( ("RESP_DEF" > 9.86)
);
/* Assigning values related to Node 3 */
UPDATE < TABLE >
SET NODEID = 3, PREDVAL = 2.36, VARIVAL = 3.42)
WHERE ( ("RESP_DEF" > 9.86)
);
From this SQL statement, a business rule can be induced: "If the value of the variable RESP_DEF is > 9.86, then assign the predicted value as 3.42" (compared with the observed value of 2.36). A similar (but more complex) business rule could be induced from this statement for terminal node 46.
The SQL assignment statement at node 46 (not shown in Fig. 10.10) is as follows:
/* Assigning values related to Node 46 */
UPDATE < TABLE >
SET NODEID = 46, PREDVAL = −1.58286355732380e-001, VARIVAL = 1.86424905201287e-003
WHERE ( ("RESP_DEF"<= 9.85776807826836e+000)
And (DR3 <= 9.29624349632859e+000)
And ("PRE_L_DS1" > −7.37056366674857e-001)
And ("RESP_DEF"<= 6.67860053006884e-002)
And ("PRE_L_DS1" > 3.83034634488327e-002)
And ("RESP_DEF"<= −4.28291478552872e-002)
And ("PRE_L_DS1" > 3.63470956904821e-001)
And ("RESP_DEF" > −6.63181092458534e-002)
(Note, values are not rounded in this example.)
Model Results Available in CART
Most of the report tables and charts are available from data mining tool packages providing CART. These features are described next.
Variable Importance Tables
The variable importance table will give you overall expression of the importance of a variable among all the splits in the tree (Table 10.1). The variable with the highest importance value (DR3) is not the variable used for the first split. It draws its importance from its participation in many splits in the tree.
Table 10.1. Variable Importance Table Generated by CART
| Predictor Importance 1 (fail_tsf.STA) Dependent Variable: TOT_DEFS Options: Continuous Response, Tree Number 1 | ||
|---|---|---|
| Variable—Rank | Importance | |
| DR3 | 100 | 1.000000 |
| PF_DS | 93 | 0.930011 |
| PF_AOL | 93 | 0.928640 |
| RESP_DEF | 90 | 0.899691 |
| PRE_L_DS1 | 89 | 0.886784 |
| RESP_AVE | 84 | 0.842368 |
| PF_SR | 78 | 0.783173 |
| DR2 | 49 | 0.493977 |
| PF_IC | 44 | 0.438273 |
| PF_PRE | 41 | 0.409875 |
Observed Versus Predicted Plots
Much information about the performance of the prediction algorithm is available by plotting the observed and predicted values, as shown in Fig. 10.12.
Fig. 10.12. Observed vs predicted values.
Normal Probability Plots of the Residuals
In Fig. 10.13, the normal probability plot of the residuals shows that the large majority of the residuals are well behaved; that is, they fall near a straight line. Some cases shown on the lower left and upper right of the plot are anomalous, but in general, the plot suggests that the model is valid.
Fig. 10.13. Normal probability plot.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780124166325000104
Residualized Categorical Phenotypes and Behavioral Genetic Modeling
Scott L. Hershberger , in Categorical Variables in Developmental Research, 1996
2 WEIGHTED LEAST-SQUARES ESTIMATION
The covariance structure of the observed variables is frequently analyzed using Pearson product-moment correlations or covariances. One justification for treating observed categorical variables (y) as if they were continuous rests on the oftentimes implicit assumption that a latent continuous response variable ( y*) underlies each y. For a dichotomous y:
where τ is a threshold value on the latent continuous variable beyond which the individual is placed in the affirmative category. Several consequences follow from this implicit assumption. First, consider the usual measurement model for the p × 1 vector of y*:
In general, y ≠ y*, or
Thus, the measurement model for the latent continuous y* does not hold for the observed categorical y. Additionally, the distribution of y* almost certainly differs from that of y; even if y* is multinomial, the distribution of dichotomous y will still be very different. Moreover, ∑*, the population covariance matrix, will generally not equal ∑, the observed covariance matrix. Thus, if (6) represents the covariance structure hypothesis, ∑* = ∑(θ) but X ≠ ∑(θ). If S is a consistent estimator of X and S* is a consistent estimator of ∑*, parameters based on 5 will be inconsistent estimators of the parameters in 0. More simply phrased, the sample values of the parameters will not equal their true population counterparts.
One may question how robust the results of normal-theory structural equation modeling are when categorical variables are treated as if they are continuous. Pearson correlation coefficients between categorized variables are generally less than the corresponding continuous variables (Bollen & Barb, 1981; Henry, 1982; J. A. Martin, 1982; Olsson, Drasgow, & Dorans, 1982; Wylie, 1976). Boomsma (1982) and Babakus, Ferguson, and Jöreskog (1987) found that with ML estimation, when the observed variables were highly skewed, asymptotic chi-square goodness-of-fit values were inflated. Muthen and Kaplan (1985) also found that extreme kurtoses inflated the value of chi-square. Anderson and Amemiya (1988) and Amemiya and Anderson (1990) report, however, that chi-square was relatively robust against departures from normality if the latent variables were independent. Unfortunately, procedures for determining the independence of latent variables are not available. The number of categories of the variable does not seem to affect chi-square, but does induce correlated errors (Johnson & Creech, 1983) and inaccurate asymptotic standard errors (Boomsma, 1983) when the number of categories is very small.
Given the severe distortions that may occur when categorical variables are analyzed under the assumption of multivariate normality, WLS (Jöreskog & Sörbom, 1988) or asymptotic distribution-free methods (Bentler, 1989) have been developed that do not require the assumption of multivariate normality. Specifically, the recommended procedure (Jöreskog & Sörbom, 1988) for analyzing ordered, observed categorical variables is, first, to invoke a threshold model, as previously described, and compute the tetrachoric or polychoric correlations between the variables, using as an estimate of the threshold τ
where ϕ− 1(•) is the inverse of the standardized normal distribution function, Nk is the number of cases in the kth category, and c is the total number of categories for y. With a dichotomous variable, Nk reduces to N, and only one threshold is computed. Actually, behavioral geneticists have used the threshold model for a number of years, under the term "liability model" (Falconer, 1981), to avoid the biases inherent in Pearson correlations computed between categorical variables. The tetrachoric (or polychoric, if there are more than two categories) are analyzed with a WLS (distribution-free) fitting function, thereby avoiding the normal-theory based ML or generalized least-squares fitting functions, and obtaining asymptotically correct chi-square values and standard errors.
What at first appears to be a panacea for the analysis of categorical variables is compromised by several difficulties. WLS fitting functions computed on more than several variables are quite complex computationally, requiring the fourth-order moments of the measured variables. When sample size is small or model degrees of freedom are large, WLS performs inaccurately as an estimator, producing biased chi-square values (Chou, Bentler, & Satorra, 1991). In particular, Hu, Bentler, and Kano (1992) found distribution-free estimation to perform "spectacularly badly" in every case of their Monte Carlo study (rejecting true models 93 to 99.9% of the time) when sample size was below 5000. More fundamentally, the phenotype is analyzed as if a continuous normal distribution exists for the phenotype in the population. For some phenotypes, such as schizophrenia, this is a reasonable assumption to make; for others, such as gender (chromosomally defined), it is not.
Few behavioral genetic studies consist of sample sizes one-tenth the magnitude of 5000. Thus the problem seems to be the identification of a method of estimation appropriate to categorical data, without resorting to a distribution-free method such as WLS. A satisfactory solution would entail the use of normal-based methods of estimation, given their well-known desirable asymptotic properties. The answer may lie with the exploitation of a procedure that is vital to behavioral genetic analysis: twin studies. Twin correlations are typically corrected for age and gender effects. Because identical and same-sex fraternal twins share the same age and gender, age and gender, if significantly related to the phenotype, will inflate the twin correlations. McGue and Bouchard (1984) found that twin correlations were overestimated consistently without the correction, and, in a majority of cases, attenuated the true magnitude of the genetic effect. Note what occurs if a polychotomous variable is residualized for the effects of the two variables: the polychotomous variable itself becomes continuous. The continuous nature of the residual distribution results from fitting a regression line to the N number of horizontal lines in the regression plane, N equaling the number of categories of the variable. For instance, in the case of a dichotomous variable scored 0, 1, if the regression line is not parallel to the two horizontal lines y = 0 and y = 1, a continuous residual distribution is created. With the subsequent application of a nonlinear transformation to the residuals, the distribution of the previously categorical variable may very well approximate a normal distribution, allowing greater freedom for the use of normal-theory methods of estimation.
In this study, behavioral genetic model-fitting was performed on 14 medication variables, each indicating whether an individual was presently taking the medication. Two-category variables were selected rather than multiple-category variables because of the greater frequency of the former in behavioral genetics and for greater simplicity of presentation. The method is equally applicable to polychotomous variables. The correlations for four twins groups served as the data: monozygotic (identical) twins reared together (MZT) and apart (MZA), and dizygotic (fraternal) twins reared together (DZT) and apart (DZA). Structural equation modeling was performed under two conditions. In the first condition, a WLS analysis was performed on the variables, directly incorporating the covariates of age and gender into the model. In the second condition, ML estimation was performed on the nonlinearly transformed age and gender residualized variables. Two questions are of interest: how well does the nonlinearly transformed age and gender residualized variable approximate normality, and how comparable are the WLS and ML solutions?
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B978012724965050016X
Restricted maximum likelihood and inference of random effects in linear mixed models
Xian Liu , in Methods and Applications of Longitudinal Data Analysis, 2016
4.7 Summary
In longitudinal data analysis, one of the most remarkable progresses in the past three decades is the widespread application of the Bayes-type techniques. Bayes' theorem and Bayesian inference provide a strong theoretical foundation for approximating unobservable parameters in mixed-effects models. The REML approach, which corrects the downward bias in the ML variance estimates, is an empirical Bayes method that models the marginal posterior predictive density for the variance components while formally integrating out the regression coefficient vector, β . Therefore, in this chapter I first described the basic specifications of Bayesian inference prior to the introduction of the REML estimator. The REML method is arguably a more reliable estimator to find parameter estimates in linear mixed models; nevertheless, for large samples the ML and REML estimators usually yield very close or even identical parameter estimates and approximates, as empirically evidenced in the Section 4.6.
In this chapter, the statistical techniques for predicting the random effects were delineated and discussed. In longitudinal data analysis, linear predictions are often required to generate the trajectory of individuals in the continuous response variable. Population-averaged growth curves can also be predicted from averaging over the random effects. In linear predictions, the BLUP and the shrinkage approach are regularly applied to approximate the random effects and predict the outcomes for each subject. In Section 4.6, a technique was delineated to adjust for potential confounding effects when creating population-averaged trajectories. In particular, a scoring dataset was constructed by retaining the variables of interest and creating some others for representing a hypothetical population.
Given the iid assumption for random errors in linear mixed models, linear predictions with adjustments for the confounding effects can also be conducted by using the least squares means, as will be described and illustrated in the next chapter. Briefly, least squares means are obtained by using the estimated regression coefficients, the selected covariates' values, and the averages over the distribution of the random effects. While the scoring data approach directly computes the mean of BLUPs with shrinkage, the model-based approach in least squares means assumes longitudinal data to be balanced, and thus can generate different predictions for population groups. The two approaches, however, are expected to yield exactly the same predicted values of the response for the entire population given the condition that . At the same time, as the is shrunk toward the population average , , and therefore, the least squares means are associated with greater standard error estimates than the scoring data approach. These issues will be further discussed in Chapter 7.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128013427000046
Linear mixed-effects models
Xian Liu , in Methods and Applications of Longitudinal Data Analysis, 2016
3.4.1 Polynomial time functions
In longitudinal data analysis, the continuous time factor can be partitioned into a set of polynomial terms. I start with a combination of the time and time × time terms, referred to as the quadratic polynomial time function or the two-order polynomial time function. In this approach, time is partitioned into two components: continuous time T and the interaction of time by itself, denoted by T × T . In linear mixed models, the estimated regression coefficients of those two time components reflect a nonlinear process of the time trend in the continuous response variable. This simple polynomial function is very flexible and can capture a number of time functions in longitudinal data. If both the regression coefficients of T and T × T take the same sign and both are statistically meaningful, the associated time trend follows an exponential function, positively or negatively. If the regression coefficient of T is negative but the regression coefficient of T × T is positive, the time trend in Y approximates a U-shaped pattern. Likewise, if the regression coefficient of T is positive and the regression coefficient of T × T is negative, the repeated measurements of Y take an inverse U-shaped pattern of change over time.
Given the assumption that the continuous measurement Y at time T, denoted by Y(T), is a function of T and T × T, the quadratic polynomial time function can be written in terms of a linear regression:
where β 0 is the intercept, β 1 and β 2 are the regression coefficients of T and T × T, respectively, and ɛ, the residual term, is assumed to be normally distributed with zero expectation.
To illustrate the flexibility of the quadratic polynomial time function, four polynomial time functions are illustrated below, denoted by Cases A to D, by arbitrarily assigning various values of β 0, β 1, and β 2, respectively:
Next, the above coefficients are used to predict the value of Y at six time points, valued 0, 1, 2, 3, 4, and 5, respectively. Given zero expectation, the residual term is not involved in the linear prediction. Figure 3.1 graphically displays the results.
Figure 3.1. Predicted Time Trend for Four Cases
(a) Case A, (b) Case B, (c) Case C, and (d) Case D.
Figure 3.1 includes four plots, each corresponding to a specific case. The first plot, Fig. 3.1a, displays an exponentially increasing time trend in Y given positive values in both β 1 and β 2. In this case, β 2, the regression coefficient of the quadratic term, behaves as an accelerating factor that increases the rate of change over time constantly. Figure 3.1b displays an opposite time trend to that of Case A in which the predicted value of Y decreases exponentially over time, with the rate of decrease governed by the negative value of β 2. Figure 3.1c presents an inverse U-shaped pattern of change over time given β 1 taking a positive value and β 2 a negative. In this case, the negative value in β 2 behaves as an offsetting effect that reduces the rate of increase in Y constantly. As a result, this offsetting effect on the positive main effect of time gets stronger and stronger over time, and beyond a time threshold, the combined effect of the two time components ushers in a decrease in Y. In Fig. 3.1d, β 2 functions in a different direction from that of Case C. In Case D, the negative main effect of time on Y is increasingly compensated by the positive effect of squared time, eventually leading to a U-shaped time trend. With different combinations of β 1 and β 2, the use of the quadratic polynomial function can capture more curvilinear patterns.
For more complex patterns of change over time, high-order polynomial functions occasionally need to be used. For example, the time trend shown in Fig. 2.2 cannot be generated by the quadratic polynomial time function. Instead, the cubit polynomial function may be applied by adding a third-order polynomial term. In this case, a linear model is specified with three time polynomials, given by
where T 3 is the cubit polynomial term of time. Let . Then, using these coefficient values, a plot of the pattern of change over time is displayed at the six time points (T = 0, 1, 2, 3, 4, 5).
Figure 3.2 displays a pattern of change over time in Y taking tremendous resemblance to the predicted pattern of change in the PCL score among those receiving acupuncture treatment (Fig. 2.2), though with two additional time points.
Figure 3.2. An Inversed Flat J-Shaped Time Trend from Cubit Polynomial Function
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128013427000034
Source: https://www.sciencedirect.com/topics/mathematics/continuous-response-variable
0 Response to "One Continuous Response to the Past"
Post a Comment