# Bootstrap-based standard error for detect

## Bootstrap-based standard error for DETECT

## Objectives of Inquiry

The purpose of this research is to develop and evaluate a method to estimate standard errors for statistics provided by the DETECT nonparametric dimensionality estimation procedure. The current version of DETECT does not provide standard errors for the DETECT estimator, IDN index, and Ratio r, which limits the usefulness of these statistics, especially for applications with small to moderate sample size. In particular, a Bootstrap resampling approach is applied, and a simulation study is carried out to evaluate the standard error estimator. The estimated standard error is compared to the standard deviation of the DETECT statistics over multiple independent simulated trials, for both unidimensional and multidimensional simulation models.

## Source of information

A simulation study will be conducted to evaluate the proposed standard error estimator. The simulation models (item parameters, ability distributions, and dimensionality structures) will be based on those used by Roussos and Ozbek (2006), who calculated the DETECT population parameter and compared it to the empirically determined (through very large sample sizes) DETECT estimator for a variety of unidimensional and multidimensional models. By using this particular choice of models (including a variety of test lengths, correlations, dimensionality structures, and items per dimension), we also know from the results of Roussos and Ozbek the value of the true DETECT population parameter, which we can compare with the mean of the DETECT estimator over the multiple simulation trials, adding another dimension (pardon the pun) to our study.

## Method

*DETECT Theoretical Background*

DETECT, a non-parametric dimensionality assessment procedure, is based on the idea that items measuring the same dimension on a multidimensional test exhibit positive conditional covariances, and items measuring different dimensions exhibit negative conditional covariances. The conditional covariance parameter estimated by DETECT is the covariance between two items, conditional on the unidimensional latent trait best estimated by the test, and averaged over the distribution of that latent trait. In a multidimensional setting, this latent trait would be some fixed linear composite of the multiple latent traits. See Zhang and Stout (1999) for more details on the underlying theory about DETECT and how the DETECT conditional covariance estimation is carried out.

After estimating conditional covariances among all the item pairs, DETECT divides the test items into a set of mutually exclusive and collectively exhaustive clusters in a search for the set of clusters for which all the within-cluster item pairs have positive conditional covariance while all the between-cluster pairs have negative conditional covariances. A genetic algorithm is used to accomplish this search of all possible clusterings. The DETECT dimensionality estimator is the sum of the conditional covariances for the within-cluster item pairs and the negative of the sum of the conditional covariances for the between-cluster item pairs. It is actually the DETECT dimensionality estimator that the genetic algorithm search is trying to maximize. To avoid capitalization on chance, the data are split into a training sample and a cross-validation sample. DETECT uses the training sample to find the optimal clusters, and then it uses those clusters with the cross-validation sample to calculate the DETECT dimensionality estimator.

*Bootstrap resample Method*

The bootstrap method estimates a standard error by randomly and independently resampling with replacement from a dataset. Standard error estimation using bootstrap resampling is not a formula-based method but rather a computer-based method of statistical inference (Efron & Tibshirani, 2000). Fundamentally, the bootstrap method calculates the standard deviation of bootstrap resampled data. The following is the bootstrap resampling procedure.

Resample with replacement from the original data to obtain a dataset having the same sample size of the original data.

Calculate the statistic (or statistics) of interest on the bootstrap sample.

Repeat steps 1 and 2 to obtain a large number of bootstrap samples and corresponding sample statistics.

Calculate the mean of the statistic of the statistic of interest from all the bootstrap samples.

Calculate the standard error by equation (1)

(1)

is the value of the statistic of interest for each bootstrap sample, and is the mean of. B is the number of bootstrap replications.

*DETECT v.2.1*

DETECT v.2.1 is the newest version of DETECT. DETECT v.2.1 provides three DETECT statistics (DETECT estimator, IDN index, and Ratio

*r*), and the number of clusters and the list of the items in each cluster. When the DETECT estimator, IDN index and Ratio

*r*indicate multidimensionality, the number of clusters is interpreted as the number of dimensions, and inspecting the list of items in each cluster should indicate the distinct substantive nature of each dimension. If the DETECT estimator, IDN, and

*r*indicate unidimensionality, the number of clusters and items in each cluster should be ignored.

Preliminary simulation studies have suggested that DETECT estimator values of 0 to 0.2 indicate essential unidimensionality to weak multidimensionality, 0.2 to 0.4 indicate weak to moderate multidimensionality, 0.4 to 1.0 indicate moderate to strong multidimensionality, and a DETECT estimator greater than 1.0 indicates strong multidimensionality. Also, when the values of Ratio r and IDN are greater than 0.7, a test is typically indicated as an approximately simple structure.

*Simulation Study*

The purpose of this research is to find a new method of estimating the standard error of DETECT statistics and evaluating the standard error using the bootstrap method. In order to evaluate the standard error of the bootstrap method, we performed a Monte Carlo simulation study in which the means and standard errors of DETECT statistics were estimated by 400 randomly generated samples from a fixed unidimensional or multidimensional IRT model. The means and standard errors of 400 trials were the criteria with which to evaluate those of the bootstrap method.

For the bootstrap method, we used two different equations to estimate standard error: 1) the mean of the bootstrap values (Efron’s bootstrap method) and 2) the first generated data, which we considered as the original data (the modified method). The procedure of the modified bootstrap method is the exact same as Efron’s method, but when standard error was estimated, the DETECT statistics of the original data, which was the first generated data, were used instead of the mean of the bootstrap samples (Equation (2)).

(2)

is each statistic of the bootstrap resamples and is a DETECT estimator of the original data. B is the number of bootstrap replications. Using a DETECT estimator of the original data as the center for the standard error calculation is a reasonable alternative to using the mean of the bootstrap sample.

In order to generate simulation data sets, we used the Reckase and McKinley (1991) multidimensional IRF (MIRF). Item parameters were used from the simulation study of Roussos and Ozbek (2006).

For these simulation studies, we will have six conditions: 1) the number of dimensions (one, two, and three dimensions with both simple structure and approximate simple structure); 2) test length (20 and 40 items); 3) the number of examinees (1000 and 4000); 4) the correlation between the two dimensions (0.5 and 0.7); 5) the number of items in each cluster, (evenly distributed; with one dimension having twice the length of the others); and 6) the number of bootstrap replications. Table 1 shows the components of the simulation study

*that have already been completed*(see Roussos & Ozbek, 2006, for a description of the entire simulation scheme)

*.*We expect to finish the remaining components in the fall.

## Table 1 Structure of the completed components of the simulation study

No. of Dimen-

sions.

No. of examinees

No. of items

Clusters

No. of bootstrap replications

1D

N=1000, or 4000

20, 40

N.A.

N.A

100 or 400

2D

N=1000, or 4000

20

10/10

0.5 or 0.7

100 or 400

40

20/20

15/25

0.5 or 0.7

100 or 400

## Results

For this proposal, for space limitations we restrict our reporting of results to the DETECT estimator analyses.

*Unidimensional.*

Figure 1 shows DETECT estimators of unidimensional conditions. The blue line is the mean of the 400-trial simulation study, the yellow line is the original data, and the pink line is the mean of the bootstrap values. The dotted lines show two standard errors around the DETECT estimators. These results indicate a close concordance between the standard error from the Monte Carlo simulations and the two bootstrap standard error estimators. The results also indicate that 100 bootstrap replications performs as well as 400, which may be an important consideration in situations where calculation time is critical (in the current study the time savings was non-consequential). As expected, standard error decreased with sample size. Standard error also decreased with test length, perhaps because of the decrease in statistical bias that is known to occur with DETECT as test length increases (see Roussos & Ozbek, 2006).

## Figure 1 Standard errors and DETECT estimators for unidimensional test structure

*Two-dimensional.*

Figure 2 shows the DETECT estimators and the two standard errors around the DETECT estimators for two-dimensional 20-item tests. These results again demonstrate that the bootstrap standard errors are accurate estimates of the true standard error as determined from the Monte Carlo simulations. As in the case of unidimensionality above, there was no substantial difference in the results between 100 and 400 bootstrap samples, and, as expected, standard error decreased with increasing sample size.

## Figure 2 Standard errors and DETECT estimators for 20 item test for 2D

Figure 3 shows the DETECT estimators and the two standard errors around the DETECT estimators for 40-item tests. As in the two-dimensional case above, there was again no substantial difference in the results between 100 and 400 bootstrap samples, and again standard error decreased as expected with increasing sample size. Moreover, comparing Figures 2 and 3, there was no substantial effect of test length on standard error, unlike the unidimensional case.

## Figure 3 Standard errors and DETECT estimators for 40 item test for 2D

The IDN index and Ratio r also supported the DETECT estimator results, even though we are not including them in this proposal. Comparing the standard errors of the two bootstrap methods to the 400-simulation study, the results were very similar.

One interesting preliminary finding from our analyses has been that the DETECT estimator has displayed an unexpected statistical bias related to sample size. So far as we are aware, this has not been previously recognized, and we will play close attention to monitoring this effect as we continue our simulation study. (Note that Roussos and Ozbek studied statistical bias as a function of test length using an extremely large sample size in order to drive the random error component to zero.)

Currently we have finished the unidimensional and two-dimensional simple structure tests. However, we plan to perform the study with two-dimensional approximate simple structure and three-dimensional structure tests, also. We expect to be done with all simulations by the end of December 2007.

## Educational importance of the study

DETECT is a nonparametric dimensionality assessment procedure that provides an estimate of the amount of multidimensionality in a dataset as well as an estimate of the number of dimensions and the items best measuring each dimension. DETECT provides three numerical indices (DETECT estimate, IDN, and

*r*), but does not provide an estimate of the standard error for any of them. DETECT does provide guidelines for interpreting the indices, with especially detailed advice for the DETECT estimator. The usefulness of these guidelines is severely hindered, however, by the lack of a standard error for the DETECT estimator. The values 0.2, 0.4, and 1.0 are boundary values between weak, moderate, and strong multidimensionality; thus, when an estimate is near a boundary, an estimate of the standard error can be critical for determining how strength of evidence for the classification decision. Indeed, for small samples, the standard error might be so large that an estimate could even be fairly far away from a boundary and still not be significantly different from it (see the unidimensional results above for the case of 1000 examinees, for example). The results from the current study have so far indicated that the bootstrap method yields a sufficiently accurate measure of the DETECT standard error for dimensionality decision making. If the remaining analyses continue to support this conclusion, this standard error estimator will make a substantial contribution to the DETECT dimensionality estimation procedure.

## Reference

Efron, B., & Tibshirani, R. (1998).

*An Introduction to the Bootstrap*

## .

Reckase, D., & McKinley, L. (1991).The discriminating power of items that measure more than one dimension.

*Applied Psychological Measurement, 15*

## ,

361-373.Roussos, L., & Ozbek, O. (2006). Formulation of the DETECT Population Parameter and Evaluation of DETECT Estimator Bias.

*Journal of Educational Measurement, 43,*215-243

*.*

Stout, W. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation.

*Psychometrika*.

*55, 293-326*

Zhang, J., & Stout, W. (1999a). Conditional covariance structure of generalized compensatory multidimensional items.

*Psychometrika*,

*64*, 129-152.

Zhang, J., & Stout, W. (1999b). The theoretical DETECT Index of dimensionality and its application to approximate simple structure.

*Psychometrika*,

*64*, 213-249.

Дадаць дакумент у свой блог ці на сайт

*2010-07-19 18:44*Читать похожую статью