Example Descriptive and Inferential Statistics Peer Reviewed Articles
Educ Psychol Meas. 2017 Apr; 77(2): 204–219.
Performing Inferential Statistics Prior to Information Collection
David Trafimow
1New Mexico Land Academy, Las Cruces, NM, United states of america
Justin A. MacDonald
1New Mexico State University, Las Cruces, NM, USA
Abstruse
Typically, in education and psychology research, the investigator collects data and later performs descriptive and inferential statistics. For instance, a researcher might compute group means and use the zero hypothesis significance testing procedure to describe conclusions about the populations from which the groups were drawn. We advise an culling inferential statistical procedure that is performed prior to information collection rather than afterwards. To utilize this procedure, the researcher specifies how close she or he desires the group means to exist to their corresponding population ways and how confident she or he wishes to be that this actually is so. We derive an equation that provides researchers with a way to decide the sample size needed to come across the specifications apropos closeness and confidence, regardless of the number of groups.
Keywords: a priori procedure, confidence, precision, complexity
In the context of structural equation modeling, Wolf, Harrington, Clark, and Miller (2013; also come across Marcoulides & Saunders, 2006) take distinguished betwixt inferential statistical approaches featuring operations performed prior to data collection or performed subsequently information collection. Although the present focus is on sample means rather than on structural equation modeling, the distinction nevertheless is useful to keep in mind and it will characteristic strongly in the present article.
It is possible to identify dissimilar inferential statistics camps. Two of these camps fall within the "frequentist" way of thinking, where the basic supposition is that hypotheses are correct or incorrect just cannot take on probabilities other than 0 or 1. I army camp—comprising those who wish to utilize data to refuse or neglect to reject null hypotheses—favor the nix hypothesis significance testing process. A second camp—comprising those who wish to use frequentist thinking for parameter interpretation—favor confidence intervals. Still a third military camp—comprising those who believe that hypotheses tin take on probabilities other than 0 or i—favor the use of the famous theorem by Bayes. one It too is possible, perhaps, to identify a quaternary military camp that features equivalence testing, where the thought is to command the size of misspecification at a prespecified value then as to assess the goodness of the model at a desired level of confidence (Wellek, 2010). 2 Equivalence testing has been advocated, for example, when the goal is to bear witness that a treatment is equivalent to another handling (Walker & Nowacki, 2011), though there are many other uses as well (encounter Yuan, Chan, Marcoulides, & Bentler, 2016, for a list). Arguably, particularly in Yuan et al. (2016), the procedure used to determine the desired sample size for equivalence testing occurs prior to data drove, which distinguishes information technology from the other three camps.
At that place has been much disagreement beyond the camps, with equivalence testing perhaps being the exception (eastward.g., Bakan, 1966; Berkson, 1938; Cohen, 1994; Fidler & Loftus, 2009; Fisher, 1973; Gigerenzer, 1993; Hoekstra, Morey, Rouder, & Wagenmakers, 2014; Hogben, 1957; Loftus, 1996; Lykken, 1968; Mayo, 1996; Meehl, 1967, 1978; Morey, Hoekstra, Rouder, Lee, & Wagenmakers, 2016; Popper, 1983; Rozeboom, 1960; Schmidt, 1996; Schmidt & Hunter, 1997; Suppes, 1994; Thompson, 1992; Trafimow, 2003, 2006; Trafimow & Rice, 2009). For present purposes, it is not necessary to discuss these controversies nor to advocate for any one camp or set of camps over any others. 3 Instead, our goal is to follow-up on a particular type of inferential process recently suggested by Trafimow (2016), that is not equivalent to any of the foregoing camps, and address what we consider to be a serious limitation of that procedure. We emphasize that the process we propose can be used in conjunction with other procedures then there is no necessary implication of a competition between procedures.
Based, in part, on a contempo article by Trafimow (2016), our proposal differs from those of the other camps, with the possible exception of equivalence testing, in the post-obit ways. Kickoff, our a priori procedure (APP) pertains to a different question, as we explain in the subsequent department. four Second, as the label suggests, there is no demand to collect any data whatsoever to answer that question. The APP is compatible with the other approaches but also works as a stand-alone process. Again, we emphasize that there is no necessity hither for a competition between procedures.
Being Confident of Being Shut
Suppose we pointed out to researchers that information technology would be much easier to obtain a single participant than to obtain a larger sample of participants. Based on this, we might inquire, "Why collect a sample of participants?" After asking questions to get beyond issues such as the importance of getting statistical significance to publish, and publishing to become tenure, and so on, we believe that eventually researchers would converge on the notion that a sample of participants aids researchers in feeling confident that the sample statistic is close to the population parameter, or at a minimum, the larger the sample, the more likely it is to resemble the population. Few researchers would be interested in sample statistics if they felt that these were completely unrepresentative of population parameters and because the most widely used sample statistic researchers use is the sample mean, as an estimate of the population hateful, our focus will be on means. 5
Well, and so, if researchers wish to collect samples of participants so that they tin be confident that their sample means are close to the corresponding population ways, this implies two of import issues: defining close and defining confident. In the case where merely a unmarried mean is at issue, at that place is a formula presented as Equation i that gives the number of participants needed (n) as a function of the fraction of a standard departure defined as close (f) and the z-score (z c ) corresponding to the desired probability of obtaining a sample mean inside the prescribed fraction of a standard deviation of the population hateful. half dozen
To use the formula, the researcher decides, prior to data collection, how close he or she wishes the sample mean to be to the population mean and the conviction desired of actually existence inside that altitude. For example, the researcher might desire to be within three tenths of a standard deviation of the mean and to have a probability of .95 of actually existence within that distance. The z-score that corresponds to .95 is 1.96, so the number of participants needed is n = (1.96/0.3)2 = 42.68, or 43 rounded to the nearest whole number. We emphasize that the assay is conducted a priori.
Equation 1 assumes a unremarkably distributed population. To test the importance of adhering to this assumption, Trafimow (2016) performed computer simulations with decidedly nonnormal distributions such as a rectangular distribution, a right triangular distribution, and even an exponential distribution. He found that using distributions that differed substantially from the assumed normal distribution made very picayune difference in the results. We accept these simulations and the implication that the normality assumption is not an important problem for Equation 1. Nevertheless, we believe that there is an important limitation with respect to Equation 1, specifically, and the Trafimow (2016) article, by and large. That is, Equation 1 is fine if there is only a unmarried mean to be considered, but few researchers are concerned with merely a single mean. Much more often, in that location is an experimental group and a control group, so that there are ii means and neither Equation i nor anything in the Trafimow (2016) commodity accommodates 2 or more means. Fairly often, researchers use complex factorial designs with any number of means. Thus, our main goal is to expand the APP to include multiple means.
We likewise investigate closely related issues. For instance, how does the APP differ from power analysis, which is sometimes conducted a priori? Finally, we address additional bug suggested by the foregoing ones, including the implications the APP has for circuitous experimental designs.
Multiple Means
It is possible to derive Equation 2, which gives the probability of obtaining one sample mean inside a specified fraction of a standard deviation of the population mean or p(1 Mean). In Equation 2, Φ is the cumulative distribution part (cdf) of the standard normal distribution, f is the precision, and n is the sample size. 7 Equation 1 implies Equation 2, simply we present an contained derivation in the appendix. To our knowledge, this is the starting time presentation of Equation 2.
Suppose that nosotros now imagine that there are two samples, with samples sizes n 1 and n 2 and that nosotros also are concerned with the distance that each sample hateful is from its corresponding population hateful, that is, f one and f 2 . If we assume that the sampling errors (M 1 −μ 1) and (Grand 2 −μ ii) are independent, then the probability that both means are within their specified distances, denoted as p(2 Means), is the product of the probabilities of each being within its specified distance, as Equation 3 shows. Notation that in Equation 3, in that location are two weather, and so there are two cell sample sizes (north 1 and due north two) and both sample means can be inside specified distances of the corresponding population means (f 1 and f 2).
More than generally, Equation 4 renders the probability of k means all beingness within their desired distances of the corresponding population means, where north k denotes the sample size of the kth mean and f yard denotes the desired distance of the chiliadthursday mean from its respective population mean.
In those cases where there is an equal sample size in each condition (northward ane =due north 2 = ⋯ =n chiliad =n) and where there is an equal desired distance of each sample mean from its corresponding population mean (f 1 =f 2 = ⋯ =f 1000 =f), this can be simplified to the following. 8
Effigy 1 explores the implications of Equation 5, where the probability that all of the means will be within f—that is, p(chiliad Means)—is given forth the vertical axis as a function of the size of f (range is .01 to .5 along the horizontal axis) and the number of ways (in that location are five curves representing 1, ii, three, 4, or v means). Finally, there are four panels where the total sample size (N) = 100, 200, 300, or 400. Consider the top curve (1 mean) in the panel where N = 100. Equally i moves rightward from f = .01, the bend rises sharply before shifting toward asymptote. Thus, a pocket-sized increase in imprecision along the horizontal axis implies a much larger probability that the mean is inside the specified interval. In dissimilarity, once asymptote is reached, even large increases in imprecision neglect to substantially increase the probability that the hateful is inside the specified interval. As 1 peruses the other curves, where there are increasingly more means to be considered, asymptote is reached at increasingly larger levels of imprecision along the horizontal centrality. As there are more than means, the level of imprecision must increase to ensure that all of the ways have a reasonable probability of being within the specified interval. Also, as there are more means, the total sample size is divide among more than groups, so that n decreases keeping Northward constant.
However, a look at the other iii panels qualifies these conclusions. Every bit N increases, asymptote is reached at lower levels of f along the horizontal centrality. That is, increasing N allows the researcher to specify more than stringent intervals in which there is a reasonable probability that all of the means are likely to be within them. More generally, the four panels of Figure 1 allow the researcher to see precisely how the desired precision, the number of means, and the total sample size interact to influence the probability that all of the means are inside the specified interval. However, a limitation of Figure i is that it is difficult to discern the number of participants that really are needed for the researcher to be able to exist confident that the sample ways will be close to their corresponding population means.
Reversing Equation five and Its Implications
With the aid of Figure 1, we take seen how f, k, and number of participants interact to influence the probability that all of the sample means of interest volition be inside the desired distance of the respective population means. Withal, we have not all the same explored how f, k, and the desired probability that the sample means are within the specified altitude (p(chiliad Means)) influence the number of participants needed per sample. To aid in this exploration, solving Equation 5 for due north renders Equation vi.
Figure two provides a visual rendition of Equation 6, where n is expressed along the vertical axis as a role of f forth the horizontal centrality and the number of means (one, 2, three, 4, or 5, as in Figure 1). In Effigy 1, f ranged from .01 to .five, but using this range acquired a problem with respect to Figure 2. To empathise the problem, consider the case where ane wishes to have a .95 probability that v sample means are within .01 standard deviations of their corresponding population ways. The value of n, in this case, co-ordinate to Equation half-dozen, is 65,985. Not only is this practically unrealistic but information technology also stretches the vertical axis to the point of making it difficult to meet the effects to be described after. To avert the trouble, we used a range of .1 to .5 for f along the horizontal axis. Finally, as in Effigy 1, there were iv panels, where the probability that all means are within f was set at .65, .75, .85, and .95. Put another way, each panel specifies a unlike level of conviction that the researcher tin have that all of the sample means will exist within the specified distance of the population ways.
Permit us showtime with the first panel, where the confidence was fix at .65, and with the lowest curve denoting the case where there is but a single mean to be considered. When f is at the minimum value of .1, thereby indicating impressive precision, the number of participants needed to accept confidence at .65 is 87 (rounded to the nearest whole number). As ane moves toward increasingly less precision, there is a steep driblet in the number of participants needed at first but the drib becomes increasingly less steep equally one becomes increasingly less precise. Maybe another style to await at it is to go from right to left, where it tin can be seen that, at low levels of precision, substantial improvements in precision can be made at very footling cost in n. In dissimilarity, as one continues to move leftward along the horizontal axis, if ane moves sufficiently far in that direction, fifty-fifty small improvements in precision have a substantial cost in north. This fact suggests that researchers might consider the concept of a "best purchase," that is, how much improvement in f is worth how much price in north?
2 other effects are worth mentioning. Most plainly, equally there are more means to exist considered, n must increase appropriately. But this effect of the number of means is qualified by f. Consider, for example, the difference betwixt one mean and v means when f is .1. In this case, when there is 1 mean, the needed due north is 87, but when there are five means, the needed due north is 301, for a departure of 214. In contrast, every bit the level of imprecision increases, differences in the level of n needed to reach f attenuate dramatically. For example, when f = .5, the levels of n needed for one or five means are 3 and 12, respectively, for a departure of 9.
All of the foregoing effects for when confidence is prepare at .65 become increasingly more than dramatic as the level of confidence increases to .75, .85, and .95. This is because, although the panels do non look very different from each other in terms of the shapes of the curves, the extent of the values along the vertical axis increases as the level of confidence increases. For case, we saw earlier that when there are v means and conviction is set at .65, n = 301. Simply when confidence is ready at .95, this value becomes 660. More by and large, equally confidence increases, so does (a) the consequence of the number of ways, (b) the effect of precision, and (c) the interaction between the number of means and precision. Effigy 3 provides a "blown-up" version of Effigy ii where the vertical centrality is restricted to a maximum of 100 participants and then equally to permit the reader to better view the implications of Equation 6 at sample sizes typically used in research.
As dramatic as the foregoing effects might seem to be, they arguably provide underestimates because Figures ii and iii represent n along the vertical axis rather than North. To see why this might thing, consider that increasing the number of means necessitates that more participants are needed in two ways. First, as Figures 2 and 3 illustrate, increasing the number of means necessitates an increase in the number of participants in each status. Simply, as Figures two and three fail to illustrate, increasing the number of means also indicates an increase in the number of conditions. Thus, although Figures 2 and 3 are fine for understanding how the number of conditions increases the number of participants needed in each of them, information technology is necessary to multiply n by the number of conditions (k) to empathize the total effects. Figure 4 includes this change: N (rather than north) is represented forth the vertical axis every bit a function of f, k, and the four levels of confidence used in Effigy 2 (.65, .75, .85, and .95). As can be seen by attention to the actual values forth the vertical axis, all of the furnishings described with respect to n are much more dramatic with respect to Due north.
The APP and Ability Analysis
The APP is like to a priori ability analysis in the sense that both are prior to data collection. However, there is a difference. The purpose of power analysis is to make up one's mind the N needed to have a reasonable run a risk of obtaining a statistically significant finding, given that there is an effect to be plant. For those who favor confidence intervals over the zippo hypothesis significance testing procedure, the purpose of power analysis is to decide the North needed to obtain a confidence interval of a desired width. Either way, the purpose of the power assay is to aid the researcher in what somewhen volition exist an a posteriori analysis. In contrast, the APP can exist used in isolation or in conjunction with a posteriori procedures. One starts by specifying how shut ane wishes the sample ways to be to their corresponding population ways, and the desired conviction that this actually will be so. From in that location, the APP provides the number of participants needed. Another fashion to think almost the APP is that the closeness and conviction specifications point the weather needed for the researcher to trust the data, so that equally long equally the researcher collects the required sample sizes to fulfill these conditions, the researcher can simply trust the resulting means without further inferential analyses, at to the lowest degree to the extent of the specified caste of conviction. The APP recognizes that the extent to which sample means can be trusted as estimates of population means has nil to do with what the findings really are simply rather depends solely on the size of the samples. Nosotros hasten to add together, however, that because the APP pertains to cartoon conclusions virtually samples rather than nearly populations, the researcher is not justified in concluding that the population hateful is within the specified distance (f) of the obtained sample hateful, with the specified degree of conviction. ix
There is an additional difference that can be illustrated with an example. Suppose that Experimenter A and Experimenter B perform experiments with an experimental group and a control grouping. Based on previous research, Experimenter A can count on a very big effect size and Experimenter B anticipates a very pocket-size effect size. Because p is, to an important degree, influenced by the effect size, Experimenter A needs only a small sample size whereas Experimenter B needs a large sample size to have a reasonable chance of being able to reject the nada hypothesis. In contrast, using the APP, the effect is not about obtaining a item value for p but most having sample means that can be trusted to accurately estimate population ways. From this betoken of view, information technology does not follow that Experimenter B needs to have a larger sample size than Experimenter A. From the point of view of trusting whether the sample ways accurately estimate the population means, both researchers could use the same sample size and be able to trust their sample means equally. 10
Complexity
Many researchers utilise circuitous factorial designs. Nevertheless, the APP suggests that there is an important problem with this approach that few researchers appreciate. Specifically, equally more than conditions are added, in that location are more ways, and the probability that all of the means are close to the corresponding population ways decreases. Put simply, as the pattern becomes increasingly complex, the researcher tin identify less trust in the cell means. But there is more than one reason for this.
Most obviously, keeping the overall sample size (N) constant, increasing the number of weather condition necessitates that each prison cell mean will exist based on fewer participants. In plow, fewer participants indicates that less trust can exist placed in the resulting prison cell means. Just across that, even if more than participants are added, and then that the sample size per condition remains the same, the foregoing equations imply that the probability that all of the sample means are inside the specified distance of the corresponding population means decreases as the number of atmospheric condition increases. Information technology is interesting to consider this mathematical fact in the context of the complex designs that are mutual in psychology. For example, psychologists often apply two × two × 2 designs, and even 2 × ii × 2 × 2 designs are not especially uncommon.
A few quick calculations are illuminating. When at that place is a single condition (thousand = 1) with 100 participants, there is a better than 95% chance that the sample hateful volition be within .2 standard deviations of the population mean. But suppose that the design is a 2 × ii × 2 design, so that there are 8 condition (g = viii). Allow us even make an extremely favorable stipulation that we employ N = 800 participants so that the number of participants in each status remains constant at n = 100. Still, the probability that all of the means volition be within .2 standard deviations of the corresponding population means is but .69. And matters become even worse if we consider a 2 × 2 × 2 × ii design, so that k = 16. Even keeping n at 100 (so that North = 1,600) implies that the probability that all of the means volition be within .2 standard deviations of the respective population ways reduces to .47. Thus, the APP makes salient that there is an important problem with using complex designs, the nature of which otherwise would not be apparent. To reiterate, as the number of weather increases, less trust can exist placed in the obtained sample means.
Contrasting the Proposed Process With Other Procedures
Unlike procedures entail unlike bug. Our goal in the nowadays department is to make clear how the APP asks a different question, and comes to a different sort of conclusion, than other procedures. Permit us commence by considering the most common statistical process, namely, that concerned with testing null hypotheses. The idea of this procedure is to compute p and so reject or fail to decline the null hypothesis. Every bit we pointed out before, many researchers have criticized this procedure, with much of the criticism based on the logical fact that 1 cannot validly describe conclusions almost the probabilities of hypotheses given findings from the probabilities of findings given hypotheses. It would take us too far afield to discuss inverse inferences in detail and information technology is sufficient merely to note that there is an increasing tendency for researchers to exist concerned with this issue (encounter Trafimow, 2016, for such a give-and-take). Confidence intervals are closely related. In fact, it is possible to use conviction intervals as an alternative fashion to test zip hypotheses. In addition, some researchers have suggested that confidence intervals can exist used for parameter estimation. Critics of confidence intervals take argued that confidence intervals cannot validly be used for parameter estimation because in that location is no way to know the probability that the population mean (or difference between population means) is within the constructed interval. Both null hypothesis significance tests and confidence intervals have in common that the goal is to make inverse inferences about probabilities concerning hypotheses or population parameters given sample data. Critics have argued that this is not logically valid.
Bayesians are among the most vociferous claimants that null hypothesis significance tests and confidence intervals cannot validly exist used to draw conclusions about populations from sample data. In contrast, the famous theorem past Bayes can be used to depict logically valid inferences about populations from sample data. However, frequentists are critical of Bayesian procedures on the grounds of unsound bounds. They ask how one can know the prior probability of a hypothesis, what a prior distribution looks like, and how to handle the catch-all hypothesis of "non the goose egg" (east.grand., Mayo, 1996). Thus, critics of nothing hypothesis significance tests and confidence intervals tend to exist dissatisfied based on the logical validity bug involved with inverse inferences, and critics of Bayesian methods tend to be dissatisfied based on what they consider to exist questionable bounds.
In contrast to null hypothesis significance tests, confidence intervals, and Bayesian procedures, which involve inverse inferences, the APP does not involve inverse inferences. Thus, the question of business organization is not, "Can I reject the zilch hypothesis?" or "Tin can I conclude that the population mean is likely to be within a constructed interval?" Rather, the question is, "How can I exist confident that the sample means are probable to be close to their respective population means?" As nosotros suggested earlier, researchers tin use the APP to address this final question and still use one of the available a posteriori procedures to address questions about hypotheses or almost population ways. Alternatively, those researchers who believe that available a posteriori procedures are problematic can use the APP to answer the question nearly conviction and closeness and leave questions near hypotheses or population means unaddressed.
Conclusion
Although Trafimow (2016), with his emphasis on a priori inferential statistics, provided an accelerate, we noted an important limitation that his procedure only is applicable when there is a single mean at issue. Our expansion of the procedure to work with multiple means provides researchers with the opportunity to perform a priori inferential statistical analyses with a diverseness of designs, such as when there are experimental and control conditions, complex factorial designs, and then on.
In plow, nevertheless, our expansion suggests additional implications, such every bit the fact that complex enquiry designs tin be quite problematic from the indicate of view of precision. We also showed how the APP differs from other APPs, such every bit power analyses—the latter depends importantly on the upshot size i expects to obtain whereas our APP is uninfluenced by the expected consequence size. There as well is a necessity to follow-up power analyses with traditional a posteriori analyses whereas this is not true of the APP. More than generally, and in understanding with Wolf et al. (2013), we believe that researchers have much to gain by performing a priori inferential statistics, and nosotros promise and expect that our expansion of the interaction of the concepts of closeness and confidence to utilise to whatsoever number of conditions will aid in the future evolution of the areas of didactics and psychology.
Appendix
Assume:
Then:
where
Continuing,
Notes
1.One of the underlying disagreements betwixt frequentists and Bayesians concerns the definition of probability. Frequentists ofttimes, simply not always, define probability as a relative frequency, whereas Bayesians often, but not always, define it as a belief state.
two.We give thanks an anonymous reviewer for suggesting this.
3.This is non to say that nosotros lack strong convictions near these issues only that there is no point in bogging downward the present discussion with them.
4.Following Wolf et al. (2013), we could have used the term proactive. But as our procedure is based on Trafimow (2016), nosotros decided to use the term a priori, as he did.
5.Some researchers accept argued that perchance the hateful is also strongly emphasized, at to the lowest degree at particular times (e.g., Speelman & McGann, 2013). Nevertheless, whether justified or non, researchers favor means and then that is our focus here.
6.The formula is listed in many sources (eastward.m., Hays, 1994; also see Harris & Quade, 1992). Trafimow (2016) provided an accessible derivation.
vii.Many statistics textbooks include cdf tables. In improver, there are many software- and Internet-based sources for this information. For case, Equation ii can be implemented in Microsoft Excel using the formula =ii*NORM.S.DIST( f *SQRT( N ), True)-1, where f and N are replaced with their desired numeric values.
viii.Equations 3 to 5 presume independence. We thank an anonymous reviewer for suggesting that when in that location is dependence, there is a lower bound ready past i minus the sum of the nonconditioned probabilities that each sample mean is non within the prescribed distance of its respective population mean (see Kutner, Nachtsheim, Neter, & Li, 2005, for a relevant discussion).
9.Confidence intervals do not justify this decision either. Although, for example, performing an experiment many times and constructing a 95% confidence interval each time will result in 95% of the synthetic intervals enclosing the population hateful, information technology is not the case that the population hateful has a 95% probability of being enclosed by a particular obtained confidence interval. At that place is no way to know this probability.
ten.Of course, if a researcher wishes to capitalize on a difference between means, the expected size of the difference likely will influence the imprecision the researcher is willing to tolerate. But this is a different upshot than the result of how much trust a researcher is willing to place in sample means every bit indications of corresponding population means.
Footnotes
Declaration of Alien Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this commodity.
Funding: The author(s) received no financial support for the inquiry, authorship, and/or publication of this commodity.
References
- Bakan D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423-437. doi:10.1037/h0020412 [PubMed] [CrossRef] [Google Scholar]
- Berkson J. (1938). Some difficulties of estimation encountered in the application of the chi-foursquare test. Journal of the American Statistical Association, 33, 526-542. doi:10.2307/2279690 [CrossRef] [Google Scholar]
- Cohen J. (1994). The globe is round (p < .05). American Psychologist, 49, 997-1003. [Google Scholar]
- Fidler F., Loftus G. R. (2009). Why figures with error bars should supervene upon p values: Some conceptual arguments and empirical demonstrations. Periodical of Psychology, 217, 27-37. doi:10.1027/0044-3409.217.1.27 [CrossRef] [Google Scholar]
- Fisher R. A. (1973). Statistical methods and scientific inference (3rd ed.). London, England: Collier Macmillan. [Google Scholar]
- Gigerenzer G. (1993). The superego, the ego, and the id in statistical reasoning. In Keren G., Lewis C. (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311-339). Hillsdale, NJ: Erlbaum. [Google Scholar]
- Harris R. J., Quade D. (1992). The minimally important difference significant benchmark for sample size. Journal of Educational Statistics, 17(1), 27-49. doi:10.3102/10769986017001027 [CrossRef] [Google Scholar]
- Hays W. L. (1994). Statistics (5th ed.). Fort Worth, TX: Harcourt Brace. [Google Scholar]
- Hoekstra R., Morey R. D., Rouder J. N., Wagenmakers E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157-1164. doi:10.3758/s13423-013-0572-3 [PubMed] [CrossRef] [Google Scholar]
- Hogben L. (1957). Statistical theory. London, England: Allen & Unwin. [Google Scholar]
- Kutner Grand. H., Nachtsheim C. J., Neter J., Li W. (2005). Applied linear statistical models (5th ed.). Boston, MA: WCB/McGraw-Hill. [Google Scholar]
- Loftus G. R. (1996). Psychology will be a much better science when nosotros change the way we analyze data. Current Directions in Psychological Science, 5(six), 161-171. doi:10.1111/1467-8721.ep11512376 [CrossRef] [Google Scholar]
- Lykken D. E. (1968). Statistical significance in psychological research. Psychological Bulletin, 70(3, Pt 1), 151-159. http://doi:10.1037/h0026141 [PubMed] [Google Scholar]
- Marcoulides Thou. A., Saunders C. (2006). PLS: A argent bullet? MIS Quarterly, xxx, iii-ix. [Google Scholar]
- Mayo D. (1996). Fault and the growth of experimental noesis. Chicago, IL: University of Chicago Printing. [Google Scholar]
- Meehl P. E. (1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103-115. Retrieved from http://www.jstor.org/stable/186099 [Google Scholar]
- Meehl P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the wearisome progress of soft psychology. Periodical of Consulting and Clinical Psychology, 46, 806-834. [Google Scholar]
- Morey R. D., Hoekstra R., Rouder J. North., Lee G. D., Wagenmakers East.-J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 23, 103-123. doi:10.3758/s13423-015-0947-8 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Popper M. R. (1983). Realism and the aim of science. London, England: Routledge. [Google Scholar]
- Rozeboom Due west. W. (1960). The fallacy of the naught hypothesis significance examination. Psychological Bulletin, 57, 416-428. doi:10.1037/h0042040 [PubMed] [CrossRef] [Google Scholar]
- Schmidt F. Fifty. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115-129. [Google Scholar]
- Schmidt F. L., Hunter J. E. (1997). 8 objections to the discontinuation of significance testing in the assay of research data. In Harlow L., Mulaik S. A., Steiger J. H. (Eds.), What if there were no significance tests? (pp. 37-64). Mahwah, NJ: Erlbaum. [Google Scholar]
- Speelman C. P., McGann Thou. (2013). How mean is the mean? Frontiers in Psychology, four, 451. doi:10.3389/fpsyg.2013.00451 [PMC free commodity] [PubMed] [CrossRef] [Google Scholar]
- Suppes P. (1994). Qualitative theory of subjective probability. In Wright G., Ayton P. (Eds.), Subjective probability (pp. 17-38). Chichester, England: Wiley. [Google Scholar]
- Thompson B. (1992). Two and half decades of leadership in measurement and evaluation. Journal of Counseling and Development, lxx, 434-438. doi:ten.1002/j.1556-6676.1992.tb01631.ten [CrossRef] [Google Scholar]
- Trafimow D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes's theorem. Psychological Review, 110, 526-535. doi:ten.1037/0033-295X.110.three.526 [PubMed] [CrossRef] [Google Scholar]
- Trafimow D. (2006). Using epistemic ratios to evaluate hypotheses: An imprecision penalty for imprecise hypotheses. Genetic, Social, and General Psychology Monographs, 132, 431-462. doi:10.3200/MONO.132.4.431-462 [PubMed] [CrossRef] [Google Scholar]
- Trafimow D. (2016). Using the coefficient of confidence to brand the philosophical switch from a posteriori to a priori inferential statistics. Educational and Psychological Measurement. Accelerate online publication. doi:x.1177/0013164416667977 [CrossRef] [Google Scholar]
- Trafimow D., Rice S. (2009). A test of the null hypothesis significance testing procedure correlation statement. Periodical of General Psychology, 136, 261-269. doi:10.3200/GENP.136.3.261-270 [PubMed] [CrossRef] [Google Scholar]
- Walker East., Nowacki A. S. (2011). Understanding equivalence and noninferiority testing. Journal of General Internal Medicine, 26, 192-196. doi:10.1007/s11606-010-1513-8 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Wellek S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC. [Google Scholar]
- Wolf E. J., Harrington G. M., Clark South. L., Miller K. Westward. (2013). Sample size requirements for structural equation models: An evaluation of ability, bias, and solution propriety. Educational and Psychological Measurement, 73, 913-934. doi:10.1177/0013164413495237 [PMC free commodity] [PubMed] [CrossRef] [Google Scholar]
- Yuan K.-H., Chan W., Marcoulides Yard. A., Bentler P. M. (2016). Assessing structural equation models past equivalence testing with adjusted fit indexes. Structural Equation Modeling, 23, 319-330. doi:x.1080/10705511.2015.1065414 [CrossRef] [Google Scholar]
Articles from Educational and Psychological Measurement are provided hither courtesy of SAGE Publications
cutshawwifeentent.blogspot.com
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5965545/
0 Response to "Example Descriptive and Inferential Statistics Peer Reviewed Articles"
Post a Comment