IMPORTANCE OF SAMPLE SIZE IN PLS-SEM AND CB-SEM: DIFFERENCES AND PRACTICAL APPLICATIONS
As we enter the field of research and data analysis, one of the key challenges is deciding on sample size. The minimum sample size ensures that the results of the analysis can be reliable and positive. Within this ramble, we will explore the determination of minimum sample size in two popular SEM analysis methods: CB-SEM and PLS-SEM.
Sample size is an important concept in statistics and scientific research, and it has undergone evolution under the influence of many scientists and researchers. During the 19th century, statistics and scientific research began to become more popular, but awareness of the importance of sample size had not yet clearly developed. It is common to conduct experiments and surveys with small sample sizes, and sample size determination is often based on convenience and accessibility. During the 1920s and 1930s, statisticians such as Ronald Fisher and Jerzy Neyman began to develop modern statistical methods and emphasized the importance of choosing appropriate sample sizes. Specifically, Fisher proposed concepts such as “type I error” and “type II error” (Type I error and Type II error) to describe errors during hypothesis testing. During the 1960s and 1970s, consciousness about sample size continued to increase, especially in the fields of clinical trials and medicine. Sample size calculation methods based on controlling for type I and type II errors have been developed and widely used. In recent years, with the development of computers and information technology, sample size calculation and statistical methods have become easier and more effective. Many statistical software programs have been developed to assist in calculating and estimating sample size based on factors such as degree of variability, desired precision, and reliability.
It can be said that the scientist Jacob Cohen laid the foundation for modern research on minimum sample size. Jacob Cohen (1923–1998) was a prominent statistician and psychology researcher who played a key role in developing and promoting awareness of the importance of sample size and statistical results. statistics in research. Cohen is best known for defining the concept of “effect size” and the method of calculating sample size in hypothesis testing. He proposed a number of criteria for measuring effect sizes in statistical experiments, and he also raised awareness of the need for a sample size large enough to be able to detect a true effect, if there is one. One of Cohen’s most important contributions was the idea of a “standardized effect size,” which measures the magnitude of a statistical effect based on its standard deviation. He proposed using indices such as f (normal form of the difference between two means), r (correlation coefficient), and phi (for binary two-group data) to measure effect size. Cohen also developed thresholds for effect sizes, which help researchers determine whether an effect is considered significant. In particular, he proposed commonly used thresholds for f (such as small 0.2, medium 0.5, and large 0.8) as a general guide for assessing the significance of results.
Later generations such as, Joseph F. Hair Jr. is a renowned researcher in the field of marketing and business administration, especially in data analysis and research methods. He has made many important contributions in developing the PLS-SEM (Partial Least Squares Structural Equation Modeling) method, specifically non-parameterized PLS-SEM.
It was not until the 2000s that Hair and his colleagues developed the non-parametric PLS-SEM method, also known as PLS-Graph. This method is a variation of traditional PLS-SEM, but it focuses on model investigation without requiring the estimation of specific model parameters.
Non-parameterized PLS-SEM offers a flexible and less constrained approach to structural model analysis. Instead of focusing on estimating the parameters of the relationships between variables, this method focuses on testing the robustness and predictability of the model. It allows researchers to test and refine models more easily, without having to ensure that the model adheres to specific hypotheses. That is why PLS SEM is also considered a method used to explore research.
With this approach, Hair and colleagues have helped reshape the way researchers approach structural model building and testing, especially in the fields of marketing and management. Non-parameterized PLS-SEM has opened up a new approach, helping researchers perform model analyzes more flexibly and suitable for many different situations.
Because of the above, the minimum sample size needed for PLS SEM analysis was later acknowledged by most researchers to be much lower than the minimum sample size used for CB SEM analysis. In terms of the goals and approaches of the two methods, CB-SEM focuses on testing specific hypotheses and pre-designed models. Therefore, a large enough sample size is needed to accurately test the relationships between variables and hypotheses. And PLS-SEM focuses on prediction and predictive modeling. Model building and predictive testing may require less data to achieve the desired reliability. As for the flexibility of the two methods, it can be seen that PLS-SEM is often more flexible in building models and does not require specific relationships between variables as in CB-SEM. This allows PLS-SEM to perform well with smaller sample sizes while still ensuring good prediction ability. In fact, in some fields such as marketing and management, data is often imperfect and complex, leading to difficulties in building CB-SEM models with high reliability. PLS-SEM is able to work better in these cases because it mainly focuses on model prediction instead of testing specific hypotheses. However, deciding on the minimum sample size still depends on many factors such as the complexity of the model, the degree of variability in the data, the degree of correlation between variables, and the desired level of reliability of the test. result. Certainly, a sample size large enough to ensure the validity and reliability of the results remains an important goal when performing any research analysis.
The formula for determining the minimum sample size in CB-SEM is often related to the “Rule of Thumb” principle or Monte Carlo simulation method to ensure the reliability of the estimated results. A common method for determining sample size is the “10:1” or “5:1” formula, meaning that at least 10 or 5 samples are needed for each variable being estimated. But this method is currently considered no longer suitable. However, the complexity of the model, the degree of variability in the data, the degree of interaction between variables, and the desired level of confidence in the results can all influence the minimum sample size. Monte Carlo simulation is a good method to test and verify the minimum sample size for each particular case. It sounds reasonable, but it is very expensive and complicated to implement, so many researchers have proactively proposed additional rules for determining the minimum sample size. For example, Hair et al., (2019) (page 132) suggest that the minimum sample size for CB SEM analysis is determined by the formula n = max(50.5X), where X is the number of variables measured in the model. research image, preferably max(50, 10X). According to Green (1991), the minimum sample size is determined to be n = 50 + 8m, where m is the number of predictor variables (independent variable) in the model.
For PLS-SEM, provides an alternative approach to structural model analysis. PLS-SEM focuses on building predictive models rather than testing specific hypotheses. In PLS-SEM, the main concern is the flexibility and predictability of the model. Therefore, the minimum sample size in PLS-SEM often depends less on the number of estimated parameters than on the predictive ability and reliability of the model. The minimum sample size in PLS-SEM does not depend as strongly on the number of parameters to be estimated as in CB-SEM. Instead, it focuses on ensuring the flexibility and predictability of the model. The formula for determining sample size is usually related to the degree of variability in the data and the desired predictive ability. In the book Multivariate Data Analysis, Hair et al. (2019) proposed that the minimum sample size for PLS SEM is 10 times the number of arrows pointing to a construct variable (page 770). However, Hair et al also believe that the minimum sample size should be 100, and in many cases a sample size of ≤100 is still acceptable depending on the specific research context.
An empirical conclusion of Hair et al (2021), with a research sample size of 250 or more, the results of PLS SEM and CB SEM analysis will be similar (Exhibit 1.8).
In summary, determining the minimum sample size in both methods is not a simple task. This depends on many factors such as the complexity of the model, the amount of variability in the data, the expected level of influence between variables, and the desired level of confidence in the results. Determining the minimum sample size is an important part of the research and data analysis process using CB-SEM and PLS-SEM. While they have different approaches, both require careful consideration of the balance between sample quantity and quality to ensure that analytical results are reliable and positive.
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Academic press.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2013). Applied multiple regression/correlation analysis for the behavioral sciences. Routledge.
Green, S. B. (1991). How many subjects does it take to do a regression analysis. Multivariate Behavioral Research, 26(3), 499–510.
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (2019). Multivariate Data Analysis. In Cengage Learning (8th ed.). Cengage Learning.
Hair Jr, J. F., Hult, G. T. M., Ringle, C. M., & Sarstedt, M. (2021). A primer on partial least squares structural equation modeling (PLS-SEM). Sage publications.
Jacob Cohen. (1992). Quantitative methods in psychology: A power primer. Psychological Bulletin, 112(1,155-159).
