One of the most important but often neglected processes in creating any type of measurement or index is validating its scale values against an external criterion. In this section, we will show that the new index, NPI, satisfies the following three types of scale validation:
- It meets the external validation test
- It can be translated into credible winning probabilities in a bilateral competition
- It satisfies indirect tests of validity by exhibiting expected pattern of relationship with several other competitive outcome variables
Although many existing measures of national power may show excellent face validity (that is, they seem reasonable at the conceptual level), their scale-values usually fail at least one of these external validation tests. We will show one of the key weaknesses of the existing indices of power.
A Test of External Validity
Here we use the factor-analytic framework to find some external criterion variables. Finding external criterion variables is easier said than done. We were lucky to be able to find some credible external criterion variables for the index of basic national power:
- FIFA ratings (used until 2006)
- ELO ratings
- SPI ratings
These ratings qualify as external criterion variables because they measure something quite different from the basic national capacity, they are created for different purposes, and they are based on different data and methodology. Nevertheless they are expected to have some linear relationship with NPI on the expectation that basic national capacity (or power) must have some bearing on the more specific sports performance of a nation. This type of theoretical expectation comes from a factor-analytic frame-work. See Kim et al. (2013) for a fuller explication. As can be seen from the following graph, the relationship between NPI and these three indices are linear.
The FIFA index used here are based on the old ranking scheme used until 2006. The relationship between NPI and FIFA index is linear, and the minor deviation shown on the right indicates is not statistically significant. The new FIFA index introduced in 2006 and in use today is not a credible soccer power index. For that reason, we must rely on two other currently available soccer power indices: SPI and ELO.
For those who are new to the intricacies of index construction and the concept of external validity may wonder how such a trivial index as Soccer Power Index can be used as the criterion of scale validation of a much more comprehensive and important index of national capacity. Note, first, that in the graph above, there is a great deal of zig-zag around the linear trend or pattern: the correlation between the two indices is only about .6. These two indices are clearly measuring something quite different. This linear pattern is compatible with the factor-analytic assumption that a specific factor, such as soccer-power, is affected by both a more general, or common, factor (e.g., basic national capacity) and an area specific (e.g., soccer specific) factor. The important point is that two indices, built for different purposes, by different persons, by using different conceptual and methodological schemes, and by using different sets of data, reveal such a systematic relationship–in this case, a linear pattern.
Credible Winning Probabilities
It is not an exaggeration to argue that the relative scale values of any index of national power or capacity has meaning only in connection with their systematic connection to winning or losing probability when two nations compete. The main entry, Winning Probability, contains statistical models and illustrations of a few example uses of the model.
In this section, we will show how to use the winning probability model as a means of testing “minimum” admissibility of any index of national capacity. The first step is to convert any index values to z-scores–that is, standardized values with mean=0 and standard deviation=1. Then, using the z-score difference between any two nations, one may read off the winning probability of the stronger nation from the standard normal distribution function. If the power difference between the two nations is, for instance, 1.23, then from the table, you will note that the probability associated with z=1.23 is .8907. (Green, 2000, for instance, the first table in Appendix B.) This probability (=.8907) is the winning probability of the stronger nation over the weaker one, provided that it is reasonable to assume the saliency ratio is 2. A graphic representation of the competitive results of two nations with 1.23 power-difference, under the assumption that the saliency ratio is 2 is shown below:
We can easily show why GNI, CINC, or any other index that contains “raw-scaled” components (such as population size, GDP, GDP per capita, total energy consumption, total military budget, total military personnel) without proper scale transformations is likely to fail this scale admissibility test.
An illustration with the simple index based on the gross-national-income (GNI) will suffice. The power difference between the United States (the strongest nation) and Germany (the fourth strongest nation) is 7.863 on GNI (when transformed into z-scores), which imply the winning probability of the U.S. over Germany a certainty on almost all areas of competition under quite different assumptions about the saliency ratio. However, such a winning probability is not credible, a clear sign that the GNI scale values do not have a theoretically meaningful relationship to the winning probabilities in a bilateral connection and therefore cannot be a credible index of national capacity. See the two graphs, one where saliency ratio=2 and the other where saliency ratio=1.
The unit-free transformations used in CINC, where the raw scales are simply repackaged without scale-transformations as proportions of the total sum of the raw component, suffer the same problem as using the raw-scale values. For example, in constructing the CINC, the population value of a nation is transformed into (population size of a nation)/(total population in the world), thereby leaving intact the relative distances of the population size, exactly proportional to the “raw” differences in the original untransformed raw scales. Many “raw” variables have such skewed distributions that the relative distances in raw values cannot be assumed to be roughly proportional to the winning probabilities. So, one of the necessary steps in constructing an index of national power is to examine the distribution properties of the components that are included in the composite index.
Skewed Distributions and Necessary Transformations
As examples of components with highly skewed distributions, we show the distribution patterns of two widely used components: GNI and Population Size.
There is no way such highly skewed values can be proportional to winning probabilities. Before such components can be used as a part of some composite index, they have to be transformed. For example, our new NPI index transforms the population size in such a way that the raw population values are transformed twice–once by taking the log values, and a second time by taking the square-root of the normed index (that ranges between 0 and 1). The following graph shows the relationship between the original values and the final transformation used in constructing the NPI index. The horizontal values (x-axis) and the line that connects the bottom left corner and the top-right corner represent the final scale used in the un-normed NPI. The bottom curve with purple dots shows how the original population size is related to the final transformed values. The middle line with red dots shows the relative values after the log-transformation. In short, the final sub-index is based on two successive transformations–log and square-root–thereby discounting the impact of raw-values twice in order to accommodate the diminishing returns to size.
Note the difference in population size between China (the most populous nation) and the United States (the third most populous state) is huge in original raw scale, but the relative distance between the two in the final sub-index, although still substantial, is much smaller.
The WTA Distortions and Scale-Transformations
An interesting question emerges–can some relevant transformations cure the scale problems of some components. The answer is Yes and No. An appropriate transformation will help reduce the distortions and possible exaggeration of the relative power of the more powerful. In some instance, however, no transformation can cure all possible distortion problems. Let us assume that one might consider Global Fortune 500 as a useful indicator of national power and want to use it as one of the component indicators of national power.
A careful scholar will examine the distribution pattern of Global Fortune 500 and note a severely skewed distribution, and make a log-transformation (usually, log (x +1)) before including it in the construction of a composite index. Such a transformation will reduce the distortion problem substantially, but cannot eliminate the severe truncation problem. That is, when the effect of winner-take-all distortion is very strong, which is the case with Fortune 500 index, there will be many cases with zero entry, thereby introducing unintended truncation on the left. Not all the countries without a Fortune 500 company have the same underlying capacity.
The same problems do not exist when we use the log-transformation of GNI or Population Size, but whenever we use variables that are based on competitive outcomes (such as Fortune 500, Nobel Prizes, Scientific Patents, and so on) in which the Winner-Take-All distortion effect is strong, one could introduce inadvertent distortions into the index building, even if one makes the best possible transformations. See the section of WTA on this web.
NPI and Other Indices of National Performance and Extent of Distortions
The relationships between the new index and some other performance related outcome measures, such as “Medals won in the Olympic Games,” “Global Fortune 500 Companies,” or “Nobel Prize Winners by Nations” all show the expected non-linear pattern. The fact that non-linear relationships are expected because of the operation of WTA (winner-take-all) and the differences in the degree of non-linearity of these graphs follow an expected pattern can be taken as an additional, indirect validation of the NPI scale. For fuller explication, see Kim et al. (2013).
In short, the new index (NPI) passes the test of external validity and the scale values of the index produces acceptable winning probabilities, while GNI and CINC not only fail the external validity test, but also produce clearly unacceptable winning probabilities.