Reliability and Validity | 16Personalities

In this article, we discuss various reliability and validity metrics of our assessment, NERIS Type Explorer^®. All metrics refer to the English version of the assessment – we monitor and improve all international sections as well, but covering them in the same article would be too overwhelming.

Internal Consistency

A coefficient called Cronbach’s alpha measures whether questions belonging to the same scale produce similar scores. For instance, if you agree with “I like cookies”, you’d also be likely to agree with “I’ve eaten lots of cookies in the past” and disagree with “The smell of cookies annoys me.”

Alpha values are generally expected to be between 0.70 and 0.90. Lower values indicate that the questions being evaluated may not measure the same construct; higher values imply redundancy. As you can see from the table below, all our scales have good alpha values, which confirms that our assessment is reliable and measures all its scales well.

Scale	Alpha
Introverted vs. Extraverted	0.87
Observant vs. Intuitive	0.78
Thinking vs. Feeling	0.75
Judging vs. Prospecting	0.82
Assertive vs. Turbulent	0.86

Sample size: 10,000 respondents.

Test-Retest Reliability

Test-retest reliability shows how much correlation there is between the original test results and a retake (usually after a longer time period). The higher the reliability coefficient, the less variability there is on a particular scale.

It’s important to keep in mind that measuring test-retest reliability in personality psychology is quite different from, say, near static physical measurements (such as vision). At the very least, a perfect test-retest experiment would require the same environment and the same mindset, which is near impossible to achieve when it comes to personality testing.

For instance, our data strongly hints that certain personality types reacted differently to the 2016 U.S. presidential election, with noticeable changes on scales such as the Assertive-Turbulent one – potentially indicating increased or decreased anxiety. Our personality traits also tend to shift slightly as we grow and mature. Therefore, some variation over a period of time should definitely be expected.

That said, our assessment passes the test-retest challenge as well. Just like Cronbach’s alpha, all coefficients are expected to be 0.70 or higher.

Scale	Coefficient
Introverted vs. Extraverted	0.83
Observant vs. Intuitive	0.74
Thinking vs. Feeling	0.80
Judging vs. Prospecting	0.79
Assertive vs. Turbulent	0.78

Sample size: 2,900 respondents, who took our assessment after a break of 5-7 months. p < 0.001.

Discriminant Validity

The third step is discriminant validity analysis. It confirms whether scales that should not be related are really not related. In other words, are we actually measuring five distinct scales, or are they mixed up in any way? Are we certain that when we ask you questions related to the Introverted vs. Extraverted scale, we are not inadvertently measuring half of the Assertive vs. Turbulent scale as well?

Yep. Let’s take a look at the table below. The maximum accepted (absolute) value for this coefficient is usually considered to be around 0.70-0.80 – if it’s more than that, it means there is enough overlap between the two scales to invalidate them.

As you can see, absolute values for all our scales’ coefficients are well below the threshold. Observant-Intuitive and Judging-Prospecting scales have the highest coefficient, at 0.37, a slight positive relationship that has been mirrored by other instruments measuring similar concepts – the increased tolerance of ambiguity that is associated with the Intuitive side of the first scale lends itself well to the desire for flexibility that Prospecting individuals are known for. Regardless of that, their correlation coefficient is way too low for either scale to have an unacceptable impact on another.

So, this third check shows that all five scales are distinct and don’t influence each other in a way that would make us question their integrity.

	Introverted vs. Extraverted	Observant vs. Intuitive	Thinking vs. Feeling	Judging vs. Prospecting	Assertive vs. Turbulent
Introverted vs. Extraverted		-0.09	0.02	-0.01	-0.29
Observant vs. Intuitive	-0.09		0.09	0.37	0.22
Thinking vs. Feeling	0.02	0.09		0.08	0.25
Judging vs. Prospecting	-0.01	0.37	0.08		0.16
Assertive vs. Turbulent	-0.29	0.22	0.25	0.16

Sample size: 10,000 respondents.

Conclusion

To summarize, statistical analysis confirms that:

Our assessment is based on five distinct and independent scales;
All scales are internally consistent;
People who retake our assessment are very likely to get similar scores on all scales, even after a break of ~6 months.

We always welcome feedback, questions, and criticism, so if you have any comments regarding the above metrics, please feel free to drop us a message!