I am assuming that Joel compiled his list of questions based on his extensive experience with software teams, so some expectations pop up in my head. First, there should be a common theme running through these questions in so much as they relate to the quality of a software development team or organization. One way to look at this is to run the responses through factor analysis where such a common theme, if present, should manifest itself as a single dominant factor. Second, if a common theme is indeed present, one would expect positive correlations with a more scientific "team quality factor" and other researched factors, such as “extended experience” and “skill”.
Now, I am guessing that some people here might be familiar with factor analysis, but that many may not know what this is at all. I’m going to try to describe the technical analysis steps I did, while at the same time describing what they and the result mean. I only did a rough analysis using a fairly standard approach.
The analysis was conducted using principal component analysis (PCA). Although PCA, in a strict sense is not factor analysis (it is a data reduction technique), it is one very common way to proceed, especially in initial stages.
Second, the scree plot indicates that there is a clearly discernible theme (represented by principal component 1) present in the data. The break points in the plot indicate components, and where the steepness in the plot goes toward horizontal, the remaining components are too weak to be considered. There is a clear break (or "elbow") around component number 2 or 3.
Third, using a Varimax rotated solution with two factors extracted, one can see that only 31.5% of the total variance is explained by the two factors. The first factor only explains about 20%, which is somewhat low (about 40-45% would have been better). This means that there are two reasonably strong signals in the data. All data has a lot of noise, and the whole point of factor analysis is to find signals in that noise and to cluster those signals into entities that might have a meaning; here “team quality”, or similar. The reason one “rotates” the factors is that the mathematics simply extracts two factors (signals) from the data and places them a bit haphazardly in the data. Think of the two factors as a coordinate system that suddenly appears in a two-dimensional representation of the data. Rotating this coordinate system just means that one does so to minimize the distance between the data points and the axes, so that the axes actually represent the signal in the data optimally.
It is possible to plot the loadings of the questions in a two-dimensional space. Principal component 1 should be analogue to the quality of a software development team. Both FixBeforeCode and QuietWorkCond have problematic relations to this this factor. These questions form a small cluster by themselves with a medium loading on the second principal component. There is also a small cluster compromised of HaveSchedule, HaveSpec, BestTools and UsabilityTesting that have a medium strong relation to principal component 2 and a weak relation to principal component 1. Finally, the remaining questions form a cluster that is clearly associated with the team quality factor, and is marginally or not at all associated with the second (unknown) factor.
Why is this so? I would love to hear you views on this. What should the second factor stand for, do you think? Further, it might be useful to do factor analysis on subsamples based on location. For example, U.S. and European respondents might display quite different results when the two populations are analyzed independently. 
Fifth, to be fair to Joel, I should mention that even though some of the questions had a negative loading on the team quality factor, he has some empirical support in his claim that the twelve questions as a whole give some meaningful results. First, it is possible to specify that only one principal component should be extracted, thereby reducing the responses for all the 12 questions to a single factor. The component matrix shows then that all questions have shared variance with this common factor. (Variance is noise, but shared variance means a signal is detected.) Although only about 20% of the total variance is explained (half of what could be explained by two factors), all questions have positive loading (FixBeforeCode and QuietWorkCond still have low loadings, though).
Specifying that one only wants one factor, gives a “one-factor” solution. One can do this if one is sure (from a theory or past experience perspective) that only one factor should be present and the empirical analysis supports this.
The first PC of the single and two factor solutions are highly correlated (.87), having nearly 76% shared variance. Second, the first PC of the single factor solution is also highly correlated with the second PC component of the two-factor solution as well (.65) with over 42% shared variance.
Finally, it is possible to cross check whether the extracted factors are positively correlated with other factors one would expect them to be positively correlated with. I have used non-parametric correlations for the variables (Spearmans' rho) and parametric correlations for the factors (Pearson's r) in the table below. For both first principal components (i.e., for the one and two factor solution), the correlations with general programming skill, total months of experience, time spent per week as paid work are all positive and above .158. Further, both principal components are negatively correlated with time spent per week as education or courses. Moreover, there is a slight tendency that both principal components are positively correlated with the percentage of paid time actually doing programming.
Finally, it is possible to cross check whether the extracted factors are positively correlated with other factors one would expect them to be positively correlated with. I have used non-parametric correlations for the variables (Spearmans' rho) and parametric correlations for the factors (Pearson's r) in the table below. For both first principal components (i.e., for the one and two factor solution), the correlations with general programming skill, total months of experience, time spent per week as paid work are all positive and above .158. Further, both principal components are negatively correlated with time spent per week as education or courses. Moreover, there is a slight tendency that both principal components are positively correlated with the percentage of paid time actually doing programming.
Overall, there seems to be a tendency that the Joel's 12 questions capture one or (most likely) two factors that are relatively consistent with expectations. But more work is needed.
first PC (single factor solution) | first PC (two-factor solution) | second PC (two-factor solution) | ||
first PC (single factor solution) | r | 1.000 | .871 | .649 |
Sig. (2-tailed) | . | .000 | .000 | |
N | 1234 | 1234 | 1234 | |
first PC (two-factor solution) | r | 1.000 | .207 | |
Sig. (2-tailed) | . | .000 | ||
N | 1234 | 1234 | ||
first PC (two-factor solution) | r | 1.000 | ||
Sig. (2-tailed) | . | |||
N | 1234 | |||
general programming skill | rho | .230 | .229 | .100 |
Sig. (2-tailed) | .000 | .000 | .000 | |
N | 1234 | 1234 | 1234 | |
total months of programming experience | rho | .158 | .168 | .043 |
Sig. (2-tailed) | .000 | .000 | .132 | |
N | 1227 | 1227 | 1227 | |
time spent per week as paid work | rho | .256 | .291 | .075 |
Sig. (2-tailed) | .000 | .000 | .009 | |
N | 1234 | 1234 | 1234 | |
time spent per week as education or courses | rho | -.148 | -.196 | .002 |
Sig. (2-tailed) | .000 | .000 | .942 | |
N | 1125 | 1125 | 1125 | |
time spent per week as unpaid work (e.g., OS) | rho | .019 | -.011 | .057 |
Sig. (2-tailed) | .529 | .704 | .055 | |
N | 1147 | 1147 | 1147 | |
percentage of paid time actually programming | rho | .098 | .073 | .058 |
Sig. (2-tailed) | .001 | .016 | .056 | |
N | 1081 | 1081 | 1081 |
Exceptional post however , I was wondering if you could write a litte more
ReplyDeleteon this topic? I'd be very thankful if you could elaborate a little bit more.
Bless you!
Very informative post! There is a lot of information here that can help any business get started with a successful social networking campaign. Arborist Tools
ReplyDeleteI admit, I have not been on this web page in a long time... however it was another joy to see It is such an important topic and ignored by so many, even professionals. I thank you to help making people more aware of possible issues. ABŞ vizası onlayn
ReplyDeleteI admit, I have not been on this web page in a long time... however it was another joy to see It is such an important topic and ignored by so many, even professionals. I thank you to help making people more aware of possible issues. Hindistan vizası onlayn
ReplyDeletemecidiyeköy
ReplyDeleteçeşme
muğla
afyon
uşak
LCY