Pages

Tuesday 13 March 2012

A more detailed look at language popularity

In November 2011, Reddit r/programming was asked to indicate what programming languages they had used during the last 12 months. The preliminary analyses (here and here) indicated that JavaScript was the most frequently used programming language. A limitation, however, was that "used box" does consider how much a language is used relative to other languages that had also been used.

The figure on the right shows a plot were the frequency that a language was put into the "used box" is illustrated together with the mean rank for the same programming language. A mean rank of 3 indicate that, on average, respondents rank a programming language as the third most used. Languages that were reported used by fewer than 100 respondents were omitted from this figure.

Even though JavaScript has the highest frequency of use, the mean rank is somewhat higher than for languages such as C, C++, Java or C#. This indicates that other programming languages are more of a "main" programming language for the respondents on average. C# had the lowest mean rank, followed by Java and C++. 

One way to look at this further is to count the number of occurrences that a language is ranked as first, second, or third and divide by the total number of occurrences a language is used. This has been done in the table below. For those who reported using C#, 44% ranked C# as the most used language whereas for those who used JavaScript, only 8% rank this language as the main (1st) language.

Use C++ C Java C# PHP JavaScript
1st 28 % 25 % 34 % 44 % 22 % 8 %
2nd 25 % 23 % 21 % 19 % 20 % 31 %
3rd 16 % 17 % 14 % 13 % 18 % 21 %
Total 69 % 65 % 70 % 76 % 60 % 60 %

The overview of how many use a language compared to what the mean rank of the language for the "use" group is illustrated in the figure below. For the languages that very few use (those on the left), a great deal of uncertainty is present due to the low frequency of respondents for the language. 
Percent who placed a language into the "use" category versus the mean rank (popularity) of each language
Another way to illustrate language popularity is to use some form of weight function that incorporate both the mean rank and the number of occurrences a language appears in the used category. In the figure below, the frequency of a language was multiplied with 1/mean rank and plotted on the y-axis. TIOBE's 2011 December ranking is shown on the x-axis.

Deviations from the identity line (where x = y), shows how the respondents of r/programming differ from TIOBE's index of language popularity. For example, users of JavaScript, Python, and to some extent, C++ and C# seems to be overrepresented in the survey. Users of Objective-C and Visual Basic seem to be underrepresented in the survey.

Results for r/programming using a weighed usage function together with the results for TIOBE's December results

1 comment: