Pages

Tuesday, 13 December 2011

Introduction to and data from the 2011 Reddit r/programming study

In November 2011, a survey was posted on /r/programming, which is a subreddit of Reddit. At the time, r/programming had about 345 000 subscribers. A total of 2502 individuals (n), which is 0.73% of the subscribers, clicked the link to the survey.

The raw data from the study can be downloaded from here (thanks TinyUpload!).

And here are some details about the survey:

The survey consisted of the following questions:

  1. Where do you live? (indicated by clicking on a map of the world)
  2. What is your age? (0-100 using a slider bar)
  3. What is your gender? (Male/female, using radio buttons)
  4. How many hours per regular week do you do programming? (all as sliders ranging from 0-70 hours)
    1. as paid work, 
    2. during education or courses, and
    3. as unpaid work (e.g., open source)
  5. For the hours spent programming as paid work (question 4.1), what percentage of hours do you actually write code? (0-100 % using a slider bar)
  6. How do you rate your programming skills? (5 category Likert response option using "novice" = 1 and "expert" = 5)
  7. How much professional programming experience do you have in
    1. years and
    2. months (number entry)?
  8. Please click the career(s) that you think is most similar to professional software development
  9. The Joel test consisting of 12 yes/no ratio button response option for 
    1. Do you use source control?
    2. Can you make a build in one step?
    3. Do you make daily builds?
    4. Do you have a bug database?
    5. Do you fix bugs before writing new code?
    6. Do you have an up-to-date schedule?
    7. Do you have a spec?
    8. Do programmers have quiet working conditions?
    9. Do you use the best tools money can buy?
    10. Do you have testers?
    11. Do new candidates write code during their interview?
    12. Do you do hallway usability testing?
  10. Which of TIOBE's top 100 programming languages plus Clojure (101 in total) have you
    1. used during last 12 months
    2. plan to start using during the next 12 months
    3. have used, but do not intend to use in the future
    4. have used but hate?
  11. Please indicate what kind of programmer type you are among three dimensions: intelligent, socially inept, and obsessed. Using a Venn diagram, this indicated eight different types (using radio buttons):
    1. Socially inept
    2. Intelligent
    3. Obsessed
    4. Dweeb (intelligent and socially inept)
    5. Dork (obsessed and socially inept)
    6. Geek (intelligent and obsessed)
    7. Nerd (intelligent, socially inept and obsessed)
    8. None of the above
Surveys are great for collecting masses of responses, but usually, surveys suffer from incomplete responses. This is normal, and there are ways of analyzing the data in spite of this.

Here are the response rates: A total of 68% of the survey respondents answered all 11 questions (n = 1698). The first question (location) had the highest response rate, 92.6% (n = 2317). Questions toward the end of the survey had lower response rates: programmer type (question 11, n = 1800, 71.9%),  programming languages (question 10, n = 1724, 68.9%), the Joel test (question 9, n = 1273, 50.9%). Question 5 was additionally added after the survey had been live for a couple of hours, thereby having fewer respondents as well (n = 1490, 59.6%).

Presenting answer alternatives in random order is important and helps counteract unwanted effects that may result from the order in which answer choices are given. All answer choices (e.g., 11.1-11.8) were presented in randomized order except for those of question 10 where the programming languages were presented in alphabetic order. By the way, respondents with less than 8 hours per week doing programming as paid work (question 4.1) were not presented with the Joel test (question 9). And for Questions 8 to 11, the questions themselves were presented in randomized order.

Other stuff: For question 10 (programming languages), the survey used click-and-drag which, admittedly, was cumbersome to use. Each programming language was only applicable to one of the answer choices (question 10.1-10.4). Additionally, for the first 100 respondents (or so), only question choices 1, 2 and 3 were available. Within each answer choice, respondents could internally rank programming languages where rank 1 indicated the strongest association for a programming language to this group.

More other stuff: The figures for questions 8 and 11 were used without knowing whether the picture has copyright or not. If you own the right to any of these figures, let us know.

2 comments: