Last data update: Wed Apr 13 14:57:11 +0200 2011
We have attempted to collect a variety of data about the relative popularity of programming languages, mostly out of curiousity. To some degree popularity does matter - however it is clearly not the only thing to take into account when choosing a programming language. Most experienced programmers should be able to learn the basics of a new language in a week, and be productive with it in a few more weeks, although it will likely take much longer to truly master it.
Browser requirements: sadly, yes, this site requires a browser that supports
canvas tag. I wasn't happy with other options for creating
downloaded once. Firefox, IE and Safari ought to work, although I haven't tested it with the
latter. Konqueror apparently does not work. The charts are created with Chartr and Flotr.
Note: these results are not scientific. They are interesting nonetheless, and are an attempt to glean as much data as possible notwithstanding the fact that gathering precise data is impossible. We hope you find them interesting as well. Constructive suggestions on improving them are welcome. Contact information is provided at the bottom of the page.
This is a chart showing combined results from all data sets, listed individually below.
It is possible to recalculate the normalized results with different "weights" for the different data sources. For instance, if you want to place more importance on Craigslist data, and less on Yahoo Search, you could set Craigslist to '2', and Yahoo Search to '0.5'.
Yahoo provides an API to its search API. Previous versions of these
statistics used numbers from Google, but since Google has deprecated its own API, we utilized
Yahoo's. Searches took the form
This is a fairly crude approximation of popularity, however, it's worth including, because all other things being equal, the more popular a language is, the more pages will exist mentioning it.
We used Yahoo's search API for this too, with queries like this:
programmer -"job wanted" site:craigslist.org
Popular languages are used more in industry, and consequently, people post job listings that seek individuals with experience in those languages. This is probably something of a lagging indicator, because a language is likely to gain popularity prior to companies utilizing it and consequently seeking more people with experience in it.
Note: Until recently, we used data from Amazon.com for book statistics, but due to several problems with Amazon's web service, we have switched to data from Powell's Books. They're a large, independant book store based in Portland, Oregon. A visit to the physical store is highly recommended if you're ever in the area.
Since these results are new, we will probably be tweaking them in order to determine which queries work "best". Currently, we're searching for language names in titles in several sections that are relevant (Software Engineering and Computer Programming, to be precise). Expect the results to change some over the coming months.
Books are a lagging indicator, but a good way to eliminate languages that aren't "established" at all. There are hundreds of languages out there, but if there's a book, it's generally something more than a toy or research project. That's not to say that languages without a book aren't "serious", but we do need to draw the line somewhere. In any case, it's interesting to compare what languages people are talking about with the amount of available books.
The data from Freshmeat were obtained via their new API: http://help.freshmeat.net/faqs/api-7/data-api-intro.
Freshmeat is a good place to get data on open source projects that have passed the early stages and actually released something and announced it. These results most likely reflect differences in what people are paid to work with and what they choose to work with when they can choose. There were no freshmeat projects utilizing Cobol, for example, although it seems to fare decently in the other results.
Data from Google Code Search was obtained using the API to search here: http://www.google.com/codesearch
This is similar to Freshmeat in that it favors open source projects with code that is visible on the internet. Due to some issues with the API, I am currently (as of October 2010) using a dump of data handed directly to me by Google.
Data from Del.icio.us was obtained with the Yahoo Search API, because the del.icio.us API
really isn't up to the job yet. We did site: searches like
This is an interesting bit of data for a couple of reasons. First of all, it seems more linear that the others. It ought to reflect what people genuinely find interesting or useful themselves, rather than what they put out there at random, which means they have an incentive to be 'honest'. The order of the language also seems to change significantly compared to the other data sets.
Ohloh provides a lot of information and statistics about various open source projects. We decided to use the number of people committing code in a particular language, rather than something like lines of code, as languages like C will always have more lines than, say, shell scripts.
For fun (well, this whole site is "for fun"... let's just say it's extra data we don't include in the main results), we also gathered some data from sites programmers often visit to talk about programming languages. Because of how this industry functions, what people are experimenting with, what they want to use, and what they're paid to use every day are often different things. For the moment, we use three sites:
Normalized results from the discussion site data sets - these results are not included with the 'normalized results' above. It's interesting to note how languages like Haskell and Erlang are talked about a lot, despite scoring fairly low on the normalized popularity chart above. People are interested in them, but haven't begun to use them on a large scale yet.
It is possible to recalculate the normalized results with different "weights" for the different data sources. For instance, if you want to place more importance on Craigslist data, and less on Yahoo Search, you could set Craigslist to '2', and Search to '0.5', and then redraw the chart.
The data were obtained using Yahoo's search API on the Lambda The Ultimate web site, utilizing the
title: query option in an attempt to eliminate false positives due to the
presence of these terms on every page: Erlang, Lisp, Haskell, Tcl, Python.
This site is firmly grounded in academia, and many participants are associated with programming language research, so more "experimental" or innovative languages are commonly discussed and well regarded. What's interesting about the numbers is that there seems to be a cap, with several languages equal to the maximum. Perhaps it's an error with Yahoo's data - we'll keep an eye on it for future versions of this report.
The data were obtained with a bit of screen scraping and reddit's own search feature.
This site has gained in popularity recently, and often has decent discussions of programming languages and their relative merits. The community is generally curious about up and coming languages like Haskell and Erlang. Of course there are also many people working in industry with languages like Java and PHP.
The data were obtained using Yahoo's search API with the Slashdot web site. We use the
option here too, to be fair.
Slashdot reaches a very wide audience, and while it hasn't been quite as popular as more recent arrivals like reddit, it's still a very popular site, and has been around for a while, so is worth including.
IRC is still, for many people, the place to go to get real-time help with various technologies, or simply to discuss them.
With the proper infrastructure in place for gathering and saving data, we intend to update this data on a regular basis, as well as showing historical trends.
Past versions of these statistics used data on prices of keywords in programs like Google's AdSense. We have currently applied for access to this data from Google, and are waiting on approval.
"C" named languages are something of a problem. Queries for "C" tend to return results
for C# and C++ as well. One way of dealing with this would be to run queries like this:
C -C# -C++, however, that unfairly penalizes pages that contain discussions
of both C and C++. The D programming language suffers from a similar problem (it tends to
be confused with "3-d programming", so we tweaked some of the searches to account for
this, and use "D programming language" where appropriate.
More sources of data are always welcome.
We're willing to add other languages, but they should register in all of our existing data sources.
Check out the Google Group for announcements / updates, or if you want to make a suggestion about the survey. We also welcome email to suggestions --- at --- langpop.com. When submitting a request, please check and see if your language registers hits with the data source used in this survey, and send me the links. Thanks!