The Top 50 Code Bases
Anthony Cangiano had a post I liked that did an estimation of programming language popularity by Amazon Rank of the bestselling corresponding book. OK, everybody that thinks of that as unscientific rather than fun, leave now.
Using Amazon Bestseller Rank is biased toward new, hot languages and languages popular among people just learning programming. This is fine; I appreciated that and took the post on its own terms. You can also get fancy and make up some popularity index based on number of programmers, courses, and vendors like these guys. But recently, I took a high-level look at my own skill set. Inspired by Anthony, I wanted to come up with my hard-knocks, generalist view of the actual code base out there in a fun way, what you might run into to maintain. So, dedicated to the apologetic project manager who introduces your new assignment with, “Remember the aerodynamics library Elmer did in Ada before he retired…”
Presenting: The Top 50 Code Bases, as ranked by the Google hits on the phrase “written in language“. The term language here is ridiculously broad, encompassing any programming, scripting, query, or markup language, and even meta-applications; anything you’re likely to be faced with updating something in, and thus might have to understand. (This is my rationale for including markup languages.) If you might have to fix it, I counted it.
| Rank | Language | Refs (K) | Cangiano Rank |
| 1 | C++ | 1,290 | 6 |
| 2 | Perl | 1,120 | 11 |
| 3 | Python | 748 | 9 |
| 4 | PHP | 667 | 10 |
| 5 | C | 523 | 7 |
| 6 | C#1 | 381 | 5 |
| 7 | ASP | 292 | |
| 8 | HTML | 282 | |
| 9 | Assembly | 277 | |
| 10 | Visual Basic | 225 | 8 |
| 11 | XML | 188 | |
| 12 | JavaScript | 177 | 1 |
| 13 | Flash or ActionScript2 | 162 | |
| 14 | BASIC | 160 | |
| 15 | SQL | 141 | 4 |
| 16 | Java | 137 | 2 |
| 17 | Delphi | 117 | 19 |
| 18 | FORTRAN | 112 | |
| 19 | Excel | 107 | |
| 20 | SQL Server or Microsoft SQL Server2 | 101 | |
| 21 | .NET | 97.7 | |
| 22 | Ruby | 94.4 | 3 |
| 23 | Access | 82.9 | |
| 24 | Scheme | 76 | 17 |
| 25 | Visual C++ | 70.7 | |
| 26 | MySQL | 67.3 | |
| 27 | D1 | 65.2 | |
| 28 | Tcl | 58.0 | |
| 29 | Pascal | 57.1 | |
| 30 | Oracle or PL/SQL | 55.3 | |
| 31 | COBOL | 49.2 | |
| 32 | AJAX | 47.2 | |
| 33 | LISP3 | 46.2 | 20 |
| 34 | MATLAB | 44.0 | |
| 35 | Ada | 36.5 | |
| 36 | Prolog | 34.7 | |
| 37 | VBScript | 33.9 | |
| 38 | Haskell | 33.6 | 18 |
| 39 | bash | 33.3 | |
| 40 | Smalltalk | 32.7 | 22 |
| 41 | CSS | 31.5 | |
| 42 | PostScript | 28.2 | |
| 43 | sh or Bourne shell1,2 | 22.1 | |
| 44 | Turbo PASCAL | 21.4 | |
| 45 | Common LISP | 20.2 | |
| 46 | ColdFusion or Cold Fusion | 20.1 | |
| 47 | Erlang | 19.6 | 12 |
| 48 | Objective-C | 18.9 | 13 |
| 49 | Lua | 18.4 | 16 |
| 50 | UML | 17.6 |
1These were corrected for false positives such as “written in C# minor”.
2Both search terms’ totals were added together.
3Sorry, Paul.
For comparison, I have included Anthony’s rankings, so you can see how his hot languages compare to the spectres who will haunt your nightmares in years to come. I decided that dialects of languages should be ranked separately, since you might only initially be told that a program is in LISP, and that might require you to first identify that Common LISP library calls are present; this is a harder problem than being told initially that it is in Common LISP.
I was surprised at what didn’t make the cut. If you’re wondering if I forgot:
- awk
- sed
- XHTML
- SOAP
- XSLT
- ksh, csh, and their variant names
- OpenGL
- Mathematica
- Forth
- ALGOL, PL/I and those other 60s hits
and about three dozen other also-rans, um, no.
Don’t take this all too seriously, as when I checked back a week later, there were unlikely changes in some of the totals. Google’s count estimation is very rough, apparently. These numbers were retrieved on December 14, 2007, in all but a few instances.