Python For Scientific Engineering
In this article we look at the advantages and disadvantages of the use of Python, who has won Linux Journal 2009 Readers' Choice Award for Favorite Programming Language, in scientific and engineering applications as an alternative to the traditional C, C++, Fortran, and, above all, MATLAB, Octave, and other mathematical packages. It was written after workshop held 2009-02-05 at the Institute of Cybernetics, NAS Ukraine.
The article is written at the request of developers.org.ua (DOU) and, accordingly, seasoned in the style of other articles from this site.
Note: there were lots of objections after publications of the article in DOU, they said "you have to mention those and those, not this and this! These facts are well-known and obvious, there are no needs to teach it us! As for your expectations about FOSS - it's absurd!". I guess the DOU website just wasn't the best place for the article, first of all it's aimed at students, schoolchildren, IT teachers who are not skilled in Python at all and/or don't know that it is commonly used for scientific/engineering purposes. Note also, I decided to omit any copy-paste examples from Python/numpy/scipy cookbooks, I guess people can easily find it by themselves.
Why someone (like these schools students) has to spend his time on the study of Python (and, indeed, use it to write scientific software), while there is C/C++, Fortran, MATLAB/Octave, with a sufficient number of appropriate software? Why does MIT (lead USA technical institute) has moved to study Python instead of Scheme (and even our lead Kiev national university of Ukraine had started teaching Python and NumPy), while Python speed seems to be quite slow yet? (although you could speedup your Python code via using PyPy or numba).
First of all, low-level languages like C, C++, Fortran, Assembler do not allow RAD (Rapid Application Development). We have to spend much time on the compilation and linking, and use a debugger that is rather hard for students/schoolchildren. In addition, both C and Fortran are the "Write Only Language" (especially it's true for Perl, where the code is understand only by its author, but usually no longer than 15 minutes). As a rule, is often required as synchronization (automatic or manual) header-s (h, hpp etc.) and source-files (c, cpp etc.). Very frequently it yields runtime-error "out of array bounds", which is hard to find or avoid for inexperienced users. The site mloss.org, where more than 170 packages on the new scientific and technical direction (machine learning) are present, has no any single package written in Fortran. Also, low-level languages are not cross-platform.
One of the characteristics of RAD-Python is the lack of strict classification. Let's see, for example, the following function:
- def myFunc(a, b):
- return 2*a + b
Here we do not need to specify the types of arguments a and b, it may be anything that allows the operation amount and the multiplication by 2 without a problem: it may be the number, the number and array of numpy, strings, lists, matrices of equal dimension, objects of a class, provided they are supplied by operators of addition and multiplication by a number.
For some more Python examples visit our Python introduction.
Even schoolchildren can learn Python in a couple of weeks, which is not true for C/C++, Fortran, OCAML, Erlang. According to experts, the development of Python applications is about 2 times faster than Fortran ones, in addition, written programs contain far fewer lines of code and are more readable, which makes it easier to change code. And it is not only salaries of programmers, but also rent payments, other staff salaries, and just being ahead of competitors and performing required job timely. Experience of my work for commercial firms shows that very often among the software development organization and a potential client raises the following dialogue: "Your software performs the task in 30 seconds. For $ 5000 we can write software capable to do it for 3 seconds." - "And we don't require speeding it up to 3 seconds, we perform it once per week, and after a month we will buy a new processor for $ 500 and it will take just 15 seconds."
Ok, let's now consider the high-level languages. MATLAB, MAPLE, MATCAD, Mathematica are quite expensive. Of course, now we now could buy unlicensed version easily, but:
- It doesn't guarantee that this situation will last forever
- Except for a decent price one has to pay ~10% per year for updating the libraries
- In abroad (China, Brazil, etc., especially Europe, which more closely with the unlicensed software) governments had realized the negative side of dependence on the commercial software, and a powerful movement had been organized to force education, municipal and other public structures migrate to free software. Over time, this will improve the quality of free software, so the migration of programmers, customers, users will accelerate more and more. Therefore, those ones who relied on these commercial packages in the future may regret, because rewriting of the thousands code lines (moreover, scientific-technical), particularly if the organization doesn't have qualified professionals in both languages, is not an easy task both technically and financially.
Note also that even the undoubted leader MATLAB (vs MATCAD, MAPLE, Mathematica) - also has several drawbacks:
- The need to end each line of code with semicolon
- Absence of TRUE compiler to machine code (I don't treat those ctf-files with about 270 MB MCR as a real compilation)
- Inconvenient docking with other programming languages (C, Fortran) by mex-functions
- Inconvenient processing of strings and a number of other types, including OOP classes
- Passing arguments by copying (using "global" is obsolete, unstable and not recommended, using "evalin" or "assignin" makes code unclear, moreover, for other programmers)
- A very high price (today for the company without any rebates it's $3000 for MATLAB + prices for toolboxes, for example Optimization toolbox - $1562). Of course, like almost any other self-respecting commercial software product, MATLAB has a spidery network, Mathworks usually names it "student discounts".
It was very precisely noted in one of MATLAB mail-list messages: Octave, SciLab etc. are tentative implementation of MATLAB language, while MATLAB is a tentative realization of anything except matrices.
With regard to Octave and especially SciLab, is also worth mentioning the problem with the license. For Octave is GPL, containing copyleft (which prohibits the use of the license of your product more severe restrictions than the limits used in the library with the copyleft), SciLab license also has copyleft. It sufficiently limits their distribution and development, because a number of organizations who produce commercial software do not use them, preferring using a software without copyleft (ie, such licenses as the BSD, MIT, Apache).
According to my observations the general trend in software development (including research soft) is like that: free software gradually pushes out commercial software, while free software without copyleft gradually replaces containing that one with copyleft (due to the fact that software organizations give them financial support more willingly). A typical example - well-known libraries BLAS and LAPACK. For a number of other, more complex scientific software, such as numerical optimization, proprietary software still held their position, but I think this is just a matter of time, BTW there are already some free solvers such as DSDP, IPOPT, PSwarm (and much more) that are compatible to commercial solvers priced of thousands dollars. It should be noted that the basic financial foundations for the development of free scientific soft are research grants from foundations, universities, several organizations (IBM, Sun Microsystems, etc.).
With regard to the use of other high-level languages for scientific and technical purposes, then they have the following disadvantages:
- OCAML - license with copyleft, the complexity of study
- Ruby - low speed (2-4 times less than the Python), popularity is mainly due to ROR (a library for developing web-applications), problems with multiple inheritance, the uncertainty in the choosing programming constructs to most rapid code execution
- Java - the language has lower level than Python, Ruby, MATLAB, so the development of applications is longer
- Groovy, Cobra (not to be confused with CORBA) and other Python clones (or languages similar to Python, like Lua) - a small amount of their software. Who will translate all those megatons of code only due to it (probably) has some other usage of brackets, commas, colons? In Python vs MATLAB/Fortran the reasons are obvious - license issues, RAD features. It's not so difficult to take an existing language and correct some of its drawbacks, but, as Java-programmers wont to say, now none needs a language without batteries. BTW, one of Python slogans is "Batteries included" - that is, many software modules are attached to the language
- F # - the mere fact that it is Microsoft, deters many users (and, hence, reduces the audience and the dissemination of the language). Some years ago I read F# FAQ, and author of the language unconvincingly attempted to persuade there are no needs to wait for possible licensing problems from Microsoft. In addition, to my mind, F# does not completely got rid of OCAML drawbacks
- R - License (GPL), the narrow main thrust (stochastic), and it syntax isn't excellent
- As for Pascal, the author believes, in schools and some universities of Ukraine it's studied by inertia, it hasn't any special advantages in comparison with competitors.
In addition, almost all the languages here (except that the R), as well as PHP, tcl (where the same have to write tiresome sign "$" in the case of PHP - and yet have very ";"), a small number of scientific software (compared to Python, including the software with Python-interface).
Main Python drawbacks
As any other programming language, Python of course has some disadvantages:
- Python is not originally intended for scientific and technical tasks, as well as C/C++. Therefore, it doesn't yield a great speed during code execution (which is partially offset by numpy and ease of other languages code connection, see below)
- Unlike MATLAB, Octave, and a number of other software, there is not standard Python library for sparse matrices: someone uses scipy.sparse, someone PySparse, someone (as CVXOPT) uses its own library and/or BLAS, someone just uses 3 columns (for the number indexes and value). SciPy developers refused to author of scipy.sparse to include it into NumPy, I think it's a big mistake, probably now it would been a unified standard. Still I hope in future numpy versions difference in API for handling sparse and dense arrays will be removed.
- Currently, Python passes the painful migration from version 2.5 to 2.6 and 3.0, where a lot of changes have been done. There are programs that can do this automatically, but for numpy and scipy, which are a significant part of the code C and Fortran. For more information, see here. It seems that it is linked to the local decline in the popularity of Python TIOBE index
- The allocation of units based on indentation. Some users like it (so do I), some others - on the contrary, hate it
Main Python language implementations
- CPython - Python implementation in C performed by the Python author Guido van Rossum (of course, currently much more programmers maintain it). AFAIK it took more than 90%
- Jython - Python implementation in Java. Currently it's nothing special, it's not faster than CPython and it's not compatible with many libraries written for CPython
- IronPython - Python implementation for Microsoft .NET. It has a contingent of users, first of all Microsoft is interested in promoting C#, that is aimed in the similar purposes and, hence, is Python competitor. Also, you may be interested in ironclad project, its aim is to make CPython C Extensions usable from IronPython (so it allows to use numpy)
- PyPy (PythonPython) - experimental implementation by a team sponsored by a grant from FP7. It is unlikely that he has a future (because of incompatibility with the libraries of CPython), but some of it ideas (dynamic translation of some code pieces to C language and then compilation "on the fly") may in future be used in CPython
- PyMite - implementation of Python for a number of microprocessors (especially Atmel)
Main Python scientific libraries
First of all, we should pay attention to NumPy (numeric python) and SciPy (scientific python). They (and their lists of mailing lists - numpy-user, scipy-user, scipy-dev, see here) are meet points to all Python users for scientific and technical purposes (however, SAGE google groups should be mentioned as well).
- NumPy - this is a low-level library, written mostly in C and Fortran (mostly matrix operations), based on the code of BLAS + ATLAS, LAPACK. Additional speedup can be obtained via linking NumPy with Intel MKL / AMD ACML (currently it requires some efforts in modification makefiles, I guess in future the operation will be simplified). See also NumPy for MATLAB users. Those who have problems with English, I can recommend html-paste this address into http://translate.google.com or a similar service.
- SciPy - numerical integration, splines, optimization, solving systems of differential. equations, etc. If your requirements are beyond scipy capabilities, you could take a look at special packages that sometimes provide more functionality, convenience, possibilities etc. (eg, OpenOpt, CVXOPT, NLPy, Pyomo vs scipy.optimize). Some examples of numpy/scipy usage can be found here
- As an equivalent of MATLAB plot tool most prevalent is matplotlib
- For parallel calculations first of all python-multiprocessing should be considered, see examples here or here. It is included in CPython v >= 2.6, for 2.4 and 2.5 you should install it by yourself, via easy_install or somehow else.
Also, you'd better take a look at the following lists of Python scientific software:
- List of Topical Software (classified by some categories: optimization, visualization etc)
- PYPI - list of scientific/engineering software
- And the poll on the use of scientific packages
Main scientific Python distributions
You can manually install Python, one of its IDEs and all libraries required for your purposes (that is especially popular in Linux society via apt-get and PYPI), alternatively you can use one of the scientific-oriented distributions:
- Sage - "free viable alternative to MATLAB, MAPLE, MATCAD, Mathematica". Mozilla-based IDE is recommended to be used (or another browser) with some possibilities similar to MATLAB cells, but I would prefer to use a more convenient IDE. Another pro are convenient interfaces to MATLAB, Octave, R, etc. Sage is quite popular in educational/academic society, because it requires only installation onto a server and it implements some features that make scientific/engineering software development even more fast, than pure Python, but those changes make your code less portable - you won't be able to run the code anywhere except of Sage itself (although, you may code in pure Python there). Another drawback - Sage is very heavyweight, ~ 1 GB on hard drive + WMVare player if running under Windows, require several hours to build and for every update (~quarterly) for ordinary computer (~ AMD 3800).
- PythonXY - distribution based on Eclipse (widespread free cross-platform IDE, has plugins for Python). Additional modules can be installed as an exe-files (also PythonXY is available for Debian-based Linux OSes). Drawbacks: Eclipse is quite heavyweight - several hundreds MB on hard drive and RAM, long start time on low hardware.
- (Least recommended) EPD (Enthought Python Distribution) - contains numpy, scipy, matplotlib, and much more (however, IIRC most of other included soft has suspicious non-OSI-approved licenses). Disadvantages: it is free for non-commercial usage and for 32-bit platforms only; supported OS - only commercial Windows and RHEL, attached IDEs (IDLE and Scite) functionality can compete at most with Microsoft notepad. The single known pro: it is easy to install (several mouse clicks).
Using at least one of these distributions is recommended for to avoid potential problems when installing numpy, scipy, matplotlib, etc. (this is especially important if you cannot use apt-get, yum and other tools to automatically download and install Linux soft, and if some packages are C/Fortran written). After that, you can install and use another Python IDE (if those ones from the distribution involved doesn't satisfy you).
Recommended development environments for Python
First of all, I would recommend to pay attention to Eric, NetBeans, Eclipse, SPE. AFAIK only Eric IDE has Python Command Shell, similar to MATLAB Command Shell, therefore, first of all I would recommend this one (commercial Wing IDE has 2 windows at once, including one for the debugger, but from my experience it is inconvenient). It should be noted that Eric is developed by a single person (Detlev Offenbach), and he maintain Eric functionality and (German) quality in the way compared to NetBeans, Eclipse (which are written by teams of developers and get financial support from a number of large corporations), as well as to commercial KOMODO, Wing IDE. In addition, it consumes only 41 MB of RAM (Eclipse - 137, NetBeans 6.1/6.5 - 229/412). Start time, seconds - 4 - 21 - 16/25. The figures are for the AMD Athlon 3800+ X2, Linux KUBUNTU. Eric has some localization plugins (i.e. other than English languages).
Main Eric disadvantage is the problem with the installation, especially with its dependencies (best via apt-get, yum, and other Linux update channels to install all the dependencies, and then put on top the latest version of Eric, on the question of Qt data directory is usually helps mere Enter key. If together with the dependencies you installed himself Eric, check what version you're running (Help-> About Eric) in the case of the older version, change the path to the new). It is recommended that you install a version no older than 4.3.0 (2009-02-08), because there is finally a couple of shortcomings have been corrected (at the request of the author of the article) - scrolling file tabs available mouse debugger window immediately switches to locals; another shortcoming - the inability to carry out the commands of the current function stack - Detlev promised to fix in 4.4.0 as well as adding the visual system of errors and warnings, similar to MATLAB and NetBeans. Installing plugins for Eric is not very convenient: Plugins-> Plugin Repository-> update; download; install; Plugins-> Plugin Infos-> activate, autoactivate. First and foremost, is recommended to install pylint. Eric plugin installation is less convenient than in NetBeans, but it is more convenient than the Eclipse, where the same installation of plug-ins for some reason is executed via the menu Help.
After installation, Eric is recommended to immediately make the initial preparations - to remove the current word highlight (if it has not yet been removed in the new version) and change the red color to highlight search results by a better one, make the default file types to save/load "py"(Python), delete or move the window project-viewer, so that it does not trigger when you involve debug on/off, remove unnecessary buttons from the panel (for example, plugins - they are present in menu and are not involved very often), and make "save files" button visible. However, Eric hot-keys are rather convenient, so using them is more convenient than panel.
Look at the snapshot is configured in such a way Eric is here.
All 3 IDEs have conditional breakpoints, integration with VC systems (cvs, svn, etc.) and have almost full range of standard services. Full list of Python IDEs is here.
Connecting other languages to Python
To stick Python code with other languages you should consider:
First of all let's note - you could use software like CORBA (Ada, C, C++, Lisp, Ruby, Smalltalk, Java, COBOL, PL/I, Perl, Python, Visual Basic, Erlang, Tcl) or Ice (C++, Java, .NET-languages (such as C# or Visual Basic), Objective-C, Python, PHP, and Ruby).
- NumPy_for_Matlab_Users guide
- Migration from Matlab to Python-based systems using NumPy and SciPy (doc)
- mlabwrap - a high-level Python to MATLAB bridge
- pythoncall - runs a Python interpreter inside MATLAB, and allows transferring data (matrices etc.) between the Python and Matlab workspaces
- libermate - MATLAB to Python code converter
- ompc - another MATLAB to Python code converter
- pym - another MATLAB to Python code converter
- oct2py - Octave <-> Python tool
- NumPy and SciPy for IronPython / .Net (my experiments from Sep 8 show it's too premature yet)
- Ironclad - tool to run CPython modules in .NET IronPython
- python-excel.org software list
- PyCel - Compiling Excel spreadsheets to Python (OpenOpt usage is available)
- jpype - an effort to allow python programs full access to java class libraries
- jepp - embeds CPython in Java
- jpe - Java-Python extension
- CFFI (for C)
- CPPYY (for C++)
- shed-skin - Python to C++ compiler
- cython (Pyrex descendant) - most recommended tool for connecting C/C++ code to Python, included into NumPy
- RPy - Python interface to the R Programming Language
- R-numpy - doc entry about using NumPy for R (and S+) users
- f2py - Fortran to Python interface generator, included into NumPy
- fwrap - a new, recently created tool, wraps Fortran code in C, Cython and Python
- IntegratingPythonWithOtherLanguages page from python.org
- numba - speedup Python code via running LLVM on some funcs
NASA and some other organizations typically practice the following approach:
- Anything that can be written in Python + numpy (of course, used and scipy, and other libraries). Also, it can just be use for quick prototyping, then find most slow parts of code (bottlenecks) via profiler or somehow else and translating them into other language
- Code, that requires greater speed, is written in Pyrex (now in more modern Cython)
- Code, that requires even greater speed, is written in C/C++, Fortran (and connected via Cython. f2py)
- Code, that requires maximum speed, is written in Assembler
As noted by some mail list subscribers (and my own observations - 3 years on MATLAB mail list, and 2 years in numpy/scipy mail lists), people wont to answer in free software forums at greater length than in commercial soft - indeed, why should someone help Mathworks to earn money? And why Mathworks will provide free support to users when they prefer to earn money by commercial support?
In our countries all comes with a delay (in comparison with the West), apparently this is true for the use of Python in scientific and technical purposes. The lack of strict control over licensed software slows down the involving of free software. BTW I had sent a letter to our Ministry of Science and Education with a similar arguments about moving the educational process from MATLAB to Python, but of course there were no answer - as expected from them, whose website mention the ministry staff member spreading MATLAB and receiving his %.
Python consulting in Ukrainian language can be found on the Ukrainian site of Python-programmers http://python.su. Unfortunately, the owners of the site refused to create subforum - meetpoint for Ukrainian users of Python in scientific and technical purposes. They argued it by a small number of the users.
If you have no problems with English, you'd better ask questions about Python here, respond will be faster and performed by more skilled people. Python-announce google group is also worth noting, this one publishes major events (mostly releases of software). Releases of scientific and technical libraries for Python (or, at least with Python-API) are usually published on the scipy-user mail list.
One can't deny: there is a probability (someone consider it hight, someone think it's low, but it's certainly non-zero), that in future Python (as well as C, Fortran, MATLAB) will be suppressed by another language. But now using it in scientific and technical purposes is on the rise (especially for the West), as showed at least by a sharp increase over the last 2 years of numpy-user and scipy-user mail lists traffic. In any way, this language will certainly be high level (as well as Python, Ruby) and use pass-by-reference, so Python will be one of best languages to migrate from (in my experience of code migration from MATLAB to Python two largest problems were pass-by-copy and indexing of arrays from ones).
- FuncDesigner - a tool that essentially enhances RAD abilities of Python language for developing scientific software
- Parallel calculations in Python
- Some opinions about Python by mature industrial companies
- Python Advocacy In Scientific Computation
- Python tools for parallel calculations
- CorePy Assembly speedup for NumPy - A GSoC student will try to accelerate NumPy's ufuncs with SSE and multi-core execution
- pylab-works - Simulink equivalent for Python (for current date 2009-04-04 it is premature yet)
- PEPs (Python Enhancement Propositions) - a list of proposals to change the language. Some accepted, some rejected, some of the discussion, including possible alternatives
- google trends - one more argument that MATLAB and Fortran popularity is going down, that is in accordance with my 2-years observations in TIOBE index (for Python it's better to involve "Python language" for to omit searches for the nasty reptiles)
- shed-skin - python to C++ compiler
|Made by Dmitrey|