Tuesday, August 31, 2010

BLAS, LAPACK, and Parrot-Linear-Algebra

Last night I got back to work on Parrot-Linear-Algebra, my matrix package for Parrot. I'm really starting to get happy with the functionality it provides and am looking to start moving the project to the next level of functionality and usability.

I hadn't touched it in a while, for a variety of reasons. Austing Hastings has been very busy with other projects and hasn't cut an "official" release of Kakapo that works with the last supported Parrot release, 2.3. I've applied some hotfixes that make Kakapo build and pass tests on 2.3, but that's not quite good enough. When I make a PLA release I don't want "clone the Kakapo repository from Gitorious" in the instructions, because then the instructions get out of date when Kakapo is updated to be incompatible. What I want is to say "Download Kakapo version X from this URL", which will be invariant.

Last night I added some prototype new features to the NumMatrix2D PMC, the basic numeric matrix type. I'm going to mirror the majority of those changes to the two other general-purpose types; ComplexMatrix2D and PMCMatrix2D. I added methods to do basic elementary row operations (swap two rows, add a multiple of one row to another, and multiply the contents of a row by a non-zero constant), and used those operations to put together an example program to calculate both the Row Echelon and Reduced Row Echelon forms of a matrix. Those methods, combined with block manipulation methods I added previously and the new GEMM method I added last night as well, create a basis for us to create a comprehensive library of linear algebra routines.

But that brings me to my next concern: How to implement the remainder of the fundamental algorithms a linear algebra pack is going to want to provide? Obviously I would love to wrap the LAPACK library up and call the routines it provides directly. LAPACK provides a huge number of routines for doing things like singular value decomposition, QR, LU, and other types of decomposition, solving the eigen probem, solving systems of equations in two variables, etc. In fact, LAPACK provides almost all of the functionality I would want PLA to have in the next few months.

The problem, however, is that LAPACK is not nearly as accessible from C code as I would like. In fact, there is no "standard" for C-bindings to the library, and several lame attempts are available that tend to be incompatible with each other. The reference version, and only reliably-available version, of LAPACK is written in FORTRAN. The standard CLAPACK library is just a machine-lead translation of the FORTRAN sourcecode, with a few points in the code needing to be manually tweaked after conversion. It has a few problems, including the fact that every single function parameter (even basic ints and floats) must be passed by reference. The ATLAS library, which PLA currently prefers to provide BLAS bindings, provides some LAPACK C bindings of it's own, but only a very very small subset of all LAPACK functions are provided, and the ones it does have hardly support all the operations PLA is going to want to provide.

CLAPACK, being more or less the standard could be made to work, except that it doesn't come as any sort of user-friendly package. There are no CLAPACK packages for Debian (Ubuntu) or RedHat that I have seen, and that raises the barrier to entry significantly, something that I want to avoid.

I could use the LAPACK library directly, and twist my code to match the FORTRAN calling conventions and data alignments. That's not unthinkable, though it would require some more work on the part of PLA developers than any other solution.

I could skip LAPACK entirely, and instead rely on something like GSL with proper C bindings built-in. GSL does provide several important matrix operations and decompositions, though I would need to do more research into the capabilities of that library.. What I don't want is to lose focus and have this project grow to try and become a general wrapper for all of GSL. I want PLA to stay focused on Linear Algebra only. We could maybe create sister projects to encapsulate more of the GSL functionality and use PLA under the hood to implement the basic data structures, of course.

Maybe we will get lucky, and the new NCI system will have support for calling FORTRAN routines from shared libraries. I don't think we can just anticipate this kind of functionality, at least not during the GSoC program.

A "nuclear" option, perhaps, would be to not rely on any external library for these things and instead brew up all the basics myself. I'm not against such work per se, but it would be a huge investment in time and the case cannot be made that it's the best use of my limited development time. I did put together a Gauss-Jordan elimination routine last night, it wouldn't be too too much effort to put together a generalized QR decomposition algorithm and a singular-value decomposition algorithm, followed by routines to calculate eigenvalues, eigenvectors, and matrix inverses from those things. If PLA had a larger team of active developers who wanted to participate in this kind of work it would be an easier decision to make, but if it's primarily me and a handful of occasional-contributors, this really isn't a doable task.

My plan for PLA in the near future is this: After the 2.6 release I want to push for a stable release of Kakapo, and then using that I want to cut a release of PLA. From that point forward, PLA will target the 2.6 version of Parrot at least until 2.9 and maybe later. The first release of PLA is going to provide three basic matrix types: A 2D matrix of floats, a 2D matrix of complex numbers, and a 2D matrix of PMCs. These three matrix types will have a large base of common functionality and each type will have some additional functionality too, as required by that type. Everything provided will be well-tested (including error cases, which I haven't exercised nearly enough so far) and some example programs will be provided in both PIR and NQP.

There are a few projects I am envisioning to start in the future that will rely on PLA, so I really am hoping to create a nice, stable, functional release within the next few months. I'll post more information about any other projects as they arise.

elementary linear algebra

No comments:

Post a Comment