Thursday, March 27, 2008

Schemelab Concept

My PLT Scheme project for this year is to create a Schemelab collection to provide better analysis capabilities in PLT Scheme. It will be something of a mini-Matlab with capabilities similar to what numpy, scipy, and matplotlib provide in Python. I plan on making a new numeric collection to provide homogeneous, n-dimensional arrays (and, as a subset, matrices) as the primary underlying representation. Next, the science collection will be updated to use the new numeric package. Finally, a new plot collection will be developed to provide better visualization functionality. [After these are done I'll also update the simulation and inference collections to use the new Schemelab packages.]

I've already started work on the numeric collection. The basic representational element is a homogeneous, n-dimensional array (ndarray). An ndarray's type and shape are specified when it is created. The type may be any of the element types allowed for SRFI 4 vector types (u8, u16, u32, u64, s8, s16, s32, s64, f32, or f64), a complex type (c64 or c128), or may hold any Scheme object (object). The shape is a list of natural numbers where the length of the shape is the number of dimensions for the ndarray and each element is the cardinality of the corresponding dimension. References to array elements will support array slicing (in any dimension) for both array accessing (array-ref) and mutation (array-set!). Slicing operations create new views of the referenced array as opposed to copying portions of the array. I will start posting entries on different aspects of the numeric collection in the next few days.

The updates to the science collection to support (or to make use of internally) the new numeric collection should be relatively straightforward. The main issue will be to retain compatibility with the existing data types (i.e. vectors). Most of the really numerically complex code (e.g. special functions and random number distributions) will not be affected since they don't generally provide vector or array operations. The main changes will be to the statistics and histogram modules, which will be modified to work with ndarrays as well as vectors. Finally, some of the modules (e.g. histograms and ordinary differential equations) will benefit from being reimplemented using ndarrays internally.

I've also prototyped some code for the new plot package that provides much more functionality than the current PLoT package. The initial capability will be very much patterned after the functionality provided by Matlab (or more precisely, matplotlib for Python, which is also based on Matlab). It provides precise control over the elements of a graph (or multiple subgraphs). It also provides interactive graphics functionality for more dynamic analysis capabilities.

Note that all of these new (or updated) collections require PLT Scheme Version 4 (currently 3.99) and are not compatible with earlier versions. As such, I am not releasing them to PLaneT until either PLT Scheme Version 4 is officially released or there is a separate PLaneT repository for PLT Scheme Version 4 code. I will be putting the code on the Schematics web site at some point.



Blogger crasch said...

Hi Doug,

Just wanted to thank you for working on the development of Schemelab. I look forward to its release!


3:28 PM  
Blogger Substantia Nigra said...

Hi, Schemelab sounds exciting! Is there an update on how it's been going?

8:04 PM  
Blogger Doug Williams said...

My agent-based simulation work has kept me busy with the simulation and inference collections, so I haven't done much with the Schemelab idea yet.

9:58 PM  
Blogger Andrew Whaley said...

Hi Doug,

I applaud what you are doing as I think Scheme could make a fantastic free and open alternative to Matlab.

Given that performance is extremely important for this application wouldn't you be better to start with Gambit rather than PLT ? Gambit is generally 2 to 3 times faster than PLT on most benchmarks.

I've been doing some serious number crunching and ended up having to abandon Gambit even and revert to C because performance was an issue. You need to be able to plug the core into the NVidia CUDA SDK so you can take advantage of their massively parallel hardware (Matlab already has this). Nobody is going to wait weeks for their answers just so they can code in Scheme when they could have their answer in hours in another language.

Good luck with it,


7:57 AM  
Blogger Doug Williams said...

I'm more interested in the flexibility that Scheme affords me rather than efficiency. I chose PLT Scheme because I can run applications unchanged on Windows, Linux, and Mac and get exactly the same numerical answers and graphics. I'm not aware of any other Scheme implementation that gives me that flexibility.

Since I started porting my applications to PLT Scheme, I've seen at least an order of magnitude or so improvement in speed of my numeric applications (and equivalent improvements in my simulations because of improvements in processing continuations). I am getting another 2x improvement or so in the latest SVN versions.

If speed were my main concern, I'd never have gone to Lisp or Scheme in the first place. But for the knowledge-based simulations I'm doing, it's great. And, being able to do my analysis in the same environment is a plus to me.

6:01 PM  

Post a Comment

<< Home