Sunday, September 20, 2009

Science Collection Updated

I released a new version of the Science Collection with reimplemented statistics and Chebyshev approximation routines.

Statistics

The statistics routines have been rewritten to take sequences (for example, lists and vectors) as arguments instead of just vectors - except for the median and quantile routines that require sorted vectors. Unfortunately, the sequence versions, which are generally implemented using for/fold, were more than twice as slow as the previous vector versions. So, I wrote dispatching for and for/fold macros (see below) that take a general sequence and generate a cond expression with special cases for vectors and lists that the complier can better optimize. These perform about the same as the old code for vectors and lists perform about the same (as vectors). And, we still get the benefit of general sequences.

I also added the correlation function that I had somehow missed in the original implementation. [Unfortunately, the first version had a bug (#203) in it - actually, there were two since the contract was also wrong. It has since been fixed.]

I also added running statistics to the statistics routines. If you only need simple statistics (e.g., n, min, max, mean, variance, and standard deviation), you don't need to maintain a list (or vector) of the values. Instead, you create a statistics object and tally successive values. The statistics object can be queried at any time for the values of its statistics at that point.

I also implemented a mean-variance function that computes both the mean and variance for a sequence in a single pass through the data. [The previous functions requires two passes - one to compute the mean and one to compute the variance.] This also reduces the number of passes required for computing skew and kurtosis when the mean and standard deviation aren't given.

I also implemented contracts for sequence-of-real? and nonempty-sequence-of-real? that are used in the contracts for the statistics routines. As usual in the Science Collection, there are unchecked versions of the statistics routines that bypass the contract check.

Chebyshev Approximations

The Chebyshev approximation routines were reimplemented for clarity and efficiency. I changed the make-chebyshev-series function to be more general. It replaces the functionality of make-chebyshev-series-order and chebyshev-series-init, which are still available as legacy routines. I also added make-chebyshev-series-derivative and make-chebyshev-series-integral that generate new series to compute the derivative and integral, respectively, of a given Chebyshev series.

Dispatching for and for/fold

In reimplementing the statistics routines to allow general sequences as arguments, I also implemented dispatch-for and dispatch-for/fold macros, as described above. These macro are not exported from the statistics collection and are implemented using syntax-rules - so error messages are pretty much limited to 'bad-syntax'. With that caveat, they do produce code that outperforms (due to compiler optimizations) a straight for or for/fold for vectors and lists.

The implementation of the macros and examples their usage can be seen in the statistics.ss file in the Science Collection.

Labels: , ,

0 Comments:

Post a Comment

<< Home