Signal processing often requires moving between logarithmic & linear domains. In audio area we can talk about Octaves (logarithmic representation) and Frequencies (linear representation). Octave -> Frequency conversion corresponds to transformation** x = 2 ^ x**.

Most programming languages provide support for such calculation, for example Pow(a, b) in C#.

Those functions however tend to be time consuming. We can take advantage of the fact that in our case *a = const = 2* and optimize the calculation.

Let’s start with equation:

###### a ^ ( b + c ) = a ^ b * a ^ c

Assuming:

- a = 2
- b = Int(x); // integer part
- c = Frac(x); // fractional part

2 ^ Int(x) can be easily achieved with bit-shift operation. Now all it takes is to buffer *2 ^ Frac(x)* values. The buffer size depends on precision you want to maintain. E.g for 0.001 precision you need to cache 1000 values. In such case we end up with something like:

###### x = ( 1 << Int(x) ) * BufferPowerFractional[ (int) ( Frac(x) * 1000 ) ];

Which happen to be faster than Pow(2, x);

Note that above function yet needs to be adjusted to handle x < 0 values. Also for better memory/precision balance we can further break the *BufferPowFrac* cache down .

#### Code (C#)

[csharp]// the bit precision of particular buffer

// better precision -> bigger cache needed

const int PRECISIONBIT1 = 10;

const int PRECISIONBIT2 = PRECISIONBIT1 * 2;

const int PRECISIONBIT3 = PRECISIONBIT1 * 3;

// calculated precision values (bit count > bandwidth)

const int PRECISION_1 = 1 << PRECISIONBIT1;

const int PRECISION_2 = 1 << PRECISIONBIT2;

const int PRECISION_3 = 1 << PRECISIONBIT3;

// bit mask for obtaining buffer index

const int PRECISON_MASK = PRECISION_1 – 1;

// buffers for storing cahed fractional powers

// more buffers -> better precision but slower calculation

double[] powFrac_1 = new double[PRECISION_1];

double[] powFrac_2 = new double[PRECISION_1];

double[] powFrac_3 = new double[PRECISION_1];

// Pre-calculate fractional powers

void Init()

{

for (int n = 0; n < PRECISION_1; n++)

{

powFrac_1[n] = Math.Pow(2, (double)n / PRECISION_1);

powFrac_2[n] = Math.Pow(2, (double)n / PRECISION_2);

// Latter (int) conversion truncate the fractional part (as opposed to rounding).

// Thus we are looking for the best value on [+0..+1) => +0.5 in the final approximation

// Note: this causes ugly fractional parts added for ‘simple’ values (e.g 2^-1),

// but overall gives 2x better precision :)

powFrac_3[n] = Math.Pow(2, ((double)n + 0.5) / PRECISION_3);

}

}

// Raise 2 to the given power – optimized

double Pow2(double x)

{

// Integer part of x

int integer = (int)x;

// Fractional part of x

double frac = x – integer;

// 2 ^ Int(x) value

double powInt;

if (x >= 0)

// 2 ^ Int(x)

powInt = 1 << integer;

else

{

// conversion of negative fractional part

// this is in order to be able to use positive index for cached fractional powers

if (frac < 0)

{

integer–;

frac++;

}

// 2 ^ Int(x) for x < 0

// taking advantage of dependency: 2 ^ (-x) = 1 / (2 ^ x)

powInt = 1d / (1 << -integer);

}

// converting Frac(x) to cache index (with highest precision)

// next we will use bitwise operators to break it down for each of the sub-buffers

int index = (int)(frac * PRECISION_3);

// Calculating the final result: Int(x) ^ 2 * Frac(x) ^ 2

// the fractional part is broken down further into 3 buffers

return powInt

* powFrac_1[index >> PRECISIONBIT2]

* powFrac_2[PRECISON_MASK & (index >> PRECISIONBIT1)]

* powFrac_3[PRECISON_MASK & index];

}[/csharp]

#### Precision benchmark

Using 3*10 byte buffers (as in example above) – standard Pow(2, x) function vs optimized for given inputs:

- -10
- 0,0009765625
- 0,0009765625

- -2
- 0,25
- 0,25

- -1,7
- 0,307786103336229
- 0,307786103296491

- -1,3
- 0,406126198178118
- 0,40612619796838

- -1
- 0,5
- 0,5

- -0,7
- 0,615572206672458
- 0,615572206592982

- -0,3
- 0,812252396356236
- 0,81225239593676

- 0
- 1
- 1

- 0,3
- 1,23114441334492
- 1,23114441318596

- 0,7
- 1,62450479271247
- 1,62450479187352

- 1
- 2
- 2

- 1,3
- 2,46228882668983
- 2,46228882637193

- 1,7
- 3,24900958542494
- 3,24900958374704

- 2
- 4
- 4

- 10
- 1024
- 1024

The precision can be improved by either extending the buffer size (mostly memory cost) or introducing more buffers (mostly time cost). So the overall tuning tuning is about balancing between memory/time/precision.

#### Time efficiency benchmark

For spoken example – 5 series of 10000000 iterations, for input distributed equally below/above 0:

Standard [ms] | Optimized [ms] | Saved % |

1389,0794 | 785,0449 | 43,48 |

1363,078 | 784,0448 | 42,48 |

1384,0792 | 787,045 | 43,14 |

1360,0778 | 787,045 | 42,13 |

1361,0779 | 788,0451 | 42,10 |

We can observe >40% gain in time efficiency :)

It’s even better with only positive values, and can be further boosted – with cost of memory and/or precision.

#### Afterword

What’s surprising I wasn’t able to find this algorithm on internet (please let me know if you do!)

It’s result of my own investigation and main reason why I’m sharing it.

Don’t hesitate to ask if you have any questions or ideas for further improvement.

Zipped C# project (with benchmark included).

Of course this will be one of many performance improvements in the next release of Falasol ;)

Tables? Yuck. Look at celt_exp in http://git.xiph.org/?p=celt.git;a=blob;f=libcelt/mathops.h;h=31778a13df7d894155cee00650a817d37c59696b;hb=HEAD More terms could be added if you needed more efficiency.

gah. s/efficiency/precision/

Thanks a lot Gregory!

I just found a comparison of both methods (tables described above vs polynomial from your link):

http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html

Also according to that article polynomial approximation is better.

When investigating the subject I found this interesting implementation:

http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/

I’m going to test those 3 approaches (and publish the results around the weekend).

I felt that someone might have already came up with similar idea…

But hey, all those docs are quite fresh so I’m not that much delayed after all ;)

Have you looked at this one?

http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.html

The claim was 9x speed improvement over the CRT version of pow, which sounds pretty nice.

That is based on the same idea :)

The guy thought even about the fractional buffer breakdown (in supplement part). I was surprised no one mentioned that in other articles when struggling with over 1MB buffers.

Regarding the efficiency – there’s nothing like a thorough, reproducible benchmark…

“And, on one machine, it was over 9 times faster than standard pow.”

Hard to discuss with that. We’re also dealing with C++ vs. C# code. Can be different system function implementations, code optimizations, many various things.

You’re always rounding down to the nearest entry in the table. Maybe I’m missing something, but it seems like you could cut your error in half by using ((double)n + 0.5) in the Init() function.

(Or perhaps it would be more correct to use the average of pow(2,n/precision) and pow(2,(n+1)/precision).)

That is a really good suggestion!

I like the first option more since it doesn’t introduce any performance overhead.

Second could be slightly more accurate, but would require 2x code execution.

The only drawback I see an ugly fractional part for some ‘simple’ outputs, e.g:

2^-1 => 0,500000000161386

This is only a question of human aesthetics sense, since precision comparison speaks for itself.

Values [-10…10] with step 0.001:

Old version:

Max absolute error: 6,60890577819373E-07

Encountered at: 9,99999999999979

Max relative error: 6,45538647088066E-10

Encountered at: -1,87500000000001

Updated version:

Max absolute error: 3,3037224511645E-07

Encountered at: 9,99999999999979

Max relative error: 3,22766880154427E-10

Encountered at: -1,87500000000001

I’m updating the post.

Thank you!