2^x optimization for double type

Signal processing often requires moving between logarithmic & linear domains. In audio area we can talk about Octaves (logarithmic representation) and Frequencies (linear representation). Octave -> Frequency conversion corresponds to transformation x = 2 ^ x.

Most programming languages provide support for such calculation, for example Pow(a, b) in C#.
Those functions however tend to be time consuming. We can take advantage of the fact that in our case a = const = 2 and optimize the calculation.
Let’s start with equation:

a ^ ( b + c ) = a ^ b * a ^ c


Assuming:

  • a = 2
  • b = Int(x);      // integer part
  • c = Frac(x);  // fractional part

2 ^ Int(x) can be easily achieved with bit-shift operation. Now all it takes is to buffer 2 ^ Frac(x) values. The buffer size depends on precision you want to maintain. E.g for 0.001 precision you need to cache 1000 values. In such case we end up with something like:

x = ( 1 << Int(x) ) * BufferPowerFractional[ (int) ( Frac(x) * 1000 ) ];

Which happen to be faster than Pow(2, x);

Note that above function yet needs to be adjusted to handle x < 0 values. Also for better memory/precision balance we can further break the BufferPowFrac cache down .

Code (C#)

[csharp]// the bit precision of particular buffer
// better precision -> bigger cache needed
const int PRECISIONBIT1 = 10;
const int PRECISIONBIT2 = PRECISIONBIT1 * 2;
const int PRECISIONBIT3 = PRECISIONBIT1 * 3;

// calculated precision values (bit count > bandwidth)
const int PRECISION_1 = 1 << PRECISIONBIT1;
const int PRECISION_2 = 1 << PRECISIONBIT2;
const int PRECISION_3 = 1 << PRECISIONBIT3;

// bit mask for obtaining buffer index
const int PRECISON_MASK = PRECISION_1 – 1;

// buffers for storing cahed fractional powers
// more buffers -> better precision but slower calculation
double[] powFrac_1 = new double[PRECISION_1];
double[] powFrac_2 = new double[PRECISION_1];
double[] powFrac_3 = new double[PRECISION_1];

// Pre-calculate fractional powers
void Init()
{
for (int n = 0; n < PRECISION_1; n++)
{
powFrac_1[n] = Math.Pow(2, (double)n / PRECISION_1);
powFrac_2[n] = Math.Pow(2, (double)n / PRECISION_2);

// Latter (int) conversion truncate the fractional part (as opposed to rounding).
// Thus we are looking for the best value on [+0..+1) => +0.5 in the final approximation
// Note: this causes ugly fractional parts added for ‘simple’ values (e.g 2^-1),
// but overall gives 2x better precision :)
powFrac_3[n] = Math.Pow(2, ((double)n + 0.5) / PRECISION_3);
}
}

// Raise 2 to the given power – optimized
double Pow2(double x)
{
// Integer part of x
int integer = (int)x;

// Fractional part of x
double frac = x – integer;

// 2 ^ Int(x) value
double powInt;
if (x >= 0)
// 2 ^ Int(x)
powInt = 1 << integer;
else
{
// conversion of negative fractional part
// this is in order to be able to use positive index for cached fractional powers
if (frac < 0)
{
integer–;
frac++;
}

// 2 ^ Int(x) for x < 0
// taking advantage of dependency: 2 ^ (-x) = 1 / (2 ^ x)
powInt = 1d / (1 << -integer);
}

// converting Frac(x) to cache index (with highest precision)
// next we will use bitwise operators to break it down for each of the sub-buffers
int index = (int)(frac * PRECISION_3);

// Calculating the final result: Int(x) ^ 2 * Frac(x) ^ 2
// the fractional part is broken down further into 3 buffers
return powInt
* powFrac_1[index >> PRECISIONBIT2]
* powFrac_2[PRECISON_MASK & (index >> PRECISIONBIT1)]
* powFrac_3[PRECISON_MASK & index];
}[/csharp]

Precision benchmark

Using 3*10 byte buffers (as in example above) – standard Pow(2, x) function vs optimized for given inputs:

  • -10
    • 0,0009765625
    • 0,0009765625
  • -2
    • 0,25
    • 0,25
  • -1,7
    • 0,307786103336229
    • 0,307786103296491
  • -1,3
    • 0,406126198178118
    • 0,40612619796838
  • -1
    • 0,5
    • 0,5
  • -0,7
    • 0,615572206672458
    • 0,615572206592982
  • -0,3
    • 0,812252396356236
    • 0,81225239593676
  • 0
    • 1
    • 1
  • 0,3
    • 1,23114441334492
    • 1,23114441318596
  • 0,7
    • 1,62450479271247
    • 1,62450479187352
  • 1
    • 2
    • 2
  • 1,3
    • 2,46228882668983
    • 2,46228882637193
  • 1,7
    • 3,24900958542494
    • 3,24900958374704
  • 2
    • 4
    • 4
  • 10
    • 1024
    • 1024

The precision can be improved by either extending the buffer size (mostly memory cost) or introducing more buffers (mostly time cost). So the overall tuning tuning is about balancing between memory/time/precision.

Time efficiency benchmark

For spoken example – 5 series of 10000000 iterations, for input distributed equally below/above 0:

Standard [ms] Optimized [ms] Saved %
1389,0794 785,0449 43,48
1363,078 784,0448 42,48
1384,0792 787,045 43,14
1360,0778 787,045 42,13
1361,0779 788,0451 42,10

We can observe >40% gain in time efficiency :)
It’s even better with only positive values, and can be further boosted – with cost of memory and/or precision.

Afterword

What’s surprising I wasn’t able to find this algorithm on internet (please let me know if you do!)
It’s result of my own investigation and main reason why I’m sharing it.
Don’t hesitate to ask if you have any questions or ideas for further improvement.

Zipped C# project (with benchmark included).

Of course this will be one of many performance improvements in the next release of Falasol ;)

Tags: ,

Reader's Comments »

  1. By Gregory Maxwell on July 12, 2010 at 1:06 pm

    Tables? Yuck. Look at celt_exp in http://git.xiph.org/?p=celt.git;a=blob;f=libcelt/mathops.h;h=31778a13df7d894155cee00650a817d37c59696b;hb=HEAD More terms could be added if you needed more efficiency.

  2. By Gregory Maxwell on July 12, 2010 at 1:06 pm

    gah. s/efficiency/precision/

  3. By koshik on July 13, 2010 at 12:00 am

    Thanks a lot Gregory!
    I just found a comparison of both methods (tables described above vs polynomial from your link):
    http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html
    Also according to that article polynomial approximation is better.

    When investigating the subject I found this interesting implementation:
    http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/

    I’m going to test those 3 approaches (and publish the results around the weekend).

    I felt that someone might have already came up with similar idea…
    But hey, all those docs are quite fresh so I’m not that much delayed after all ;)

  4. By Dan on January 28, 2011 at 8:42 pm

    Have you looked at this one?

    http://www.hxa.name/articles/content/fast-pow-adjustable_hxa7241_2007.html

    The claim was 9x speed improvement over the CRT version of pow, which sounds pretty nice.

    By koshik on January 30th, 2011 at 4:48 pm

    That is based on the same idea :)
    The guy thought even about the fractional buffer breakdown (in supplement part). I was surprised no one mentioned that in other articles when struggling with over 1MB buffers.

    Regarding the efficiency – there’s nothing like a thorough, reproducible benchmark…
    “And, on one machine, it was over 9 times faster than standard pow.”

    Hard to discuss with that. We’re also dealing with C++ vs. C# code. Can be different system function implementations, code optimizations, many various things.

  5. By Slime on October 14, 2011 at 10:54 am

    You’re always rounding down to the nearest entry in the table. Maybe I’m missing something, but it seems like you could cut your error in half by using ((double)n + 0.5) in the Init() function.

    (Or perhaps it would be more correct to use the average of pow(2,n/precision) and pow(2,(n+1)/precision).)

    By koshik on October 15th, 2011 at 4:01 pm

    That is a really good suggestion!
    I like the first option more since it doesn’t introduce any performance overhead.
    Second could be slightly more accurate, but would require 2x code execution.

    The only drawback I see an ugly fractional part for some ‘simple’ outputs, e.g:
    2^-1 => 0,500000000161386
    This is only a question of human aesthetics sense, since precision comparison speaks for itself.
    Values [-10…10] with step 0.001:

    Old version:
    Max absolute error: 6,60890577819373E-07
    Encountered at: 9,99999999999979
    Max relative error: 6,45538647088066E-10
    Encountered at: -1,87500000000001

    Updated version:
    Max absolute error: 3,3037224511645E-07
    Encountered at: 9,99999999999979
    Max relative error: 3,22766880154427E-10
    Encountered at: -1,87500000000001

    I’m updating the post.
    Thank you!