Is VecLib buggy?

Discussion in 'Mac Programming' started by ASLMAC, Nov 21, 2013.

  1. ASLMAC, Nov 21, 2013
    Last edited by a moderator: Nov 21, 2013

    ASLMAC macrumors newbie

    Nov 23, 2010

    I am using VecLib in the Accelerate framework to implement a C++ class for large integer calculations.

    After some testing, I discovered that division functions were not always calculating the correct result.

    The following code shows what happens:

    vU512 A,B,C,Q,R;
    A.s.LSW = 1;
    B.s.LSW = 3;
    for (int64_t k=1; k<129; k++)
         A = C;
         assert(R.s.LSW == 0);

    When the counter k reaches 81 the assert fails, indicating that the division did not calculate correct.

    (the preceeding instruction multiplies with B and the next divides by B, and this should give a remainder of 0)

    I have noticed that the error occurs when the number C becomes larger than 2^128.

    I am not sure what to think about this:

    - Did I make a mistake and coded wrong.

    - Is the problem related to an older Mac

    - Is Apple frameworks doing wrong calculations

    Hope that someone can clarify this.

    Kind regards

  2. danwilliams macrumors member

    Sep 15, 2008
  3. chown33 macrumors 604

    Aug 9, 2009
    Which OS version is this happening on?

    Which architecture? (ppc? i386? x86_64?)

    Which Xcode version? Compiler?
  4. ASLMAC thread starter macrumors newbie

    Nov 23, 2010
    OS, Architecture, etc.

    Answer to your question:

    iMac 2.4 GHz, Aluminium, 21", EMC No: 2133, Mac OS X (10.6.4), Xcode 5.0.2

    So latest software versions and old mac.
  5. ASLMAC thread starter macrumors newbie

    Nov 23, 2010
    memset is OK

    I have checked by single stepping with the debugger, the A and B structures are correct set to zero.
  6. chown33 macrumors 604

    Aug 9, 2009
    1) Your software versions are not the latest ones. 10.6.8 is the latest version of the 10.6 (Snow Leopard) releases.

    2) You didn't identify what CPU architecture your code was compiled and run for. In my test case below, I compiled and ran both 32-bit (i386) and 64-bit (x86_64) versions.

    I suggest that you update to 10.6.8, for reasons which should become apparent below.

    First, I suggest making a well-isolated and more informative (verbose) test case. It can be C or C++, but it should be well-isolated so others can compile and run it without needing to add code, and it should be more verbose so its output can be collected into a file and diff'ed.

    Here's my test case, in C:
    File: vecTest.c:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <assert.h>
    #include <vecLib/vecLib.h>
    typedef  vU512  MYVEC_t;
    #define MYVEC_name				"vU512"
    #define MYVEC_HalfMultiply		vU512HalfMultiply
    #define MYVEC_Divide				vU512Divide
    int main( int argc, const char * argv[] )
    	MYVEC_t A,B,C,Q,R;
    	printf( " vec type: %s\n", MYVEC_name );
    	A.s.LSW = 1;
    	B.s.LSW = 3;
    	for (int64_t k=1; k<129; k++)
    		fprintf( stderr, "k:%02u  ", k );
    		fprintf( stderr, "A:%u B:%u C:%u ", A.s.LSW, B.s.LSW, C.s.LSW );
    		 A = C;
    		Q.s.LSW = k;  R.s.LSW = k;  
    		fprintf( stderr, " Q:%u R:%u \n", Q.s.LSW, R.s.LSW );
    //		 assert(R.s.LSW == 0);
    My reasons for using the MY_VEC family of types and defines will become apparent in the results presented below.

    I compiled and ran both 32-bit and 64-bit versions on a 10.4 Tiger machine. This machine has Xcode and gcc tools on it. My other working machines don't. I have a currently-dead 10.6.8 machine with a Core 2 Duo CPU, but I haven't needed it for my main work since it died, so I've delayed repairing it.

    When I run both versions on 10.4.11, I get identical output. So both architectures are consistent.

    However, inspecting the output, it's apparent that both architectures are also wrong. Here's a section of the 10.4 output:
    k:01  A:1 B:3 C:3  Q:1 R:1 
    k:02  A:3 B:3 C:9  Q:2 R:2 
    k:03  A:9 B:3 C:27  Q:3 R:3 
    k:04  A:27 B:3 C:81  Q:4 R:4 
    k:05  A:81 B:3 C:243  Q:5 R:5 
    k:06  A:243 B:3 C:729  Q:6 R:6 
    k:07  A:729 B:3 C:2187  Q:7 R:7 
    k:08  A:2187 B:3 C:6561  Q:8 R:8 
    k:09  A:6561 B:3 C:19683  Q:9 R:9 
    k:10  A:19683 B:3 C:59049  Q:10 R:10 
    k:11  A:59049 B:3 C:177147  Q:11 R:11 
    That's right, the quotient (Q) and remainder (R) values are not being returned in the output variables.

    Furthermore, when I change the types to vU256, vU1024, or any other referential type (i.e. not vu128), the results are the same. The results of the division are not stored into the output variables.

    The assert() must be disabled here, otherwise the test won't run to completion. Also, the Q & R variables must be initialized, otherwise they will contain garbage values. Finally, I chose to store an iteration-specific value into the parts of Q & R that produce output, to distinguish between a possible "zeroes the output" vs. a "doesn't change the output" failure modes.

    Next, I copied the executables and collected outputs (./v32 &>out32.txt) to my 10.6.8 (Snow Leopard) machine. Unfortunately, this machine only has a Core Duo CPU, which is not 64-bit. Here's a sample of its output:
    k:01  A:1 B:3 C:3  Q:1 R:0 
    k:02  A:3 B:3 C:9  Q:3 R:0 
    k:03  A:9 B:3 C:27  Q:9 R:0 
    k:04  A:27 B:3 C:81  Q:27 R:0 
    k:05  A:81 B:3 C:243  Q:81 R:0 
    k:06  A:243 B:3 C:729  Q:243 R:0 
    k:07  A:729 B:3 C:2187  Q:729 R:0 
    k:08  A:2187 B:3 C:6561  Q:2187 R:0 
    k:09  A:6561 B:3 C:19683  Q:6561 R:0 
    k:10  A:19683 B:3 C:59049  Q:19683 R:0 
    k:11  A:59049 B:3 C:177147  Q:59049 R:0 
    k:12  A:177147 B:3 C:531441  Q:177147 R:0 
    k:13  A:531441 B:3 C:1594323  Q:531441 R:0 
    k:14  A:1594323 B:3 C:4782969  Q:1594323 R:0 
    k:15  A:4782969 B:3 C:14348907  Q:4782969 R:0 
    k:16  A:14348907 B:3 C:43046721  Q:14348907 R:0 
    k:17  A:43046721 B:3 C:129140163  Q:43046721 R:0 
    k:18  A:129140163 B:3 C:387420489  Q:129140163 R:0 
    k:19  A:387420489 B:3 C:1162261467  Q:387420489 R:0 
    k:20  A:1162261467 B:3 C:3486784401  Q:1162261467 R:0 
    k:21  A:3486784401 B:3 C:1870418611  Q:3486784401 R:0 
    k:22  A:1870418611 B:3 C:1316288537  Q:1870418611 R:0 
    k:23  A:1316288537 B:3 C:3948865611  Q:1316288537 R:0 
    k:24  A:3948865611 B:3 C:3256662241  Q:3948865611 R:0 
    k:25  A:3256662241 B:3 C:1180052131  Q:3256662241 R:0 
    k:26  A:1180052131 B:3 C:3540156393  Q:1180052131 R:0 
    k:27  A:3540156393 B:3 C:2030534587  Q:3540156393 R:0 
    k:28  A:2030534587 B:3 C:1796636465  Q:2030534587 R:0 
    k:29  A:1796636465 B:3 C:1094942099  Q:1796636465 R:0 
    k:30  A:1094942099 B:3 C:3284826297  Q:1094942099 R:0 
    k:31  A:3284826297 B:3 C:1264544299  Q:3284826297 R:0 
    k:32  A:1264544299 B:3 C:3793632897  Q:1264544299 R:0 
    k:33  A:3793632897 B:3 C:2790964099  Q:3793632897 R:0 
    k:34  A:2790964099 B:3 C:4077925001  Q:2790964099 R:0 
    k:35  A:4077925001 B:3 C:3643840411  Q:4077925001 R:0 
    k:120  A:989468363 B:3 C:2968405089  Q:989468363 R:0 
    k:121  A:2968405089 B:3 C:315280675  Q:2968405089 R:0 
    k:122  A:315280675 B:3 C:945842025  Q:315280675 R:0 
    k:123  A:945842025 B:3 C:2837526075  Q:945842025 R:0 
    k:124  A:2837526075 B:3 C:4217610929  Q:2837526075 R:0 
    k:125  A:4217610929 B:3 C:4062898195  Q:4217610929 R:0 
    k:126  A:4062898195 B:3 C:3598759993  Q:4062898195 R:0 
    k:127  A:3598759993 B:3 C:2206345387  Q:3598759993 R:0 
    k:128  A:2206345387 B:3 C:2324068865  Q:2206345387 R:0 
    Here, the results of the division are definitely being calculated and stored into the output variables.

    The displayed values are the unsigned low 32-bits of the larger number, which can be cross-checked in vecLib by adding A to itself a total of 3 times, i.e. A+A+A = A*3. (Left as an exercise for the reader.)

    Results up to around 2^56 can also be checked using "", because its floating-point operands carry 56-bits of precision. These checks require the Programmer mode under its View menu.

    I also get the same 32-bit and 64-bit outputs when I run the executables on a 10.8.4 Mountain Lion machine. Whatever bug existed in 10.4.11 and disappeared by 10.6.8, it seems to remain at bay in 10.8.4.

    The fact that I got R:0 in all cases under 10.6.8 suggests that 10.6.8 does not have whatever problem you're seeing.

    Before updating to 10.6.8, I strongly recommend compiling and running both 32-bit and 64-bit versions of my test case above, and collecting the output. You can post the output as evidence for or against correct behavior. I won't hazard a guess what output you'll see, mainly because of the surprising results I got when running under 10.4.11.

    The short answer to your question is: "Yes, vecLib has bugs", but that answer must be qualified by a "depending on OS version, arithmetic operation, and possibly other factors".

    Finally, debugging is primarily a process of confirming expectations by gathering evidence. Your expectations were seemingly disconfirmed by the use of assert(). Your next step should always be to make a well-isolated test that gathers informative evidence.

    You can then present both the test (code) and the evidence to support your position. It would have saved me some time if you'd made a stand-alone compilable program that produced useful output. It may also have enlisted the help of more people with more data-points (OS versions, architectures).

    I just realized you wrote "Xcode 5.0.2", which tends to invalidate the "10.6.4" version of OS X. I.e. that version of Xcode is incapable of running on that OS version.

    Accuracy is important in programming.
  7. ASLMAC, Nov 24, 2013
    Last edited: Nov 24, 2013

    ASLMAC thread starter macrumors newbie

    Nov 23, 2010
    Sorry, I made a copy and paste error; OS X are 10.9. Compiled architechture is 32 bit (i386).

    For the division routine, I am sure that calculations are correct up to numbers larger than 2^128.

    To check my result, I used an online calculator for big integers:

    This site also contains a c++ class for large integers, which I found suitable as test reference.


    Just discovered that the multiply operation is also calculating wrong when one of the factors is larger than 2^128.

    But the following is ok:

    A = 11059408994828288;
    B = 2126117475393;
    C = 382537302016;

    D = A*B*C;


    Which seems to indicate that calculations are wrong when an operand is larger than 2^128, but not if the result is larger than 2^128.
  8. chown33 macrumors 604

    Aug 9, 2009
    Since you seem to have a consistent fail-case, please submit a bug-report to Apple and include all the relevant code and output data supporting the bug(s).

    If your bug is repeatable at Apple, then they will typically add a regression-test to their automated test procedures. Frankly, given what I found in 10.4.11 vs. 10.6.8 and later, I'm a little surprised there isn't already an automated test for vecLib.

    If you have access to the 10.9.1 pre-releases, you can try your tests there. You should make a full backup BEFORE updating, however, because pre-releases that fix bug A might introduce bugs B and C, and they could have worse consequences. So even if the 10.9.1 preview fixes vecLib, you should be fully prepared to revert to the public 10.9.0 release at any time.

    If you want to investigate the cause further, and you're familiar with assembly language, you should be able to disassemble the vecLib functions under test, and see what the actual code is. For vU256 operands, I don't expect that half-multiplication is too complex to comprehend in assembler. It's basically school-book 2-digit by 2-digit multiplication, where each digit is 128-bits long (i.e. base 2^128).

Share This Page