C++: What's so special about 13.34, 14.34, and 15.34?

Discussion in 'Mac Programming' started by rayonix, Jan 4, 2014.

  1. rayonix, Jan 4, 2014
    Last edited: Jan 4, 2014

    rayonix macrumors newbie

    Joined:
    Jan 4, 2014
    #1
    Hello,

    I am running Xcode 5.0.2 on OSX 10.9.1.

    So today is my first day using the forum and posting, and this is already my second post. At first, I wasn't going to post this, but this occurrence is so strange I wanted to check with other people if they have the same thing happen.

    I was coding for a C++ program that sums keyboard input numbers, returns the average, and returns the number of input elements that are greater than the average. While trying different inputs, I got a result I was not expecting. Here is my code (not from the program I was working on, but a separate file I created to test the abnormality):

    Code:
    // strange.cpp
    
    #include <iostream>
    
    int main()
    {
        using namespace std;
        
        cout << "Enter three numbers: ";
        double n1, n2, n3;
        cin >> n1 >> n2 >> n3;
        double total = 3.0;
        double average = (n1 + n2 + n3) / total;
        cout << "Average: " << average << endl;
        
        int aboveAverage = 0;
        
        if (n1 > average)
            aboveAverage++;
        cout << "After first comparison, aboveAverage = " << aboveAverage << endl;
        if (n2 > average)
            aboveAverage++;
        cout << "After second comparison, aboveAverage = " << aboveAverage << endl;
        if (n3 > average)
            aboveAverage++;
        cout << "After third comparison, aboveAverage = " << aboveAverage << endl;
        
        cout << "Above average: " << aboveAverage << endl;
            
        
        return 0;
    }
    I know this is not the most optimized code (i.e. I could use an array, a while loop, and a for loop) and that I am not following standard coding practices (variable names, etc); I just wanted to get the rough idea out.

    When I input the three numbers 13.34, 14.34, and 15.34, I get the following output:

    Code:
    Enter three numbers: 13.34
    14.34
    15.34
    Average: 14.34
    After first comparison, aboveAverage = 0
    After second comparison, aboveAverage = 1
    After third comparison, aboveAverage = 2
    Above average: 2
    Program ended with exit code: 0
    I get the correct average, but the output suggests that two of the three entries are above the average, which is not true in this case! The weird thing is that in the second comparison it seems that (14.34 > 14.34) evaluates to true!

    Changing the input numbers to {10.34, 11.34, 12.34}, {11.34, 12.34, 13.34}, {12.34, 13.34, 14.34}, or {14.34, 15.34, 16.34} gives the same result. And changing the order of the comparisons doesn't seem to change anything either.

    I have tried this both in Xcode and in Terminal.

    I have no clue why this happens, does anyone have any ideas?
     
  2. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #2
    IEEE-754 floating point is a binary approximation of decimal values. It's not exact. I personally wish >,<,>=,<= and == weren't defined for floats and doubles. For what you're wanting to do you actually want to either do fixed point math (I.e. Store as long with a bias of 2) or come up with some machine delta rigmarole to handle the approximation. It may be interesting to view the doubles in their binary form to see the very small difference.

    Floating point math is hard. We do it really poorly. Try to avoid it.

    -Lee
     
  3. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #3
    The short answer is to display the values with more decimal places. (Basic debugging principles: break it down, collect detailed evidence.)

    I suggest displaying 18 significant decimal digits, to ensure every significant digit (and then some) is displayed. So for the values given (around 14.0), there are 2 digits to the left of the decimal place, which means 16 to the right.

    And what lee1210 said is correct. However, rather than "Floating point math is hard" , I would say "Floating point math is not what you learned in grade school (or high school, or college, unless you took Computer Science or Numerical Representation courses)". That's not nearly as succinct, so it loses a lot of its memorability. Floating point is not hard, and actually makes perfect sense in binary: it's just not what you learned. The main hard part is unlearning what you "know". As Will Rogers once said, "It's not what we don't know that gives us trouble, it's what we know that ain't so."

    IEEE-754 floating-point is an approximation (as are all floating-point representations, even base-10 ones). For further reading:
    http://en.wikipedia.org/wiki/IEEE_floating_point
    What Every Computer Scientist Should Know About Floating-Point Arithmetic

    As with every Wikipedia article, it's not just the article, but what's under the "See also", "References", and "Further reading" headings. It's not that it's hard or complex, it's that if you've never seen it before, or don't know the fundamentals, there's a lot of foundation to be laid first.
     
  4. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #4
    I'll stand by it being hard, even if the interpretation is a bit different. If it were not hard I don't think I'd've seen it done wrong so many times. People really, really want to do decimal math. They really don't want to do binary math. They type (most folk) their literals in decimal. They accept their input and display their output in decimal. The in between where the machine approximates in binary, potentially accumulating errors, is not something a lot of folks want to concern themselves with. They ought to, but I don't run into it much.

    In any event, we've made it this far with folks figuring out when they can approximate, when they're really doing fixed point math, etc. It just takes most people a few serious bugs to actually learn it.

    -Lee
     
  5. LPZ, Jan 4, 2014
    Last edited: Jan 5, 2014

    LPZ macrumors 65816

    Joined:
    Jul 11, 2006
    #5
    You could avoid the divisions when comparing to the average, like this:

    Code:
    // strange.cpp
    
    #include <iostream>
    
    int main()
    {
        using namespace std;
    
        int n = 3;
        
        cout << "Enter three numbers: ";
        double n1, n2, n3;
        cin >> n1 >> n2 >> n3;
        
        double sum = n1 + n2 + n3;
        double average = sum / n;
        cout << "Average: " << average << endl;
        
        int aboveAverage = 0;
        
        if (n * n1 > sum)
            aboveAverage++;
        cout << "After first comparison, aboveAverage = " << aboveAverage << endl;
        if (n * n2 > sum)
            aboveAverage++;
        cout << "After second comparison, aboveAverage = " << aboveAverage << endl;
        if (n * n3 > sum)
            aboveAverage++;
        cout << "After third comparison, aboveAverage = " << aboveAverage << endl;
        
        cout << "Above average: " << aboveAverage << endl;
            
        
        return 0;
    }
    
    Or use long doubles in your original code:

    Code:
    // strange.cpp
    
    #include <iostream>
    
    int main()
    {
        using namespace std;
        
        cout << "Enter three numbers: ";
        long double n1, n2, n3;
        cin >> n1 >> n2 >> n3;
        int total = 3;
        long double average = (n1 + n2 + n3) / total;
        cout << "Average: " << average << endl;
        
        int aboveAverage = 0;
        
        if (n1 > average)
            aboveAverage++;
        cout << "After first comparison, aboveAverage = " << aboveAverage << endl;
        if (n2 > average)
            aboveAverage++;
        cout << "After second comparison, aboveAverage = " << aboveAverage << endl;
        if (n3 > average)
            aboveAverage++;
        cout << "After third comparison, aboveAverage = " << aboveAverage << endl;
        
        cout << "Above average: " << aboveAverage << endl;
            
        
        return 0;
    }
     
  6. gnasher729, Jan 5, 2014
    Last edited: Jan 5, 2014

    gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #6
    Every time you do a floating point operation, the result is some number very close to the exact result. Sometimes you get the exact result, sometimes that's impossible (for example, there is no floating point number exactly equal to the mathematical real number 14.34), and when you can't get the exact result, you get the number closest to it.

    So your first number is some number very close to 13.34.
    Your second number is some number very close to 14.34.
    Your third number is some number very close to 15.34.

    The sum that you calculate is some number very close to 43.02, and your second number times 3 is also some number very close to 43.02. So what is the result if you compare one number that is very close to 43.02 with another number that is very close to 43.02? You don't know. It could be less, the same, or greater.

    That doesn't help in a case like this. Using long double, instead of getting two numbers that are close to 43.02 you get two numbers that are even closer to 43.02. You still have the same problem: If you compare two numbers that are very, very close to 43.02 instead of just very close, then the can be equal, or the first one is larger, or the second one is larger. (Your result may be different, but that's just coincidence because sometimes you will get the result you expect by coincidence. Use different numbers, and you'll get different results).

    Well, if you add (2/3) + (2/3) + (2/3), you would expect the result to be 2, right?

    With two decimals: 0.67 + 0.67 + 0.67 = 2.01.
    With ten decimals: 0.6666666667 + 0.6666666667 + 0.6666666667 = 2.0000000001.

    Obviously more decimals aren't going to help! You get closer and closer to 2, but you'll never get 2. So the same problem that you get with binary numbers will also happen with decimal numbers.

    The first thing to learn is: Unless you can prove that getting the correct result is possible, you'll get the result that is closest to the correct one. So just assume that you always get almost, but not quite, the correct result and live with that.

    Later on you'll learn how to cope with harder problems, when the errors in your calculations can add up to a degree that you may not get results anywhere near the correct one.
     
  7. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #7
    I changed your code to this (changes are in red):
    Code:
    	[COLOR="Red"]std::cout.precision( 22 );
    	std::cout.setf( std::ios::fixed, std::ios::floatfield );
    [/COLOR]
       cout << "Enter three numbers: ";
       double n1, n2, n3;
       cin >> n1 >> n2 >> n3;
       double total = 3.0;
       double average = (n1 + n2 + n3) / total;
       cout << "Average: " << average << endl;
    
       int aboveAverage = 0;
    
    	[COLOR="Red"]cout << "n1: " << n1 << " n2: " << n2 << " n3: " << n3 << endl;[/COLOR]
    (Reference: http://www.cplusplus.com/reference/ostream/ostream/
    Code pretty much copy-pasted from precision description.)

    When I run it, with input values of 13.34, 14.34, 15.34, the output is this:
    Code:
    Average: 14.3399999999999980815346
    n1: 13.3399999999999998578915 n2: 14.3399999999999998578915 n3: 15.3399999999999998578915
    After first comparison, aboveAverage = 0
    After second comparison, aboveAverage = 1
    After third comparison, aboveAverage = 2
    
    I deliberately chose a precision of 22 because the next change was LPZ's suggestion of using long double instead of double.
    The output from that version was this:
    Code:
    Average: 14.3400000000000000001388
    n1: 13.3400000000000000001388 n2: 14.3400000000000000001388 n3: 15.3400000000000000001388
    After first comparison, aboveAverage = 0
    After second comparison, aboveAverage = 0
    After third comparison, aboveAverage = 1
    Above average: 1
    
    Notice how the extended precision of long double has two effects:
    1. There are more digits of precision, so the floating-point approximation is closer to the exact value (but is still inexact).
    2. The more precise value is now slightly above the exact value, rather than being slightly below it.

    The first effect is predictable; the second is not. That is, going to long double will always have more precise values (closer to the exact value), but the more precise value may be higher or lower than the exact value. In other words, the precision will be better, but it's still only an approximation, so you still have to understand and accommodate that fact.

    In this particular case, if the comparison used less-than rather than greater-than, the long double version would be "wrong".


    I probably came into programming with a different background, which helped me understand some things more readily.

    When I was a kid, long before I became interested in electronics or computers, I used to do simple woodworking projects with my father and grandfather. Thus, the idea that some things only come in certain sizes, with discrete steps between the size, was a concept I was familiar with. For example, if you wanted a hole that was 1/6 of an inch in diameter, you had to find the closest drill bit to 0.166666..., because there's no such thing as a 1/6" drill-bit.

    Also, I learned to take into account things like the preferred direction of the error term. For example, with the 1/6" drill-bit, you might want the smaller size bit if you were doweling a joint, to get a tight fit, but you'd want the larger size bit if you intend to put a bolt or axle through the hole. If you used the smaller bit size for drilling an axle hole, your toy cart won't roll.

    http://en.wikipedia.org/wiki/Drill_bit_sizes
     
  8. LPZ macrumors 65816

    Joined:
    Jul 11, 2006
    #8
    My suggestion to use long double was based on my analysis of the binary representation of 1/50 = 0.02. I didn't intend to suggest that switching to long double would work correctly for arbitrary x, x+1, x+2. I wanted the OP (who seems to be gone) to wonder why the switch worked. Sorry for not being clear.
     
  9. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #9
    I understand that, and my use of your suggestion of long double was just to show a worked-out example and the actual results. The fact that long double produced the expected answer was just an opportunity to point to a different pitfall of floating-point, which is that changes in representational precision can be misleading. That is, one can get the "right" answer for the wrong reason.

    Here's the results from using the float type instead of double:
    Code:
    Average: 14.3400001525878906250000
    n1: 13.3400001525878906250000 n2: 14.3400001525878906250000 n3: 15.3400001525878906250000
    After first comparison, aboveAverage = 0
    After second comparison, aboveAverage = 0
    After third comparison, aboveAverage = 1
    Above average: 1
    Here, we see:
    1. The actual values are less precise (compare with double values from earlier post).
    2. The less precise value is slightly above the exact value.

    In other words, if the OP had used float instead of double, the answer would have been "right" (i.e. matching the naive expectation). However, in the overall scheme of things, this would have only postponed the confrontation with the realities of floating-point approximations.
     
  10. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #10
    In this case, when you only have two decimals you can multiply the numbers with 100, do the addition, then divide by 100 before showing the result. This has some limitations though, for example you can not use numbers that are so large that they do not fit with the precision of the type you have used (52 bits for doubles) or you will have the same problem.
     
  11. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #11
    But what do you get if you take a number very close to 13.34 and multiply by 100? You get a number very close to 1334, but not necessarily exactly 1334. Actually, you _will_ get 1334. Try this code:

    Code:
        for (double x = 0; x < 10000; ++x)
        {
            double y = x / 100.0;
            double z = y * 100.0;
            if (z != x) printf ("%f\n", x);
        }
    and then you can try to explain the result.
     
  12. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #12
    Closer than 0.01? If so, then the assumption about two decimals is out the door.
     
  13. ArtOfWarfare macrumors 604

    ArtOfWarfare

    Joined:
    Nov 26, 2007
    #13
    I would think the compiler would see that y is only used once, inline it to this:

    Code:
    double z = x / 100.0 * 100.0;
    Which then gets simplified down to:

    Code:
    double z = x;
    Possibly at that point it throws out all the shown code and just has a Noop or something.
     
  14. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #14
    The idea here is to treat the number as a fixed point number, where in this case the precision is two decimals. Need more? Scale the number accordingly, but at the cost of lowering the upper bound by the same orders of magnitude. http://en.wikipedia.org/wiki/Fixed-point_arithmetic

    It's still not a representation with infinite precision, but at least the precision we pick is accurate.
     
  15. Catfish_Man macrumors 68030

    Catfish_Man

    Joined:
    Sep 13, 2001
    Location:
    Portland, OR
    #15
    It can't do that, because it wouldn't get the same result, due to precision and such.

    Code:
    ...
    	divsd	%xmm2, %xmm1
    	mulsd	%xmm2, %xmm1
    ...
    
     
  16. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #16
    That would mean the compiler is broken. In the FORTRAN language, the compiler is allowed to do optimisations that are mathematically equivalent. In the C language, the compiler is only allowed to do optimisations according to the "as if" rule, which means the result of the optimised code must be the same as the original code.

    There are very rare cases, like x * 2.0 * 3.141592653589, where the compiler can prove that replacing this with x * (2.0 * 3.141592653589) will always give the identical result (but x * 2.0 * 0.1 couldn't be optimised because x * 2.0 could overflow).
     
  17. ArtOfWarfare macrumors 604

    ArtOfWarfare

    Joined:
    Nov 26, 2007
    #17
    This would suggest that you should always use fully simplified equations in your programs, even if it means reducing the readability of it... I've always heard the compiler takes care of all the arithmetic it can... but I suppose I've never heard this from an authoratative source.
     
  18. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #18
    The C compiler is allowed to do constant subexpression optimization. The subexpression 2.0 * 0.1 is a constant, therefore the compiler can calculate that constant value, and replace the subexpression with that constant.

    This can all be easily verified by writing a test program. I suggest trying it to see what happens.
     
  19. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #19
    Yes if you write x * (2.0 * 0.1).
    No if you write (x * 2.0) * 0.1.
    No if you write x * 2.0 * 0.1.

    Unoptimised, x * 2.0 * 0.1 will give x * 0.2 except when x is very large, where x * 2.0 gives infinity, and infinity * 0.1 is still infinity. The compiler can only optimise if it can guarantee the same result in _all_ cases.

    If you have a variable "long x" and calculate x * 2.0 * 0.1, the compiler _could_ combine 2.0 * 0.1 because x converted to double is never so large that x * 2.0 will overflow, but I doubt any compiler will be that clever.

    A compiler can't even replace (x == x) with 1 if x is double, because x == x is supposed to give 0 if x is Not-a-number.
     
  20. iSee, Jan 7, 2014
    Last edited: Jan 7, 2014

    iSee macrumors 68040

    iSee

    Joined:
    Oct 25, 2004
    #20
    But the subexpression 2.0 * 0.1 doesn't appear in the example you quoted... the expression was x * 2.0 * 0.1, which, from an evaluation standpoint, is like ((x * 2.0) * 0.1) -- there aren't really any constant subexpressions to be optimized... right?

    The other example, x * 2.0 * 3.14... is an interesting optimization, but I wouldn't call it a constant subexpression optimization. Not sure what I would call it... something like an "equivalent constant expression optimization" which would change x * 2.0 * 3.14... to x * (2.0 * 3.14...) which could then have a constant subexpression optimization applied. Interesting.

    edit: oh, took too long to reply... it's redundant now
    I'll try to contribute this instead
    As a rule of thumb, when writing code prefer readability over a potential opimization. A potential optimization is just another word for a premature optimization which we know is the root of all programming evil. :) Later, in the (unlikely) case that profiling reveals that the evaluation of your readable expressions are a bottleneck you can modify them.
     
  21. Qaanol, Jan 7, 2014
    Last edited: Jan 8, 2014

    Qaanol macrumors 6502a

    Joined:
    Jun 21, 2010
    #21
    You can display the bits of a value with the following macro (you'll either need to #include <limits.h> or replace CHAR_BIT with 8):

    Code:
    #define printbits(x) do { \
      for(int __macro_printbits_ii = CHAR_BIT * sizeof(x); __macro_printbits_ii --> 0;) \
        ((x) >> __macro_printbits_ii) & 1 ? putchar('1') : putchar('0') \
      }while(0)
    Use it like this: printbits(average);

    Also note, if you pass in a pointer you will get the bitrep of the pointer, not whatever it points to. And finally, I just wrote this off the top of my head, so it might have typos.

    Of course, you can always output the hex rep of a floating-point value with printf("%a", aDouble); and of an int value with printf("%x", anInt);

    Edit: updated macro to work with any size simple data type, including 80-bit long doubles.
     
  22. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #22
    I was thinking this was wrong, but after writing a test program, I realized you're right. I also remembered the reason behind a macro-writing rule of thumb.

    First, the code:
    fconst1.c
    Code:
    #include <stdio.h>
    #include <float.h>
    
    extern float fconst2( void );
    
    // Smaller than __FLT_MAX__, but big enough so doubling will overflow.
    #define BIG_F		3.2e+38F
    
    int main( int argc, char ** argv )
    {
    	float a = BIG_F;
    
    	float b = a * 2.0F * 0.1F;
    	float c = fconst2() * 2.0F * 0.1F;
    
    	float d = a * 0.1F * 2.0F;
    
    	float e = a * ( 2.0F * 0.1F );
    	float f = fconst2() * ( 2.0F * 0.1F );
    
    	printf( "a: %g  b: %g  c: %g\n", a, b, c );
    	printf( "d: %g  e: %g  f: %g \n", d, e, f );
    }
    fconst2.c
    Code:
    #include <stdio.h>
    #include <float.h>
    
    // Smaller than __FLT_MAX__, but big enough so doubling will overflow.
    // Also, different from BIG_F in fconst1.c
    #define BIG_2		3.1e+38F
    
    float fconst2( void )
    {  return BIG_2;  }
    
    The reason for a separate compilation-unit for fconst2() is to definitively prevent some optimizations.

    To compile:
    Code:
    gcc -std=c99  fconst1.c  fconst2.c
    
    The output:
    Code:
    a: 3.2e+38  b: inf  c: inf
    d: 6.4e+37  e: 6.4e+37  f: 6.2e+37 
    
    The results are exactly as you had predicted. Thanks.

    What I remembered was the reason behind writing macros that enclose the expansion in parentheses. That reason is to force expression evaluation order. Specifically, to force the expression within the parens to be evaluated before being combined with whatever expression the macro-name appears in.

    For example, if 2.0 * 0.1 was the value of a macro, and it's NOT in parens, it evaluates under the evaluation rules of whatever operators are adjacent to it. When it IS in parens, i.e. (2.0 * 0.1), then the sub-expression is evaluated first, and the result is then inserted into the larger expression.
     
  23. Qaanol, Jan 7, 2014
    Last edited: Jan 7, 2014

    Qaanol macrumors 6502a

    Joined:
    Jun 21, 2010
    #23
    Yep, parens are important in macros, both around individual parameters (because someone might pass in arbitrary code) and the whole expansion (so it doesn't get combined with preceding/trailing code).

    Edit: the following refers to a previous version of my above macro, and is irrelevant now.

    Note that I put my whole for-loop in parens in the post before yours, because if I had omitted them and someone did something like this:

    #define showbits(x) (printbits(x), putchar('\n'))

    then the newline would (I think) be printed on every pass through the for-loop. But the way I wrote it with the parens, that showbits macro will work as expected, printing the bits then a newline.
     
  24. jtara macrumors 65816

    Joined:
    Mar 23, 2009
  25. ArtOfWarfare macrumors 604

    ArtOfWarfare

    Joined:
    Nov 26, 2007
    #25
    Why do you think a macro is necessary? Outside of things that can only be determined at compilation time, you shouldn't be using #define.

    (We have gotten way off topic.)
     

Share This Page