Trying to store a huge number while maintaining exact digits

Discussion in 'Mac Programming' started by jsmwoolf, Sep 20, 2011.

  1. jsmwoolf macrumors regular

    Joined:
    Aug 17, 2011
    #1
    In Project Euler, I've been trying to do two problems that appear to have numbers beyond the storing capacity of an int. The thing is that they require exact accuracy in order to correctly solve a problem. I've been thinking of using a string to store digit by digit in order to solve this. However, the part that I'm stumped at is how could I store/calculate such a number in the first place?

    I was thinking of using the modulo of each digit and store it as a char to convert to a string.

    I suppose you could do something like this in order to grab a digit:

    Code:
    digit = number % 10^x; //Java
    or
    Code:
    digit = number % pow(10,x); //C, C++, Objective-C
    Where x is the power or the where you would grab the digit, for example, in the hundred or thousand. However, it appears that when trying to solve a problem that involved grabbing digits from the thousands, I couldn't correctly formulate such.

    So the question is what's the best approach to this? I don't want a posted code that can actually do it or an external library, as I wish to figure this out without using a 3rd party library. Procedures and advice is what I'm looking for.
     
  2. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #2
    My question is what int are you extracting the digits from? If it fits in an int, you don't need a different representation. Traditionally this might be done with % 10, then / 10 to shift, rinse, repeat. Will a long int still fail to store the size needed?

    -Lee
     
  3. basher macrumors 6502

    basher

    Joined:
    May 27, 2011
    Location:
    Glendale, AZ USA
    #3
    In java you use Math.Pow(x, y) to raise to an exponent.

    ^ is used for bitwise exclusive OR operations.
     
  4. firestarter macrumors 603

    firestarter

    Joined:
    Dec 31, 2002
    Location:
    Green and pleasant land
    #4
    If you're using Java, why don't you use a BigInteger class to store this?

    The beauty of OO languages with operator overriding is that you don't have to rely on base types for storage - you can create a class that behaves similarly but has the additional properties you need (in this case, an integer of any length).
     
  5. jsmwoolf thread starter macrumors regular

    Joined:
    Aug 17, 2011
    #5
    The first number is from a power(2^1000) and the second from a factorial(100!). And yes, it'll still fail with a long int. While a double can store a number, you can't solve the problem because a double rounds after a certain number and the numbers that I need to work on are way above that limit.

    My bad. However, can Java even support bitwise?

    I'm using C or C++. Besides, I prefer the practice.
     
  6. chown33, Sep 20, 2011
    Last edited: Sep 20, 2011

    chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #6
    Google is your friend: java bitwise operators

    It has all the same bitwise operators as C: & | ^ ~ << >>.
    It also adds an unsigned shift operator: >>>


    I suggest reading up on arbitrary-precision arithmetic.
     
  7. jsmwoolf thread starter macrumors regular

    Joined:
    Aug 17, 2011
    #7
    Code:
    /*Constant Limit = 1000;           %Sufficient digits.
    Constant Base = 10;              %The base of the simulated arithmetic.
    Constant FactorialLimit = 365;   %Target number to solve, 365!
    Array digit[1:Limit] of integer; %The big number.
    Integer carry,d;                 %Assistants during multiplication.
    Integer last,i;                  %Indices to the big number's digits.
    Array text[1:Limit] of character;%Scratchpad for the output.
    Constant tdigit[0:9] of character = ["0","1","2","3","4","5","6","7","8","9"];
    BEGIN
    digit:=0;                       %Clear the whole array.
    digit[1]:=1;                    %The big number starts with 1,
    last:=1;                        %Its highest-order digit is number 1.
    for n:=1 to FactorialLimit do   %Step through producing 1!, 2!, 3!, 4!, etc. 
    carry:=0;                      %Start a multiply by n.
    for i:=1 to last do            %Step along every digit.
    d:=digit[i]*n + carry;        %The classic multiply.
    digit[i]:=d mod Base;         %The low-order digit of the result.
    carry:=d div Base;            %The carry to the next digit.
    next i;
    while carry > 0                %Store the carry in the big number.            
    if last >= Limit then croak('Overflow!'); %If possible!
    last:=last + 1;               %One more digit.
    digit[last]:=carry mod Base;  %Placed.
    carry:=carry div Base;        %The carry reduced.
    Wend                           %With n > Base, maybe > 1 digit extra.
    text:=" ";                     %Now prepare the output.
    for i:=1 to last do            %Translate from binary to text.
    text[Limit - i + 1]:=tdigit[digit[i]]; %Reversing the order.
    next i;                        %Arabic numerals put the low order last.
    Print text," = ",n,"!";        %Print the result!
    next n;                         %On to the next factorial up.
    END;*/
    
    #include <stdio.h>
    
    int main (int argc, const char * argv[])
    {
        int limit = 1000;
        int base = 10;
        int FactorialLimit = 365;
        int digit[limit];
        int carry,d;
        int last,i;
        char text[limit];
        char tdigit[10] = { "1", "2", "3", "4", "5", "6", "7", "8", "9" };
        
        //Now the procedure
        for(int x = 0; x < limit; x++)
        {
        digit[x]=0;
        }
        digit[1]=1;
        last=1;
        for(int n = 1; n <= FactorialLimit; n++)
        {
            carry = 0;
            for(i = 1; i <= last;i++)
            {
                d=digit[i]*n + carry;
                digit[i] = d % base;
                carry = d/base;
                while(carry > 0)
                {
                    if(last >= limit)
                    {
                        printf("Overflow!\n");
                        last++;
                        digit[last] = carry % base;
                    }
                }
                text[limit - i + 1] = tdigit[digit[i]];
            }
            printf("%d!\n",n);
        }
        return 0;
    }
    I copied the example on how to do this. However, it appears to show up to 3! and then pretty much stops displaying results. First off, did I correctly interoperate this?
     
  8. chown33, Sep 20, 2011
    Last edited: Sep 20, 2011

    chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #8
    What errors did you get? Post the actual error message.


    When I compile your code with this command:
    Code:
    gcc arith.c
    
    I get a series of errors like this:
    Code:
    arith.c: In function 'main':
    arith.c:45: error: excess elements in char array initializer
    arith.c:45: error: (near initialization for 'tdigit')
    
    So I can't even get this code to compile, much less run for 3!.


    Looking at line 45, we see this:
    Code:
        char tdigit[10] = { "1", "2", "3", "4", "5", "6", "7", "8", "9" };
    
    Your initializers are strings, not chars. In C, double-quotes means a string literal. If you want a character literal, you need single quotes.

    Also, that array doesn't correspond at all to the array in the algorithm, which is this:
    Code:
    Constant tdigit[0:9] of character = ["0","1","2","3","4","5","6","7","8","9"];
    
    There's an important difference, in what the first item in the array is.

    There's a more serious problem here:
    Code:
                    if(last >= limit)
                    {
                        printf("Overflow!\n");
                        last++;
                        digit[last] = carry % base;
                    }
    
    It seems you don't understand what the algorithm's croak() function does, nor the scope of the conditional.

    That's not the only place you've misplaced the scope of a block. You have errors in at least one of the for loops, too.


    I think you should revisit the basics of C. You can't expect to type in algorithms using the syntax of two different languages. You have to understand what the algorithm is telling you, in the language it's expressed in, then translate that into the correct C expressions in the C language.

    Frankly, I see no reason why you can't do this exercise using Java. There's a minor difference in how arrays are allocated or initialized, but Java has the same looping constructs, the same integer types, and the same arithmetic operators as C does. You might get more informative error messages, too.

    Finally, I think you should practice by writing simpler programs, and then using the debugger on them. Learning to use the debugger is a crucial skill. It's a lot easier to do that on simple programs than on complex ones.
     
  9. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #9
    Have you given up on the pow function? Anyway, 2 ^ 1000 would require a 1000 bit int. An int on OS X is 2 ^ 32.
     
  10. foidulus macrumors 6502a

    Joined:
    Jan 15, 2007
    #10
    No, sizeof(int) tends not to be OS-specific, it is specific to the architecture you are running. On OS X ints are 32 bits(plus even for 32 bit ints, unless you are using unsigned the max value is (2^31-1). But on x86-64 the 64 bits(2^63-1)
     
  11. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #11
    Sure but what 16 bit hardware runs OS X? All ints are 32 bits even signed, it's just that half of it's capacity is used for negative numbers.

    ints are 32 bits even on 64 bit hardware, at least on OS X if I'm not mistaken.
     
  12. ghellquist macrumors regular

    Joined:
    Aug 21, 2011
    Location:
    Stockholm Sweden
    #12
    Don´t assume, make sure

    Never assume that int is any specific size. The definition in C is very loose, saying something like "the most natural size on the machine". It could be 16, 18, 24, 32, 36, 48 or 64 bits. It might differ between different compilers on the same hardware. There are even hardware where you can run it in different modes and hence the size becomes different.
    // gunnar
     
  13. jiminaus, Sep 21, 2011
    Last edited: Sep 21, 2011

    jiminaus macrumors 65816

    jiminaus

    Joined:
    Dec 16, 2010
    Location:
    Sydney
    #13
    The size of an int is a function of operating system, hardware, and compiler.

    Consider x86_64 when compiling with gcc. Under Mac OS X the LP64 model is used, so ints are 32bits. On the same hardware, under Windows the ILP64 model is used, so ints are 64bit.

    Edit: Sorry the paragraph is wrong in that Windows is LLP64, so an int still remains 32bit. But if there was an ILP64 OS on x86_64, then the paragraph above would still be true.
     
  14. subsonix macrumors 68040

    Joined:
    Feb 2, 2008
    #14
    Very good point, if the size is important I often use stdint.h which have types like: int32_t, int16_t and so on.

    ----------

    The reason is most likely that a change for a native int would lead to all sorts of breakage in old code, due to the assumption mentioned in the post above yours.
     
  15. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #15
    Advice: When you can't solve a problem, don't ask how to solve a subtask that you think is needed to solve the problem; ask how to solve the problem itself. Now with Project Euler the whole point is of course that _you_ solve the problem, but you can probably get some hints here. At least for the first 200 or so problems. And most of the time there are some obvious but very difficult subtasks that you just avoid using a bit of mathematics. For example, what are the last 100 digits of 500! (500 factorial), without using a calculator?
     
  16. tachoknight, Sep 21, 2011
    Last edited: Sep 21, 2011

    tachoknight macrumors newbie

    Joined:
    Sep 21, 2011
    #16
    Gmp

    Hi-

    Wanted to mention that I've done this several times in C and C++ programs using GNU MP library. Internally the numbers are stored as arrays of strings. Some of my programs have numbers that are a thousand digits or larger and never had a problem.

    That said, working with those kinds of numbers leads to other issues, mostly in the calculation phase; using GCD was a huge win for me in cutting down programs that took 12+ hours (single threaded) to around 45-50 minutes (spread out over a 16 procs on a Mac Pro).

    Hope this helps anyone who comes across this thread; this library is one more hidden gems of the GNU software library.
     
  17. gasman23 macrumors newbie

    Joined:
    Sep 21, 2011
    #17
    I also came across this problem with some of the questions in project Euler. I found a couple of big integer libraries online that looked like they would work, but I thought that was almost cheating and kind of defeated the purpose of project Euler.

    The solution I came up with was to place the numbers into a basic c-array of integers. For example, the number 165 would be represented as x[5,6,1]. 5 goes in the 0th position, 6 in the 1st, and 1 in the 2nd. Any integer can be represented by breaking it down into its n * 10^x, with x being it's position in the array. I then did arithmetic (+, -, *, /) on each element between the two arrays carrying digits to the next element as needed. I am sure there are much more elegant (and memory friendly!) solutions to this problem, but using this methodology I was able to calculate very large integers (e.g. 300!) in a matter of a few seconds.

    This solution worked well enough that I made my own Obj-C BigInteger class out of it (and rudimentary iPhone big integer calculator), passing it NSStrings representing the numbers, then converting them to c-arrays to do the calculations, then returning a NSString as the result.
     
  18. Sydde macrumors 68020

    Sydde

    Joined:
    Aug 17, 2009
    #18
    Binary math is just as extensible as decimal math, the numbers are more compact (than string arrays) and the math is a bit faster. In another distant century, I wrote a simple 32bit integer calculator on a true 8 bit CPU (6502) using extended binary math that I wrote myself. The same processes you use to do arithmetic on paper (if they still teach that) work for bits and bytes, in the same way.

    The difficult and time consuming part of handling really large numbers is converting them to a string, but for integers, the process is actually quite straightforward. Once you have your abitrarliyHugeInteger division working, just use it to take the modulus of 10 repeatedly to build the decimal representation from right to left. If you use NSMutableString, building the decimal output can be as easy as inserting each decimal digit at index 0 as you go. Otherwise, just estimate the string size by dividing the bit count (bit position of the highest order one) by 3 and adding 1 (plus the zero terminator) — then, when you get to the last digit character, pass that location as the pointer to your output string.
     
  19. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #19
    Also see this algorithm:
    http://en.wikipedia.org/wiki/Double_dabble

    And in 6502 assembler (or other CPUs), it's even easier, due to the DAA instruction. You just implement a "BCD shift register", such as here:
    http://www.doulos.com/knowhow/vhdl_designers_guide/models/binary_bcd/

    I remember writing BCD-shift-register code in everything I got my hands on: 8080, Z80, 6800, 6502, 8051, 68k, etc. The 68k was interesting, because it has actual BCD-arithmetic opcodes.
     
  20. jsmwoolf thread starter macrumors regular

    Joined:
    Aug 17, 2011
    #20
    Thanks for the correction. I usually don't work with chars and I was working in Java for a while so I forgot.

    I got stumped on this part when trying to rewrite it.
    Code:
    carry:=d div Base;            %The carry to the next digit.
    next i;
    while carry > 0                %Store the carry in the big number.            
    if last >= Limit then croak('Overflow!'); %If possible!
    last:=last + 1;               %One more digit.
    digit[last]:=carry mod Base;  %Placed.
    carry:=carry div Base;        %The carry reduced.
    Wend                           %With n > Base, maybe > 1 digit extra.
    text:=" ";                     %Now prepare the output.
    for i:=1 to last do            %Translate from binary to text.
    text[Limit - i + 1]:=tdigit[digit[i]]; %Reversing the order.
    next i;
    I didn't know where to put the while statement and I didn't understand why there's a second for(i = 1; i <= last; i++). I also didn't understand the Wend part as well. I thought crock was equaled to printf because it contained a string in it. So, I was quite confused when writing the rest of it.
     
  21. chown33, Sep 21, 2011
    Last edited: Sep 21, 2011

    chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #21
    Odd, because Java has exactly the same syntax as C: double-quotes is a string literal, single-quotes is a char literal (always UTF-16 Unicode in Java).


    Here's the original algorithm's code, verbatim from the Wikipedia article:
    Code:
    Constant Limit = 1000;           %Sufficient digits.
    Constant Base = 10;              %The base of the simulated arithmetic.
    Constant FactorialLimit = 365;   %Target number to solve, 365!
    Array digit[1:Limit] of integer; %The big number.
    Integer carry,d;                 %Assistants during multiplication.
    Integer last,i;                  %Indices to the big number's digits.
    Array text[1:Limit] of character;%Scratchpad for the output.
    Constant tdigit[0:9] of character = ["0","1","2","3","4","5","6","7","8","9"];
    BEGIN
     digit:=0;                       %Clear the whole array.
     digit[1]:=1;                    %The big number starts with 1,
     last:=1;                        %Its highest-order digit is number 1.
     for n:=1 to FactorialLimit do   %Step through producing 1!, 2!, 3!, 4!, etc. 
      carry:=0;                      %Start a multiply by n.
      for i:=1 to last do            %Step along every digit.
       d:=digit[i]*n + carry;        %The classic multiply.
       digit[i]:=d mod Base;         %The low-order digit of the result.
       carry:=d div Base;            %The carry to the next digit.
      next i;
      while carry > 0                %Store the carry in the big number.            
       if last >= Limit then croak('Overflow!'); %If possible!
       last:=last + 1;               %One more digit.
       digit[last]:=carry mod Base;  %Placed.
       carry:=carry div Base;        %The carry reduced.
      Wend                           %With n > Base, maybe > 1 digit extra.
      text:=" ";                     %Now prepare the output.
      for i:=1 to last do            %Translate from binary to text.
       text[Limit - i + 1]:=tdigit[digit[i]]; %Reversing the order.
      next i;                        %Arabic numerals put the low order last.
      Print text," = ",n,"!";        %Print the result!
     next n;                         %On to the next factorial up.
    END;
    
    I've omitted the bold hilite on keywords like for, while, wend, because it doesn't copy/paste over. However, it doesn't change the problem.

    Step 1 to solving it is to find the start and end of each block. A "block" means the body of a loop or the body of an if conditional. I suggest simply adding blank lines:
    Code:
    Constant Limit = 1000;           %Sufficient digits.
    Constant Base = 10;              %The base of the simulated arithmetic.
    Constant FactorialLimit = 365;   %Target number to solve, 365!
    Array digit[1:Limit] of integer; %The big number.
    Integer carry,d;                 %Assistants during multiplication.
    Integer last,i;                  %Indices to the big number's digits.
    Array text[1:Limit] of character;%Scratchpad for the output.
    Constant tdigit[0:9] of character = ["0","1","2","3","4","5","6","7","8","9"];
    BEGIN
     digit:=0;                       %Clear the whole array.
     digit[1]:=1;                    %The big number starts with 1,
     last:=1;                        %Its highest-order digit is number 1.
    
     for n:=1 to FactorialLimit do   %Step through producing 1!, 2!, 3!, 4!, etc. 
      carry:=0;                      %Start a multiply by n.
    
      for i:=1 to last do            %Step along every digit.
       d:=digit[i]*n + carry;        %The classic multiply.
       digit[i]:=d mod Base;         %The low-order digit of the result.
       carry:=d div Base;            %The carry to the next digit.
      next i;
    
      while carry > 0                %Store the carry in the big number.  
              
       if last >= Limit then croak('Overflow!'); %If possible!
    
       last:=last + 1;               %One more digit.
       digit[last]:=carry mod Base;  %Placed.
       carry:=carry div Base;        %The carry reduced.
      Wend                           %With n > Base, maybe > 1 digit extra.
    
    [COLOR="Blue"]  text:=" ";                     %Now prepare the output.
    
      for i:=1 to last do            %Translate from binary to text.
       text[Limit - i + 1]:=tdigit[digit[i]]; %Reversing the order.
      next i;                        %Arabic numerals put the low order last.
    
      Print text," = ",n,"!";        %Print the result!
    [/COLOR] next n;                         %On to the next factorial up.
    
    END;
    
    Note that next marks the end of a for loop, equivalent to the closing } of a for loop in C.

    If you want to take the next step, surround each subsection with { }, so it's very clear where the ranges of the loops and conditionals are.

    I will also tell you that the C code for printing the output isn't going to look anything at all like the algorithm's code for output, which I've hilited in blue. Yes, you can do the "digits in reverse" thing, but the code itself will look very different. In Java, you'd probably just use a StringBuffer and its reverse() method, but in C you get to make it yourself, or try finding a library function.


    As far as understanding what croak() means, you have to know it's a colloquialism for "die" or "death", and strongly suggests termination. If something terminates, the usual implication is that it's no longer doing anything, or is certainly no longer doing what it did before. So croak() implies terminating the process with a failure condition. See the C function abort(), which doesn't have a message parameter, so try writing croak(msg) yourself using, say, fprintf() and abort().

    The abort() function is defined by a prototype in <stdlib.h>. You might want to look at that header for other useful functions, just as I would suggest looking in stdio.h for standard I/O functions. It's basic C, and it's important to know how basic C works, and what functions are available. For one thing, it helps avoid reinventing the wheel.
     
  22. jsmwoolf thread starter macrumors regular

    Joined:
    Aug 17, 2011
    #22
    I was working in strings, not chars(although while writing my calculator app, I used char to display capital letters with numbers in a for loop).

    I guess the Wend is the end of the while loop.

    I did remember the abort() command from reading a book, but I never used it because I don't write code that's really complicated to the point that I would need it. Now, I would use the exit command.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main (int argc, const char * argv[])
    {
        int limit = 1000;
        int base = 10;
        int FactorialLimit = 365;
        int digit[limit];
        int carry,d;
        int last,i;
        char text[limit];
        char tdigit[10] = { '1', '2', '3', '4', '5', '6', '7', '8', '9' };
        
        //Now the procedure
        for(int x = 0; x < limit; x++)
        {
        digit[x]=0;
        }
        digit[1]=1;
        last=1;
        for(int n = 1; n <= FactorialLimit; n++)
        {
            carry = 0;
            for(i = 1; i <= last;i++)
            {
                d=digit[i]*n + carry;
                digit[i] = d % base;
                carry = d/base;
            }
            while(carry > 0)
            {
                if(last >= limit)
                {
                    abort();
                    last++;
                    digit[last] = carry % base;
                }
            }
            for(i = 1; i <= last; i++)
            {
                            text[limit - i + 1] = tdigit[digit[i]];
            }
            printf("%d!\n",n);
        }
        return 0;
    }
    Now would this be more exact from the pseudo code?
     
  23. chown33, Sep 21, 2011
    Last edited: Sep 21, 2011

    chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #23
    Does it compile?

    Does it run?

    Does it produce the correct results?

    If it doesn't do any of those things, then you have a problem. So look at what I wrote before, and see if you can find the problem(s). You may have to use a debugger.

    The first thing that strikes me is you still don't have tdigit's initializers right. The first element of the array should be what?
    Code:
    Constant tdigit[0:9] of character = [[COLOR="Red"]"0"[/COLOR],"1","2","3","4","5","6","7","8","9"];
    
    That's not the only problem. Several other problems I wrote about before are still present. For example:
    Code:
            while(carry > 0)
            {
                if(last >= limit)
                {
                    abort();
                    last++;
                    digit[last] = carry % base;
                }
            }
    
    Under what conditions will this loop end?

    If you replace abort() with exit(1), as you suggested, does it make sense to have any code after an exit()? Does abort() return? Does exit() return?

    It's written like you're trying to translate the code without having to understand what it's doing. That's backwards. You need to study it, and understand what it's doing first, or you won't be able to translate it correctly.
     
  24. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #24
    Try a simple case by hand.
    Start with the number 1, and multiply it by 123456, which should give a result of 123456.

    The first part of the loop should write the digit '6' and be left with a carry of 12345.

    Your code should then write the digit '5' and change the carry to 1234, write the digit '4' and change the carry to 123, write the digit '3' and change the carry to 12, write the digit '2' and change the carry to 1, and finally write the digit '1'. Is that what it does?
     
  25. firestarter, Sep 22, 2011
    Last edited: Sep 22, 2011

    firestarter macrumors 603

    firestarter

    Joined:
    Dec 31, 2002
    Location:
    Green and pleasant land
    #25
    jsmwoolf - as a newbie coder you might want to try improving your coding style a bit.

    While everything you've written is clearly formatted and not bad, you don't have any comments, and you're keen on single character loop variable names. ('now the procedure' is a wasted comment which says nothing about what you're doing or how you're doing it).

    You would be doing yourself a big favour if you swapped out all those single characters for meaningful names. Adding single line comments like 'reset the array to 000...0001', 'percolate carry up through the result, until zero' etc. not only make code more readable, they help you spot bugs.

    This may be a small piece of code that is easy to take in in one go - but as you write more complex stuff, and find yourself coming back to stuff you wrote a while back, you'll appreciate how much easier it is to read better written code.

    The secret of good coding; the reason for using procedures, existing libraries and objects is to try and hide complexity from our puny brains in order to remove distraction and allow us to be more effective. Bear that in mind.
     

Share This Page