PDA

View Full Version : A self-programming program?




bguy
Jul 21, 2009, 06:51 PM
Is it possible to create a program that will update itself?

For an example of a use for such a program, here is a c++ function that will return Fibonacci number n:
unsigned long Fibonacci(unsigned long x)
{
unsigned long y;
switch (x)
{
case 0:
y = 0;
break;
case 1:
y = 1;
break;
case 2:
y = 1;
break;
default:
y = Fibonacci(x-1) + Fibonacci(x-2);
break;
}
return y;
}

It ends up being very slow as it is recursive, so I was wondering if it is possible for the program to check for a case x, and if it isn't there, add it, so the computer can speed up. So if you did this function with 15, it would add:

case 15:
y = 610;
break;

if it isn't already there.

If it is possible, how would it be done?



gnasher729
Jul 21, 2009, 07:19 PM
The function is not slow because it uses recursion, it is slow because it uses recursion in a very inefficient way.

Write a function to calculate f (n, a, b) = a * fib (n) + b * fib (n-1):

If n ≥ 2 then fib (n) = fib (n-1) + fib (n-2), so

a * fib (n) + b * fib (n-1) =
a * fib (n-1) + a * fib (n-2) + b * fib (n-1) =
(a + b) * fib (n-1) + a * fib (n-2) =
f (n-1, a+b, a).

unsigned long f (unsigned long n, unsigned long a, unsigned long b)
{
if (n == 0)
return 0;
else if (n == 1)
return a;
else
return f (n-1, a+b, a);
}

and then

unsigned long fib (unsigned long n)
{
return f (n, 1, 0);
}


Change the types except for n to long double, and you can easily calculate fib (5000).

kpua
Jul 21, 2009, 07:20 PM
The concept of self-modifying code exists, but it is an extremely hairy subject, at best.

The general idea of your optimization has merit though. There's no need to resort to code modification to implement it. There's a common technique for doing this called 'memoization', which is a way to cheaply recall the result of a function for a particular set of input values.

Google 'memoization' (together with 'fibonacci' if you want a concrete example) for more information.

@gnasher

You're right that his method of recursion was inefficient, but so is using plain recursion at all calling fib() multiple times. Your recursive implementation is O(n), where an implementation using memoization is amortized O(1) (assuming fib() is called approximately n times).

sushi
Jul 21, 2009, 07:27 PM
The concept of self-modifying code exists, but it is an extremely hairy subject, at best.
Agree.

When a program can modify itself, it can actually create it's own path which may not be the one that you want it to take. This is especially true if you allow the program to modify your algorithm during runtime.

Flynnstone
Jul 21, 2009, 11:58 PM
I've done this for speed up. Might be a bit large in this case, but fast.
Pre-compute the series and store in array.
ie unsigned long Fib[] = {0,1,1,2 etc}
Wise to bounds check.
So calculating Fib is just a lookup.

Cinder6
Jul 22, 2009, 12:33 AM
I've done this for speed up. Might be a bit large in this case, but fast.
Pre-compute the series and store in array.
ie unsigned long Fib[] = {0,1,1,2 etc}
Wise to bounds check.
So calculating Fib is just a lookup.

But you still have to go through the slow method to get those values.

Try this:

Make a large, empty array of unsigned longs. Set all the values to -1. Then set the three initial values of the sequence (0, 1, 1 for elements 0-2). After that, get rid of the switch statement. If the value that you're looking at in the array is != -1, then return that. Otherwise, calculate it and save it to the array, then return it.

This is considerably faster than the traditional recursion example most people get taught, and the program "learns" the values of the sequence, thus removing the requirement to recalculate them.

hydan
Jul 22, 2009, 06:29 AM
This is what your looking for using memoization


#!/usr/bin/perl

my %table;
$table{0} = 1;

sub fibonacci {
my $number = shift;
my $fibonacci;
if ($number < 2)
{ return $number; }
else {
if ( exists $table{$number} ) {
return $table{$number};
}
else {
$fibonacci = fibonacci($number - 1) + fibonacci($number - 2);
$table{$number} = $fibonacci;
}
}
return $fibonacci;
}


$var = fibonacci(15);
print $var;

Flynnstone
Jul 22, 2009, 06:52 AM
But you still have to go through the slow method to get those values.

memoization is good too.

Yes, but pre compute once.
What is the largest "n" to have the fib number fit in 32 bits?, 64 bits?

yeroen
Jul 22, 2009, 07:59 AM
Or simply avoid recursion altogether and use the fact that the Fibonacci difference equation F_n = F_n-1 + F_n-2 has a closed form solution for the nth term:

F_n = 1/sqrt(5) * [((( 1 + sqrt(5))/2)^n - (((1 - sqrt(5))/2)^n]

sqrt(5) ( == square root of 5), (1 + sqrt(5)) / 2, (1 - sqrt(5)) / 2 are all constants known beforehand, so all you have to compute are two exponentiations by n, 1 subtraction, and 1 multiplication. .

kpua
Jul 22, 2009, 09:55 AM
F_n = 1/sqrt(5) * [((( 1 + sqrt(5))/2)^n - (((1 - sqrt(5))/2)^n]


I was wondering when someone was going to bring that up. :-)

savar
Jul 22, 2009, 11:42 AM
Or simply avoid recursion altogether and use the fact that the Fibonacci difference equation F_n = F_n-1 + F_n-2 has a closed form solution for the nth term:

F_n = 1/sqrt(5) * [((( 1 + sqrt(5))/2)^n - (((1 - sqrt(5))/2)^n]

sqrt(5) ( == square root of 5), (1 + sqrt(5)) / 2, (1 - sqrt(5)) / 2 are all constants known beforehand, so all you have to compute are two exponentiations by n, 1 subtraction, and 1 multiplication. .

Beautiful.

Back to the original question: self-modifying code has been around for a long time, but on modern systems it's extremely rare.

One reason has to do with security. If a program can modify code in its memory-space, then it has greater potential to be exploited if there are any security vulnerabilities. So the modern OS's now make sure that code pages are read only, and data pages are marked as non-executable.

But even aside from the security (which can probably be worked around) -- self-modifying code is just incredibly difficult to maintain. Sure, you might right some crazy awesome self-modifying algorithm now, but 6 months from now when you sit down to tweak it, you'll have forgotten how it works. And heaven forbid you give the code to somebody else for them to try to understand.

Flynnstone
Jul 22, 2009, 12:12 PM
Or simply avoid recursion altogether and use the fact that the Fibonacci difference equation F_n = F_n-1 + F_n-2 has a closed form solution for the nth term:

F_n = 1/sqrt(5) * [((( 1 + sqrt(5))/2)^n - (((1 - sqrt(5))/2)^n]

sqrt(5) ( == square root of 5), (1 + sqrt(5)) / 2, (1 - sqrt(5)) / 2 are all constants known beforehand, so all you have to compute are two exponentiations by n, 1 subtraction, and 1 multiplication. .

And there folks is a great reason why you don't drop into using assembler to speed things up. Find a better algorithm first.

jared_kipe
Jul 22, 2009, 12:39 PM
Or simply avoid recursion altogether and use the fact that the Fibonacci difference equation F_n = F_n-1 + F_n-2 has a closed form solution for the nth term:

F_n = 1/sqrt(5) * [((( 1 + sqrt(5))/2)^n - (((1 - sqrt(5))/2)^n]

sqrt(5) ( == square root of 5), (1 + sqrt(5)) / 2, (1 - sqrt(5)) / 2 are all constants known beforehand, so all you have to compute are two exponentiations by n, 1 subtraction, and 1 multiplication. .

Several things spring to mind.
1, your equation has a couple extra ('s in it.
2, This is less accurate at lower n values.
for example, F_4 = 3.065 .... F_6 = 12.985 ... F_10 = 55.004

Your assumption that this would be faster is possibly not true.
IF someone wrote the poster's iteratively then worst case run time would be just the time to make n additions.

IF someone wrote yours there is only one function, but you raise a float ~1.618 to the n, so it would be bound by how fast your pow() function is.
defined in math.h
double pow( double base, double exp );

If you didn't have this function, Naively you would write a function like this.

double myPow( double base, int exp ) {
register int i; // you don't need the register part, but it 'might' make it faster
register double result = 1.0;

for ( i=0; i < exp; i++ ) {
result = result * base;
}

return (result);
}


This would be like N multiplications, and N assignments. , but you did it twice, once for (1+sqrt(5))/2 and (1-sqrt(5))/2

SO assuming that a computer can do N multiplications as fast as N additions (probably not true), your program would be twice as "slow" as the iterative addition program. ( I think, I've never had a formal programming or algorithm class, but I have watched some MIT OCW on the subject)

jared_kipe
Jul 22, 2009, 01:09 PM
iterative version of an addition based Fibonacci program

int fibonacci( int n ) {
register int i;
register int result1 = 0;
register int result2 = 1;
register int temp;

for ( i=0; i < n ; i++ ) {
temp = result2;
result2 = result1 + result2;
result1 = temp;
}

return (result2);
}


N additions, and 3N assignments

jared_kipe
Jul 22, 2009, 04:03 PM
I just wrote a program that illustrates it.

Source included. Here is a screen grab.


Basically you can specify the Maximum N for Fib_n, but remember that they are all ints in here so 200 is about as high as you could want to go.

And the number of random times it tries. Since the numbers are random, you have to do a lot to get it to be "fair" between the two. (we'll call this X)

It then makes two objects, one calculates X Fib_n numbers with my additive algorithm. And once with my naive Pow function and those darn floats specified by the above sqrt(5) functions.



You'll see that the Pow function takes a little over twice as long in my simulations... with 50million random Fib numbers.

EDIT: I should have made them unsigned, but oh well
EDIT2: the ratio is pretty close to 2.3x I have run it at 500mil and gotten 97s and 232s

lee1210
Jul 22, 2009, 04:49 PM
I think bguy got something quite different than he was expecting. The real answer to his question is that self-modifying code is not the right answer here. Memorization is a fine approach that approximates what he was wanting to achieve. Ultimately, though... generating Fibonacci numbers like this is being done as an exercise.

If you just wanted to be able to display any particular element of the series, or list all of them up to a certain point, you could just store all of them (you can easily find lists online) in an array literal in your program (either const char *s or long ints, you can certainly display more with the prior) and get them in constant time with no more computation than that of your array offset (base + sizeof(element)*offset).

-Lee

jared_kipe
Jul 22, 2009, 05:06 PM
I have rewritten it to include a new class named FibPowC, guess what? it replaces my naive pow function with the math.h pow function.

It is much faster than my bad power function. In fact it is even faster than my add function.

Goes to show, your algorithm is only as good as its weakest link.

Flynnstone
Jul 22, 2009, 05:32 PM
I've heard (and believe) that programmers aren't very good at predicting host spots in code. So this is good.
We need to measure. Like what Jared is doing.
What to add a index lookup very to your code?

jared_kipe
Jul 22, 2009, 11:27 PM
I was WAY off on how high F_n you could go before overflowing unsigned ints. And the C pow function.
Using this simple C, I printf the numbers. To verify accuracy, and where they over flow. FibAdd and FibPow overflow at 45 or 46, FibPowC goes one further and then gives infinity.


#include <stdio.h>
#include <math.h>

unsigned int FibAdd ( int n ) {
int i;
unsigned int result1 = 0;
unsigned int result2 = 1;
unsigned int temp;

for ( i=0; i < n ; i++ ){
temp = result2;
result2 = result2 + result1;
result1 = temp;
}
printf("\n %u", result2);
return(result2);
}
unsigned int myPow (double base, int exp){
int i;
double result = 1.0;
for ( i=0; i <exp; i++ ) {
result = result * base;
}

return((unsigned int)result);
}
unsigned int FibPow( int n ) {
unsigned int result;
result = (unsigned int)round(0.4472135955 * ( myPow( 1.61803398875, n ) - myPow( -0.61803398875, n ) ));
printf("\n %u", result);
return(result);
}

unsigned int FibPowC( int n ) {
unsigned int result;
result = (unsigned int)round(0.4472135955 * ( pow( 1.61803398875, n ) - pow( -0.61803398875, n ) ));
printf("\n %u", result);
return(result);
}

int main (int argc, const char * argv[]) {
register int i;

//
//Just change the function in here between FibAdd, FibPow, and FibPowC
//FibAdd( i ); or FibPow( i+1 ); or FibPowC( i+1 );
for ( i=0; i<46; i++ ) {
FibPow( i+1 );
}
printf("\n");
return 0;
}


Here are some plots of the data points I've taken. Each n is the degree of Fibonacci(n), and the plots are in seconds. 500 million random F_n's are calculated using each method.

The order is FibAdd, FibPow, FibPowC. It looks like FibPowC will certainly be a better function in the long haul, but has much more function overhead. Over the range of that is relevant using unsigned ints (N<46) FibAdd wins hands down.

FibAdd = 0.83n + 12.31
FibPow = 2.02n + 37
FibPowC = 0.37n + 47.78

EDIT: I'd just like to take a moment and say that my hypothesis of Nadditions being about twice as fast as 2Nmutiplications seems correct, that was I guess was the point of all my nonsense today.

jared_kipe
Jul 23, 2009, 12:36 AM
Added table lookup class. Basically the table lookup class populates an array of unsigned int's with the sequential values from an instance of FibAdd (before the timer is started for the record). Then the method fib_n: (int) n, just returns the value from the n'th array.

Unsurprisingly it is by far the fastest. But doesn't make sense for anything larger than F_45, as you couldn't hold it in an unsigned int. Thus you'd have to make arrays of larger memory foot print things.

gnasher729
Jul 23, 2009, 02:46 AM
You can save about half the time by changing the calculation to

result = (unsigned int)round(0.4472135955 * pow( 1.61803398875, n));


Here is why: The second power 0.61803398875 ^ n is never greater than 1; it is largest when n = 0 where it equals 1. This is multiplied by 0.4472135955, so by omitting it you change the result by at most 0.447.

But we _know_ that the correct result is an integer, and you round the floating point number to the nearest integer anyway. If you change to floating point result by 0.447, you will still round to the same integer result. For example, when n = 5, you end up calculating round (4.9596747752) instead of say round (4.99999999999999999). The result is still the same, 5 in each case.

jared_kipe
Jul 23, 2009, 10:46 AM
That is certainly true, I was just taking the algorithm proposed verbatim.

I'm inclined to prefer "perfect" algorithms, or algorithms that calculate it exactly, over those that are approximations at high n and so forth. They are also usually easier to adjust for higher n values.

Lets say we wanted to write a new fixed precision number to account for much higher N values. We would have to write a new Pow function, and a new round function, in addition to the primatives +*. FibAdd would only need some kind of addition primitive, and if the numbers were objects would be trivial to rewrite with message sending. (for instance NSDecimalNumber has addition but no exponential, pow or sqrt functions -- so you would have to add them with a category)

I changed the algorithm in this test program to a single pow call for FibPow and FibPowC.

lee1210
Jul 23, 2009, 10:59 AM
Added table lookup class. Basically the table lookup class populates an array of unsigned int's with the sequential values from an instance of FibAdd (before the timer is started for the record). Then the method fib_n: (int) n, just returns the value from the n'th array.

Unsurprisingly it is by far the fastest. But doesn't make sense for anything larger than F_45, as you couldn't hold it in an unsigned int. Thus you'd have to make arrays of larger memory foot print things.

It would take more memory (a lot more) but if we're going for speed and the ability to go very high, you can store results in char *s instead of unsigned long int (switching to that from the current int would buy you something without a huge increase in memory). If you're propagating programatically for testing, you'd need fixed widths you can snprintf into, so you'd pick a maximum number of digits and use that for the widths. If you were going to just have a table of char * literals you generated at compile time, this wouldn't be an issue.

-Lee

jared_kipe
Jul 23, 2009, 11:39 AM
On a side note, a while ago I wrote a program that takes an NSString and does the math inside it. I never wanted it to be so accurate, but check out the highest it can go without hitting infinity.

Screen grab of that program, and of Wolfram Alpha doing the same calculation.

jared_kipe
Jul 23, 2009, 12:39 PM
using unsigned long long int in FibAdd( int n ) function in my simple c program showed me that it overflows at 12200160415121876738 which is either F_93 or F_94 depending on which table you use.

I cannot think of a way to extend this accuracy to FibPow or FibPowC using standard data types and retaining accuracy.

Treating the data type as double the whole way of course gets us higher "looking" numbers. But at the same depth returns ...
12200160415196712960
off by 75million.

Mac Player
Jul 23, 2009, 05:54 PM
@gnasher
Your recursive implementation is O(n), where an implementation using memoization is amortized O(1) (assuming fib() is called approximately n times).

Assuming you know the largest m for which fib() will be called.