Variance Method Question

Discussion in 'Mac Programming' started by soccersquirt82, Sep 1, 2008.

  1. macrumors 6502

    Joined:
    Mar 11, 2008
    #1
    I am making a method that finds the variance of three numbers. The variance of a list of numbers is the average of the squares of the differences of the numbers from the mean. When I debug, the program crashes and has an error after the first loop. Why?

    Code:
    	static double variance (double [] a){
    		
    		int index = 0; //the # we are processing
    		double total = 0; //running total
    		double mean;
    		double difference;
    		double square;
    		double sum = 0;
    		double variance;
    
    		// loop to find sum of values
    		while (index <= a[index]) {
    			total = total + a[index];
    			index = index + 1;	//advance to next
    			}
    						
    		//calculates mean
    			mean = total / a.length;
    			
    		//loop that calculates difference from mean, square of difference, sums of squares
    		while (index <= a[index])  {
    			difference = mean - a[index];
    			square = difference * difference;
    			sum = sum + square;
    			}
    						
    		//calculates and returns variance
    		variance = sum / a.length;
    		return variance;
    	}
    
     
  2. macrumors 6502a

    yeroen

    Joined:
    Mar 8, 2007
    Location:
    Cambridge, MA
    #2
    Putting aside for the moment that the while loop condition doesn't properly express what you're trying to compute, and even assuming 'a' points to a valid segment of memory, the while loop may iterate off the end of your array 8 bytes (the size of a double) at a time until disaster strikes in the form of a segmentation fault and program crash, or more may leave elements out of the sum (e.g. the array {5,4,3,2,1} would sum to 12 given your logic )


    You need to explicitly pass in the size of your array to the variance function, into an integer variable called numElements say, and let's rewrite the while loop as a for loop, for example:

    Code:
    for (index = 0; index < numElements; ++index)
    {
      total += a[index];
    }
    
    and a similar prescription for the second while loop.
     
  3. macrumors G5

    gnasher729

    Joined:
    Nov 25, 2005
    #3
    1. What is " while (index <= a[index])" supposed to do? It looks very, very unusual to me.
    2. After fixing that error, you should have a look at for loops. They make your code more readable, and if you use a for loop, your second mistake will be really, really obvious.
     
  4. thread starter macrumors 6502

    Joined:
    Mar 11, 2008
    #4
    I now have two errors that say, "The local variable difference may not have been initialized." These errors are for square = difference * difference. I thought they knew the value of difference from difference = mean - a[index].


    Code:
    	static double variance (double [] a){
    		
    		int index = 0; //the # we are processing
    		double total = 0; //running total
    		double mean;
    		double difference;
    		double square;
    		double sum = 0;
    		double variance;
    
    		// loop to find sum of values
    		while (index <= a.length) {
    			total = total + a[index];
    			index = index + 1;	//advance to next
    			}
    						
    		//calculates mean
    			mean = total / a.length;
    			
    		//loop that calculates difference of a[index] from mean
    		while (index <= a.length)  {
    			difference = mean - a[index];
    			index = index + 1;
    			}
    		
    		// calculates square and sum
    		square = difference * difference;
    		sum = sum + square;
    						
    		//calculates and returns variance
    		variance = sum / a.length;
    		
    		return variance;
    	}
    
     
  5. macrumors 6502a

    yeroen

    Joined:
    Mar 8, 2007
    Location:
    Cambridge, MA
    #5
    The second while loop never gets executed, and so the difference variable is never set, because the index variable is never reset after the end of the first loop.

    You need to reset index back to zero before executing the second loop.
     
  6. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #6
    People are already giving tips, etc. but...

    This is java, so an array is an object that has a property, length, that is public. So you don't need to pass length in. You didn't mention you're working in Java, so only those of us that have followed your other threads know this. You should mention it.

    Second, while most people that would be responding here knows you're working on homework, you need to mention that, too. I'll write someone some working code if they're needing an example, etc. but won't if people are doing homework because it strips the learning and experience out of doing the work yourself.

    As for difference not being initialized... in the second loop if index is greater than a.length, the loop will iterate 0 times. As such, the line you mentioned may never run, hence the error is "may not have been initialized", as it's possible that it will be. Always initialize at declaration if at all possible, it makes your code perform more predictably even in the face of unexpected inputs.

    I don't know the math involved, but having just looked it up, variance is the sum of the squares of the differences from the mean, divided by the number of values. As such, you are not summing in difference, which is a problem. In this case, difference will be:
    mean - a[a.length]

    when you exit the loop. This will lead to a runtime exception being thrown, ArrayIndexOutOfBoundsException. The valid indices for an array in Java and many other languages (though fortran, by default, is 1-based) are 0 to (length - 1). Looping from 0 to length is no good.

    You then square this and assign the value to square. You then set sum to 0 + square, so you are essentially assigning square to sum. You then divide sum (which is the same as square) by a.length, assign to variance, then return.

    So your algorithm is:
    Get the difference of the element after the last element of the array, and get it's difference from the sum (of all of the numbers plus the one "off the end"), then square this value, then divide by a.length.

    This varies greatly from the correct algorithm, which is:
    For each value in the list (0 to length-1) get the difference from the mean.
    Square this value.
    Accumulate this in a variable, say, sum.
    divide sum by length to get the variance.

    Your error with your loop conditionals is referred to an "Off-by-One" error or ObO. This is one of the most common errors when building a loop. As yeroen stated earlier, in most languages going off the end of an array won't fail immediately, but yield incorrect results, and eventually lead to memory corruption, segmentation faults, etc. In Java, as I said, you'll just get an ArrayIndexOutOfBoundsException thrown during runtime, so at least you know you have a problem and where, instead of other languages which may well fail inexplicably at some later point in time.

    Above I mentioned unexpected inputs, and I don't know if your instructor tests for this, but what if null was passed in to your variance function? Right now you'd get a NullPointerException. What if a is 0 items long? index is set to 0, so your first loop will execute once since 0 <= 0, and you'll get that same NullPointerException. Your code might work with a 0 item array, but might it be easier to just return 0 immediately in this case?

    -Lee
     
  7. thread starter macrumors 6502

    Joined:
    Mar 11, 2008
    #7
    OK, I did that, but I still have the same errors.

    Code:
    static double variance (double [] a){
    		
    		int index = 0; //the # we are processing
    		double total = 0; //running total
    		double mean;
    		double difference;
    		double square;
    		double sum = 0;
    		double variance;
    
    		// loop to find sum of values
    		while (index <= a.length) {
    			total = total + a[index];
    			index = index + 1;	//advance to next
    			}
    						
    		//calculates mean
    			mean = total / a.length;
    			
    		//loop that calculates difference of a[index] from mean
    		index = 0;
    		while (index <= a.length)  {
    			difference = mean - a[index];
    			index = index + 1;
    			}
    		
    		// calculates square and sum
    		square = difference * difference;
    		sum = sum + square;
    						
    		//calculates and returns variance
    		variance = sum / a.length;
    		
    		return variance;
    	}
    
     
  8. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #8
    Since i posted a few seconds before you, you probably haven't seen it yet, but go through my last post and see if that helps you clear more of the problems up.

    -Lee
     
  9. thread starter macrumors 6502

    Joined:
    Mar 11, 2008
    #9
    OK, I understand what you're saying about how I should calculate the squares of the differences separately and the take those and average them. I know I could do that by having different variables for a[0], a[1], a[2] and do each one in a different, but that wouldn't be good if there were more than three arrays. I also don't understand when you talk about not having to pass length in, and I don't understand when you say that difference should equal mean - a[a.length]. I thought it would be difference = mean - a[1] or a[2] or a[3].
     
  10. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #10
    Not that it SHOULD be, but that's what you have it coded as now. Here's what you have:
    Code:
    while (index <= a.length)  {
    	difference = mean - a[index];
    	index = index + 1;
    }
    
    Since you set difference each iteration, difference would be set to whatever was assigned to it in the last iteration. Since the last iteration will have index equal to a.length, that yields:
    difference = mean - a[a.length]
    This is not valid because a.length is off the end of a, but i was just stating what your code does (not what it should do).

    You don't need separate variables, because you just need the sum of the squares of the differences. So you would have some variable, and in your loop you would accumulate the squares of the differences in there. Only one variable for the sum would be needed.

    -Lee
     
  11. macrumors 603

    whooleytoo

    Joined:
    Aug 2, 2002
    Location:
    Cork, Ireland.
    #11
    Apart from questions over your algorithm (;)), you only change the value of difference in a loop, which may never be run (the compiler has no way of knowing).

    If you initialise difference to zero, that specific error/warning will disappear.
     
  12. macrumors G5

    gnasher729

    Joined:
    Nov 25, 2005
    #12
    If you initialise difference to zero, the warning will disappear; it won't make the code work though. The variance reported will always be zero. That is why many programmers recommend _not_ to just blindly initialise all variables; it prevents the compiler from telling you about logic errors in the program.
     
  13. macrumors 603

    whooleytoo

    Joined:
    Aug 2, 2002
    Location:
    Cork, Ireland.
    #13
    That's a very valid point. I'm not sure I'd want to rely on compiler errors to catch logic errors though! ;)

    I would still recommend initialising variables, if you don't then the random uninitialised value in the variable might just as easily give a "false pass" when in fact the logic has failed.

    Initialising variables to an unlikely value (0, -1, whatever suits the algorithm) which can then be tested later on for changes is a useful debugging method. If you don't initialise, you can't do that.
     
  14. thread starter macrumors 6502

    Joined:
    Mar 11, 2008
    #14
    I now must find variance, but with any amount of numbers. I need to proompt the user to tell me how many numbers they want to enter and then find the variance using those numbers. How would I do this?

    Code:
    package arrays;
    import java.util.Scanner;
    public class Variance {
    
    	static double variance (double [] a){
    
    		int index; //the # we are processing
    		double total; //running total
    		double mean;
    		double sum = 0;
    		double variance;
    
    		// loop to find sum of values
    		index = 0;
    		total = 0;
    		while (index < a.length) {
    			total = total + a[index];
    			index = index + 1; //advance to next
    			}
    
    		//calculates mean
    		mean = total / a.length;
    
    		// loop to get the sum of the squares of the differences
    		index = 0;
    
    		while (index < a.length) {
    			sum = sum + (a[index] - mean) * (a[index] - mean);
    			index = index + 1; //advance to next
    			}
    
    		//calculates and returns variance
    		variance = sum / a.length;
    		return variance;
    	}
    
    public static void main (String [] args){
    
    	Scanner sc = new Scanner (System.in);
    	
    	double [] a = new double [3];
    
    	//user input
    	System.out.println("How many numbers would you like to enter?");
    	
    	System.out.print("x = ");
    	a[0] = sc.nextDouble();
    	System.out.print("y = ");
    	a[1] = sc.nextDouble();
    	System.out.print("z = ");
    	a[2] = sc.nextDouble();
    	System.out.println("The variance is: " + variance(a));
    	}
    }
    
    Once again, this is for school and I'm using Java.
     
  15. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #15
    Should be a pretty simple tweak. The first input you'd take from the user would be an integer representing the number of numbers they wish to enter. You'd then have to set up your array that you've called a to be the appropriate size. Then you'd need to read in from the user the number of times the user chose initially. What sort of construct would you use to do something a certain number of times? Once you've done that, your variance routine is already setup to handle an arbitrary length array, so that should be it.

    -Lee
     
  16. thread starter macrumors 6502

    Joined:
    Mar 11, 2008
    #16
    OK, it's almost working. The only problem: it crashes during the last loop. What is wrong?

    Code:
    public static void main (String [] args){
    
    	Scanner sc = new Scanner (System.in);
    	int numbers;
    	int index = 1;
    	
    	//user input
    	System.out.print("How many numbers would you like to enter? ");
    	numbers = sc.nextInt();
    	
    	double [] a = new double [numbers];
    
    		while (index <= numbers ){
    			System.out.print("Number " + index + ": ");
    			a[index] = sc.nextDouble();
    			index += 1; //advance to next
    			}
    		
    	System.out.print("The variance is: " + variance(a));
    	}
    
     
  17. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #17
    You should say what the exception is when it crashes, which in this case is an ArrayIndexOutOfBoundsException thrown on the line:
    Code:
    a[index] = sc.nextDouble();
    What is the range of index in your loop? With numbers being the size of a, what are the valid indicies for a? Or to break that last question down further, where do array indexes start?

    -Lee
     
  18. thread starter macrumors 6502

    Joined:
    Mar 11, 2008
    #18
    Ok, so I just had the array handle more by putting in "number + 1". I don't really understand why it needs to handle one more than the amount. It also gives the wrong answer. What is wrong?

    Code:
    package arrays;
    import java.util.Scanner;
    public class Variance {
    
    	static double variance (double [] a){
    
    		int index; //the # we are processing
    		double total; //running total
    		double mean;
    		double sum = 0;
    		double variance;
    
    		// loop to find sum of values
    		index = 0;
    		total = 0;
    		while (index < a.length) {
    			total = total + a[index];
    			index = index + 1; //advance to next
    			}
    
    		//calculates mean
    		mean = total / a.length;
    
    		// loop to get the sum of the squares of the differences
    		index = 0;
    
    		while (index < a.length) {
    			sum = sum + (a[index] - mean) * (a[index] - mean);
    			index = index + 1; //advance to next
    			}
    
    		//calculates and returns variance
    		variance = sum / a.length;
    		return variance;
    	}
    
    public static void main (String [] args){
    
    	Scanner sc = new Scanner (System.in);
    	int numbers;
    	int index = 0;
    	
    	//user input
    	System.out.print("How many numbers would you like to enter? ");
    	numbers = sc.nextInt();
    	
    	double [] a = new double [numbers + 1];
    
    		while (index < numbers ){
    			System.out.print("Number " + index + ": ");
    			a[index] = sc.nextDouble();
    			index += 1; //advance to next
    			}
    		System.out.print("The variance is: " + variance(a));
    	}
    }
    
     
  19. macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #19
    You did everything right except increasing the capacity of the array. You want to loop from 0 to number - 1. That's what you're doing now, so that's great. There's just an extra element that is uninitialized at the end of your array a now. So when the variance is being calculated some junk value is being included.

    Fix that and you should be good.

    -Lee
     

Share This Page