Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

soccersquirt82

macrumors 6502
Original poster
Mar 11, 2008
296
0
For my Java homework, I must count words in a file. I know how to count whitespace and non-whitespace, but not words. The number of words is not necessarily the number of spaces + 1. If there was three spaces, there could be two, three, or four words, so doing that won't work.


Code:
package files;
import java.io.*;
public class Files2 {

	/** * This class demonstrates file I/O using streams and readers */
	// says if character is whitespace
	static boolean iswhite(int x) {
		return((x == 32) || (x == 9 ) ||
				(x == 13) || (x == 10));
	}

	public static void main(String[] args) {
		// Declare the filename we will be looking for

		String filename = "T:/Lorem.txt";

		FileInputStream stream;

		// the stream we will read from
		try {
			// Open the file. Reading right to left, we
			// (1) create a FileInput Stream, pointing to the file;
			// (2) assign that stream to the name 'stream'
			stream = new FileInputStream(filename);
		}catch (IOException e) {
			System.	out.println("File not found: " + filename);
			return; // abort on error
		}

		// Here the file was opened. Read in all the characters
		// Connect a stream reader to the stream.

		InputStreamReader reader = new InputStreamReader(stream);
		// read in all the characters until we hit end-of-file, which is
		// indicated by a negative return value

		try{
			int answer = 0; //calculates number of characters
			int answer2 = 0; //calculates number of whitespace
			int answer3 = 0; //calculates number of words
			int answer4 = 0; //calculates length of longest word
			boolean lastchar = true;
			int charread; // the character we read. Note it is returned as an int
			while((charread = reader.read()) >= 0){ //looks at every character
				if (iswhite(charread)) { //calls method which calc whitespace
					answer2 = answer2 + 1;
				}

				if (!iswhite(charread)) { //calls method which calc number of words
					answer3 = answer3 + 1;
				}

				answer = answer + 1;

				lastchar = iswhite(charread);

				// here we can look at the character read
			}

			System.out.println("Total number of characters: " + answer);
			System.out.println("Total number of whitespace: " + answer2);
			System.out.println("Total number of non-whitespace: " + answer3);
			
		} catch (IOException e) {
			System.	out.println("File not found: " + filename);
			return; // abort on error
		}

		// close the input file
		try {
			stream.close();

		} catch (IOException e) {
			System.out.println("Error closing file: " + filename);
			return; // abort on error
		}

	}

}
 
If you're not going to use built in split functions, regular expressions, etc. the best hint I can give is not to count whitespace, but count changes from whitespace to non-whitespace. It should be fairly straightforward to keep some sort of boolean stating you're "in whitespace", and whenever that is true, and you get a non-whitespace character, increment your word count and set that boolean to false. Make sure to keep track of the border cases at the start and end of the file.

-Lee

Edit: Some additional things to consider:
Does anything that falls between two sets of whitespaces count as a word? In this string, how many words are there:
" . , test @ 1"
?

Is it 5? If not, which of those things don't count?

If you CAN use string.split (which is practically cheating), this becomes completely trivial. You can look at:
http://java.sun.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)

The regular expression needed is pretty straight-forward. I wrote a 6 line program to do this on arg[0] because i didn't feel like doing file I/O, but this way is much better.

Since it's homework they probably want you do to a bit more legwork, but wanted to point out shortcuts if you are allowed to use them.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.