PDA

View Full Version : Java HW- counting words in a file




soccersquirt82
Oct 1, 2008, 09:37 PM
For my Java homework, I must count words in a file. I know how to count whitespace and non-whitespace, but not words. The number of words is not necessarily the number of spaces + 1. If there was three spaces, there could be two, three, or four words, so doing that won't work.



package files;
import java.io.*;
public class Files2 {

/** * This class demonstrates file I/O using streams and readers */
// says if character is whitespace
static boolean iswhite(int x) {
return((x == 32) || (x == 9 ) ||
(x == 13) || (x == 10));
}

public static void main(String[] args) {
// Declare the filename we will be looking for

String filename = "T:/Lorem.txt";

FileInputStream stream;

// the stream we will read from
try {
// Open the file. Reading right to left, we
// (1) create a FileInput Stream, pointing to the file;
// (2) assign that stream to the name 'stream'
stream = new FileInputStream(filename);
}catch (IOException e) {
System. out.println("File not found: " + filename);
return; // abort on error
}

// Here the file was opened. Read in all the characters
// Connect a stream reader to the stream.

InputStreamReader reader = new InputStreamReader(stream);
// read in all the characters until we hit end-of-file, which is
// indicated by a negative return value

try{
int answer = 0; //calculates number of characters
int answer2 = 0; //calculates number of whitespace
int answer3 = 0; //calculates number of words
int answer4 = 0; //calculates length of longest word
boolean lastchar = true;
int charread; // the character we read. Note it is returned as an int
while((charread = reader.read()) >= 0){ //looks at every character
if (iswhite(charread)) { //calls method which calc whitespace
answer2 = answer2 + 1;
}

if (!iswhite(charread)) { //calls method which calc number of words
answer3 = answer3 + 1;
}

answer = answer + 1;

lastchar = iswhite(charread);

// here we can look at the character read
}

System.out.println("Total number of characters: " + answer);
System.out.println("Total number of whitespace: " + answer2);
System.out.println("Total number of non-whitespace: " + answer3);

} catch (IOException e) {
System. out.println("File not found: " + filename);
return; // abort on error
}

// close the input file
try {
stream.close();

} catch (IOException e) {
System.out.println("Error closing file: " + filename);
return; // abort on error
}

}

}



lee1210
Oct 1, 2008, 09:41 PM
If you're not going to use built in split functions, regular expressions, etc. the best hint I can give is not to count whitespace, but count changes from whitespace to non-whitespace. It should be fairly straightforward to keep some sort of boolean stating you're "in whitespace", and whenever that is true, and you get a non-whitespace character, increment your word count and set that boolean to false. Make sure to keep track of the border cases at the start and end of the file.

-Lee

Edit: Some additional things to consider:
Does anything that falls between two sets of whitespaces count as a word? In this string, how many words are there:
" . , test @ 1"
?

Is it 5? If not, which of those things don't count?

If you CAN use string.split (which is practically cheating), this becomes completely trivial. You can look at:
http://java.sun.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)

The regular expression needed is pretty straight-forward. I wrote a 6 line program to do this on arg[0] because i didn't feel like doing file I/O, but this way is much better.

Since it's homework they probably want you do to a bit more legwork, but wanted to point out shortcuts if you are allowed to use them.