Java HW- counting words in a file

Discussion in 'Mac Programming' started by soccersquirt82, Oct 1, 2008.

  1. soccersquirt82 macrumors 6502

    Joined:
    Mar 11, 2008
    #1
    For my Java homework, I must count words in a file. I know how to count whitespace and non-whitespace, but not words. The number of words is not necessarily the number of spaces + 1. If there was three spaces, there could be two, three, or four words, so doing that won't work.


    Code:
    package files;
    import java.io.*;
    public class Files2 {
    
    	/** * This class demonstrates file I/O using streams and readers */
    	// says if character is whitespace
    	static boolean iswhite(int x) {
    		return((x == 32) || (x == 9 ) ||
    				(x == 13) || (x == 10));
    	}
    
    	public static void main(String[] args) {
    		// Declare the filename we will be looking for
    
    		String filename = "T:/Lorem.txt";
    
    		FileInputStream stream;
    
    		// the stream we will read from
    		try {
    			// Open the file. Reading right to left, we
    			// (1) create a FileInput Stream, pointing to the file;
    			// (2) assign that stream to the name 'stream'
    			stream = new FileInputStream(filename);
    		}catch (IOException e) {
    			System.	out.println("File not found: " + filename);
    			return; // abort on error
    		}
    
    		// Here the file was opened. Read in all the characters
    		// Connect a stream reader to the stream.
    
    		InputStreamReader reader = new InputStreamReader(stream);
    		// read in all the characters until we hit end-of-file, which is
    		// indicated by a negative return value
    
    		try{
    			int answer = 0; //calculates number of characters
    			int answer2 = 0; //calculates number of whitespace
    			int answer3 = 0; //calculates number of words
    			int answer4 = 0; //calculates length of longest word
    			boolean lastchar = true;
    			int charread; // the character we read. Note it is returned as an int
    			while((charread = reader.read()) >= 0){ //looks at every character
    				if (iswhite(charread)) { //calls method which calc whitespace
    					answer2 = answer2 + 1;
    				}
    
    				if (!iswhite(charread)) { //calls method which calc number of words
    					answer3 = answer3 + 1;
    				}
    
    				answer = answer + 1;
    
    				lastchar = iswhite(charread);
    
    				// here we can look at the character read
    			}
    
    			System.out.println("Total number of characters: " + answer);
    			System.out.println("Total number of whitespace: " + answer2);
    			System.out.println("Total number of non-whitespace: " + answer3);
    			
    		} catch (IOException e) {
    			System.	out.println("File not found: " + filename);
    			return; // abort on error
    		}
    
    		// close the input file
    		try {
    			stream.close();
    
    		} catch (IOException e) {
    			System.out.println("Error closing file: " + filename);
    			return; // abort on error
    		}
    
    	}
    
    }
    
     
  2. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #2
    If you're not going to use built in split functions, regular expressions, etc. the best hint I can give is not to count whitespace, but count changes from whitespace to non-whitespace. It should be fairly straightforward to keep some sort of boolean stating you're "in whitespace", and whenever that is true, and you get a non-whitespace character, increment your word count and set that boolean to false. Make sure to keep track of the border cases at the start and end of the file.

    -Lee

    Edit: Some additional things to consider:
    Does anything that falls between two sets of whitespaces count as a word? In this string, how many words are there:
    " . , test @ 1"
    ?

    Is it 5? If not, which of those things don't count?

    If you CAN use string.split (which is practically cheating), this becomes completely trivial. You can look at:
    http://java.sun.com/javase/6/docs/api/java/lang/String.html#split(java.lang.String)

    The regular expression needed is pretty straight-forward. I wrote a 6 line program to do this on arg[0] because i didn't feel like doing file I/O, but this way is much better.

    Since it's homework they probably want you do to a bit more legwork, but wanted to point out shortcuts if you are allowed to use them.
     

Share This Page