Help creating script to sort through text files

Discussion in 'Mac Programming' started by Kirkman, Sep 25, 2005.

  1. Kirkman macrumors member

    Joined:
    Dec 27, 2002
    #1
    Okay... I've got about 100 individual text files that each contain a list of song titles that somebody likes.

    I want to create a script that will scan through the files and count the song titles. It would work like this:

    Open text file
    Get song title from first line
    Search through song title array. If script hasn't encounted the song, add a new entry to an array.
    Add "1" to that song's counter in the array
    Go to next line and repeat this process to the end of file
    Go to next file and repeat this process until end of all files

    At the end of the process, it would output a list of all the songs it encountered in the files, and how many times each song appeared.

    That's the process. But I have no idea how to turn this into a shell script (or AppleScript). Can anyone help me do this?

    --Josh
     
  2. robbieduncan Moderator emeritus

    robbieduncan

    Joined:
    Jul 24, 2002
    Location:
    London
    #2
    Can we assume that each song is on a separate line? It's make things a lot easier if they were.

    Do you want to treat "A SONG TITLE" as the same song as "A Song Title"?
     
  3. HiRez macrumors 603

    HiRez

    Joined:
    Jan 6, 2004
    Location:
    Western US
    #3
    Try this:
    Code:
    #! /usr/bin/python
    
    import os, sys
    
    if len(sys.argv) > 1:
    	rootPath = sys.argv[1]
    else:
    	rootPath = os.getcwd()
    	
    songFiles = []
    
    def parseDir(dirPath):
    	for path in os.listdir(dirPath):
    		if os.path.isfile(path):
    			if os.path.splitext(path)[1] == ".txt":
    				songFiles.append(path)
    		elif os.path.isdir(path):
    			parseDir(path)
    			
    parseDir(rootPath)
    
    print "Scanning %d files..." % len(songFiles)
    
    songList = {}
    
    for file in songFiles:
    	f = open(file)
    	for line in f.readlines():
    		line.strip()
    		if songList.has_key(line):
    			count = songList[line]
    			songList[line] = count + 1
    		else:
    			songList[line] = 1
    
    for key in songList.keys():
    	print "%3d %s" % (songList[key], key),
    Save as songlist.py and run it in Terminal by typing python songlist.py <directory-to-search>, or leave off the directory to seach files in the current working directory. Only files with a .txt extension will be searched.

    Disclaimer: This is totally untested! Use at your own risk! It may not even work at all!
     
  4. WebMongol macrumors member

    Joined:
    Sep 19, 2004
    Location:
    Bay Area, CA
    #4
    Well, it's trivial to do with standard Unix tools:

    $ cat *.txt | sort | uniq -c | sort -nr

    Output is a list of songs sorted by frequency.
    You can put this sequence of command into file and invoke it by name from Terminal.
    File: most
    Content:
    #! /bin/bash

    cat $* | sort | uniq -c | sort -nr

    Usage:
    $ most filenames
    Example:
    $ most SongDir/song*.txt
     

Share This Page