Trying to separate items into categories, but can't.

Discussion in 'Mac Programming' started by moonman239, Jan 12, 2013.

  1. macrumors 65816

    Joined:
    Mar 27, 2009
    #1
    I have a string of items that I need to separate according to their respective categories.

    Here's my code so far:

    Code:
    
    on searchForSubstring(theString, theSubstring)
    	try
    		set oldDelims to AppleScript's text item delimiters
    		set AppleScript's text item delimiters to theSubstring
    		set itemsOfString to every text item of theString
    		set indexes to {}
    		set theIndex to 0
    		repeat with X from 1 to ((count of itemsOfString) - 1)
    			set theIndex to theIndex + (length of item X of itemsOfString) + 1
    			copy theIndex to end of indexes
    		end repeat
    		set AppleScript's text item delimiters to oldDelims
    		return indexes
    	on error errMsg
    		log errMsg
    	end try
    end searchForSubstring
    
    to switchText from t to r instead of s
    	set d to text item delimiters
    	set text item delimiters to s
    	set t to t's text items
    	set text item delimiters to r
    	tell t to set t to item 1 & ({""} & rest)
    	set text item delimiters to d
    	t
    end switchText
    
    on separateItemsIntoCategories(theContent, theCategories, categoryID)
    	set newContent to theCategories
    	log (count of theCategories)
    	repeat with X from 1 to (count of theCategories)
    		log X
    		set categoryDelimiters to (item X of theCategories) & categoryID
    		set categoryIndex to item 1 of searchForSubstring(theContent, categoryDelimiters)
    		set newContent to switchText from newContent to "ijkl" instead of categoryIndex
    	end repeat
    	set oldDelims to AppleScript's text item delimiters
    	set AppleScript's text item delimiters to "ijkl"
    	return every text item of newContent
    end separateItemsIntoCategories
    try
    	set theCourses to {"Appetizers", "Breakfast", "Entrees", "Soups and Salads", "Desserts"}
    	set theCuisines to {"American", "Mediterranean", "Mexican", "Asian"}
    	set thefile to POSIX file "/Users/Montana/Desktop/recipes.txt"
    	set fref to (open for access thefile)
    	set theContent to read fref
    	close access fref
    	set contentByCourses to separateItemsIntoCategories(theContent, theCourses, ":")
    on error msg
    	log msg
    end try
    
    It's a huge block of code, but I figured I'd post everything that might be relevant.
    Here's the problem I'm having. After calling "separateItemsIntoCategories", AppleScript tells me it "can't get item 1 of {}." I can't figure out why there's a blank list. Nothing's wrong with the file, and nothing's wrong with the searchForSubstring handler, as I have tested that with different parameters.
     
  2. macrumors 603

    Joined:
    Aug 9, 2009
    #2
    Post a test file of sample data for "recipes.txt". It should show the error, i.e. it should not parse successfully.

    I'd run the posted code here, but without data, and no comments as to what the data should look like, it's too time-consuming to invent sample data.
     
  3. thread starter macrumors 65816

    Joined:
    Mar 27, 2009
    #3
    The file looks something like this:

    Appetizers:

    (A bunch of recipes here)

    Breakfast:

    (Another set of recipes here)

    etc.

    The script should give me something like this:
    {"(All the appetizer recipes)","(All the breakfast recipes)",...}
     
  4. chown33, Jan 14, 2013
    Last edited: Jan 14, 2013

    macrumors 603

    Joined:
    Aug 9, 2009
    #4
    I did the following on OS versions 10.6.8 and 10.8.0. You didn't give your OS version, so your results may differ.


    I started by making this test data file:

    Filename: "vikings.txt"
    Code:
    # Before the first category.
    
    Appetizers:
    crunchy frog
    crackers and spam
    
    Breakfast:
    egg and spam
    egg, bacon, and spam
    spam, egg, spam, spam, bacon and spam
    
    Entrees:
    barbecued spam with pineapple and spam
    Lobster Thermidor aux crevettes and spam
    lutefisk
    
    Next, I changed the file's path in your script and ran it. It failed.

    I then tried to figure out what your code was doing, by adding log statements at step-wise points. The resulting program was this (see separateItemsIntoCategories in particular):
    Code:
    on searchForSubstring(theString, theSubstring)
    	try
    		set oldDelims to AppleScript's text item delimiters
    		set AppleScript's text item delimiters to theSubstring
    		set itemsOfString to every text item of theString
    		set indexes to {}
    		set theIndex to 0
    		repeat with X from 1 to ((count of itemsOfString) - 1)
    			set theIndex to theIndex + (length of item X of itemsOfString) + 1
    			copy theIndex to end of indexes
    		end repeat
    		set AppleScript's text item delimiters to oldDelims
    		return indexes
    	on error errMsg
    		log errMsg
    	end try
    end searchForSubstring
    
    to switchText from t to r instead of s
    	set d to text item delimiters
    	set text item delimiters to s
    	set t to t's text items
    	set text item delimiters to r
    	tell t to set t to item 1 & ({""} & rest)
    	set text item delimiters to d
    	t
    end switchText
    
    on separateItemsIntoCategories(theContent, theCategories, categoryID)
    	set newContent to theCategories
    	log (count of theCategories)
    	repeat with X from 1 to (count of theCategories)
    		log " ----- item " & X & " -----"
    		set categoryDelimiters to (item X of theCategories) & categoryID
    		log categoryDelimiters
    		set categoryIndex to item 1 of searchForSubstring(theContent, categoryDelimiters)
    		log categoryIndex
    		set newContent to switchText from newContent to "ijkl" instead of categoryIndex
    		log newContent
    	end repeat
    	--set oldDelims to AppleScript's text item delimiters
    	--set AppleScript's text item delimiters to "ijkl"
    	--return every text item of newContent
    end separateItemsIntoCategories
    
    try
    	set theCourses to {"Appetizers", "Breakfast", "Entrees", "Soups and Salads", "Desserts"}
    	set theCuisines to {"American", "Mediterranean", "Mexican", "Asian"}
    	
    	set myPath to "/Volumes/TWork/Trials/a-script/moon/vikings.txt"
    	
    	set thefile to POSIX file myPath
    	set fref to (open for access thefile)
    	set theContent to read fref
    	close access fref
    	
    	set contentByCourses to separateItemsIntoCategories(theContent, theCourses, ":")
    	
    on error msg
    	log msg
    end try
    
    When run, it generates a fair amount of output in the Event Log subview of AppleScript Editor.app. I won't paste it here, because you can generate the output yourself, after editing the myPath variable's pathname string to your actual file pathname.

    The logged output made no sense to me. It doesn't seem to have any relationship to what it should be doing, which is splitting text at delimiter words. Instead of trying to figure it out and make it work as given, I more or less started over from the requirements and the sample data file.


    Firstly, I don't understand why you're going to the trouble of writing parsing and separating handlers when AppleScript's builtin text item delimiters should be able to do the job.

    The text item delimiters has always been a list of delimiters. Before 10.6, only the first string in that list was used, meaning there could be only a single delimiter. As of 10.6 (and later), the full list is used. This means you can give it a complete list of delimiters, and text items will split the text at every occurence of any item in that list.

    Your variable theCourses is already a list, and each string in that list is almost a delimiter. Simply append a colon to each string and the result is the desired delimiter. Put those in a list and boom.

    Based on this simple analysis of the requirements and data, I wrote this:
    Code:
    try
    	set theCourses to {"Appetizers", "Breakfast", "Entrees", "Soups and Salads", "Desserts"}
    	
    	-- Build filePath using Posix form; show its info.
    	set _home to POSIX path of (path to home folder)
    	set _path to "/TWork/Trials/a-script/moon/"
    	set filePath to POSIX file (_home & _path & "vikings.txt")
    	log filePath & " -- " & (info for filePath)
    	
    	-- Assemble delimiter list, giving each string a colon at end.
    	set delimList to {}
    	repeat with course in theCourses
    		set end of delimList to (course & ":")
    	end repeat
    	log delimList
    	
    	-- Read entire file at once into content.
    	set contentFile to (open for access filePath)
    	set content to read contentFile
    	close access contentFile
    	log content
    	
    	-- Split content into substrings.
    	set _was to AppleScript's text item delimiters
    	set AppleScript's text item delimiters to delimList
    	set contentList to text items of content
    	set AppleScript's text item delimiters to _was
    	
    	-- Log individual parts.
    	repeat with part in contentList
    		log part
    	end repeat
    	
    	-- Result is in contentList, a list of strings.
    	return contentList
    	
    on error msg
    	log msg
    	
    end try
    
    When run, this produces the desired output, with the text before the first delimiter ("Appetizers:") appearing in the first element of the resulting list.

    To eliminate the first item, assuming it's not wanted, you could write additional code to remove it, or make a new list containing every item except the 1st.


    I am certainly no expert at AppleScript. "Somewhat competent" would be a better description. I simply applied basic programming principles to solving the problem, starting with basic AppleScript reference docs:
    https://developer.apple.com/library...ptlangguide/conceptual/ASLR_fundamentals.html

    That reference says that only the first element of text item delimiters is used, but I was pretty sure that's no longer true. I can't recall where I read that, nor what OS version it became true, so I can't cite an article or URL for it [see EDIT below]. So to test whether it was true or not, I wrote this:
    Code:
    try
    	-- Test list-of-delims vs. single delim
    	set delims to {":", ",", ";"}
    	
    	set content to "abc: def, ghi; jkl; mno, pqr, stu: vwx: yz"
    	log "content -- " & content
    	
    	-- Split content into substrings.
    	set _was to AppleScript's text item delimiters
    	set AppleScript's text item delimiters to delims
    	set contentList to text items of content
    	set AppleScript's text item delimiters to _was
    	log contentList
    	
    	-- Log individual parts.
    	repeat with part in contentList
    		log part
    	end repeat
    	
    on error msg
    	log msg
    end try
    Since this produced the expected results, I was fairly sure I could apply the same approach to data from a file, using longer strings as delimiters.

    If it hadn't worked, I would have used a different approach, but that wasn't necessary so I won't expand on it further.


    In the course of researching and writing the above scripts, I came across Smile, which is like an extended version of AppleScript. I did not test it, because I have little interest in using AppleScript for solving problems like this. I'll just point you to it, since you said before that you like doing things like this in AppleScript.
    http://www.satimage.fr/software/en/smile/index.html

    Among other things, it appears to have better interactivity than AppleScript Editor, which may prove more useful than other features. Debugging by log statements alone is primitive, at best. At worst, it's completely impossible.
    http://www.satimage.fr/software/en/smile/interface/as_shell.html

    Smile also has XML parsing features, and a DOM document model. I mention this in regard to your earlier XML parsing adventures.


    Finally, you also previously wrote that you had web-dev experience, so you might look at JavaScript OSA:
    http://www.latenightsw.com/freeware/JavaScriptOSA/index.html

    It's basically JavaScript with the ability to send inter-process events (i.e. AppleEvents, i.e. AppleScript events). Since JavaScript has somewhat better split/join/search/replace capabilities when compared to AppleScript, you may find it more useful than reinventing all those foundational functions.

    Again, I have not tried this, because I have no interest in using JavaScript for these kinds of text-parsing tasks.


    If I were doing this, I might use 'awk' and/or bash, or a combination thereof. But since you already said you didn't want to learn another language, I mention this only as a potential future reference.


    EDIT

    I found a reference for the change to the text item delimiters list. See the 10.6 AppleScript Release Notes:
    http://developer.apple.com/library/mac/#releasenotes/AppleScript/RN-AppleScript/RN-10_6/RN-10_6.html

    Under the heading Other Enhancements:
    When getting the text items of a string, all the values in text item delimiters are considered. Previous versions only considered the first item in the list.
     
  5. thread starter macrumors 65816

    Joined:
    Mar 27, 2009
    #5
    Well, just about every AppleScript page on the Internet is from like a few years ago. As soon as I finish this post, I'm going to make a new thread where members can come correct old information from the Web.
     
  6. thread starter macrumors 65816

    Joined:
    Mar 27, 2009
    #6
    Hallelujah, it works! Thanks a lot!

    Just for the reference, here's my new code:

    Code:
    
    on separateItemsIntoCategories(theContent, theCategories, categoryID)
    	set oldDelims to AppleScript's text item delimiters
    	set theDelims to {}
    	repeat with X from 1 to count of theCategories
    		set delim to (item X of theCategories) & categoryID
    		copy delim to end of theDelims
    	end repeat
    	set AppleScript's text item delimiters to theDelims
    	set categorizedContent to rest of (every text item of theContent)
    	set AppleScript's text item delimiters to oldDelims
    	return categorizedContent
    end separateItemsIntoCategories
    try
    	set theCourses to {"Appetizers", "Breakfast", "Entrees", "Soups and Salads", "Desserts"}
    	set theCuisines to {"American", "Mediterranean", "Mexican", "Asian"}
    	set thefile to POSIX file "/Users/Montana/Desktop/recipes.txt"
    	set fref to (open for access thefile)
    	set theContent to read fref
    	close access fref
    	set contentByCourses to separateItemsIntoCategories(theContent, theCourses, ":")
    	set item1 to item 1 of contentByCourses
    	display alert item1
    on error msg
    	log msg
    end try
    
    For those who haven't already figured it out, categoryID denotes a character that follows the category name, so the script knows when it should make a new item in the list.
     

Share This Page