|
|||||||
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#1 |
|
Unable to strip excess newline characters from a string or an array :/
Hi guys,
I have information I pulled from a webpage. I stripped the HTML from it so now it is just a string and looks like this (its long so this is only part of it): Code:
Beginner
Upper Mambo Alley
Yes
Yes
Lower Mambo Alley
Yes
Yes
Snow Drop - Beginner's Area
Yes
Yes
Code:
Upper Mambo Alley
Yes
Yes
Lower Mambo Alley
Yes
Yes
Snow Drop - Beginner's Area
Yes
Yes
1. I tried using NSScanner to scan for two consecutive newline characters. No luck. 2. I tried using stringByReplacingOccurancesOfString@"\n\n" withString: ""]; 3. I tried reading the string into an array using NSArray *testContents = [strippedSiteData componentsSeparatedByString:@"\n"];, converting it to an NSMutableArray and then comparing the contents and removing any array member that was a newline character. Nothing seems to be working. I am guessing one of two things, either my comparison statement is wrong (where the code says "this is not working") or its something other than newline characters that are in these strings or in the array created. If anyone can give me a heads up to what is wrong it would be greatly appreciated. Here is my code: Code:
#import <Foundation/Foundation.h>
//Function Prototypes
NSString *stripHTML(NSString *html);
NSString *removeRandomTags(NSString *html);
int main (int argc, const char * argv[])
{
@autoreleasepool {
//Create URL
NSURL *url = [NSURL URLWithString:@"http://www.blueknob.com/winter/conditions.php"];
//Request information from website
NSURLRequest *request = [NSURLRequest requestWithURL:url];
NSError *error = nil;
NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:NULL error:&error];
//Check if data was read
if(!data)
{
NSLog(@"Request failed %@", [error localizedDescription]);
return 1;
}
//Convert NSData object to an NSString
NSString *siteData = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
//Strip HTML tags and random HTML tags
NSString *strippedSiteData = [[NSString alloc] initWithString:stripHTML(siteData)];
strippedSiteData = removeRandomTags(strippedSiteData);
//Check output
NSLog(@"%@", strippedSiteData);
//Create an array based on data
NSArray *testContents = [strippedSiteData componentsSeparatedByString:@"\n"];
NSMutableArray *contents = [NSMutableArray arrayWithArray:testContents];
//Attempt to remove any objects that are only newline characters
for(int i = 0; i < [contents count]; ++i)
{
if([contents objectAtIndex:i] == @"\n") //This doesn't work
[contents removeObjectAtIndex:i];
}
//Print contents
for(NSString *s in contents)
{
NSLog(@"%@", s);
}
}
return 0;
}
NSString *stripHTML(NSString *html)
{
//Scan the string and strip out the HTML from it
NSScanner *scanner = [NSScanner scannerWithString:html];
NSString *text = nil;
while([scanner isAtEnd] == NO)
{
//Beginning of a tag
[scanner scanUpToString:@"<" intoString:nil];
//End of a tag
[scanner scanUpToString:@">" intoString:&text];
//Replace the found tag with a space
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@" "];
}
return html;
}
NSString *removeRandomTags(NSString *html)
{
NSScanner *scanner = [NSScanner scannerWithString:html];
NSString *text = nil;
while([scanner isAtEnd] == NO)
{
//Beginning of a tag
[scanner scanUpToString:@"&" intoString:nil];
//End of a tag
[scanner scanUpToString:@";" intoString:&text];
//Replace the found tag with nothing
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@;", text] withString:@""];
}
return html;
}
__________________
Macbook Air 13inch Ultimate
Hexcore MacPro 3.33ghz - 24 gigs ram - ATI 5870 - Dual 27inch ACD's |
|
|
|
0
|
|
|
#2 |
|
Are you sure they are \n characters? There are other return characters (\r for example).
__________________
Sponsor me to cycle 100Km round London in the dark |
|
|
|
0
|
|
|
#3 | |
|
Quote:
I know there is a way, and I used to do it long ago in C and C++ I just haven't came across it yet (and I forget how to do it).
__________________
Macbook Air 13inch Ultimate
Hexcore MacPro 3.33ghz - 24 gigs ram - ATI 5870 - Dual 27inch ACD's |
||
|
|
0
|
|
|
#4 |
|
You see this line:
Code:
if([contents objectAtIndex:i] == @"\n") //This doesn't work
__________________
Sponsor me to cycle 100Km round London in the dark |
|
|
|
2
|
|
|
#5 |
|
https://developer.apple.com/library/...Reference.html
Code:
NSError error = [[NSError alloc] init...whatever];
NSRegularExpression * regexp = [[NSRegularExpression alloc] initWithPattern: @"^\s+$" options: 0 error: &error ];
NSString * strippedandpurifiedsitedata = [regexp stringByReplacingMatchesInString: strippedSiteData
options: 0
range: NSMakeRange(0, [strippedSiteData length])
withTemplate: @""];
The regular expression might not be up to snuff, but that's as easy as it gets. BTW, you shouldn't capitalize any characters in your variable names, that only applies to methods following the guidelines.
__________________
"What you leave behind is not what is engraved in stone monuments, but what is woven into the lives of others." -- Pericles Last edited by KnightWRX; Feb 14, 2012 at 10:05 AM. Reason: Changed regular expression to use \s for whitespaces which includes carriage returns, line feeds and tabs and spaces |
|
|
|
1
|
|
|
#6 | ||
|
Quote:
Thanks for the clarification.Quote:
I never even heard of NSRegularExpression so I will take a look at it and see how to work with it.
__________________
Macbook Air 13inch Ultimate
Hexcore MacPro 3.33ghz - 24 gigs ram - ATI 5870 - Dual 27inch ACD's |
|||
|
|
0
|
|
|
#7 | ||
|
Ok, I think I got a good pattern down with "^\s+$"
That one should do it. Should being the key word. Regexps are powerful but a pain sometimes. ![]() Quote:
Quote:
__________________
"What you leave behind is not what is engraved in stone monuments, but what is woven into the lives of others." -- Pericles Last edited by KnightWRX; Feb 14, 2012 at 10:18 AM. |
|||
|
|
1
|
|
|
#8 | |
|
Quote:
I am still pretty confused reading through the documentation. Am I using this correctly? My string still looks the same :/ I was getting a warning from the compiler that /s was an unrecognized escape sequence (even though it shows it in the page you referenced me) so I figured maybe it needed an extra slash so I added one, I don't know if that may have screwed it up or not. Here is my revamped code. (I didn't get a chance to change the variable names yet): Code:
#import <Foundation/Foundation.h>
//Function Prototypes
NSString *stripHTML(NSString *html);
NSString *removeRandomTags(NSString *html);
int main (int argc, const char * argv[])
{
@autoreleasepool {
//Create URL
NSURL *url = [NSURL URLWithString:@"http://www.blueknob.com/winter/conditions.php"];
//Request information from website
NSURLRequest *request = [NSURLRequest requestWithURL:url];
NSError *error = nil;
NSData *data = [NSURLConnection sendSynchronousRequest:request returningResponse:NULL error:&error];
//Check if data was read
if(!data)
{
NSLog(@"Request failed %@", [error localizedDescription]);
return 1;
}
//Convert NSData object to an NSString
NSString *siteData = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
//Strip HTML tags and random HTML tags
NSString *strippedSiteData = [[NSString alloc] initWithString:stripHTML(siteData)];
strippedSiteData = removeRandomTags(strippedSiteData);
//Check output
NSLog(@"%@", strippedSiteData);
//Create an array based on data
NSArray *testContents = [strippedSiteData componentsSeparatedByString:@"\n"];
__unused NSMutableArray *contents = [NSMutableArray arrayWithArray:testContents];
NSRegularExpression *regexp = [[NSRegularExpression alloc] initWithPattern: @"^\\s+$" options: 0 error: &error ];
NSString * strippedandpurifiedsitedata = [regexp stringByReplacingMatchesInString: strippedSiteData
options: 0
range: NSMakeRange(0, [strippedSiteData length])
withTemplate: @""];
NSLog(@"%@", strippedandpurifiedsitedata);
}
return 0;
}
NSString *stripHTML(NSString *html)
{
//Scan the string and strip out the HTML from it
NSScanner *scanner = [NSScanner scannerWithString:html];
NSString *text = nil;
while([scanner isAtEnd] == NO)
{
//Beginning of a tag
[scanner scanUpToString:@"<" intoString:nil];
//End of a tag
[scanner scanUpToString:@">" intoString:&text];
//Replace the found tag with a space
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@" "];
}
return html;
}
NSString *removeRandomTags(NSString *html)
{
NSScanner *scanner = [NSScanner scannerWithString:html];
NSString *text = nil;
while([scanner isAtEnd] == NO)
{
//Beginning of a tag
[scanner scanUpToString:@"&" intoString:nil];
//End of a tag
[scanner scanUpToString:@";" intoString:&text];
//Replace the found tag with nothing
html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@;", text] withString:@""];
}
return html;
}
__________________
Macbook Air 13inch Ultimate
Hexcore MacPro 3.33ghz - 24 gigs ram - ATI 5870 - Dual 27inch ACD's |
||
|
|
0
|
|
|
#9 |
|
I know I used it recently in some code that I got working. Let me try to dig it up instead of just writing it by hand.
EDIT : found it, unfortunately, mine is used to validate that a string is a representation of an hex number (0xAFF2 for example) : Code:
NSError * error;
NSRange range = { 0, [self length] };
NSRegularExpression * regex = [[NSRegularExpression alloc] initWithPattern:@"\\A0x[0-9a-f]+\\z" options: NSRegularExpressionCaseInsensitive error: &error];
if([regex numberOfMatchesInString: self options:0 range: range])
{
...
}
You could start with NSLog()ing the result from [regex numberOfMatchesInString:options:range:] though, that could give you a big clue to see if it's finding anything. Then once you're matching stuff, go with the replace function.
__________________
"What you leave behind is not what is engraved in stone monuments, but what is woven into the lives of others." -- Pericles |
|
|
|
0
|
|
|
#10 | |
|
Quote:
Code:
NSString *hostName;
__________________
|
||
|
|
0
|
|
|
#11 | |
|
Quote:
https://developer.apple.com/library/...01281-BBCHBFAH I use to capitalize the letters on the 2nd and subsequent words and had stopped because of something I had read. I guess it was wrong information.
__________________
"What you leave behind is not what is engraved in stone monuments, but what is woven into the lives of others." -- Pericles |
||
|
|
0
|
|
|
#12 | |
|
Quote:
Code:
NSString * strippedandpurifiedsitedata; Code:
NSString * strippedAndPurifiedSiteData; NSString * stripped_and_purified_site_data; Short names don't necessarily need the same rules. Example: Code:
NSRegularExpression * regex; Code:
NSRegularExpression * regularexpression; NSRegularExpression * regularimpression; NSRegularExpression * regularexpanssion; |
||
|
|
0
|
|
|
#13 | |
|
Quote:
Code:
int main (int argc, const char * argv[])
{
@autoreleasepool {
NSError * error;
NSString * stringData = [[NSString alloc] initWithContentsOfFile: @"/path/to/file/named/taggeddata.html" encoding: NSUTF8StringEncoding error: &error];
if(error)
{
NSLog(@"%@", [error description]);
exit(EXIT_FAILURE);
}
NSString * strippedStringData = [[NSString alloc] initWithString: stripHTML(stringData)];
strippedStringData = removeRandomTags(strippedStringData);
NSLog(@"String contains : \n%@", strippedStringData);
NSRegularExpression * regexp = [[NSRegularExpression alloc] initWithPattern: @"^\\s+$" options: NSRegularExpressionAnchorsMatchLines error: &error];
NSLog(@"Found %lu matches", [[regexp matchesInString: strippedStringData options: 0 range: NSMakeRange(0, [strippedStringData length])] count]);
if([[regexp matchesInString: strippedStringData options: 0 range: NSMakeRange(0, [strippedStringData length])] count])
{
NSMutableString * strippedAndPurifiedString = [[NSMutableString alloc] initWithString: strippedStringData];
[regexp replaceMatchesInString: strippedAndPurifiedString options: 0 range: NSMakeRange(0, [strippedStringData length]) withTemplate: @""];
NSLog(@"New String contains :\n %@", strippedAndPurifiedString);
}
}
return EXIT_SUCCESS;
}
I love these things. So powerful, once you got them sorted out.
__________________
"What you leave behind is not what is engraved in stone monuments, but what is woven into the lives of others." -- Pericles Last edited by KnightWRX; Feb 14, 2012 at 07:28 PM. Reason: fixed "withTemplate" argument and removed file path. |
||
|
|
0
|
|
|
#14 |
|
Rewrote the code without stripHTML() and removeblahblahtags() :
Code:
int main (int argc, const char * argv[])
{
@autoreleasepool {
NSError * error;
NSMutableString * stringData = [[NSMutableString alloc] initWithContentsOfFile: @"/path/to/taggeddata.html" encoding: NSUTF8StringEncoding error: &error];
NSUInteger matches;
if(error)
{
NSLog(@"%@", [error description]);
exit(EXIT_FAILURE);
}
NSRegularExpression * regexp = [[NSRegularExpression alloc] initWithPattern: @"</*[a-z]+\\s*[a-z]*=*\"*[0-9a-z]*\"*>" options: NSRegularExpressionCaseInsensitive error: &error];
matches = [[regexp matchesInString: stringData options: 0 range: NSMakeRange(0, [stringData length])] count];
if(matches)
{
NSLog(@"Matched %lu tags, let's strip 'em", matches);
[regexp replaceMatchesInString: stringData options: 0 range: NSMakeRange(0, [stringData length]) withTemplate: @""];
}
regexp = [[NSRegularExpression alloc] initWithPattern: @"^\\s*$" options: NSRegularExpressionAnchorsMatchLines error: &error];
matches = [[regexp matchesInString: stringData options: 0 range: NSMakeRange(0, [stringData length])] count];
if(matches)
{
NSLog(@"Found %lu matches", matches);
[regexp replaceMatchesInString: stringData options: 0 range: NSMakeRange(0, [stringData length]) withTemplate: @""];
NSLog(@"New String contains :\n %@", stringData);
}
}
return EXIT_SUCCESS;
}
Code:
2012-02-14 20:40:24.163 regexptest[13592:707] Matched 26 tags, let's strip 'em 2012-02-14 20:40:24.167 regexptest[13592:707] Found 9 matches 2012-02-14 20:40:24.168 regexptest[13592:707] New String contains : Beginner Upper Mambo Alley Yes Yes Lower Mambo Alley Yes Yes Snow Drop - Beginner's Area Yes Yes Program ended with exit code: 0 I basically replaced all your NSScanner stuff and multitude of strings with 1 NSMutableString and 2 RegularExpressions.And people say Lion sucks. .EDIT : Works perfectly this morning, after removing some "over-thinking" and using NSMutableString's delete function rather than replacing with null characters (which is more akin to what you want) : Code:
#import <Foundation/Foundation.h>
void removeMatchesFromString(NSArray *, NSMutableString *);
int main (int argc, const char * argv[])
{
@autoreleasepool {
NSError * error;
NSMutableString * stringData = [[NSMutableString alloc] initWithContentsOfFile: @"/path/to/taggeddata.html" encoding: NSUTF8StringEncoding error: &error];
NSArray * matches;
if(error)
{
NSLog(@"%@", [error description]);
exit(EXIT_FAILURE);
}
NSRegularExpression * regexp = [[NSRegularExpression alloc] initWithPattern: @"<[\\w\\d\\s\"=/]+>" options: NSRegularExpressionCaseInsensitive error: &error];
matches = [regexp matchesInString: stringData options: 0 range: NSMakeRange(0, [stringData length])];
NSLog(@"Matched %lu tags, let's strip 'em", [matches count]);
removeMatchesFromString(matches, stringData);
NSLog(@"After HTML strip : %@", stringData);
regexp = [[NSRegularExpression alloc] initWithPattern: @"^\\s*$" options: NSRegularExpressionAnchorsMatchLines error: &error];
matches = [regexp matchesInString: stringData options: 0 range: NSMakeRange(0, [stringData length])];
NSLog(@"Matched %lu garbage, let's strip 'em", [matches count]);
removeMatchesFromString(matches, stringData);
NSLog(@"After purification : \n%@", stringData);
}
return EXIT_SUCCESS;
}
void removeMatchesFromString(NSArray * matches, NSMutableString * string)
{
NSInteger rangeoffset = 0;
NSRange range;
for(int i = 0; i < [matches count]; i++)
{
range = [[matches objectAtIndex: i] range];
range.location -= rangeoffset;
if(!range.length)
range.length++;
rangeoffset += range.length;
[string deleteCharactersInRange: range];
}
}
Code:
2012-02-15 06:33:34.323 regexptest[14110:707] After purification : Beginner Upper Mambo Alley Yes Yes Lower Mambo Alley Yes Yes Snow Drop - Beginner's Area Yes Yes Program ended with exit code: 0
__________________
"What you leave behind is not what is engraved in stone monuments, but what is woven into the lives of others." -- Pericles Last edited by KnightWRX; Feb 15, 2012 at 05:38 AM. |
|
|
|
0
|
|
|
#15 |
|
I am wondering, if you use a scanner and a character set from +newlineCharacterSet, would that not easily capture any type of returns? Just scan each line (-scanUpToCharactersFromSet:intoString:), append the string into a mutable string, then -scanCharactersFromSet:intoString: into nil to get to the next line.
__________________
Mr. Paul, sir, I thought you should be advised, there seems to be a zombie tribble clinging to your head, for it is scarfing your brain
|
|
|
|
0
|
![]() |
|
«
Previous Thread
|
Next Thread
»
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
Similar Threads
|
||||
| thread | Thread Starter | Forum | Replies | Last Post |
| Mac can't see airport express or servers after about an hour on wifi | blueeye | Mac OS X Server, Xserve, and Networking | 4 | Jan 14, 2012 02:02 PM |
| Use javascript to change <td> background colour based on value in an array | dom.mason | Web Design and Development | 2 | May 5, 2010 04:49 PM |
| Unable to send email thru OptimumOnline from a Macbook Pro | mariaann411 | Mac Basics and Help | 6 | Mar 21, 2009 02:16 PM |
| Unable to get 1080p tailer links from Apple's site on Windows | atszyman | Mac Basics and Help | 3 | Sep 9, 2005 03:33 PM |
| Unable to load Medal of Honourgame from icon | enterthelight | Games | 0 | Dec 28, 2003 04:51 AM |
All times are GMT -5. The time now is 09:04 AM.







Thanks for the clarification.
I never even heard of NSRegularExpression so I will take a look at it and see how to work with it.

.
Linear Mode

