Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Rower_CPU

Moderator emeritus
Original poster
Oct 5, 2001
11,219
3
San Diego, CA
OK, I'm trying to do something that I thought would be simple, but apparently isn't.

I'm checking user input via a form to make sure that there are only English characters (no numbers, punctuation, foreign characters or <i>accented Latin</i>). So far everything I've tried just doesn't work.

I think it basically comes down to me not being familiar enough with regex, but I could be tripping up on some obscure thing that includes accented Latin in the typical "a-z" or ":alpha:" ranges.

Any help is much appreciated.
 
Rower_CPU said:
OK, I'm trying to do something that I thought would be simple, but apparently isn't.

I'm checking user input via a form to make sure that there are only English characters (no numbers, punctuation, foreign characters or <i>accented Latin</i>). So far everything I've tried just doesn't work.

I think it basically comes down to me not being familiar enough with regex, but I could be tripping up on some obscure thing that includes accented Latin in the typical "a-z" or ":alpha:" ranges.

Any help is much appreciated.

Just out of curiousity, what is "eregi"? I assume it's a PHP script or something, what does it do? Just curious.
 
hotwire-
eregi is a function for pattern matching. eregi manual

zimv20-
I haven't gone that route yet - I try to avoid explicitly stating values like that. If no-one else has a suggestion that'll be my next attempt.
 
Looking at the eregi manual you supplied and tracking down a man page for POSIX regex (it's a bit dense...), I think that I have a regex will match all non-english characters, leaving only the 52 upper and lower case "regular" characters.

Try using "[^a-z]" (no double quotes needed for the expresssion) the carat (^) should tell eregi to match everything that is NOT in the following list (a-z). You dont need (A-Z) since you are using the case insensitive version (of ereg) - in other words the expression above automatically expands to "[^a-zA-Z]" in strict regex terms. You can then use the result of eregi for your filtering - e.g. if it gets a match then an "illegal" charater has been entered. The entity in "[]" is a single character, and should trigger a match if that character exists anywhere in the input string. "^[a-z]" will probably also work, or may work if "[^a-z]" doesn't.

Note: I don't do PHP. I can cobble together a regex if needed, but I'm no expert - so what I gave you has a chance of not working, but the only thing to do is try it. :) I hope it works, but if not I can give it another go, or maybe what's above will be enough to get you started.

Edit: [a-z] should not match any accented characters as far as I know (but this should be easy to double check by testing your code), but [:alpha:] may, depending on the system locale.
 
Ah, thanks Eminence. I've been using !eregi and then trying to match the "bad stuff", but this is much more simple. Too many double-negatives and I was getting all turned around.

I added a space ("[^a-z ]") to allow spaces between words in the string and I'm set.

One fun thing this mess produced was a form for testing it out so that I wasn't bombarding my database with inserts as I tested my regex. Here's the page:
http://larcx.sdsu.edu/dma/contributor/test/regexp.php
 
Rower_CPU said:
Ah, thanks Eminence. I've been using !eregi and then trying to match the "bad stuff", but this is much more simple. Too many double-negatives and I was getting all turned around.

I added a space ("[^a-z ]") to allow spaces between words in the string and I'm set.

Excellent! Glad I could help. Good catch with the space, I'd forgotten about that one.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.