[java] replacing all non alpha chars in a String with no char

Discussion in 'Mac Programming' started by Mac Player, Dec 5, 2006.

  1. Mac Player macrumors regular

    Joined:
    Jan 19, 2006
    #1
    For example :

    amajjjfgy, torsrfewf

    to


    amajjjfgytorsrfewf

    TIA
     
  2. gekko513 macrumors 603

    gekko513

    Joined:
    Oct 16, 2003
    #2
    There are many ways to do it. They'll vary in simplicity, flexibility, correctness and performance. Here's a straightforward way that also works with non-english letters. If you only want [A-Z][a-z] you'll have to do something different.
    Code:
        public static String stripNonAlphaCharsFromString(String str) {
            StringBuffer sb = new StringBuffer();
            for (int i=0; i<str.length(); i++) {
                char ch = str.charAt(i);
                if (Character.isLetter(ch))
                    sb.append(ch);
            }
            return sb.toString();  
        }
    
     
  3. robbieduncan Moderator emeritus

    robbieduncan

    Joined:
    Jul 24, 2002
    Location:
    London
    #3
    If this is homework or an assignment for a class etc it's probably a good idea to come clean now...

    As a starter you can loop over the characters in the string testing each character using charAt to get the character at an individual index then turn that into a Character object and use either isLetter or isLetterOrDigit (depends if you want to allow letters). Use a StringBuffer to build a new result string and convert to String at the end.

    Edit: Doh! Beaten by Gekko who also gave you the solution so you don't even need to learn!
     
  4. Mac Player thread starter macrumors regular

    Joined:
    Jan 19, 2006
    #4

    I tried a similar solution using 2 loops 1 to select a char from the string and another to compare the char with all the 24 possible chars. I doesnt look very pretty or efficient thats why i asked here.

    New solution: Since ill put all chars in the string to lower case i can just convert each char in the string to an int and check if it is between (including) 97 and 122.
     
  5. plinden macrumors 68040

    plinden

    Joined:
    Apr 8, 2004
    #5
    Not really relevant here, but you could remember it in the future.

    When manipulating characters from Strings in Java, if the String is longer than about 300 characters, getting an array using toCharArray and iterating over the array is faster than using charAt. charAt is slightly faster than toCharArray with shorter Strings. (That's since JDK1.4 - with earlier versions the limit was about 15 characters)
     
  6. gekko513 macrumors 603

    gekko513

    Joined:
    Oct 16, 2003
    #6
    If (s)he's old enough to learn programming I figure (s)he's old enough to take responsibility for his/her own learning.
     
  7. Mac Player thread starter macrumors regular

    Joined:
    Jan 19, 2006
    #7
    The average string length here is about 50 chars so i dont think it will be an huge hit in performance. But thanks for the tip
    :)
     
  8. robbieduncan Moderator emeritus

    robbieduncan

    Joined:
    Jul 24, 2002
    Location:
    London
    #8
    Don't do that! You are assuming that the string is only using English characters in ASCII or one of the Unicode encodings that is ASCII compatible. The isLetter method should allow for accented letters and so on and is the "correct" way of doing this...
     
  9. robbieduncan Moderator emeritus

    robbieduncan

    Joined:
    Jul 24, 2002
    Location:
    London
    #9
    Fair enough. My parents are both teachers and I think I must have inherited the annoying teachers habit of giving enough information so as I think you should be able to solve the problem without actually solving the problem for you :D
     
  10. Mac Player thread starter macrumors regular

    Joined:
    Jan 19, 2006
    #10


    I just need english characters so dont worry. :)
     
  11. jeremy.king macrumors 603

    jeremy.king

    Joined:
    Jul 23, 2002
    Location:
    Fuquay Varina, NC
    #11
    Alternatively you can use the String class' replaceAll method with a regular expression. Obviously, its more restrictive than gekko513's solutions, but more simple.

    Code:
        String someString = "abc jalskdf ,.m/;ppodif";
        String newString = someString.replaceAll("[^A-Za-z]", "");
    
     
  12. Mac Player thread starter macrumors regular

    Joined:
    Jan 19, 2006
    #12
    Thanks i lot i was looking for something like that, ill just check to see how the "[^A-Za-z]" thing works.


    Anyway my code doenst work:confused:
    When it reaches the "." he replaces all the string :eek: :
     
  13. jeremy.king macrumors 603

    jeremy.king

    Joined:
    Jul 23, 2002
    Location:
    Fuquay Varina, NC
    #14
    This is known as a regular expression and has many rules with respect to the syntax.

    The following is all you will need, no looping or tests for charAt or ASCII indexes - its this simple.

    Code:
    public class TestString {
    
      public static void main(String[] args) {
        String someString = "Test  -- this, *&%you _(@*#$(*stinky beo!!!tch!";
        
        System.out.println(someString.replaceAll("[^A-Za-z]", ""));
      }
    }
    
     
  14. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #15
    I think your attitude is extraordinarily naïve.
     
  15. Mac Player thread starter macrumors regular

    Joined:
    Jan 19, 2006
    #16
    "Dont use a canon to kill bee"

    U dont even no what the program is supposed to do why argue with me?
     
  16. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #17
    You see, it only took you about twenty seconds of your time to convince me that I would never want you working as a software developer anywhere near me.

    And if your code only has to handle English characters, then it certainly doesn't. There are more characters in English than 24 as you said.
     
  17. gekko513 macrumors 603

    gekko513

    Joined:
    Oct 16, 2003
    #18
    That's right. You can also make that approach work for non-English letters by doing this...
    Code:
    someString.replaceAll("[^\\p{L}]", "");
    The code is shorter, but it requires more knowledge about regex patterns (see java.util.regex.Pattern) and the Unicode standard.
     
  18. jeremy.king macrumors 603

    jeremy.king

    Joined:
    Jul 23, 2002
    Location:
    Fuquay Varina, NC
    #19
    It took you only 20 seconds to convince me that you probably aren't a very good person to work with in the first place. A know it all who's brash and insulting. Wow! :rolleyes:

    Exactly why did you bother posting here?
     
  19. HiRez macrumors 603

    HiRez

    Joined:
    Jan 6, 2004
    Location:
    Western US
    #20
    My feeling is that if you cheat by copying someone else's code, while it may help you out on a single assignment, it's going to come back to bite you eventually. You're going to need to understand the concepts behind the code and if you don't you'll be in big trouble.
     
  20. gekko513 macrumors 603

    gekko513

    Joined:
    Oct 16, 2003
    #21
    You're right, so if you do copy code, you must make sure you understand what it's doing and why it's done that way. That's what I mean by taking responsibility for own learning.

    Most of my own learning is done by starting out using code samples and tutorials that I find on the web to and figure out what they're doing and why. Once I get the basic picture I'm able to utilise the language and API I'm working with to do what I want.
     
  21. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #22
    Let's summarize: A perfectly good solution was posted. The original poster wants to go with a far more complicated and inferior solution. He doesn't care that it only works for English letters. That's what he says. If you read my first post carefully you would have noticed a hint that it doesn't work for English letters either (apart from the outright stupid remark about 24 English letters - A to Z are 26 letters, and apart from uppercase letters, there are a few more letters used in the English language).

    And since this was a homework question, the best that can happen is that he fails the assignment and learns from the experience.
     
  22. MarkCollette macrumors 68000

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #23
    That's what I like about these forums. I would have only thought about using the StringBuffer and Character APIs, and here's a reminder of the Regex APIs.

    To the original poster, even if you don't use a Regex approach, I highly recommend you learn it regardless. It's kind of like learning SQL, it's an invaluable language to learn that is relevent no matter what programming you end up doing.


    Mac Player, I can understand that your assignment works within certain bounds, and you might not feel it necessary to to take a more complicated approach, but you're missing out on a few things.

    • A String object contains a char[]. char primitives are Unicode characters, encapsulating more than just 7 bit ASCII values. Part of the reason for using Java is that it solves issues like the ASCII to Unicode migration that various developers have had to deal with over the past decade. You can choose to spend the time breaking this advantage, hobbling your program artificially. I'm not sure why you would do that.
    • "English" does have accents in it. Many words, taken from other languages, and adopted into English, use accents. For example déjà vu.
    • You have to ask yourself, did your assignment mean to strip out characters that are not AZaz, or to strip out punctuation and numbers. There's the wording in the assignment spec, and then there's the intent of the teacher. Experience should teach you that when you hit ambiguities like this, the correct answer is not algorithm A or B, but rather to ask more questions of whoever told you to write the program.
    • plinden was talking about optimisations. A general rule of optimisations, that I hold dear, is never do it if you don't even understand or know the unoptimised form. An optimisation is a short-cut, but how can you know if a short-cut is valid if you don't know anything about the main road?
    • Related to the last point, you should probably focus, at this stage of learning, to discover the java.lang.Character methods that are available to you. You can easily download the Java source and see what they're doing, and compare that code to yours, if you insist on rolling your own solution AKA reinventing the wheel.
     
  23. ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #24
    I'll write it in C as compactly as possible you can translate to whatever you like

    char s[] = "amajjjfgy, torsrfew";
    for (i=0; i++; s){
    if(isalpha(s){
    strcpy(&s, &s[i+1]);
    }
    }

    Yes I actually do use code like the above in real life. It has uses for example to remove prohibited characters from strings before using them to build an SQL query.
     
  24. bousozoku Moderator emeritus

    Joined:
    Jun 25, 2002
    Location:
    Gone but not forgotten.
    #25
    I usually try to point someone to documents or give them enough information so that they can follow it to what should be its natural end. It may be expedient to give an answer but there is no lesson learned.

    You're right, of course, and the lesson becomes somewhat more bitter when it's learnt too late.
     

Share This Page