Batch-formatting phone numbers in vCard file?

Discussion in 'Mac Programming' started by exi, Sep 11, 2016.

  1. exi macrumors 6502

    Joined:
    Oct 16, 2012
    #1
    Posting here looking for the more technical folks who might know how to do this.

    Have hundreds of contacts. Looking to transition away from Apple Contacts.

    Exporting vCard, version 3.0.

    Imported into any other software, some phone numbers are formatted (123) 456-7890; others, 1234567890.

    Looking at the vCard file in TextEdit shows the same for the same people, of course.

    Question is: how could I run through the vCard file and somehow automatically apply formatting such that every ten-digit string of numbers is formatted as (123) 456-7890?

    There are far too many to do this manually.
     
  2. Weaselboy Moderator

    Weaselboy

    Staff Member

    Joined:
    Jan 23, 2005
    Location:
    California
    #2
    What I would do is convert the vcard file to a comma separated value (CSV) file. Then open the CSV file in a spreadsheet application and do the search and replace you want and save. Then convert the CSV back to vcard and import it to your new app.

    If you Google "convert vcard to csv" you can find apps and even web sites that will convert for you. Sorry I don't have a specific recommendation since I have not used any of the apps.
     
  3. exi thread starter macrumors 6502

    Joined:
    Oct 16, 2012
    #3
    That sounds great, but the search and replace would require manual entry of hundreds of phone numbers. Suppose there's a way to search for ten-digit strings and then add formatting?
     
  4. Weaselboy Moderator

    Weaselboy

    Staff Member

    Joined:
    Jan 23, 2005
    Location:
    California
    #4
    It would depend on the spreadsheet app of course, but yes, that is what I was thinking.
     
  5. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #5
    An 'awk' script might work. Or perl, which I don't know.

    Awk can be told to match a pattern. The pattern is given as a regular expression, and "10 digits" is pretty simple as regex's go.

    If you're willing to post some sample data, I can take a shot at an 'awk' script.

    The data should be vcard text. If it's really long, post it as an attachment, otherwise up to ~50 lines can be inline in a post if it's pasted within CODE tags. The CODE tags don't have an intrinsic length limit AFAIK, it's just a pain to deal with copy/paste when a file download is an option.
     
  6. exi thread starter macrumors 6502

    Joined:
    Oct 16, 2012
    #6
    Thanks, both of you.

    @chown33: thanks for the offer. Have only cursory familiarity with regex but not with awk or any real implementation of such things.

    Here's some sample data with placeholder text by me, of course. It just repeats in one big wall of text, as you know, for following contacts -- and with additional information where relevant (type=WORK and whatnot).

    The actual vCard file in question has many hundreds of entries. Happy to do whatever I can to help make sense out of it.

    Code:
    BEGIN:VCARD
    VERSION:3.0
    PRODID:-//Apple Inc.//Mac OS X 10.11.6//EN
    N:lastname;firstname;;;
    FN:firstname lastname
    EMAIL;type=INTERNET;type=HOME;type=pref:email@domain.com
    TEL;type=CELL;type=VOICE;type=pref:(123) 456-7890
    ADR;type=HOME;type=pref:;;street;city;two-letter state;zip;country
    BDAY:yyyy-mm-dd
    X-ABOrder:FIRST
    CATEGORIES:category
    UID:[string]
    X-ABUID:[string]:ABPerson
    END:VCARD
    
    Some contact entries in the vCard file read as such:
    Code:
    ...
    TEL;type=CELL;type=VOICE;type=pref:1234567890
    ...
    
    Of course, I'm trying to make the number formatting consistent.
     
  7. chown33, Sep 13, 2016
    Last edited: Sep 13, 2016

    chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #7
    Thanks for posting the example vcard data. It clarifies exactly what output format you want. I'll post the conversion commands a little later.

    First, I'm wondering how many entries this will need to deal with, and whether it might miss anything simply because you're not aware of them. So please do the following and post the results.

    1. Export all the Contacts records to "test.vcf" stored on your Desktop.

    2. Paste the following commands into a Terminal window, exactly as given:
    Code:
    grep 'TEL' ~/Desktop/test.vcf | grep -E -c '[0-9]{5,}'
    grep 'TEL' ~/Desktop/test.vcf | grep -E -c '[0-9]{10}' 
    3. The output should be two numbers. They should be the same. Please post them.​
    --- Post Merged, Sep 13, 2016 ---

    Below is the 'awk' conversion script, and instructions on how to use it.

    1. Paste the following into a plain text file:
    Code:
    #!/usr/bin/awk
    
    ## Input is a vcard file or stream.
    ##
    ## Output on stdout is the input with all 10-digit TEL numbers
    ## converted to: (nnn) nnn-nnnn
    ##
    ## Only lines whose 1st field contains "TEL" are converted.
    ## Only the last field on a TEL line will be converted.
    ## Only numbers with exactly 10 digits are converted.
    ## All other lines, values, and numbers are copied verbatim.
    
    
    ## Set delimiters for breaking lines into fields and
    ## building back into lines to be a semicolon.
    BEGIN  {
      FS=";"
      OFS=";"
    }
    
    
    ## Only convert lines whose 1st field contains "TEL".
    $1 ~ /TEL/  {
      ## Only where last field has a 10-digit number.
      match( $NF, /[0-9]{10}/ )
      if ( RSTART != 0 )  {
        prefix = substr( $NF, 1, RSTART - 1 )
        numstr = substr( $NF, RSTART, RLENGTH )
        suffix = substr( $NF, RSTART + RLENGTH )
    
        num_A = substr( numstr, 1, 3 )
        num_B = substr( numstr, 4, 3 )
        num_C = substr( numstr, 7 )
    
        result = "(" num_A ") " num_B "-" num_C
     
        $NF = "" prefix "" result "" suffix
      }
    
      print $0
      next
    }
    
    
    ## Any lines not matching a pattern above will print the line verbatim.
    {  print $0;  }
    
    2. Save this plain text file on your Desktop as "tele-10.txt".

    3. In Contacts.app, export all your data as vCard, storing it in a file on your Desktop named "all.vcf".

    4. In a Terminal window, paste this exact command line:
    Code:
    awk -f ~/Desktop/tele-10.txt ~/Desktop/all.vcf >~/Desktop/new.vcf
    5. If the command runs with no error messages in the Terminal window, the output should now be in "new.vcf" on your Desktop.

    6. You can drag "new.vcf" onto TextEdit.app and it will open it as a text file. Find one of the entries that you knew to be a 10-digit number, and confirm that the TEL line is now formatted as desired.

    7. Use "new.vcf" as your new full list of contacts.


    If there are any error messages from the command in #4, copy and paste the complete exact message and post it here.
     
  8. exi, Sep 13, 2016
    Last edited: Sep 13, 2016

    exi thread starter macrumors 6502

    Joined:
    Oct 16, 2012
    #8
    Thanks again for your help!

    As for the two initial commands: the numbers are 80 and 78, respectively -- what's that all about / will your script still function as expected? Have not yet tried. For context, this will be running through a vCard file with about 700 entries, many hundreds of which have phone numbers and not just other contact information, and many of those have more than one number. Am looking up the details of the functions you're using. Good to learn.
     
  9. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #9
    The first number (80) is the count of "TEL" lines that have 5 or more digits in a row. Note that 10-digit "TEL" lines fall into this category.

    The second number (78) is the count of "TEL" lines that have exactly 10 digits in a row. This is the category you wanted to be reformatted.

    I chose 5-or-more because the format you wanted "(nnn) nnn-nnnn" has at most 4 digits in a row. That is, no properly formatted phone number will have 5 or more digits in a row. Only improperly formatted phone numbers will.

    The conclusion drawn from these two numbers is that you have 2 "TEL" lines with 5 or more digits in a row (80 minus 78), and they aren't 10-digits (78). As a result, there are 2 "TEL" lines that will not be formatted in your desired format.

    This is only an approximation; it might be wrong. It's conceivable that the substring "TEL" appears in some non-telephone lines, along with a number of 5 or more digits. The 'awk' program will NOT convert such lines, as its search for "TEL" is more specific than the 'grep' commands shown.

    I recommend that you test "new.vcf" in whatever your Contacts replacement will be. Don't commit to it until you're certain it works correctly.


    If you want to find those 5-or-more "TEL" lines, paste this command line into a Terminal window:
    Code:
    grep 'TEL' ~/Desktop/new.vcf | grep -E '[0-9]{5,}'
    The output will be only the "TEL" lines. You can then open "new.vcf" in TextEdit.app and search for the numbers, to see what the complete vcard entries are.

    Note that it uses the converted "new.vcf" file as input.
     
  10. exi thread starter macrumors 6502

    Joined:
    Oct 16, 2012
    #10
    Ah, I see. I have some numbers left which are pager numbers which may not always follow convention and/or are nonstandard. Easily 98%+ of the file is the same case -- mostly (nnn) nnn-nnnn, some nnnnnnnnnn, all of which should be the former.

    Getting an error message using the script in the post above. It's pasted below.

    Code:
    awk: syntax error at source line 1 source file /Users/Exi/Desktop/tele-10.txt
    
    context is
    
        >>> {\ <<< rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf470
    
    awk: illegal statement at source line 2 source file /Users/Exi/Desktop/tele-10.txt
    
    awk: illegal statement at source line 2 source file /Users/Exi/Desktop/tele-10.txt
    
     
  11. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #11
    Yeah, I thought that might happen.

    You're not converting it to plain text first. Instead, it's being saved as an RTF file (I can tell by the "rtf1" in the error message).

    Make absolutely sure it's plain text before saving it.
     
  12. exi thread starter macrumors 6502

    Joined:
    Oct 16, 2012
    #12
    Sigh. I saw that too and thought I corrected. Silly.

    Ran the script. Have the output file. Ran your additional line and found the two TELE lines in question -- one, a number formatted as nnnnnn-nnnn (!?); the other, an international number formatted as nnn nn nnnn nnnnnn. Fixed them in the output file by hand.

    Spot checked a few contacts with numbers I know are formatted incorrectly in the original file. They now appear as they should as (nnn) nnn-nnnn.

    Thanks again for your help. Obviously, you know far more about such things than I do -- the one thing I would wonder is whether there's anything that could be done or if there is any utility in somehow running something to verify data in the new file. My very rudimentary understanding of grep and your script is that it shouldn't be destructive in any way or modify anything aside from formatting as you've mentioned in the comments in the script, but I know what I don't know, so to speak.

    As an aside, just to give me an idea, what would be required if I were to want to append a country code -- +1 in my case -- to all numbers throughout the file? If it takes more than two minutes, no bother. You've gone way beyond as it is.
     
  13. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #13
    You're welcome. It was an interesting diversion.

    One thing I learned when looking into vCard is that there's really no guarantee of compatibility. It all depends on what app is producing or consuming it. For example:
    https://alessandrorossini.org/the-sad-story-of-the-vcard-format-and-its-lack-of-interoperability/

    Reading the vCard specs is just as disheartening. At least the conversion task in this case was simple and narrowly defined. I'd hate to have to write a more general "reformat phone numbers" conversion (or worse: street addresses).


    I intentionally made the 'awk' script be very discriminating about what patterns to match, and what to modify when it found a fully qualified match. The 'grep' cmds were less discriminate, but useful for testing.

    If you want, I can modify the 'awk' script so instead of modifying the vcard data it simply outputs the lines that match. Then you can visually confirm that only those lines will be converted.

    Let me know if you want that, it's pretty easy to change the script for it.


    Well, it's definitely more than two minutes work.

    The main reason is that everything non-trivial has to change:
    1. The pattern-matching is different.
    It has to match a "(nnn) nnn-nnnn" pattern, rather than 10-digits.

    2. The action taken on finding a match is different.
    The breaking and reassembly is completely different.​

    So pretty much everything other than the script line with BEGIN and the last catch-all action will have to be changed. Plus there's the testing.
     
  14. exi thread starter macrumors 6502

    Joined:
    Oct 16, 2012
    #14
    Ah, yes, I actually meant data integrity in the sense that something which could verify that the numbers before and after are the same -- that is, nobody's phone number was somehow changed in the process. Which of course is almost certainly prevented by a discerning script, but just curious.

    Thinking of exporting to Fastmail, actually, which I find to be very well-supported and coded as far as what they do.

    I would definitely be interested in that modified awk script -- for that and for my own edification. I enjoy these sorts of things and am a tech sort of guy in general, but it's not my profession, and so once things turn to code and the more nitty gritty...
     
  15. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #15
    Here's the modified awk script and instructions.

    1. Paste the following into a plain text file:
    Code:
    #!/usr/bin/awk
    
    ## Input is a vcard file or stream.
    ##
    ## Output on stdout is only lines with 10-digit TEL numbers.
    ##
    ## Only lines whose 1st field contains "TEL" are output.
    ## Only the last field on a TEL line will be tested for 10 digits.
    
    
    ## Set delimiters for breaking lines into fields.
    BEGIN  {
      FS=";"
      OFS=";"
    }
    
    
    ## Only for lines whose 1st field contains "TEL".
    $1 ~ /TEL/  {
      ## Only where last field has a 10-digit number.
      match( $NF, /[0-9]{10}/ )
      if ( RSTART != 0 )  {
    
        print $0
      }
    }
    
    2. Save this plain text file on your Desktop as "teller.txt".

    3. Use the same "all.vcf" as before.

    4. In a Terminal window, paste this exact command line:
    Code:
    awk -f ~/Desktop/teller.txt ~/Desktop/all.vcf >~/Desktop/tens.txt
    5. If the command runs with no error messages in the Terminal window, the output should now be in "tens.txt" on your Desktop.

    6. Open it in TextEdit.app and you'll see the TEL lines with the 10-digit numbers.


    You can manually confirm every line in "tens.txt" with "new.vcf", but here's an automated way. You can do this without producing "tens.txt" at all. That's more for your own uses.

    In a Terminal window, paste this exact command line:
    Code:
    diff ~/Desktop/all.vcf ~/Desktop/new.vcf >~/Desktop/diffs.txt
    Then open "diffs.txt" in TextEdit. It will be a list of the differences between the files.

    Lines starting with '<' show what's in the 1st file (all.vcf). Lines starting with '>' show what's in the 2nd file (new.vcf). Only "TEL" lines with 10-digits should be shown coming from "all.vcf", and the changed output should be on the line after it.

    A "magic code" before each < line looks like this:
    157c157​

    This means line 157 in the 1st file changed to line 157 in the 2nd file, and the changes are shown as the < and > lines following it.

    The 'diff' cmd is capable of a lot more, so you might want to play with it. Here's its man page:
    https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/diff.1.html
     
  16. exi thread starter macrumors 6502

    Joined:
    Oct 16, 2012
    #16
    This is fantastic stuff. Will play more later, but I think that should about do it. I had tried to figure out how to do this years ago when I migrated from something else to iCloud for mail/contacts/calendars and had no luck there either. Thanks again.
     

Share This Page