Need help with a really basic shell script to echo weather from an XML

Discussion in 'Mac Programming' started by ideal.dreams, Jun 24, 2017.

  1. ideal.dreams macrumors 68020

    ideal.dreams

    Joined:
    Jul 19, 2010
    Location:
    OH
    #1
    I am using Geektool to run the below shell script which, as written, uses curl to get an XML file containing weather information for the defined location and then echoes the current temperature followed by current conditions.

    Code:
    #!/bin/sh
    
    Location="33004"
    Unit="i"
    
    XML="$(curl -s "http://wxdata.weather.com/wxdata/weather/local/$Location?cc=*&unit=$Unit&dayf=0")"
    
    echo "$XML" | xpath 'weather/cc/t | weather/cc/tmp' 2>&1 |grep -E "<t>|<tmp>" |sed -e 's/-- NODE --//' | sed -e 's/<[^>]*>//g' | tr '\n' ' '
    printf "\n"
    I am relatively experienced in PHP and HTML so I can kind of read the code to see what's happening where, but otherwise I am new to shell as of an hour ago. I've spent the last few hours googling to figure out how to do what I want to do to no avail. As written, the script echoes the temperature followed by conditions, so it'll output something like "77 Mostly Cloudy."

    However, I want it to output "Mostly Cloudy, 77 °F" but I cannot for the life of me figure out how to swap around the order and add the °F to the mix without throwing a syntax error.

    Any help would be GREATLY appreciated -- I'm sure this is a SUPER simple adjustment to someone experienced in shell.
     
  2. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #2
    The basic principle I used below is Break It Down. This amounts to doing less in each step, at the cost of more steps. Simpler is better.

    Code:
    Location="33004"
    Unit="i"
    
    XML="$(curl -s "http://wxdata.weather.com/wxdata/weather/local/$Location?cc=*&unit=$Unit&dayf=0")"
    
    TMP=$( echo "$XML" | xpath 'weather/cc/tmp' 2>/dev/null | sed -e 's/<[^>]*>//g' )
      T=$( echo "$XML" | xpath 'weather/cc/t'   2>/dev/null | sed -e 's/<[^>]*>//g' )
    
    echo -e "$T, $TMP\xc2\xb0F"
    
    Since there's no value in the junk emitted on stderr (fd 2) when running xpath, I've sent it to the bit-bucket. The remaining text is simpler and easier to parse.

    I also broke it into two separate actions, so the results are placed in two separate variables. You can echo these separately, to see what's in them.

    Finally, I told echo to accept hex-data notation (\xXX), and provided the 2-byte UTF-8 codes for the degree symbol.

    I discovered what those hex bytes should be with this:
    Code:
    hexdump -C
    Then I typed the degree symbol on the keyboard, pressed return, then ctrl-D. The hex output was this:
    Code:
    00000000  c2 b0 0a                                          |...|
    The 0a is the newline, which means the c2 b0 is the sequence for the degree symbol. If that makes no sense to you, then read this:
    https://en.wikipedia.org/wiki/UTF-8
     
  3. ideal.dreams thread starter macrumors 68020

    ideal.dreams

    Joined:
    Jul 19, 2010
    Location:
    OH
    #3
    @chown33 you are awesome! Thanks so much, works perfectly. When I was tinkering with it I thought it would be better to separate the actions, just as you did, but could not get it to work without a syntax error. Also thank you for the information on the hex code -- I was not doing it that way when I was trying but it seems like that's the better route to take.

    It looks like there's no need for the -e after the echo though, is there? It actually echoes the -e in the output. I removed it and the output looks just like I wanted.

    Thanks again!!
     
  4. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #4
    I'm glad it worked for you. It was an interesting little diversion.

    I tried a literal ° at first, and it failed (as I suspected). So I went with overt UTF-8 bytes.

    The -e is the option for bash's builtin echo. I was assuming that was the shell that was interpreting the cmds.

    The external echo (/bin/echo) doesn't parse -e, nor does it decode \xXX. I'm guessing that Geektool somehow manages to make sense out of the escaped hex.
     
  5. ideal.dreams thread starter macrumors 68020

    ideal.dreams

    Joined:
    Jul 19, 2010
    Location:
    OH
    #5
    T=$( echo "$XML" | xpath 'weather/cc/t' 2>/dev/null | sed -e 's/<[^>]*>//g' )

    @chown33, can you explain what the 2>/dev/null does? And I know sed is for replacement, but can you explain what the following after the command does? I know it removes the tags around the data but I can't understand how.
     
  6. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #6
    The 'xpath' command emits some undesirable junk. Try pasting the following directly into a Terminal window:
    Code:
    Location="33004"
    Unit="i"
    
    XML="$(curl -s "http://wxdata.weather.com/wxdata/weather/local/$Location?cc=*&unit=$Unit&dayf=0")"
    
    The output will be something like:
    Code:
    Found 1 nodes:
    -- NODE --
    <tmp>84</tmp>
    
    I didn't want to parse the first two lines, because it makes things more complex. So when I was first playing around breaking things down, I wondered if I'd have to parse that junk out, or whether it was coming out on a different stream. To find out, I ran this cmd:
    Code:
    echo "$XML" | xpath 'weather/cc/tmp' 2>/dev/null
    I was happy to see that redirecting stderr (which is what 2> does) eliminated the unwanted junk, leaving a nice clean XML piece to parse.

    The 'sed' command is what you posted, and I reused it.

    Basically it tells sed to find a <, then anything other than a >, and that in turn is followed by a >. It then tells sed to replace that entire sequence of < anything > with nothing at all (i.e. delete the sequence found). In other words, it removes all XML start tags (like <tt> or <aardvark>) along with all XML end tags (like </tiddlywinks> or </aardvark>) leaving just the text that was between the tags.

    So given text like <tmp>84</tmp>, it removes everything but the 84.

    To break down sed's pattern more specifically:
    < matches this character literally
    [^>] matches any character other than a literal >
    * means "zero or more repetitions" of the prior pattern (i.e. the [^>])
    > matches this character literally
    The pattern-matcher in sed is "eager" or "greedy", meaning it will try to match the longest sequence it can.

    Also, the given pattern will match a number of sequences that aren't even valid XML, but since we're feeding it output from xpath, we don't need to give sed a more complex matching pattern.
     
  7. ideal.dreams thread starter macrumors 68020

    ideal.dreams

    Joined:
    Jul 19, 2010
    Location:
    OH
    #7
    Thanks for the awesome explanation, it all makes total sense now. Really appreciate the breakdown! It's easiest for me to learn with application so this certainly helps a lot.
     
  8. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #8
    You're welcome. Ask again if you have other questions.
     

Share This Page