comparing adjacent lines of file

Discussion in 'Mac Programming' started by wrathkeg, Mar 18, 2013.

  1. wrathkeg macrumors newbie

    Feb 1, 2010
    Hi all.

    Not sure if this is the right place to ask this, but here goes.

    Is there a way, preferably using terminal commands, I can compare adjacent lines of a text file to see if they contain any of the same words?

    So for a file like this
    one two three
    three four five
    six seven eight
    the first and second lines get returned (since 'three' is repeated) but not the second and third lines since they don't contain any of the same words.

  2. kryten2 macrumors 6502a

    Mar 17, 2012
    My first thought was to use grep but that's probably not what you want for this. I guess awk would be better suited for doing such a thing.
  3. wrathkeg thread starter macrumors newbie

    Feb 1, 2010
    Thanks. To use grep I think I would need to know which string I am looking for in advance, which I don't. I'm also not sure that I could apply grep to particular lines. I'll have a look at awk.
  4. cqexbesd macrumors regular

    Jun 4, 2009
    Not sure the exact semantics you are asking for (i.e. if 3 lines in a row have repeated words does the middle line come out twice, once for each pair?) but something like this might get you started.

    perl -anE 'BEGIN { $prev = []; $, = " "; } foreach $p (@{$prev}) { if ($p ~~ @F) { say("@{$prev}\n@F"); last; }}; $prev = [ @F ]'
    Just pipe in the data you want to process.
  5. wrathkeg thread starter macrumors newbie

    Feb 1, 2010
    thanks for that. I don't know much about Perl, but that certainly looks like a possibility. I have just finished putting together a script which seems to work for my needs so I am posting it here. I am sure that it is not the best way to do it, but seems to do the job. Obviously at a minimum commands could be introduced and altered to avoid the creation of all those temporary files (or at least delete them).
    tail -n +2 $1 > $1-short
    # find out how many lines there are to look at
    a=($(wc $1-short))
    # start a loop to take place as many times as there are lines
    for i in $(eval echo {1..$a})
      # output specified line
      sed -n -e "$i"p $1 > $1-single
      sed -n -e "$i"p $1-short > $1-short-single
      # split after every space to make columns
      tr ' ' '\n' < $1-single > $1-single-col
      tr ' ' '\n' < $1-short-single > $1-short-single-col
      # output shared words
      comm -12 <(sort $1-single-col | uniq) <(sort $1-short-single-col | uniq) > output-tmp
      # delete newlines so that empty files are really empty
      tr -d '\n' < output-tmp > output-tmp2
      # check if file is empty (no shared words) and not, send relevant lines to output
      if [[ -s output-tmp2 ]] 
        cat $1-single >> output
        cat $1-short-single >> output
        echo -- >> output

Share This Page