|
|
#1 |
|
comparing adjacent lines of file
Hi all.
Not sure if this is the right place to ask this, but here goes. Is there a way, preferably using terminal commands, I can compare adjacent lines of a text file to see if they contain any of the same words? So for a file like this Code:
one two three three four five six seven eight Thanks. |
|
|
|
0
|
|
|
#2 |
|
My first thought was to use grep but that's probably not what you want for this. I guess awk would be better suited for doing such a thing.
__________________
Space Corps Directive 34124 |
|
|
|
0
|
|
|
#3 |
|
Thanks. To use grep I think I would need to know which string I am looking for in advance, which I don't. I'm also not sure that I could apply grep to particular lines. I'll have a look at awk.
|
|
|
|
0
|
|
|
#4 |
|
Not sure the exact semantics you are asking for (i.e. if 3 lines in a row have repeated words does the middle line come out twice, once for each pair?) but something like this might get you started.
Code:
perl -anE 'BEGIN { $prev = []; $, = " "; } foreach $p (@{$prev}) { if ($p ~~ @F) { say("@{$prev}\n@F"); last; }}; $prev = [ @F ]'
|
|
|
|
0
|
|
|
#5 |
|
thanks for that. I don't know much about Perl, but that certainly looks like a possibility. I have just finished putting together a script which seems to work for my needs so I am posting it here. I am sure that it is not the best way to do it, but seems to do the job. Obviously at a minimum commands could be introduced and altered to avoid the creation of all those temporary files (or at least delete them).
Code:
tail -n +2 $1 > $1-short
# find out how many lines there are to look at
a=($(wc $1-short))
# start a loop to take place as many times as there are lines
for i in $(eval echo {1..$a})
do
# output specified line
sed -n -e "$i"p $1 > $1-single
sed -n -e "$i"p $1-short > $1-short-single
# split after every space to make columns
tr ' ' '\n' < $1-single > $1-single-col
tr ' ' '\n' < $1-short-single > $1-short-single-col
# output shared words
comm -12 <(sort $1-single-col | uniq) <(sort $1-short-single-col | uniq) > output-tmp
# delete newlines so that empty files are really empty
tr -d '\n' < output-tmp > output-tmp2
# check if file is empty (no shared words) and not, send relevant lines to output
if [[ -s output-tmp2 ]]
then
cat $1-single >> output
cat $1-short-single >> output
echo -- >> output
fi
done
|
|
|
|
0
|
![]() |
|
«
Previous Thread
|
Next Thread
»
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
All times are GMT -5. The time now is 09:56 PM.






Linear Mode
