PDA

View Full Version : How do I extract all words which use a certain character from a text file ?




XPcentric
Feb 12, 2013, 03:41 PM
I want to extract all words which have a character (ie. $) from a text. Is there a simple command in Terminal or even text editors ? Thanks!



ConCat
Feb 12, 2013, 03:51 PM
Most text editors have a "find" function where you can search for anything containing any number of letters.

XPcentric
Feb 12, 2013, 03:54 PM
Yes but I want to extract all of the different words which include that character. When the file is long, is hard to copy-paste all words one by one. There must be a Unix command which I'm searching now, before anyone post an answer.

robbieduncan
Feb 12, 2013, 03:56 PM
Assuming you are in the correct directory containing the file, the file is called file.txt and you want the list of words in output.txt then this will do it (without correcting for caps/non-caps). We are searching for c. If you want to search for $ you might well have to escape it


cat file.txt | tr -d '[:punct:]' | tr ' ' '\n' | grep c | sort | uniq > output.txt

XPcentric
Feb 12, 2013, 04:17 PM
Thank you so much, you saved me good amount of searching. I finally had to replace the special character to do the task quicker. I tried to escape it with single and double quotes but it didn't work.

robbieduncan
Feb 12, 2013, 04:25 PM
What is the special character? $ or something else?

balamw
Feb 12, 2013, 04:27 PM
Thank you so much, you saved me good amount of searching. I finally had to replace the special character to do the task quicker. I tried to escape it with single and double quotes but it didn't work.

$ is a special character in regular expressions (end of line). I generally escape it like this \[\$]. The backslashes are probably overkill.

B

XPcentric
Feb 13, 2013, 06:04 AM
What is the special character? $ or something else?

I don't know if 'special character' is the right expression, I just wanted to extract those.

robbieduncan
Feb 13, 2013, 07:10 AM
I don't know if 'special character' is the right expression, I just wanted to extract those.

What character are you wanting to extract the words containing?

XPcentric
Feb 13, 2013, 12:15 PM
This time I was looking for @ and $. I assign these characters to tasks, its nothing to do with their usual connotation.

Partron22
Feb 14, 2013, 10:18 AM
Convert each word to a line, and use sed: http://unix.stackexchange.com/questions/13711/differences-between-sed-on-mac-osx-and-other-standard-sed