File type using Terminal - Bash Script

Discussion in 'Mac Programming' started by SRossi, Oct 19, 2009.

  1. macrumors regular

    Hey all,

    I am currently working on a bash script that will backup and restore a users file from their choosing. But these files may be a text document, a spreadsheet or a picture. Each filetype will be backed up to a different folder.

    How I am going to do is ask them for the path to the file that they want, then take this file and find out what type of file it is and then back it up to certain file depending on this validation.

    What I do not know how to do is figure out the file type using terminal. I had a look at the find and also file commands manual pages but these don't look like they will help me.

    Does anyone have any idea how I could go about doing this?

    Thanks in advance,

  2. macrumors 6502a


    I think the easiest way for you to do this is to use the path to your advantage. What I mean is when you ask for the path of the file, it'll have the extension with it. All you need to do is look at the extension.

    3.2-bash$ ./backup /Users/Anon/Documents/list.xlsx
    From here you can just look at the .xlsx and know that it's an Excel file.

    Hope this helps.
  3. macrumors 68040


    There is a UNIX program "file" that will give you all sorts of information about a file. Not sure if its output will be sufficient for your uses, but it's worth a try.

  4. macrumors 603

    I'm puzzled as to why backing up files by type is necessary, or even a good idea. If I backup a folder containing various types of files, I expect to recover them in their original arrangement, not rearranged by file-type.

    That said, the mdls command can extract file-type and a host of other metadata, and give it to you in textual form. Read the man page first, but I recommend trying it on different file-types like image, text, audio, etc. to learn how it works. Its metadata key names are the same as the documented keys for Spotlight Metadata Attributes.

    Also, you can use mdfind to search for files based on metadata. It's the command-line equivalent to initiating a Spotlight search, or a smart folder.
  5. macrumors regular

    Yeah I was going to use the file extension seamed the easiest way to determine what the file type was.

    Yeah I had a look at the file command but I didn't know if I could use an if statement on the result of the query?

    I'm not backing up entire directories, I am only backing up certain files. Yeah I know seems a really stupid idea but its my universities idea not mine.

    What I was thinking was doing a case statement like:

    case $FILE in
    *.txt) # Do text document backup
    *.xlsx) # Do spreadsheet backup
    *.jpg) # Do picture backup
    *) # Give error about file type
    Do you think this would be a way to go about it? Or would there be an easier way? Like as Lee says using file gives the file type but would I be able to input the output of the query into an if statement?

    Thanks so far,

  6. macrumors 68040


    Here's an example i just whipped up:
    for X in `file * | grep text | awk -F: '{print $1}'`
      echo $X
    Obviously echoing isn't very interesting, but this should give you each filename matching the grep. You can obviously make this much more advanced. Any plain text file should contain "text" in the output from file. For jpeg's, i get:
    JPEG image data, EXIF standard

    This means you can look for, say "JPEG image data" instead of .jpg, .JPG, .jpeg, .JPEG, etc.

    I just saved an xslx. I got:
    "Zip archive data, at least v2.0 to extract"
    which is not particularly helpful, so maybe extension is best after all (but OS X is not CP/M or DOS, so things don't need file extensions).

  7. macrumors 603

    A lot of different files are zipped archives. The 'file' command can only look at the data fork of a file, not any metadata or xattrs. The 'mdls' command can look at metadata, so it often has more specific information than 'file' can muster. Try the kMDItemContentType attribute-name in particular.
  8. macrumors 603

    That can be interpreted several different ways, and my advice might change depending on which one it is.

    Is this a homework assignment? A term project?

    Or is this something your university's IT department intends to use for actual data backups?

    Which OS versions does it have to run on?

    What about text files that don't end in ".txt"? What about images that aren't JPEG, or JPEGs that don't end in ".jpg"? And is ".xlsx" really the only spreadsheet suffix? What about Numbers?

    I recommend breaking this into two sub-problems:
    1. Identify the type of file.
    2. Backup the file based on its type.

    If you modularize these two sub-problems, then you can change how files are identified without having to change how a file is backed up.

    You will need a consistent set of identifiers that represent the classifications of the file, which the backup module then uses to decide how to backup the file. I recommend NOT using suffixes or extensions, but a set of plain keywords would work fine, e.g. "text", "image", "spreadsheet", and "other". The "other" category would not be backed up.
  9. macrumors regular

    It is a homework assignment, and it has to be run on any type of linux system.

    I had broken the problem down to around the same sub-problems and I thought the same way about going about but how would I set up the key words like "text" ect.?

    I know it would be something like (using your mdls command):

    mdls $FILE
    Which would give me the metadata including the file type, but how would I go about using the kMDItemContentType in a validation?

    And lastly Lee thanks for the command and if I cannot get the script to run this way I am going to change my design to incorporate your example.

    Thanks so far,

  10. macrumors 68020


    You can just parse the filename for everything after the last "." and get the extension. It makes the case statement a little easier to deal with.

    file_type=$(print $input_file | sed -e 's/.*\.//')
    You can also use if statements and change the case on the fly. That's the beauty of the shell, lot's of options!
  11. macrumors 603

    Mac OS X is not a Linux system.

    Linux doesn't have the mdls command.

    Linux has the file command, but it's output may not be the same as the output from the Mac OS X file command.

    I don't think Linux has Excel, either, so ".xlsx" as the extension identifier for spreadsheets may not work.

    You really need to look at the available commands and their output on an actual Linux system. Look for a 'basename' command (which is on Mac OS X, and may be on Linux, because it's a Posix command).

    Oh, and the:

    mdls $FILE
    would have to be:

    mdls "$FILE"
    in case there are spaces embedded in the pathname.
  12. macrumors regular

    After I wrote that post I emailed a lecturer and asked if it must only run on a Linux machine and he said no as long as it was not a Windows script it would suffice. So sorry it just has to run on a mac, most likely 10.6.

    Ah I thought it would have quotes around the filename, but could I use mdls to validate it was a text file, or a spreadsheet etc? Like what I mean is:

    1: Ask for text file
    2: check if file exists - [ -f "$FILE" ]
    3: validate that file is text file - This is the part I am trying to do
    4: if validated backup
    4.1: give error and ask for another file.
    5: Quit back to main screen

    So see how would I use the the mdls command to validate it. Like would there be anyway to use the output of the query and use it in an if statement or a case statement?

    And piloterror thanks for that command, just really confused the now.

    Thanks again,

  13. macrumors 603

    What you seem to be missing is the bash syntax for using the output of a command as a value for other commands, or assigning command output to a shell variable. This is called command substitution. You should read about it on the bash man page, by finding the Command Substitution section:

    Basically, `command` or $(command) runs the command (with parameters if given) then strips off any terminal newlines and puts the output in place of the command-substitution expression. You really need to read this on the man page, rather than me paraphrasing it. Note the first form uses back-quotes (grave accents), not single-quotes/apostrophes.

    You also need to experiment with this on a manual command-line before adding it to a script. You can build up command-lines in any text editor, then copy and paste them into This avoids having to retype lines manually.

    For example, to assign the output of an mdls command to a variable:

    someVar=`mdls -name kMDItemContentType "$FILE"`
    If you echo "$someVar", it will contain the entire string output from the mdls command. You will have to parse this output further if you want only the value, because mdls generally outputs a distinctive header for the file-name, followed by name=value pairs for each metadata value.

    You could do some parsing in the backquoted expression:

    someVar=`mdls -name kMDItemContentType "$FILE" | grep kMDItemContentType `
    You could use 'awk' to parse the output instead of grep, and this would permit a more precise output that only contained the value of kMDItemContentType. You might also use the -raw and -nullMarker options of mdls to produce more precise output. You should play with those first.

    So now that the content-type value is in a variable, you can proceed to classify it as image, text, or spreadsheet. There are any number of ways to do that. A switch statement is one way, which is relatively straightforward to understand and code.

    Choosing a good classifying approach depends on how flexible the classifying has to be. Since it's a homework assignment, it probably doesn't need a lot of flexibility, so I'd go with 'switch' unless or until it proves to be unwieldy. If this were an actual IT program, then separating the classification data (i.e. the mapping of mdls output values to file-class) from the shell-script might be useful, so it wouldn't be necessary to edit a shell-script every time a classification changes.

    You might also look at other useful mdls values, such as kMDItemContentTypeTree. You can see their values by applying mdls to sample classifiable files, and you can understand more about what each value means by reading the Spotlight Metadata docs. You can also try the output from the 'file' command and see if it's any easier to parse or classify.

Share This Page