I find that a combination of
Handbrake,
Subler and
Aegisub is the best way to extract subtitles. The subtitles on a DVD is basically bitmap (known as VobSub) and does not look very good.
Here is an example of the difference:
First you use Handbrake to convert the DVD into an MP4 file. In this process pay attention to the "Subtitles" tab:
This will include the bitmap subtitles into the video file and if your player can work with these and you can live with this format, you don't need to do anymore. The "Burned in" option, means that the subtitle will be rendered into the video frames and thus it can't be turned off during playback.
If you want the subtitle to be converted into text (SRT) format, Subler works
really well for transforming the images through an OCR process.
For OCR recognition of subtitles that are not in english, I found that it works best if you download the latest training data for that language from the tesseract-oct project on github.
First open this folder: "~/Library/Application Support/Subler/"
Then create a new folder inside it, called: "tessdata" (if it doesn't already exist).
Now download you trained language file from
Github and put it inside the "~/Library/Application Support/Subler/tessdata/" folder. For danish, the file is called: "dan.traineddata"
To convert your subtitles that are contained (in VobSub format) inside an MP4 file, the process goes like this:
Open Subler
Select File -> New
Drag your MP4 file into the top window (a dialog will come up and ask which tracks you want to include). Select all or select just the subtitle you want to convert. Then click Add.
Now you are back to the main window and here it is important to set the correct language of the subtitle. Here is how I did for a danish subtitle:
Now select File -> Save as
Chose a different name, so that you don't overwrite your original and let it do the saving. In this process, which takes a few seconds, the OCR will do it's magic and convert the VobSub into text.
Once done, select your subtitle track in the upper window and select File -> Export. Type a name for your subtitle (e.g. "Jurassic Park.srt" and click Save.
You can delete the temporary MP4 that Subler created as it isn't really needed.
Sublet does more that just the OCR of the subtitle. This is why the process seems a little strange. But I find that in almost all cases, you will want to run a spell-checker against your subtitle. So I always export it to SRT.
To check for spelling errors (or to make other adjustments), I found that Aegisub works well.
I hope this guide will help people, as it took me quite a while to discover how these different tools work.