Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

kylera

macrumors 65816
Original poster
Dec 5, 2010
1,195
27
Seoul
I would like to basically archive my schoolwork or some of my work documents for time to come. What is the best way to futureproof the following:

- DOC/DOCX/pages documents (mine do not have complex styling)
- PPT/keynote files (some have animations and transitions, but they aren't vital to understanding the files)
- PDF papers

Should I just go all PDF?
 
my answer would be converted them all to pdf. pdf is a standard format and it will keep any formating a lot better.
 
I would like to basically archive my schoolwork or some of my work documents for time to come. What is the best way to futureproof the following:

- DOC/DOCX/pages documents (mine do not have complex styling)
- PPT/keynote files (some have animations and transitions, but they aren't vital to understanding the files)
- PDF papers

Should I just go all PDF?

Print to a good quality paper. Even bad paper will outlast just about any digital format.

If you insist on digital, my opinion would be PDF then ODT (Open Office).
 
my answer would be converted them all to pdf. pdf is a standard format and it will keep any formating a lot better.

...but it cannot "really" be edited.

i would like to know how institutions does this. they surely have archived microsoft office documents. does opening an e.g. office 2000 file open without changing the format and document styles?
 
...but it cannot "really" be edited.

i would like to know how institutions does this. they surely have archived microsoft office documents. does opening an e.g. office 2000 file open without changing the format and document styles?

My understanding is that opening any Word Processor document (not just MS Word files) can be an issue as far as formatting is concerned.

If the document used fonts no longer available on the computer, the system will substitute another font. While it may look very similar, line breaks and page breaks could shift - causing formatting issues.

Often documents are formatted for a specific printer. Unless the system has access to the same printer, the document may format differently.

If the document uses complex formatting, the creator may have used undocumented - and subsequently discontinued - features. Or features that have had their behaviour changed in newer releases of the application. For instance, building tables. If it is just a plain grid, then there shouldn't be any issues. But once you start merging and joining cells the formatting can be difficult to maintain across SW generations.

etc etc

Maintaining archives of old documents is a huge and complex field. National archives, for example, often maintain very old computers so that they can run very old applications, in order to access documents that are really not that old.

Meanwhile, in Timbuktu, they recently smuggled out over a quarter of million manuscripts to keep them safe from the rebels. Books and parchments that are up to 900 years old. While we have issues keeping a 15 year old Word document safe. sigh....
 
I recently decided to go "paperless" and after some research went 100% PDF. It's been around for 20 years, is now effectively public domain, anything can be printed to it, and it is a suitable format for scanning text and images as well.
 
I've run into this same issue. I used WordPerfect in the mid 90s, and getting some of those documents open recently has been a challenge. I've found apps that open a lot of them, but it's not a foolproof method.

PDF is great for documents that are done and you never will need to do anything with.

If it's a text document, and formatting isn't an issue, you could also just save it as a plain text document. Plain text has been around practically since computers could type, and I don't think it's going anywhere. Very future proof.

As for powerpoint/keynote documents, I don't think there is a way to truly further proof them outside of printing them to PDF and just having the slides visible.
 
PDF is great for static, styled content.
Just plain text is also great for anything that is text-based with the downside being that there is no formatting. You could also use Rich Text or explore Markdown.

However, I recently read/heard a discussion that Office format might not be such a big deal in the end. Obviously it is a format developed by a single company, but that the spec has been published and it is easy to find programs that are able to easily open Office documents.

As long as you stick to simple styling (as you have done), most modern productivity software should be able to round-trip an office document without issue. What I mean by that is that if you created a document in Word with simple formatting (bold, italics, lists, tabs & justification), you should be able to save it, open it in Pages, make changes, save it and open it back in Word with no loss of fidelity. Same thing with presentation slides.
 
-..........
Maintaining archives of old documents is a huge and complex field. National archives, for example, often maintain very old computers so that they can run very old applications, in order to access documents that are really not that old.

Meanwhile, in Timbuktu, they recently smuggled out over a quarter of million manuscripts to keep them safe from the rebels. Books and parchments that are up to 900 years old. While we have issues keeping a 15 year old Word document safe. sigh....

Excellent post, and, as an historian, I hear you and echo what you have just written, fervently.

To me, it is incredible that we can store documents that are thousands of years old - or centuries old - that are as readable as the day they were written, while documents created in the then cutting edge 1990s can be as inaccessible as hieroglyphics were before Jean-Francois Champollion cracked them with the use of the Rosetta Stone. Absolutely bizarre, and paradoxical.

I to store them as PDF with the caveat of knowing you are very very limited on the editing of said document.

True, alas; it is why I dislike PDF as a format.
 
Nothing wrong with keeping them as PDF as others have said, but you might consider keeping just a plain old text version too. Sure you lose formatting (I guess you could use LaTeX) but for the most part, a text file will always be readable and at worst a computer nerd should be able to convert it easily should the encoding change drastically over the years.
 
Nothing wrong with keeping them as PDF as others have said, but you might consider keeping just a plain old text version too. Sure you lose formatting (I guess you could use LaTeX) but for the most part, a text file will always be readable and at worst a computer nerd should be able to convert it easily should the encoding change drastically over the years.
Yep, I was just about to post something like this... just keep them as txt files, since layout and font preferences may change over time, plain text would be the most flexible and adaptable format to archive your files.

EDIT: I'd like to add that a lot of sequencing data is kept in plain txt format (either tab-delimited or comma-separated) because txt files can be read almost universally on any machine. For reference, google "FASTA file format" or "FASTQ file format" to see what I mean. In addition, a lot of the data that I work in is in txt format because it lends itself to being more easily loaded in to analytical environments like R, etc. Like Panda said, you lose some of the encoding that you get from storing it in other formats, but for archival purposes there's a lot to like about plain text.
 
If you are not concerned about editing, use PDF.

I can recommend scanning all your documents with a software like Prizmo (which I use myself). It has OCR, so it recognizes all text (even handwritten), so you can search in it (and edit it, but that doesn't work so well).

Combine it with DEVONthink to categorize, organize etc., and you've got a really really great "document manager"!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.