Extract Embedded PDFs/Files from Word Documents
In Microsoft Word for Windows, you have the capability to embed PDF-files / other files. On every other platform, including OS X (which even has Microsoft Office), it is not possible to extract or view these embedded files (Libre Office/Open Office do not work).
In order to get to these files I came up with a workaround that works on OS X (and possibly on Linux with a good decompression utility).
Get a good archive tool. I recommend The Unarchiver for OS X, and 7zip for any other platform.
Install The Unarchiver
Extract the archive by opening the file with The Unarchiver
Select The Unarchiver from the list
After extracting the file, you should have a folder with the same name as your doc, this folder contains the contents of your document.
If you enter this folder you'll see a number of sub-folders, the one we are interested is called ObjectPool, this contains all embedded objects.
Within ObjectPool, each embedded object is represented by a folder.
Within each embedded object's folder, we can find the actual object in question, in this example a PDF. The object will be named CONTENTS.
Simply copy and rename CONTENTS to whatever file extension/file you are expecting, in this case we know we are looking for a PDF in our doc. So we will rename CONTENTS to CONTENTS.pdf.
We should now be able to view our pdf! That's it! The above process applies to any type of embedded object within a word document.
As a note of caution, if when you unzip the doc you do not see the list of files, and instead see:
then your unzip tool has failed to unzip the padded files at the beginning of the zip file which are not contained in the zip index. This cannot be fixed by the standard command line zip tool. Try another tool.
Best of luck!