vendredi 16 juillet 2010

Convert Office Documents to PDF from the command line

I prepare training courses and often have to convert 90+ Word documents to PDF.
I have Adobe Acrobat Professional and found no quick and easy way to do it. I used vbs before and it was so slow, OpenOffice converts all documents faster than vbs can convert 2 of them. I guess it would be even faster if we were using openOffice formats ... which I am investigating ...

So I use OpenOffice for this, with two nice macros I "stole" from Sun^H^H^HOracle.

Here they are:


REM -------------8<-------------8 span="">
REM ***** BASIC *****

Sub ConvertWordToPDF(cFile)
cURL = ConvertToURL(cFile)

' Open the document.
' Just blindly assume that the document is of a type that OOo will
' correctly recognize and open -- without specifying an import filter.
oDoc = StarDesktop.loadComponentFromURL(cURL, "_blank", 0, Array(MakePropertyValue("Hidden", True), ))

cFile = Left(cFile, Len(cFile) - 4) + ".pdf"
cURL = ConvertToURL(cFile)

' Save the document using a filter.
oDoc.storeToURL(cURL, Array(MakePropertyValue("FilterName", "writer_pdf_Export"), ))

oDoc.close(True)

End Sub

Function MakePropertyValue( Optional cName As String, Optional uValue ) As com.sun.star.beans.PropertyValue
Dim oPropertyValue As New com.sun.star.beans.PropertyValue
If Not IsMissing( cName ) Then
oPropertyValue.Name = cName
EndIf
If Not IsMissing( uValue ) Then
oPropertyValue.Value = uValue
EndIf
MakePropertyValue() = oPropertyValue
End Function
REM -------------8<-------------8 span="">


Now here are my shell scripts, both Windows and Unix:

Windows
REM -------------8<-------------8 span="">
rem DIR=$(pwd)
rem DOC=$DIR/$1
REM The following two lines are on one line!
for /r %%i in (*.doc) do swriter -invisible "macro:///Standard.Module1.ConvertWordToPDF(%%i)"
REM -------------8<-------------8 span="">

Unix

#-------------8<-------------8 br="">#!/bin/sh DIR=$(pwd)
DOC=$DIR/$1 PDF=`echo $DOC | sed 's/...$/pdf/'`
if [ ! -e "$PDF" ] ;
then
#The following two lines are on one line!

soffice -invisible "macro:///Standard.Module1.ConvertWordToPDF($DOC)"
echo "Processing $DOC"
while [ ! -e "$PDF" ] ; do
sleep 3;
done
fi
#-------------8<-------------8 span="">

On Unix I run

find /some/path/to/documents -name -exec doc2pdf {} \;

The reason I need the while loop on Unix is because soffice does not wait until the macro has finished and goes on to convert the next file. OpenOffice errors out on my system, it cannot process two files simultaneously ... sad, that!