Convert Microsoft Word to Docbook XML using Ruby and OpenOffice | August 10, 2006-->
August 10, 2006The following script shows how to convert Microsoft Word files to DocBook XML using OpenOffice on Windows. The batch script uses OLE (Object Linking and Embedding) to transform an unlimited number of files.
It is assumed that you have OpenOffice installed. You need the ruby programming language (the script was tested with the most recent version Ruby 1.8.4).
require 'win32ole' # Path to directory with Word Files. PATH = "file:///c|/path/to/doc/files/" # converts a word file to docbook XML. # The XML file is named after the original file # e.g.: ABC.doc -> ABC.xml def convert_word_to_docbook(file, path) serviceManager = WIN32OLE.new("com.sun.star.ServiceManager") desktop = serviceManager.createInstance("com.sun.star.frame.Desktop") url = path + file document = desktop.loadComponentFromURL(url, "_blank", 0, []) url_to = path + file.gsub(/\.doc/, ".xml") fprops = [] property = serviceManager.Bridge_GetStruct("com.sun.star.beans.PropertyValue") property["Name"] = "FilterName" property["Value"] = "DocBook File" fprops << property begin document.storeToUrl(url_to, fprops) # this line works! ensure document.close true end end # convert all ".doc" files to DocBook XML Dir.glob("*.doc").each do |file| print "converting #{file}...\n" $stdout.flush convert_word_to_docbook file, PATH end
Original script by Julian Elve: http://www.synesthesia.co.uk/blog/.../openoffice-and-ruby/.

Tony says:
I had not realised it could be so easy to interface to OpenOffice in this way. A very helpful example of using OLE in Ruby. This opens up many possibilities. Thanks
DavidPotter says:
I have a question about the program. When I use the code above, it open the file, which is very annoy to users. Hense I want to know if there are other ways which do not need open the file.
BTW, Could you do me a favor to tell me how to use the openoffice COM to query a string in a openoffice file.
stefan says: juretta.com
Hi David,
you can try to use the method “visible” on either serviceManager or desktop:
desktop.visible = false
or
serviceManager.visible = false
Sorry, can’t test it myself – no windows around…
Invincible says:
I am looking to convert doc files to pdf files…how can I use above script to convert doc to pdf format on UNIX platforms?