juretta.com

Convert Microsoft Word to Docbook XML using Ruby and OpenOffice

August 10, 2006
Tags: XML

The following script shows how to convert Microsoft Word files to DocBook XML using OpenOffice on Windows. The batch script uses OLE (Object Linking and Embedding) to transform an unlimited number of files.

It is assumed that you have OpenOffice installed. You need the ruby programming language (the script was tested with the most recent version Ruby 1.8.4).

require 'win32ole'

# Path to directory with Word Files.
PATH = "file:///c|/path/to/doc/files/"

# converts a word file to docbook XML. 
# The XML file is named after the original file
# e.g.: ABC.doc -> ABC.xml
def convert_word_to_docbook(file, path)
  serviceManager = WIN32OLE.new("com.sun.star.ServiceManager")
  desktop = serviceManager.createInstance("com.sun.star.frame.Desktop")

  url = path + file
  document = desktop.loadComponentFromURL(url, "_blank", 0, [])
  url_to = path + file.gsub(/\.doc/, ".xml")
  fprops = []
  property = serviceManager.Bridge_GetStruct("com.sun.star.beans.PropertyValue")
  property["Name"] = "FilterName"
  property["Value"] = "DocBook File"  
  fprops << property
  begin
    document.storeToUrl(url_to, fprops) # this line works!
  ensure
    document.close true
  end
end

# convert all ".doc" files to DocBook XML
Dir.glob("*.doc").each do |file|
  print "converting #{file}...\n"
  $stdout.flush
  convert_word_to_docbook file, PATH
end

Original script by Julian Elve: http://www.synesthesia.co.uk/blog/.../openoffice-and-ruby/.


blog comments powered by Disqus

About

juretta.com is the personal workspace of Stefan Saasen. More about this site can be found in the „About“ section.

Share!

Latest links  RSS  

More...