hoodwink.d enhanced
 

juretta.com

Convert Microsoft Word to Docbook XML using Ruby and OpenOffice | August 10, 2006-->

August 10, 2006

The following script shows how to convert Microsoft Word files to DocBook XML using OpenOffice on Windows. The batch script uses OLE (Object Linking and Embedding) to transform an unlimited number of files.

It is assumed that you have OpenOffice installed. You need the ruby programming language (the script was tested with the most recent version Ruby 1.8.4).

require 'win32ole'

# Path to directory with Word Files.
PATH = "file:///c|/path/to/doc/files/"

# converts a word file to docbook XML. 
# The XML file is named after the original file
# e.g.: ABC.doc -> ABC.xml
def convert_word_to_docbook(file, path)
  serviceManager = WIN32OLE.new("com.sun.star.ServiceManager")
  desktop = serviceManager.createInstance("com.sun.star.frame.Desktop")

  url = path + file
  document = desktop.loadComponentFromURL(url, "_blank", 0, [])
  url_to = path + file.gsub(/\.doc/, ".xml")
  fprops = []
  property = serviceManager.Bridge_GetStruct("com.sun.star.beans.PropertyValue")
  property["Name"] = "FilterName"
  property["Value"] = "DocBook File"  
  fprops << property
  begin
    document.storeToUrl(url_to, fprops) # this line works!
  ensure
    document.close true
  end
end

# convert all ".doc" files to DocBook XML
Dir.glob("*.doc").each do |file|
  print "converting #{file}...\n"
  $stdout.flush
  convert_word_to_docbook file, PATH
end

Original script by Julian Elve: http://www.synesthesia.co.uk/blog/.../openoffice-and-ruby/.

@18:11 | Comments: 4 | Tags: XML (7)

Diggman

1
Tony says:
Avatar Sat Aug 12 10:09:42 +0200 2006 | #

I had not realised it could be so easy to interface to OpenOffice in this way. A very helpful example of using OLE in Ruby. This opens up many possibilities. Thanks

2
DavidPotter says:
Avatar Tue Sep 26 10:04:22 +0200 2006 | #

I have a question about the program. When I use the code above, it open the file, which is very annoy to users. Hense I want to know if there are other ways which do not need open the file.
BTW, Could you do me a favor to tell me how to use the openoffice COM to query a string in a openoffice file.

3
stefan says:
Avatar Mon Oct 02 21:49:49 +0200 2006 | #

Hi David,
you can try to use the method “visible” on either serviceManager or desktop:
desktop.visible = false
or
serviceManager.visible = false
Sorry, can’t test it myself – no windows around…

4
Invincible says:
Avatar Sun Feb 18 18:56:57 +0100 2007 | #

I am looking to convert doc files to pdf files…how can I use above script to convert doc to pdf format on UNIX platforms?

About

juretta.com is the personal workspace of Stefan Saasen. You can send him an email or read more about this site in the „About“ section.

« Previous entry

Rails 1.1.5: Mandatory security patch...
posted over 2 years ago

» Next entry

Rails: Reload models in script/console
posted over 2 years ago

Recent comment

On: “Attachr.com: OpenID support added

You need to kill this spam stuff!

posted about 1 year ago by entropie

Look!

Latest links  RSS  

More...