ERP5 KM

HowToUseOood

How To Install And Use Oood

News

  • March 11th, 2008

There is a bug of OOo2.2.1 that Oood sometimes fails to open microsoft word file. Recommend to use OOo2.3.

  • June 28th, 2007

Multi-oood - a proxy dispatching requests to a number of oood's - is now available.

  • May 31st, 2007

Five client codes are now available: Python, PHP, Java, C++ and Ruby.

  • May 29th, 2007

Lots of improvements have been made: oood now supports nearly all file formats, can generate non-ODFs from other non-ODFs, has a more user-friendly configuration file, and the old implementation has been removed alltogether.

  • May 8th, 2007

The oood passed the ultimate test - it converted 100,000 docs, working continuously for four days, under a heavy load. The test results can be viewed at http://erp5.pl/stats.txt. Hooray!!!

  • May 3rd, 2007

The oood - implementation 2 is basically finished; it uses a new protocol for client-server communication. We also eliminated the use of None, so SimpleXMLRPCServer patch is not required anymore. Also, I found out that the reason oood was segfaulting was thread-unsafety of pyuno. The way to run it safely now is to use a pool of only one worker (config.pool_size=1), then it survives the worst treatment imagineable. I'll write a new version, which will spawn workers into separate processes, in a few weeks.

  • April 21st, 2007

The Dispatcher is basically working - it passes testOoodBasicOperations.py and survives if worker is destroyed manually and if OOo instance is killed. However, there is still a long way to go. To use the new implementation, set "implementation" in config.py to 2 (default is 1). Backward compatibility is maintained. If you use implementation 2, there is no need to run start.py --init, OOo instances are started by the daemon.

  • April 18th, 2007

New method "printDocument" - prints the given document directly from the daemon's OOo instance, so if the printer is available for the serwer, the doc does not need to be converted and returned to be printed.

  • April 11th, 2007

The request handling part is going to be rewritten completely - the serw. Procesor class will be replaced by Dispatcher, which will be the only handler for timeouts and exceptions.

The program

The name

The name "oood" is an abbreviation - it stands for "OpenOffice.org Daemon". Some people who feel uncomfortable with the name (saying "oh oh oh dee" makes them feel like stutterers), so we are still experimenting with other options (including "office-daemon", "oh-daemon" and "pokemon").

The idea

The oood is an XMLRPC server capable of converting office-type files between various formats (basically MSOffice and OpenOffice), generating PDF and HTML, and also editing metadata of documents. It is capable of doing almost anything that OpenOffice.org can do; one limitation is that it doesn't let you choose save options (like encoding, separators in csv etc), and always uses default values.

The way it works

An initialization script starts an OpenOffice processes. It is started in background and uses virtual display, so you don't see it. Then the daemon creates a "worker" object, communicating with the OpenOffice through sockets. The server waits for an incoming request, the pool dispatches it to the worker if it is available (or waits until something becomes available) and so on.

Installation

Requirements

Currently, oood works with OpenOffice 2.0.3 or 2.1 and Python 2.4. It has been successfully tested with the following versions of OpenOffice:

There is a bug on OOo2.2.1 that Oood sometimes fails to open microsoft word file. Recommend to use OOo2.3.

  • 2.0.3-7
  • 2.1.0-6
  • 2.0_64
  • 2.2 with python2.5 (mdv 2007 spring)
  • 2.3

A combination of Python 2.5 and OpenOffice 2.1 does not work because this version of pyuno library is not compatible with Python 2.5.

You also need Xvfb (virtual frame buffer).

From RPMs

Download an openoffice.org-oood RPM from Nexedi's repository, install, done. But, don't do it now - RPM contain an old version. Check out from svn, wait for new RPMs.

Installation on 64-bit architecture

Largely the same as on 32 bit, only you need OpenOffice 64 bits from the "backports" repository (e.g. ftp://fr2.rpmfind.net/linux/Mandrakelinux/official/2007.0/x86_64/media/main/backports

From source

Check it out from svn at:

https://svn.erp5.org/repos/public/erp5/trunk/utils/oood

You can also patch the file:

  • [OpenOffice-installation-dir]/share/registry/data/org/openoffice/Setup.xcu

With this: openoffice.org-2.0.3-skip-registration.patch. Otherwise the OpenOffice will expect you to register, and since it is started in the background it won't really start (unless you use the --top option and click through registration).

Setting up

One problem with OpenOffice is that loads very long if it does not have Java path configured - this is why instance load time in config.py is by default 120 seconds. The way to make it shorter is run

start.py --top

This starts OpenOffice in the foreground - in each instance go to options->java, set path to your j2re installation. Then you can change 120 seconds into 10 seconds. Also you may need to make sure that the owner of the oood_instance is oood like this

chown -R oood:oood /var/run/oood

Configuration

Edit oood.conf - most of entries are self-explanatory or described in the file.

  • instance_load_time - see above
  • instance_timeout - defines how long the oood is working for an OOo instance before deciding it is not going to return, because it froze or crashed; too small a value will make processing big files impossible, too high will slow down processing if many files are broken. Something like 30 seconds should be ok.
  • formats to use or to skip = oood supports 111 file formats, so if you fill a dropdown in your client application with all formats allowed for, say, text file, it can be quite long; to make it shorter you can tell oood which formats to skip, or which to use (and skip all the rest)

Running

Starting up

The manual way to run it in the foreground is:

cd /var/lib/oood
./runserw.py --start

To start oood at specified port number offset, type:

cd /var/lib/oood
./runserw.py --start --offset=n

for instance:

./runserv.py --start --offset=1 

would start oood on port 8009 if oood.conf specifies port 8008. This way many oood instances can be easily started without a need to create own config file for each instance. Port offset can be negative.

To easily run many oood instances simultaneously, type:

./runserv.py --multiple=n

or

./runserv.py --start --multiple=m:n 

The first form starts ooods at offsets 0,1,..,n-1.

The second form starts ooods at offsets m,m+1,..,n (m < n)

When started with mulitple parameter, runserv.py just forks children ooods with proper offset parameter. When killed, it will kill its children ooods.

The multiple parameter is mainly used by Multi-oood server.

Monitoring

Do:

watch "./start.py --threads"

to see threads at work,

watch "./start.py --status"

to have some current information about what the server is doing.

Testing

There are three test suites - two of them test basic functionalities of oood, and require a subdirectory with files - download test_documents.zip from svn and unzip it into your oood home directory.

The third suite (testOoodHighLoad) is meant to test its stability in production environment, so it does its best to kill the server. First it makes a one-time indexing of all ODF files available on the machine, then starts issuing many requests at a time, in random number, for random files, at random intervals, for conversion into random formats.

To run testOoodHighLoad, do the following:

  • configure and start oood
  • make sure the oood home directory is writeable for you
  • set max_batch_size and max_interval in testOoodHighLoad.py; the ratio of max_batch_size/max_interval is the average time you give your machine to process a document, so adjust it to your machine's processing power, and be so kind to give your machine a few seconds to deal with a doc, sometimes they are fairly large; combinations like "5/15", "10/40" or "100/300" have been run successfully
  • if there is an old all_odf_docs file in the directory, remove it
  • start testOoodHighLoad; you can give it a desired number of conversions as a cmdline argument, the default number is 100

The test will first scan your entire "/" looking for ODF files and write it in an all_odf_docs file. Then will start real work.

The expected output looks like this:

03, 17:52:02 : MainThread : batch 10 (total 36), interval 5
03, 17:52:04 : Thread-15 : --------------------- got result: 200
03, 17:52:07 : Thread-18 : /home/bartek/tmp/exqmifxh.odt --> HTML document
03, 17:52:07 : Thread-13 : --------------------- got result: 200
03, 17:52:07 : MainThread : batch 7 (total 43), interval 17
03, 17:52:17 : Thread-17 : got error code: 402 xxxxxxxxxx "the document could not be processed"
03, 17:52:17 : Thread-19 : /home/bartek/ERP/konfa/Wstep_do_ERP5_biznes.odp --> Powerpoint presentation

While the test is running, you can do:

watch "cat stats.txt"

to see detailed information about the test progress, and description of return codes.

The stdout of the test is also written to "test.log". There is a little script "logproc.py" which you can then use to analyse test.log to see which documents give you 402, and investigate the reason.

Usage

The oood implements a custom protocol, which is best described in the attached design document.

The svn directory contains a "samples" subdirectory, where you can find sample client code in:

  • Python
  • PHP
  • Java
  • C++
  • Ruby

Known problems

TODO

  • support user-supplied flags
  • RPMs

HowToUseOood (last edited 2008-05-07 13:24:25 by ŁukaszNowak)

Page
  • Immutable Page
  • Info
  • Attachments
User
Learn about new ERP5 releases,technical articles, events and more.

Subscribe to the monthly ERP5 Newsletter!