How To Install And Use Oood

News

There is a bug of OOo2.2.1 that Oood sometimes fails to open microsoft word file. Recommend to use OOo2.3.

Multi-oood - a proxy dispatching requests to a number of oood's - is now available.

Five client codes are now available: Python, PHP, Java, C++ and Ruby.

Lots of improvements have been made: oood now supports nearly all file formats, can generate non-ODFs from other non-ODFs, has a more user-friendly configuration file, and the old implementation has been removed alltogether.

The oood passed the ultimate test - it converted 100,000 docs, working continuously for four days, under a heavy load. The test results can be viewed at http://erp5.pl/stats.txt. Hooray!!!

The oood - implementation 2 is basically finished; it uses a new protocol for client-server communication. We also eliminated the use of None, so SimpleXMLRPCServer patch is not required anymore. Also, I found out that the reason oood was segfaulting was thread-unsafety of pyuno. The way to run it safely now is to use a pool of only one worker (config.pool_size=1), then it survives the worst treatment imagineable. I'll write a new version, which will spawn workers into separate processes, in a few weeks.

The Dispatcher is basically working - it passes testOoodBasicOperations.py and survives if worker is destroyed manually and if OOo instance is killed. However, there is still a long way to go. To use the new implementation, set "implementation" in config.py to 2 (default is 1). Backward compatibility is maintained. If you use implementation 2, there is no need to run start.py --init, OOo instances are started by the daemon.

New method "printDocument" - prints the given document directly from the daemon's OOo instance, so if the printer is available for the serwer, the doc does not need to be converted and returned to be printed.

The request handling part is going to be rewritten completely - the serw. Procesor class will be replaced by Dispatcher, which will be the only handler for timeouts and exceptions.

The program

The name

The name "oood" is an abbreviation - it stands for "OpenOffice.org Daemon". Some people who feel uncomfortable with the name (saying "oh oh oh dee" makes them feel like stutterers), so we are still experimenting with other options (including "office-daemon", "oh-daemon" and "pokemon").

The idea

The oood is an XMLRPC server capable of converting office-type files between various formats (basically MSOffice and OpenOffice), generating PDF and HTML, and also editing metadata of documents. It is capable of doing almost anything that OpenOffice.org can do; one limitation is that it doesn't let you choose save options (like encoding, separators in csv etc), and always uses default values.

The way it works

An initialization script starts an OpenOffice processes. It is started in background and uses virtual display, so you don't see it. Then the daemon creates a "worker" object, communicating with the OpenOffice through sockets. The server waits for an incoming request, the pool dispatches it to the worker if it is available (or waits until something becomes available) and so on.

Installation - the easy way

The easiest way to install oood is to use Nexedi's RPM repository, which is described in DownloadRpm.

Install package openoffice.org-oood.

Configure /etc/oood/oood.conf it as follows:

uno_path              = /usr/lib64/ooo/basis-link/program
prog_dir              = /usr/lib64/ooo/program

uno_path              = /usr/lib64/ooo-3.0.1_64/basis-link/program
prog_dir              = /usr/lib64/ooo-3.0.1_64/program

uno_path              = /usr/lib/ooo-3.0.1/basis-link/program
prog_dir              = /usr/lib/ooo-3.0.1/program

uno_path              = /usr/lib64/ooo-3.0_64/basis-link/program
prog_dir              = /usr/lib64/ooo-3.0_64/program

uno_path              = /usr/lib/ooo-3.0/basis-link/program
prog_dir              = /usr/lib/ooo-3.0/program

To start use /etc/init.d/oood start

/!\ Python version note /!\

oood currently (as of 2009.1) uses global python interpreter. In case it is set to 2.4.x an error will be raised and oood will not be started. Python system version of 2.6.x is required.

/!\ Mandriva 2009.1 note /!\

There is known bug in supplied uno.py version https://qa.mandriva.com/show_bug.cgi?id=49064

Download the patch: https://bugzillafiles.novell.org/attachment.cgi?id=257005 and apply it:

patch /usr/lib64/ooo-3.0.1_64/basis-link/program/uno.py /path/to/patch

patch /usr/lib/ooo-3.0.1/basis-link/program/uno.py /path/to/patch

Installation - the old manual way

Requirements

Currently, oood works with OpenOffice 2.0.3 or 2.1 and Python 2.4. It has been successfully tested with the following versions of OpenOffice:

There is a bug on OOo2.2.1 that Oood sometimes fails to open microsoft word file. Recommend to use OOo2.3.

A combination of Python 2.5 and OpenOffice 2.1 does not work because this version of pyuno library is not compatible with Python 2.5.

You also need Xvfb (virtual frame buffer).

From RPMs

Download an openoffice.org-oood RPM from Nexedi's repository, install, done. But, don't do it now - RPM contain an old version. Check out from svn, wait for new RPMs.

Installation on 64-bit architecture

Largely the same as on 32 bit, only you need OpenOffice 64 bits from the "backports" repository (e.g. ftp://fr2.rpmfind.net/linux/Mandrakelinux/official/2007.0/x86_64/media/main/backports

From source

Check it out from svn at:

https://svn.erp5.org/repos/public/erp5/trunk/utils/oood

You can also patch the file:

With this: openoffice.org-2.0.3-skip-registration.patch. Otherwise the OpenOffice will expect you to register, and since it is started in the background it won't really start (unless you use the --top option and click through registration).

Setting up

One problem with OpenOffice is that loads very long if it does not have Java path configured - this is why instance load time in config.py is by default 120 seconds. The way to make it shorter is run

start.py --top

This starts OpenOffice in the foreground - in each instance go to options->java, set path to your j2re installation. Then you can change 120 seconds into 10 seconds. Also you may need to make sure that the owner of the oood_instance is oood like this

chown -R oood:oood /var/run/oood

Configuration

Edit oood.conf - most of entries are self-explanatory or described in the file.

Running

Starting up

The manual way to run it in the foreground is:

cd /var/lib/oood
./runserw.py --start

To start oood at specified port number offset, type:

cd /var/lib/oood
./runserw.py --start --offset=n

for instance:

./runserv.py --start --offset=1 

would start oood on port 8009 if oood.conf specifies port 8008. This way many oood instances can be easily started without a need to create own config file for each instance. Port offset can be negative.

To easily run many oood instances simultaneously, type:

./runserv.py --multiple=n

or

./runserv.py --start --multiple=m:n 

The first form starts ooods at offsets 0,1,..,n-1.

The second form starts ooods at offsets m,m+1,..,n (m < n)

When started with mulitple parameter, runserv.py just forks children ooods with proper offset parameter. When killed, it will kill its children ooods.

The multiple parameter is mainly used by Multi-oood server.

Monitoring

Do:

watch "./start.py --threads"

to see threads at work,

watch "./start.py --status"

to have some current information about what the server is doing.

Testing

There are three test suites - two of them test basic functionalities of oood, and require a subdirectory with files - download test_documents.zip from svn and unzip it into your oood home directory.

The third suite (testOoodHighLoad) is meant to test its stability in production environment, so it does its best to kill the server. First it makes a one-time indexing of all ODF files available on the machine, then starts issuing many requests at a time, in random number, for random files, at random intervals, for conversion into random formats.

To run testOoodHighLoad, do the following:

The test will first scan your entire "/" looking for ODF files and write it in an all_odf_docs file. Then will start real work.

The expected output looks like this:

03, 17:52:02 : MainThread : batch 10 (total 36), interval 5
03, 17:52:04 : Thread-15 : --------------------- got result: 200
03, 17:52:07 : Thread-18 : /home/bartek/tmp/exqmifxh.odt --> HTML document
03, 17:52:07 : Thread-13 : --------------------- got result: 200
03, 17:52:07 : MainThread : batch 7 (total 43), interval 17
03, 17:52:17 : Thread-17 : got error code: 402 xxxxxxxxxx "the document could not be processed"
03, 17:52:17 : Thread-19 : /home/bartek/ERP/konfa/Wstep_do_ERP5_biznes.odp --> Powerpoint presentation

While the test is running, you can do:

watch "cat stats.txt"

to see detailed information about the test progress, and description of return codes.

The stdout of the test is also written to "test.log". There is a little script "logproc.py" which you can then use to analyse test.log to see which documents give you 402, and investigate the reason.

Usage

The oood implements a custom protocol, which is best described in the attached design document.

The svn directory contains a "samples" subdirectory, where you can find sample client code in:

Known problems

TODO

HowToUseOood (last edited 2010-02-05 15:41:10 by ChetanKumar)