How To Install And Use Oood
Contents
News
- March 11th, 2008
There is a bug of OOo2.2.1 that Oood sometimes fails to open microsoft word file. Recommend to use OOo2.3.
- June 28th, 2007
Multi-oood - a proxy dispatching requests to a number of oood's - is now available.
- May 31st, 2007
Five client codes are now available: Python, PHP, Java, C++ and Ruby.
- May 29th, 2007
Lots of improvements have been made: oood now supports nearly all file formats, can generate non-ODFs from other non-ODFs, has a more user-friendly configuration file, and the old implementation has been removed alltogether.
- May 8th, 2007
The oood passed the ultimate test - it converted 100,000 docs, working continuously for four days, under a heavy load. The test results can be viewed at http://erp5.pl/stats.txt. Hooray!!!
- May 3rd, 2007
The oood - implementation 2 is basically finished; it uses a new protocol for client-server communication. We also eliminated the use of None, so SimpleXMLRPCServer patch is not required anymore. Also, I found out that the reason oood was segfaulting was thread-unsafety of pyuno. The way to run it safely now is to use a pool of only one worker (config.pool_size=1), then it survives the worst treatment imagineable. I'll write a new version, which will spawn workers into separate processes, in a few weeks.
- April 21st, 2007
The Dispatcher is basically working - it passes testOoodBasicOperations.py and survives if worker is destroyed manually and if OOo instance is killed. However, there is still a long way to go. To use the new implementation, set "implementation" in config.py to 2 (default is 1). Backward compatibility is maintained. If you use implementation 2, there is no need to run start.py --init, OOo instances are started by the daemon.
- April 18th, 2007
New method "printDocument" - prints the given document directly from the daemon's OOo instance, so if the printer is available for the serwer, the doc does not need to be converted and returned to be printed.
- April 11th, 2007
The request handling part is going to be rewritten completely - the serw. Procesor class will be replaced by Dispatcher, which will be the only handler for timeouts and exceptions.
The program
The name
The name "oood" is an abbreviation - it stands for "OpenOffice.org Daemon". Some people who feel uncomfortable with the name (saying "oh oh oh dee" makes them feel like stutterers), so we are still experimenting with other options (including "office-daemon", "oh-daemon" and "pokemon").
The idea
The oood is an XMLRPC server capable of converting office-type files between various formats (basically MSOffice and OpenOffice), generating PDF and HTML, and also editing metadata of documents. It is capable of doing almost anything that OpenOffice.org can do; one limitation is that it doesn't let you choose save options (like encoding, separators in csv etc), and always uses default values.
The way it works
An initialization script starts an OpenOffice processes. It is started in background and uses virtual display, so you don't see it. Then the daemon creates a "worker" object, communicating with the OpenOffice through sockets. The server waits for an incoming request, the pool dispatches it to the worker if it is available (or waits until something becomes available) and so on.
Installation
Requirements
Currently, oood works with OpenOffice 2.0.3 or 2.1 and Python 2.4. It has been successfully tested with the following versions of OpenOffice:
There is a bug on OOo2.2.1 that Oood sometimes fails to open microsoft word file. Recommend to use OOo2.3.
- 2.0.3-7
- 2.1.0-6
- 2.0_64
- 2.2 with python2.5 (mdv 2007 spring)
- 2.3
A combination of Python 2.5 and OpenOffice 2.1 does not work because this version of pyuno library is not compatible with Python 2.5.
You also need Xvfb (virtual frame buffer).
From RPMs
Download an openoffice.org-oood RPM from Nexedi's repository, install, done. But, don't do it now - RPM contain an old version. Check out from svn, wait for new RPMs.
Installation on 64-bit architecture
Largely the same as on 32 bit, only you need OpenOffice 64 bits from the "backports" repository (e.g. ftp://fr2.rpmfind.net/linux/Mandrakelinux/official/2007.0/x86_64/media/main/backports
From source
Check it out from svn at:
https://svn.erp5.org/repos/public/erp5/trunk/utils/oood
You can also patch the file:
[OpenOffice-installation-dir]/share/registry/data/org/openoffice/Setup.xcu
With this: openoffice.org-2.0.3-skip-registration.patch. Otherwise the OpenOffice will expect you to register, and since it is started in the background it won't really start (unless you use the --top option and click through registration).
Setting up
One problem with OpenOffice is that loads very long if it does not have Java path configured - this is why instance load time in config.py is by default 120 seconds. The way to make it shorter is run
start.py --top
This starts OpenOffice in the foreground - in each instance go to options->java, set path to your j2re installation. Then you can change 120 seconds into 10 seconds. Also you may need to make sure that the owner of the oood_instance is oood like this
chown -R oood:oood /var/run/oood
Configuration
Edit oood.conf - most of entries are self-explanatory or described in the file.
- instance_load_time - see above
- instance_timeout - defines how long the oood is working for an OOo instance before deciding it is not going to return, because it froze or crashed; too small a value will make processing big files impossible, too high will slow down processing if many files are broken. Something like 30 seconds should be ok.
- formats to use or to skip = oood supports 111 file formats, so if you fill a dropdown in your client application with all formats allowed for, say, text file, it can be quite long; to make it shorter you can tell oood which formats to skip, or which to use (and skip all the rest)
Running
Starting up
The manual way to run it in the foreground is:
cd /var/lib/oood ./runserw.py --start
To start oood at specified port number offset, type:
cd /var/lib/oood ./runserw.py --start --offset=n
for instance:
./runserv.py --start --offset=1
would start oood on port 8009 if oood.conf specifies port 8008. This way many oood instances can be easily started without a need to create own config file for each instance. Port offset can be negative.
To easily run many oood instances simultaneously, type:
./runserv.py --multiple=n
or
./runserv.py --start --multiple=m:n
The first form starts ooods at offsets 0,1,..,n-1.
The second form starts ooods at offsets m,m+1,..,n (m < n)
When started with mulitple parameter, runserv.py just forks children ooods with proper offset parameter. When killed, it will kill its children ooods.
The multiple parameter is mainly used by Multi-oood server.
Monitoring
Do:
watch "./start.py --threads"
to see threads at work,
watch "./start.py --status"
to have some current information about what the server is doing.
Testing
There are three test suites - two of them test basic functionalities of oood, and require a subdirectory with files - download test_documents.zip from svn and unzip it into your oood home directory.
The third suite (testOoodHighLoad) is meant to test its stability in production environment, so it does its best to kill the server. First it makes a one-time indexing of all ODF files available on the machine, then starts issuing many requests at a time, in random number, for random files, at random intervals, for conversion into random formats.
To run testOoodHighLoad, do the following:
- configure and start oood
- make sure the oood home directory is writeable for you
- set max_batch_size and max_interval in testOoodHighLoad.py; the ratio of max_batch_size/max_interval is the average time you give your machine to process a document, so adjust it to your machine's processing power, and be so kind to give your machine a few seconds to deal with a doc, sometimes they are fairly large; combinations like "5/15", "10/40" or "100/300" have been run successfully
- if there is an old all_odf_docs file in the directory, remove it
- start testOoodHighLoad; you can give it a desired number of conversions as a cmdline argument, the default number is 100
The test will first scan your entire "/" looking for ODF files and write it in an all_odf_docs file. Then will start real work.
The expected output looks like this:
03, 17:52:02 : MainThread : batch 10 (total 36), interval 5 03, 17:52:04 : Thread-15 : --------------------- got result: 200 03, 17:52:07 : Thread-18 : /home/bartek/tmp/exqmifxh.odt --> HTML document 03, 17:52:07 : Thread-13 : --------------------- got result: 200 03, 17:52:07 : MainThread : batch 7 (total 43), interval 17 03, 17:52:17 : Thread-17 : got error code: 402 xxxxxxxxxx "the document could not be processed" 03, 17:52:17 : Thread-19 : /home/bartek/ERP/konfa/Wstep_do_ERP5_biznes.odp --> Powerpoint presentation
While the test is running, you can do:
watch "cat stats.txt"
to see detailed information about the test progress, and description of return codes.
The stdout of the test is also written to "test.log". There is a little script "logproc.py" which you can then use to analyse test.log to see which documents give you 402, and investigate the reason.
Usage
The oood implements a custom protocol, which is best described in the attached design document.
The svn directory contains a "samples" subdirectory, where you can find sample client code in:
- Python
- PHP
- Java
- C++
- Ruby
Known problems
TODO
- support user-supplied flags
- RPMs