Here is a quick review of the current design and implementation of ERP5SyncML. Overall, it is good. However, there are still some inconsistencies or shortcomings, some of which require immediate action.
First of all, let us review the parameters of a publication
- The title of the publication
- Set to 1 to make synchronisation use activities. This is a bit strange since activities should always be used.
- The URL (http or email) of the publication
- The ID of the module from / to which documents will be synchronised. This is clearly a system design limitation. It would be for example interesting to be able with a single publication to synchronise all ERP5. In my opinion, the conduit should be in charge if finding where to create new documents (new nodes). The destination path parameter should no exist in the longer term. However, currently, it is not possible to remove because of the use of IDs in the synchronisation to locate documents in ERP5.
- Defines the name of the synchronisation for the SyncML protocol. Every publication needs a URI which may be different from the Title for more flexibility.
- The query has two purpose. First of all, define which documents in ERP5 should be synchronised. Second, locate documents in ERP5 based on their ID or GID. In my opinion, a query should be able to return a list of brains rather than objects. I am not sure that the current design is compatible with this.
- Converts an ERP5 document into an XML file which will be used for the synchronisation process. The XML schema is common to all parts of the synchronisation process. The XML may contain less information than the original ERP5 document.
- Converts an XUpdate file into an ERP5 document. The XUpdate file is computed by comparing successive XML files. The conduit is also able to convert an XML files into a GID. I think this is not very nice.
GPG key name
- A key can be specified (more information needed here).
- Defines the next ID to use whenever a new node is created. I do not understand why this is necessary.
- Defines a unique identifier based on the content of documents which are synchronised. This is very useful to change the semantic of synchronisation. Maybe this GID calculation should be based on the XML instead to prevent code duplication.
- It's the type of media transported in the syncml packet, the treatment is different if it's xml or vcard. We can't parse VCard as xml. We can't use Xupdate with other than xml. We can't send vcard as xml (we must precise in syncml packet with a CDATA markup that is not xml). So we must know the type of media transported in the syncml packet.
- Required by SyncML protocol, the format value MUST be b64, when using the clear-text, XML representation.
- The default type is syncml:auth-basic for the SyncML "Basic" form of authentication. But it could be other things (like syncml:auth-md5 : not supported yet).
Signatures are used to keep a persistent mapping at the publication side between documents on the publication side and documents on the subscription side.
Signatures are stored in a persistent mapping (or BTree) based on the GID which derives from their XML content. So, for each GID in synchronisation process, there is one signature. The signature is made of:
- - object_id : the ID of the document on the publication side (ie. the ID of the document in the module from / to which documents are synchronised) - rid : the ID of the document on the subscription side (ex. the ID of a record in an E61 phone) - XML : the XML representation of the synchronised content
In addition, signature keep the following parameters:
- - path : this seems to be redundant with object_id since path = Destination Path + '/' + object_id
Actions (short term)
- Why don't we always use activities ? Either there is a good reason or we should always use them.
Use activity parameter is used by the publication (and subscription too), and make the http response asynchronous. In SyncML recommendation, the publication can't open a new http connection. It's why we use this parameter only to synchronise with other ERP5 instances.
- For more flexibility, it should be possible to define a script to convert common XML into GID rather than expect the conduit to do this (additional parameter).
- Explain how to use GPG keys.
There no documentation and no unit tests on GPG keys, and it not seems to work, so the source code must be checked from the begening and documentation and unit tests must be write.
- I do not understand why the ID generator is needed. Why not rely on newContent ? I would like some explanation. Is it because one may want to control the ID of new objects so that, for example, they are the same on the two sides of a synchronisation ?
Which user is synchronisation processed with ? When I see:
user = UnrestrictedUser('syncml', 'syncml', ['Manager', 'Member'], '')
I feel it is not a standard used, which is a mistake. Please make sure that the whole synchronisation process takes into account user security and does not create a security leak.
=> this lines (user = Unre....) are now removed.
Actions (longer term)
Is it possible to use a path to identify objects on the publication side rather than an ID. This would make it possible to synchronise objects stored in different modules. If not, the use of uids is probably the best way. Is this compatible with SyncML (uids are long ints). Generally speaking, the combination of destination path + object_id to represent documents on the publication side is too restrictive and should be improved.
Make sure queries can return a list of brains (not objects).
Explain how to use email for synchronisation.
How should relations be synchronised between 2 ERP5 sites.