Friday, September 14, 2012

Documentum Big Data import/export

I've been away from my blog for a while, busy on projects for clients.
I learned something on one of these projects that I thought worth sharing.

In a nutshell
Importing BIG files to or exporting them from Documentum is a challenge, but you can get around the out-of-the-box limits

Here is what happened

I was asked by a client with an existing Documentum system to help them with document import/export. They were unhappy with the solution that the previous contractor had built, using Taskspace and UCF. They complained that import often failed. They also wanted to add the ability for external systems to automatically import and export documents.
I asked about the kinds of documents they are storing and they turned out to be somewhat a-typical for a Documentum system. I my experience most Documentum systems are filled with documents of kilobytes to megabytes in size, with 1Gb being considered very big. For my customer, most files were between 10 and 50 Gb, with some as big as 500 Gb. That's BIG.

Documentum has no problem storing files of that size. The challenge is in getting the files from the client to the server and back.
Since they asking for import/export functionality for interactive clients as well as back-end integration with other systems, I proposed to create a webservice using the Documentum DFS (webservices framework).

Now DFS has serveral options for content transfer:
  • BASE64: This will include the content as part of the reply message to the webservice client. This is the easiest, but also the most restrictive. Only advisable for very small content files.
  • UCF: This is Documentum's proprietary content transfer method. It has many cool features for xml files and virtual documents and such, but it had proven unreliable in my customer's environment with the BIG files they have
  • MTOM: The Message Transmission Optimization Mechanism is a W3C standard especially meant to reliably send binary data in SOAP webservice calls.
MTOM looked promising but I had run into boundaries using MTOM for big files in a previous project. When exporting several big files simultaneously, the App Server running the web services would run into Java memory issues. That previous project had considered 10Mb big, so we were sure to run into the same boundaries here.

I solved this by cutting the content transfer up in pieces.
Exporting a file now goes like this:
  • The web service client starts an export by specifying which file it wants to receive. The web service returns an export token (a unique ID for this export request).
  • The webservice client the calls the web service again, supplying the token and the maximum number of bytes he wishes to receive (the default being 1Mb). The web service returns part of the content file using MTOM.
  • The web service client keeps calling until the full content file is transferred.
This very simple protocol turned out to work like a charm, even when simultaneously transferring files of many Gb . We did advice the client to use a separate DFS server machine, so the Documentum content server is not congested with all the disk- and network traffic the big files are causing and TaskSpace can keep running smoothly for the users.

TaskSpace
For the interactive clients we did one more trick so they can use the new export/import webservice.
normally you would have a component on the TaskSpace application server that acts as a web service client, but that would mean that the content would be sent to the application server and the application server would then send it to the user's browser. That would mean that the big files are sent over the network twice, causing unnecessary delays.
Documentum has a feature called Accelarated Content Services (ACS), but we could not use that in this project.

We did find a way to get the content from the DFS server directly to the user's browser:
We added a little javascript to the export page that calls the export web service and combines all the parts of the file into 1 BIG file.

It works, it performs, 1 solution for both interactive and integration use, I am happy !

Let me know what you think

3 comments:

  1. Hi,
    I am new to DFS and am struck with the approach to use for Content transfer in our application.
    our customers are both internal and external(accessing application over internet). We thought of using UCF but it rules out due to its unstable/unreliable.MTOM option also rules out as we have files very big in size(>200MB) and have very huge user base.
    Can you please suggest us the best approach?
    Your help is very much appreciated

    Thnx
    Vish

    ReplyDelete
  2. I am working for a project in the similar situation. Large content files files. Would you share some of your code for cutting the content transfer up in pieces? It would really appreciated.

    Harry

    ReplyDelete
  3. Hi, We are also facing the same issue. Can you please share the code snippet of chunking big files and returning token.

    ReplyDelete