Friday, October 5, 2018

Getting rid of excess Documentum thumbnails


OpenText Documentum comes with a number of default settings that are not appropriate for most projects and can even cause problems when applied to a production system unchanged. One of those settings concerns thumbnails.

Documentum xCP is a great tool for creating Case Management applications using no code. It includes a viewer that can show your documents alongside their metadata and other relevant process information. Great stuff! The way that this viewer shows your documents can have some consequences however.
The xCP viewer can show documents of any file format by converting the pages of the document into images called thumbnails and then displaying the thumbnails in your browser. There are large thumbnails that are displayed in the main viewer window. There are also small thumbnails that are shown in the viewer’s page navigation bar on the left. All these thumbnail images that are generated are stored in Documentum with the document as renditions. Once the thumbnails have been generated, the next time you view the document the viewer only has to retrieve them.




An xCP application displaying a document

To make the xCP Viewer display your documents even faster, Documentum can be configured to generate the thumbnails as soon as you upload a new document.

Now here is the downside that can cause problems for your system. Documentum has a default setting that will enable automatic thumbnail generation for all documents in your system. Though the idea sounds nice at first, it means that a number of extra renditions will be added to every version of every document in you Documentum system, even if no one is going to look at the document using the xCP Viewer. All these renditions take up extra disk space. A LOT of extra disk space. The generated thumbnails can take up more space than the original content files, so your space requirement for content files will be 2x to 3x as much as needed for the original content. If you have millions of documents with terabytes of content files, you will need a serious budget increase just to store all the thumbnails.

Fortunately there are several ways to improve this situation. To start with, you can disable the default configuration and enable automatic thumbnail generation just for the document types and content types that will be displayed in the xCP application. This will reduce the number of useless thumbnails that are generated.
You can also turn off the automatic thumbnail generation altogether. In this case the xCP Viewer will generate the thumbnails it needs when it displays a document. This will take a second or so, but will save enormously on the storage needed for the thumbnails. The only downside is that the page navigation on the left side of the viewer does not work in this configuration. the xCP Viewer does not generate the small thumbnails needed for the page navigation. It only generates the main thumbnails.

At Informed Consulting we created a solution that improves on this in 2 ways:
-        A small xCP custom module that generates the small thumbnails for the xCP viewer
-        A custom job that deletes old thumbnails that are no longer needed
With these improvements we’ve cut the diskspace needed for thumbnails by 95% without losing any functionality in the xCP Viewer.


If you’re struggling with a Documentum project, contact us, we’ll be glad to help.

Friday, April 29, 2016

Choosing Documentum xCP or D2 in 2016

For the past years Documentum has had 2 user applications:
  • D2, a configurable application originally bought from C6
  • xCP, an application composition platform made by EMC engineering
Rumours that both products will be merged into 1 have been going around for years (see my blog about that from 2013) . This has not happened however. Instead both products have been expanded over the years. D2 has added a number of customization options, xCP has added some document management functions. Both have a great UI that can be layed out and styled as your application needs. This makes it ever harder to choose between them. Here is my take on when to choose which:

D2

The functionality that D2 offers is mostly geared toward Document Centric Applications. It has lots of features in that area. For things like mailroom or personnel file applications, D2 can quickly get results.

xCP

xCP has been created with Case Management Applications in mind. It really shines when used for loan applications, insurance claims, or court case management for instance.

Future direction

Though the best-fit for D2 and xCP is still pretty clear at the moment, I wonder what direction they will be taking. Both products will be actively developed for the near future. There are other things to consider however.
What about mobile Apps used on smartphones? They are showing up on wish lists more and more often. The official Documentum Mobile App has never gained much traction, since users don't want a big general application. On their smartphone they want small specialized applications to perform a specific task quickly and easily. Approving a document for instance. The good news is that EMC is investing in mobile application technology to enable businesses to easily create such apps.

Horizon

And what about Project Horizon? Will this replace D2, or xCP, or both? Even though more details will be released by EMC about the new platform at EMCWorld next week, we know that Horizon is positioned as a new modern cloud-first application platform. It is not necessarily aimed at the Enterprise Content Management market. Its aim is much wider, managing both structured and unstructured information at scale in the cloud. Horizon does not have a framework for building user applications. It has micro services Apps. Any of the numerous modern application building frameworks such as Cordova, Xamarin or Meteor can be used to build user applications for Horizon. Therefore Horizon will not replace D2 or xCP. I do expect to see integrations however; xCP or D2 applications leveraging Horizon services, or mobile Apps using Horizon as well as Documentum services.

Saturday, March 7, 2015

Documentum Dump and Load limitations

Lately I've been involved in a project where we used Documentum's dump/load feature to copy a lot of documents from one repository to another. We successfully copied millions of documents, folders and other objects, but this success did not come easy. In this blog I would like to share some of the issues we had for the benefit of others using dump and load.

A standard tool

Dump and load is a tool that can be used to extract a set of objects from a Documentum repository into a dump file and load them into a different repository. Dump and load is part of the Documentum Content Server. This means it can be used with any Documentum repository in the world. The tool is documented in the Documentum Content Server Administration and Configuration Guide (find it here on the EMC Support site). The admin guide describes the basic operation of dump and load, but does not discuss its limitations. There is also a good Blue Fish article about dump and load that provides a bit more background.

A fragile tool

Dump and load only works under certain circumstances. Most importantly, the repository must be 100% consistent, or a dump will most likely fail. So my first tip: always run dm_clean, dm_consistencychecker and dm_stateofdocbase jobs before dumping and fix any inconsistencies found.

Dump Limitations

The dump tool has limitations. Dump can be instructed to dump a set of objects using a DQL query. The dump tool will run the query and dump all selected objects. It will also dump all objects that the selected objects reference. That includes the objects ACLs, folders, users, groups, formats, object types, etc.etc. This is done in an effort to guarantee that the configuration in the target repository will be ok for the objects to land. This feature causes a lot of trouble, especially when the target repository has already been configured with all the needed object types, formats, etc. It causes a 100 object dump to grow into a dump of thousands of objects, slowing the dump and load process. Worse, the dump tool will dump any objects that are referenced from the original objects by object ID. This causes the folder structure for the selected documents to be included as well as the content objects, but it can also cause other documents to be included, including everything that these documents reference (it it s recursive process). This method can backfire, for instance if you select audit trail objects for instance, all objects that they reference will be included in the dump.
Now this would not have been so bad if the dump tool had not had size limitations, but it does. We found for instance that it is impossible to dump a folder that has more than 20.000 objects in it (though your mileage may vary). The dump tool just fails at some point in the process. We discussed it with EMC Support and their response was that the tool has limitations that you need to live with.
As another example we came across a repository where a certain group had many supergroups. This group was a member of more than 10.000 other groups. This was also too much for the dump tool. Since this group was given permissions in most ACLs, it became impossible to do any dumps in that repository. In the end we created a preparation script that removed this group from the other groups and a post-dump script to restore the group relations.

Load Limitations

The load tool has its own limitations. Most importantly we found that the bigger the dump file, the slower the load. This means that a dump file with 200.000 objects will not load in twice the time it takes to load 100.000 objects, it will take longer. We found that in our client's environment we really needed to keep the total object count of the dumps well below 1 million, or the load time would go from hours to days. We learned the hard way when we had a load fail after 30 hours and we needed to revert it and retry.
Secondly, objects may be included in multiple dump files, for instance when there are inter-document relations. For objects like folders and types this is fine, the load tool will see that the object already exists and skip it. Unfortunately this works differently for documents. If a document is present in 3 dump files, the target folder will hold 3 identical documents after they have been loaded. Since you have no control over what is included in a dump file and you cannot load partial dump files, there is little you can do to prevent these duplications. We've had to create de-duplication scripts to resolve this for our client. We also found that having duplicates can mean that the target docbase can have more documents than the source and that the file storage location or database can run out of space. So for our production migration we temporarily increased the storage space to prevent problems.
Another limitation concerns restarting of loads. When a load stops half way through, it can be restarted. However we have not seen any load finish successfully after a restart in our project. Instead it is better to revert a partial load and start it all over. Revert is much quicker than loading.
Finally we found that after loading, some meta data of the objects in the target repository was not as expected. For instance some fields containing object IDs still had IDs of the source repository in them and some had NULL IDs where there should have been a value. Again we wrote scripts to deal with this.

As a final advice I would encourage you to run all the regular consistency and cleaning jobs after finishing the loading process. This includes dm_consistencychecker, dm_clean, dm_filescan, dm_logpurge etc. This will clean up any stuff left behind by deleting duplicate documents and will ensure that the docbase is in a healthy state before it goes back into regular use.

As you may guess from this post, we had an exiting time in this project. There was a tight deadline, we had to work long hours, but we had a successful migration and I am proud of everyone involved.

If you want to know more, or want to share your own experience with dump and load, feel free to leave a comment or send me an email or tweet (@SanderHendriks).

Thursday, October 16, 2014

Creating ECM integration products

I am so excited, I can no longer hold it in!

You may wonder what I've been up to these last few months.
I haven't written a blog in a while because I've been involved in a big project.
At InformedConsulting we're not just advising our clients and implementing ECM systems any more; we're building our own products. We feel this is a great way to showcase our expertise and to help our customers to get value from their information more quickly.

I've been handed the job of Product Manager for our first product. It will enable an easy way to integrate a Documentum system with Office365. We're the first in the world to do a trick like this and we're very proud of how it is turning out. The official release is near, so I can blog more details about what it does soon.

As a product manager I had the challenge of organizing the design, engineering, testing and marketing for the product, involving an ever changing team of almost everyone in the company, including both directors. I introduced an Agile way of working, which helped a lot. We use User Stories and Sprints as the base for our engineering, a Plan Board for our marketing and Retrospective sessions to improve our way of working. We managed to take a lot of chaos and red tape and turn that into focused activities that add value to our product.

I'm very proud of everyone involved!
I hope this is the first of many more products to follow.


Tuesday, May 13, 2014

Writing Bootstrap applications for Documentum 7


We all know that the modern user wants to use our applications on any device, from anywhere in the world. This used to be a real challenge for applications using EMC Documentum. The out of the box applications such as WebTop, Administrator and TaskSpace where all designed to be used on a PC or laptop with a big screen. If you try to use those on your smartphone, the usability is going down the drain. And I'm not even mentioning that is uses a browser plugin for file transfers.

With the advent of Documentum 7 and xCP 2, things are looking better. The new xCP application UI works in all major browsers without using plugins. It also scales better to fit on smaller size screens.
However, there is still a way to go before the xCP UI can be called Responsive.

The xCP UI was designed for PC screens, not with 'Mobile first' in mind. It uses a lot of screen area by default and tries to shrink and rearrange things on screens that are smaller. For the designer it is difficult to control how things will look on smaller screens.

Fortunately there is another novelty in xCP 2: REST services.
The addition of REST services makes it possible to use any of the popular HTML5 / CCS3 / REST frameworks to write beautiful, responsive client applications.
Since the Bootstrap framework (by Twitter) is one of the most popular UI frameworks and it was highly recommended by the InformedConsulting UX experts, I decided to see how difficult it would be to write an application using Bootstrap.
(Last week at EMCWorld #MMTM14 it was mentioned that EMC is also looking into the use of Bootstrap for the
 future responsive UI; seems I am ahead of the game :-)

Setting up

Bootstrap consists of some CSS styling and some Javascript. You combine that with an HTML page and voila, you have a Bootstrap application. You don't need a Java application server to host the application, any old webserver will do. To the webserver it is just like serving some static content pages. All the dynamic behaviour is performed in the user's browser by Javascript calling REST services on the Documentum server.
This means that the initial setup consists of just downloading Bootstrap, an HTML page and maybe some images and javascript to your web server and you're good to go. In my case I chose to re-use the ROOT webapp on the server that also hosts the REST services. This also removes any cross-site scripting issues that may otherwise occur.

UX Design

To start your application, you will need to do some user interface design. You need to decide where you want to have your header, footer, menu's, tabs, panels, dialogs etc.
You can do this with an HTML editor, but you can also use one of a growing number of UX Design tools.
For instance you could try out Pingendo

Adding behaviour

Once you have the UX Design, you can add javascript to add the application behaviour. Again this could theoretically be done using a text editor, but tools will make your life easier.
In my case I used Aptana Studio.
This helps with syntax highlighting, syntax checking (have you closed all your braces?), code outline, etc.

Bootstrap uses jQuery for its dynamic behaviour, so calling a Documentum REST service is as easy as using $.ajax(). The Documentum REST services return JSON (Javascript Object Notation), so it if very easy to use
the returned objects in your Javascript code. If you want to add some modern behaviour, such as lazy loading result sets, there are many Bootstrap and jQuery plug-ins that can be used. You should however be carefull that the plug-ins are a responsive to screen size as Bootstrap itself is. Some older plug-ins aren't very responsive in design.

Testing

Testing the application is easy in some ways and difficult in others.
Whenever you make a change, you just press Refresh in your browser and you can test. No need for lengthy deployments, server restarts and all that. Single-step debugging is also easy. You can use the F12 functionality of your browser (Firebug, or the equivalent functionality in IE). Apart from seeing exactly what code is beeing executed, this will also give you insight into the values of all variables, parameters and javascript objects the code is working with.

What makes it more difficult is that some bugs will just crash your javascript and debugging will give no clue where the error is, so those kinds of bugs can be hard to find.
I also wonder what happens when your codebase grows. Javascript is less structured then Java or .NET, which makes it quick to work with, but could also lead to spaghetti chaos if good design patterns are not used.

Results

For my project the easy setup and easy deployment and testing resulted in the first working application demo after only a day! Most of that day was spent on the UX Design. After this first day, things got a little more involved, with more work being spent on coding and testing.
Now, a week into the project, we have an application with a few tabs, a query builder, a browse tree, right-click menu's, browser stored history and more. And it looks good too, on my laptop, iPad and smartphone!

Here is what it looks like:





I am really happy with the results so far.
Let me know what you think.

Sunday, October 6, 2013

The integration of D2 and xCP

Most people who follow what EMC has been doing with Documentum know that the product currently supports 2 separate application layers: D2 and xCP. They come from different backgrounds, with D2 being more aimed at document centric applications and xCP aimed at case management applications. As they evolve, they are starting to overlap more and more. With the next release, D2 will add more workflow capability and extensibility and xCP will add more document management functionality. If this continues, how will future customers be able to choose between the 2 products?
The answer is, you won't have to since they are gradually merging.

Even last year, when xCP2 first came out, there were already whispers that over time xCP and D2 will merge and become 1 product. This merge will take a few years, since the two products are built on different technology layers. D2 uses Google Web Toolkit (GXT), xCP uses ExtJS. D2 uses SOAP services, xCP uses REST services. These are major technology differences that make merging these products a long an difficult process.

xD

Let's look into the future for a bit. I wonder what a product that merges D2 with xCP would be like.
Let's call this new future product xD (excelerated Documentum).
What will this product be?
What will convince customers to upgrade to xD?

I think that EMC will create xD with the stongest parts from D2 and xCP:
It will have a run time configuration layer taken from D2.
It will also have the strong build time configuration taken from xCP.

Creating a merged runtime

To make that combination work, a new runtime will be needed that supports both the D2 and the xCP functionality.
This is where the major technology choices will need to be made. Will it use the Google Web Toolkit, or ExtJS? Will Soap or REST services be used? Time will tell, but currently my money is on ExtJS with REST.
If my guess is right, that would mean that some of the great D2 functions will need to be ported to the ExtJS/REST runtime:
- auto naming, auto linking
- lifecycle functionality with state based authorization
- C2 annotation/markup and O2 Office integration

Configuring Applications

Once all of the functionality is available in one runtime engine, the configuration tools can be added on top.
There will be build time configuration with xCP Designer and run time configuration with the D2 configuration matrix. So what things will be configured using each tool?
A paradigm that would work for me, would be that the build time configuration tool could be used to create a sort of template app. A set of types, rules and processes that define an application.
The run time tool can then be used to fill in the run time details to create 1, or more xD applications using the template app.

What things would be configurable at run time?

- Users, roles, authorization for objects and functions
- Picklists
- Document content templates
- User interface pages and workspaces with widgets that have been configured build-time
- UI widgets that integrate external data
- Search experience
- Viewer and Office integrations
- Reports

What to configure build time?

- Object types, relations and aspects
- Lifecycles and processes
- UI widgets and page fragments
- Business events
- Full-text indexing
- Content transformation
- Back-end external system integrations


An application platform that could do all that would be very flexible and I would love to use it to build awesome applications for my customers.

EMC can you make xD happen please?

Friday, September 20, 2013

To D2 or to xCP, that is the Question

Now the Momentum Developer Conference is over, it's time to look back and see what we've learned.
I chose to follow the D2 track this time and since I've already done a project with xCP2, that gives me the opportunity to compare the two products.

The question many people are asking is which product to use for what. It seems to me that the long term answer to that may be very different from the short term, since both products are converging and are bound to be merged in some way in a couple of years. I have some thoughts on that, but I'll save that for a later post.

So, when you want to build a Documentum application today, what product do you use? D2 or xCP?
Given what I've learned this week, I would say the choice depends mostly on the core functionality that you need. We all know since last year that xCP 2 has a very powerfull design tool that will let you build almost any application UI and gives you many options to integrate that with your current, or external applications. Now that I've seen what D2 4.1 can do, I realize that the same goes for D2. It also offers a composable UI that can be extended with custom functionality and integrated with other applications.

So the main differentiator between D2 and xCP is not the UI, but the underlaying OOTB functionality.
The functionality that D2 offers is most geared toward Document Centric Applications (or old-school ECM if you will). It has lots of features in that area, such as Auto-naming, Auto-linking, Documentum lifecycles, Virtual documents, etc.
xCP on the other hand has been created with Case Management Applications in mind. It has features such as Business objects, Stateless processes and Discovered metadata.

So there it is. My simplification of the current situation: If your application is mainly about documents, you should consider D2, if it has a case or data object focus, consider xCP2. That will give you the most usefull OOTB functionality.
Having said that, there are other things to take into account when selecting a product, such as the OS and database platforms you are using and other technical and organization details, so take my view as a pointer, but not as the whole truth and the only truth.

What you think? Feel free to let me know.