Sep 01 2008

The Disadvantages of Microsoft SharePoint 2007 as a Document Management System

Published by Ken Stewart at 10:06 pm under EDM, Technology

Disadvantages of Microsoft SharePoint as a Document Management SystemIn my recently completed series, SharePoint 2007: Friend or Foe, I outlined the impact Microsoft’s SharePoint offerings are having on the content and document management space in general - and strongly urged you and your company to consider how you cope with the $1 billion giant that is SharePoint.

Let’s look a little deeper at why SharePoint 2007 fails as a robust document management system by itself.

Remember, if you will, SharePoint has 2 offerings: Microsoft Office SharePoint Server (MOSS 2007) and Windows SharePoint Services (WSS 3.0). Read more on what SharePoint is here.

DMS/CMS Primer:

To understand most traditional document and content management systems, there is a fundamental architecture most cling to in order to optimize performance and build for scalability. What is termed “metadata” is stored inside a database, and optimized for search, while the documents or content is typically housed on an external storage medium such as a file server, network attached storage (NAS), or storage area network (SAN).

Metadata can be just about anything you may want to use, but it is typically thought of as indexes, OCR text, and notes; that is any textual information you may use to find the document later.

By keeping a pointer within the database of metadata, rather than the content itself, searches can be performed much more quickly and even cached for routine searches. In this fashion the image, document, or content really just becomes a visually-friendly and legally-viable original.

The Disadvantages:

  1. Document level options are not available: One of the key features most DMS/CMS systems offer are document level options like redaction and document mark-ups. While SharePoint has wonderful collaboration tools, DMS/CMS systems are geared around preserving the original state of a document - implementing markups as layers on top of the existing content. SharePoint does not provide this functionality as it relies on its Microsoft Office suite for document manipulation.
  2. Documents housed within database: for the reasons discussed within the primer above, this is perhaps the greatest weakness of SharePoint to date. Look for this to be corrected in future releases.
  3. Offering is broad but not deep: Microsoft’s SharePoint was meant to address many different aspects of document collaboration and data management. This has lead to an offering that can touch many different areas of a business, but has not been developed to be tightly integrated within the core just yet.
  4. Need for heavy customization: The need for heavy customization is evident almost from the start. Unless you are well versed in MOSS or WSS architecture, or have access to a top-shelf developer, SharePoint will likely not meet 100% of your needs.

With these 4 disadvantages so close to the surface, many organizations, including my own, are opting for a coexistence strategy. This creates another wrinkle in things as there is often a requirement for duplicate copies of the documents to be housed: 1 within SharePoint and 1 in another location such as our DMS.

Take head, however, these vulnerabilities are well known. As Microsoft decides to close in on the market of content management, coupled with the pulling of CIO’s and CTO’s to bring SharePoint to bear on their organization’s information issues, these disadvantages are predicted to be addressed.


Ken Stewart’s blog, ChangeForge.com, focuses on the collision between the constantly changing worlds of business and technology. Ken is also the Director of Technology at Kearns Business Solutions.


  • Microsoft SharePoint: The billion-dollar, slumbering giant
  • SharePoint 2007: Friend or Foe? - 1 of 4 -
  • Why Document Management Will Fail In Your Company

  • Viewing 6 Comments

      • ^
      • v
      Hi there... in your article "The Disadvantages of Microsoft SharePoint 2007 as a Document ..." you state that "Documents housed within database" as beeing one of the big disadvantages of Sharepoint...

      I was wondering why you thought this... if the database houses metadata, index information etc, as well as the image/raw doc why is this a problem... does not the benefit of having a database supply the data integrity for all items better than having to worry about the joys of links/tags to an external data store...

      As you may see at DL we have created a Online Document Archving solution (Instant Intelligence Archiving) which we sale via a Channel using a SaaS model, and all the documents/images are stored with a database along (but in a seperate DB) with indexing data (ocred text, index information supplied by the user, etc)... we did this to help with our own DR process... Yes there is a speed issue with getting Blob data out of the DB, but with the speed of processors that exist now this speed hit is becoming less and less of a problem.

      I would be very interested in hearing your thoughts....

      Kindest Regards
      Chris
      • ^
      • v
      Chris, thank you for e-mailing me, and I would like to thank you for stopping by ChangeForge. I greatly appreciate your question.

      First, let me qualify that I am not a database engineer or DBA. That aside, I work in a position whereby I have been exposed to a small number of CMS/DMS solutions to include some big names like EMC (Legato) Application xTender, and some smaller ones you probably have never heard of.

      So here's my take:

      We have 2 differing formats for CMS/DMS prentations: 1) the unstructured and "crawl the sprawl" route (e.g. Google), and 2) the highly structured route as in traditional CMS/DMS offerings. I am focused more on the latter, just to clarify.

      Traditionally, metadata is stored within a structured format to increase the transactional return of information - and to increase overall transaction speed and efficency. You even see this in Business Intelligence (BI) software where they are cubing data to help increase the return of large volumes of information. However, in most cases of document management we are not in need of this high a computational load as would an operations company at a billion dollar+ organization. Again, my article was focused more around SMB's - which I would think would be appropriate to your SaaS offering as well (not having looked indepth at the offering).

      To clarify, my statement was geared more towards what I consider maintainability of the infrastructure. As you know, text is smaller and can be compressed moreso than binary image files (traditionally TIF, PDF, BMP). As such, thought would indicate searches on raw text should be much faster than having to parse image files.

      Second, in maintaining the necessary archives (in an on-premise solution) keeping the image files outside of the database can make for much cleaner backups. Traditionally, backup agents handle backups of raw files (in an NTFS file format for instance) much more cleanly than in very large databases. Usually, the image repository of a CMS/DMS is the largest part of an installation - so making this as flexible as possible is to the benefit of the maintainer.

      Third, ability for administrators of the CMS/DMS soluiton to access and maintain images is very key. We have found it much easier to manage documents outside of the database in instances where an image file has gone corrupt (or thought to be corrupt) and we can access the file directly. This usually happens in situations where the originals are often and quickly destroyed once reliability of the system is established. You might argue security as a counterpoint to this, and this is a difficult challenge but one that can be answered generally.

      Last, and to harp on the backups, many solutions I've worked with support multiple DB's (e.g. SQL, MySQL, Oracle, DB2, etc.). I have worked with a MySQL version of a databae where the images were stored within the database, and major backup software vendors do not (at the time of my research) make an agent that allows for differential and/or incremental backups, thus making restoration a very dangerous thing - especially in situations where documents are destroyed very soon after initial scan.

      I would submit that I am not familiar with IIA architecture or design - and have no doubt CMS/DMS development may one day over come this. At this point, my experience over the last 3 years has led me to this conclusion. This is not completely scientific, but many ECM vendors and experts alike also share my opinion. SharePoint has some limitations outside of this as well, as I have learned in working with one of our Microsoft Gold Certified Partners that recently conducted an indepth study for a worldwide automotive corporation.

      Again, this is not to say storing the documents within a database is a bad thing in a SaaS offering. I might enjoy taking a tour of your software as time permits over the next few weeks. I firmly believe both SharePoint and SaaS have a huge role to play in the CMS/DMS space, and I have on-going research to do in these areas.

      Obviously, you e-mailed me so I was wondering if you would be agreeable to me posting this conversation thread in Discus comments? If not, I will abide by your wishes and look forward to continuing this conversation.

      Thanks for making me think about this,
      Ken
      • ^
      • v
      Thank you for such a detailed reply and on the weekend as well…

      Firstly I am more than happy for you to publish this conversation, and also give you permission to edit it as you see fit. From your reply and the articles you have published (the ones that I have read) it would appear you have no vested interest in editing our conversation to change the context of my thoughts…

      I would agree with many of your points, if not all of them. At Data Liberation we have worked with images, within our DMS application, but also with our Data Capture (OMR) application, and we moved very quickly to using a SQL database for storing the images after enduring the pain of lost files etc within file systems.

      Just to cover a couple of point you raise, as I mentioned in my first email we use two distinct databases one database for meta/index data and one for images/documents. Our system takes what ever it is supplied and stores the file as a blob image in one database, thereby ensuring the integrity of the documents (users can look at this, but can not update it, but can of course add new versions). If we recognise the file type (e.g. most image formats, Word, RTF, TXT, PDF etc) the system will either OCR the document or strip the text of the document out and store this in the other database. Additionally users can add there own metadata to the file. By having the all this text based information in one database we are able to perform queries to the documents very quickly and then retrieve the document only if the users requests it.

      As we use MS SQL 2005 (with sights on SQL 2008 before the end of the year) we have the benefit of being able to do incremental backups of either database. In our case we do log backups every 15 minutes on both database giving us almost continuous backup protection, full backups happen over night.

      The one advantage that you highlight with regards to direct access to documents, in the situation where the file has been corrupted. This I would agree is much easier with a file system and extremely (in comparison) difficult with a database. My response to this is that by using a database and the additional integrity that a SQL database provides is that it would be very unlikely that a single document would be corrupted, with a great chance that the entire image database becoming corrupted.

      I think we can both agree no matter what approach is taken, backups and the backup strategy is vital to any CMS/DMS system. The systems become a hugely valuable resource to a company and the loss or even partial loss of any of the data contained within them can be potentially devastating.

      I am very happy, when time permits you, to supply you with any information you would like on iiArc. The only area I would have reservations on is the way that we implement encryption of the uploaded documents/images, but otherwise I would thoroughly enjoy defending our approach.

      As you mentioned SaaS and Sharepoint will have a huge impact on the CMS/DMS market in the coming years… It could be easily argued that Sharepoint already has changed the CMS/DMS landscape massively already… and I believe that some of the current bigger players within the SME market will need to change their sales models and products sets to meet the more demanding and much better informed clients that now exist, or run the risk of losing market share and potentially disappearing all together…

      Kindest Regards

      Chris Morgan

      Managing Director

      Data Liberation Ltd
      • ^
      • v
      Chris, this sounds very interesting. The architecture indeed sounds strong - especially given the SaaS aspect of your offering. It makes me wince to even think of backup/restore operations on DB's, but I have had to do many for various reasons. I would also submit, this is one very strong advantage for SMB's to look at SaaS offerings - to which I penned an article as well. There is still the classic, on-going debate on "owning the data" versus "renting the application" - especially in mission critical or applications.

      With regards to the shifting marketplace, I would whole-heartedly agree. Microsoft, if not by education alone, has shifted the landscape already. I look for the future of CMS/DMS to have many consolidations and many closures... That being said, I would venture to say you are positioning your company very smartly if trends continue.
      • ^
      • v
      I have a few limitations outlined at http://sharepointdocumentmanagement.wordpress.com/

      The big one lies in the fact that Microsoft recommends keeping Document Libraries at 2000 objects or less for performance reasons. Some serious planning needs to take place, especially if you are using it as a repository for scanned documents.
      • ^
      • v
      Steve, thanks for the lead on that. I have heard many of the same things. MS's 2007 offerings for SharePoint technology has taken a huge leap forward, and it is getting noticed my many IT consultants who have an inside track with SMB's... They are instantly comfortable with a 'free' MS product and are strongly recommending its rollout for starting the chain reaction of document management.
     

    Trackbacks

    (Trackback URL)

    close Reblog this comment
    blog comments powered by Disqus