This is a cross-post from CloudAve.
While many despise WMA, DOC, MP3 and other proprietary formats the discussions about data formats used by web applications have been surprisingly silent. It is true that this is mainly because a lot of web applications offer XML export or offer an API for exporting data to other services. But as data formats in the cloud become more complex and by sheer number of data formats now generated by applications in the cloud, standardizing data formats becomes just as important as it is on the desktop.
I recently went to hear Richard Stallman speak about copyright issues and Håkon Wium Lie, the creator of CSS and CTO at Opera Software, had an interesting statement in his question to Stallman: “I believe that the need for open source in software is of lesser importance. What really matters is the data people produce and that we have proper standards to ensure portability between services and platforms.”
Desktop applications support import/export to various data formats. My browser of choice, Safari, can import bookmarks from other browsers. iWork accepts to a certain degree the proprietary formats produced by Microsoft Office. Any accounting software will be able to import data from a market leader like Quicken or MYOB. We are also beginning to see a similar set of functions in web applications and in many cases this is powered by direct API integration between two services.
As most web applications use XML for data there is already an open format offered to users. But should we require open standards to ensure easy portability of data between services? Many types of data are fairly simple to represent in a standard manner. I understand that this does not suit all the internal data structures for all web applications, but the main important bits of information can in most cases be specified to a common standard. A customer will in any case have name, address, phone numbers etc.
One example where a common standard would be very beneficial is accounting. Accounting is defined by commonly adopted principles but differ in countries in relation to reports, tax setup’s and such. But at the end of the day the data ends up in journal/transaction entries and account information. All accounting vendors take a different approach to this and importation of data must be designed for to suit the format in general or must be customized for customers. This limits the choice of accounting vendors for someone using a less popular accounting service. It also results in a lock-in for customers using applications which provide less commonly implemented data formats.
Data portability does not only imply solely the ability to export data, but the ability to import the data into another service. For a non-technical user a data format that is not accepted by other services is just as useless as a inaccessible service. In desktop software this is often solved by exporting to a format defined by a major competitor or using a simple format such as CSV and leaving the user the job of managing field matching.
image from Ted Berners-Lee TED talk
Taking data with you from one service to another is not just about the data in the system you are moving from but also about maintaining links to other systems. I suspect that data values that are linked to external integrated services are overlooked in many data exports. Migrating a CRM-system from one service to another in the cloud should also make it possible to keep links to external services. So any linking id’s in the CRM-system to a project management system should be maintained.
In other internet data communication we have seen recent standardized data formats evolve such as sitemaps and widgets. So maybe in the future we will see standard data formats for entities such as customers and projects. There seem to be a few organizations supporting the case but little has been done yet.
Do you have any experiences to share regarding data formats in the cloud or moving your data from one vendor to another?
Related articles: