How not to use XML!


This one goes out to the folks of MySQL who develop MySQL Workbench:

While starting the project i decided to model the database schema in a handy software, instead of doing all manually and writing SQL from hand. I had discoverd DBDesigner4 earlier and then found out that the man behind DBDesigner got employed by MySQL to work on a successor of MySQL workbench. That also explained why the last version of DBDesigner4 is already quite dated. Since i needed to convert the schema of Workbench into another format (more on that on a later post) i poked around the possibilities. Luckily for DBDesigner4 there was an XSL to convert into something I needed. I figured: Well if Workbench is based on DBDesigner4 the format of Workbench shouldn’t be that different. - False! It is an entire new format. Luckily I found another XSL which worked with the format of the first alpha of Workbench.

It wasn’t that much work to modify it to work with the first beta of Workbench. While modifing the stylesheet i noticed how broken the format is. While DBDesigner4 had a reasonable format, the whole Workbench file only consists of value- and link-tags.

Example of the beginning:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<?xml version="1.0"?>
<data grt_format="2.0" document_type="MySQL Workbench Model" version="1.1.2">
  <value type="object" struct-name="workbench.Document" id="{09E16181-F56F-4679-9BE3-2B3011614FC4}" struct-checksum="0x168c26cf">
    <value _ptr_="091538E8" type="dict" key="customData"/>
    <value type="object" struct-name="app.DocumentInfo" id="{9036684B-D03A-4A4C-ADA3-06389EEAA76A}" struct-checksum="0x6f9ae369" key="info">
      <value type="string" key="author">dhaas</value>
      <value type="string" key="caption">New Model</value>
      <value type="string" key="dateChanged">Jan 30, 2008</value>
      <value type="string" key="dateCreated">Jan 01, 1970</value>
      <value type="string" key="description"></value>
      <value type="string" key="name">Properties</value>
      <link type="object" struct-name="workbench.Document" key="owner">{09E16181-F56F-4679-9BE3-2B3011614FC4}</link>
      <value type="string" key="project">Name of the project</value>
      <value type="string" key="version">1.0</value>
    </value>
    <value type="object" struct-name="workbench.logical.Model" id="{F1F49DC7-983C-4675-925A-8C11ECA4206C}" struct-checksum="0xfe2cd94a" key="logicalModel">
      <value _ptr_="091533A8" type="dict" key="customData"/>
      <value _ptr_="09153570" type="list" content-type="object" content-struct-name="model.Marker" key="markers"/>
      <value type="string" key="name"></value>
      <value _ptr_="091539D8" type="dict" key="options"/>
      <link type="object" struct-name="workbench.Document" key="owner">{09E16181-F56F-4679-9BE3-2B3011614FC4}</link>
      <value _ptr_="0B4A04C8" type="list" content-type="object" content-struct-name="workbench.logical.View" key="views"/>
    </value>

- Okay, i lied. Actually there is even a third tag. The data-tag surrounds the whole document.
As you see, the fileformat is merely a dump of the internal structures of the workbench document. Ignoring the problems that brings when writing stylesheets because you can only match value-tags and the struct-name attribute which makes the whole XSL kinda messy, this even brings some uglier problems with it: The link-tag. The link-tag represents a pointer in the struct. So in the xml it points to another value-tag somewhere above the document.
But the real problem arises now: Since you cannot distinguish the original and a secondary pointer to any object in C/C++. This means that when an object (which is in fact a pointer) is encountered for the first time, it is saved as value, and on subsequent occurences, it is dumped as a link-tag to the original value.

The problem is now, that the occurences of objects varies, depending on the content of the workbench-model. So for example if you have a graphical representation inside your model, all tables are referenced first in the model and thus dumped as value tags. Later on, when the real table definitions are dumped, only links to the already dumped value-tags are generated.
This makes it really painful to develop a working XSL, because you always have to match two cases of tags: either the information you are searching for is directly inside the value tag, or the value tag is absent. Then you have to find the link-tag, take its id and find the original value-tag by this id.

In line 21 of the example you see such a link which actually was a pointer to the compound document around it. In this case it does not really matter because the workbench.Document will always appear first in the file, but unfortunatly everywhere else this is not the case and causes a host of problems when trying to read the XML-format.

Morale of the story: If you plan to develop your own XML-format, do not just dump your internal data-structures, otherwise you could as well dump them in binary form.

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


Other Posts
MySQL Workbench + propel = useful
Blog started!

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

[...] I wrote about converting MySQL Workbench files to Propel schema [...]