difficulties working with xml-data

advantages of xml

Xml has become the leading technology for data-exchange because of it's advantages:

  • Xml is well readable for humans and computers. It is an open standard. It can be read and written by any simple text-editor.
  • Xml-files can contain complex data-structures.
  • Xml-interfaces can easily be changed.
    • New elements can be added.
    • Programs reading this modified interface simply overread the new elements.
    • The programms do not crash!
  • Xml-elements have not to be positioned on a special line or column.
  • The order of xml-elements is normally not relevant.

But: Standard unix-tools do not work with xml-data!

The advantages of xml turn into disadvantages, because standard unix-tools like "diff", "cmp", "sort", "join" and "comm" do not work correct with xml-files. Look at the following examples:

example 1

The following two files are contentwise equal. only the order of the elements is completely different. A standard compare-tool will be not aware of this. It will always will find differences, although the files are cotentwise equal.

file: test1a.xml
<list_person>
    <person id="2">
        <name>Fischer</name>
        <firstname>Hans</firstname>
        <birthdate>1999-10-04</birthdate>
    </person>
    <person id="588521">
        <name>Becker</name>
        <firstname>Claudia</firstname>
        <birthdate>1990-01-18</birthdate>
    </person>
</list_person>
file: test1b.xml
<list_person>
    <person id="588521">
        <birthdate>1990-01-18</birthdate>
        <name>Becker</name>
        <firstname>Claudia</firstname>
    </person>
    <person id="2">
        <firstname>Hans</firstname>
        <name>Fischer</name>
        <birthdate>1999-10-04</birthdate>
    </person>
</list_person>

example 2

The "<name> and "<firstname>" of the person are in both files equal.
But in file "test2b.xml" there is no element "<birthdate>".

If you compare these two files for example with "diff"-utility, you cannot exclude element "<birthdate>" from comparison.

file: test2a.xml
<list_person>
    <person id="2">
        <name>Fischer</name>
        <firstname>Hans</firstname>
        <birthdate>1999-10-04</birthdate>
    </person>
</list_person>
file: test2b.xml
<list_person>
    <person id="2">
        <name>Fischer</name>
        <firstname>Hans</firstname>
    </person>
</list_person>
Logo SOFIKA GmbH

<xml>cmp-toolbox

  • comparing xml-files
  • merging xml-files
  • regrouping xml-files
  • sorting xml-files

<xml>cmp and large xml-files

  • designed for large xml-files
  • low memory consumption
  • very good performance

<xml>cmp-interfaces

  • command line interface (unix/dos)
  • java-api

differences are shown in the context of the xml-files:

  • all data + differences
  • only differences
  • output: xml and pdf
Software Fischer SOFIKA GmbH
Freseniusstr. 65
D-81247 Munich
Germany
Tel: +49 (0)89 / 81 00 90 15
Fax: +49 (0)89 / 81 00 90 16
Email: info@sofika.de