XML documents are composed of markup and content. Using these two fundamental building blocks an XML document can be used to represent a wide variety of data ranging from software GUI files to high end technical publications.

Markup

There are six kinds of markup that can occur in an XML document: elements, entity references, comments, processing instructions, marked sections, and document type declarations.

point.bmp

Elements :

 

These are the most common form of markup. Delimited by angle brackets, most elements identify the nature of the content they surround. Some elements may be empty in which case they have no content. If an element is not empty, it begins with a start-tag, <element>, and ends with an end-tag, </element>.

point.bmp

Attributes :

 

These are name-value pairs that occur inside start-tags after the element name. For example <fontdata classtype="bold"> is a fontdata element with the attribute classtype having the value bold. In XML, all attribute values must be quoted.

point.bmp

Entity References :

 

The XML specification reserves the use of certain characters such as < and >. In order to insert these characters into your document as content, there must be an alternative way to represent them. In XML, entities are used to represent these special characters. Entities are also used to refer to often repeated or varying text and to include the content of external files.

Every entity must have a unique name.  In order to use an entity, you simply reference it by name. Entity references begin with the ampersand and end with a semicolon.

For example, the lt entity inserts a literal < into a document. So the string <element> can be represented in an XML document as &lt;element>.

A special form of entity reference, called a character reference, can be used to insert arbitrary Unicode characters into your document. This is a mechanism for inserting characters that cannot be typed directly on your keyboard.

Character references take one of two forms: decimal references, &#8478;, and hexadecimal references, &#x211E;. Both of these refer to character number U+211E from Unicode.

point.bmp

Comments :

 

These begin with <!-- and end with -->. Comments can contain any data except the literal string --. You can place comments between markup anywhere in your document.

Comments are not part of the textual content of an XML document and are displayed in Alchemy CATALYST as locked strings.

point.bmp

Processing Instructions :

 

Commonly referred to as PI instructions, they provide an escape hatch used to send raw data to an XML application. Like comments, they are not textually part of the XML document, but the XML processor is required to pass them to an application. Processing instructions have the form: <?name pidata?>. The name, called the PI target, identifies the PI to the application. Applications should process only the targets they recognize and ignore all other PIs.

Any data that follows the PI target is optional, it is for the application that recognizes the target. The names used in PIs may be declared as notations in order to formally identify them. PI names beginning with xml are reserved for XML standardization.

point.bmp

CDATA Sections :

 

 In a document, a CDATA section instructs the parser to ignore most markup characters.

Consider a source code listing in an XML document. It might contain characters that the XML parser would ordinarily recognize as markup (< and &, for example). In order to prevent this, a CDATA section can be used.

<![CDATA[*p = &q;b = (i <= 3);]]>

Between the start of the section, <![CDATA[ and the end of the section, ]]>, all character data is passed directly to the application, without interpretation. Elements, entity references, comments, and processing instructions are all unrecognized and the characters that comprise them are passed literally to the application.

 

Example of an XML Document

Here is a simple example of what an XML document might look like using some of the elements outlined above:-

Sample XML Document

 <?xml version="1.0" encoding="ISO-8859-1" ?>

<!--  Sample XML Document                -->

<note>

<to>Susan</to>

<from>Chuck</from>

<heading>Reminder</heading>

<body>Don't forget to collect the kids from school today!</body>

 </note>

Notice how <elements> always appear as pairs and that they must have a start-tag and end-tag. It is essential that <elements> are always paired like this and this 'well-formed' pairing is an essential characteristic of XML. In addition to this, notice how the encoding of the file is also specified at the very top of the file, this information is used by Alchemy CATALYST to correctly parse the file and to encode the file when it is extracted from a Project TTK.

What do you want to do?

Create a simple ezParse rule for an XML document.

Learn how to specify conditions in ezParse rules.

Learn how to define a translation-target in multi-lingual XML documents.