Validating with XML Schema

This section looks at the process of XML Schema validation. Although a full treatment of XML Schema is beyond the scope of this tutorial, this section shows you the steps you take to validate an XML document using an XML Schema definition. (To learn more about XML Schema, you can review the online tutorial, XML Schema Part 0: Primer . At the end of this section, you will also learn how to use an XML Schema definition to validate a document that contains elements from multiple namespaces.

Overview of the Validation Process

To be notified of validation errors in an XML document, the following must be true:

  • The factory must configured, and the appropriate error handler set.
  • The document must be associated with at least one schema, and possibly more.

Configuring the DocumentBuilder Factory

It is helpful to start by defining the constants you will use when configuring the factory. These are the same constants you define when using XML Schema for SAX parsing, and they are declared at the beginning of the DOMEcho example program.

static final String JAXP_SCHEMA_LANGUAGE =
    "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
static final String W3C_XML_SCHEMA =
    "http://www.w3.org/2001/XMLSchema";

Next, you configure DocumentBuilderFactory to generate a namespace-aware, validating parser that uses XML Schema. This is done by calling the setValidating method on the DocumentBuilderFactory instance dbf, that was created in Instantiate the Factory.

// ...

dbf.setNamespaceAware(true);
dbf.setValidating(dtdValidate || xsdValidate);

if (xsdValidate) {
    try {
        dbf.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
    }
    catch (IllegalArgumentException x) {
        System.err.println("Error: JAXP DocumentBuilderFactory attribute " 
                           + "not recognized: " + JAXP_SCHEMA_LANGUAGE);
        System.err.println("Check to see if parser conforms to JAXP spec.");
        System.exit(1);
    }
}

// ...

Because JAXP-compliant parsers are not namespace-aware by default, it is necessary to set the property for schema validation to work. You also set a factory attribute to specify the parser language to use. (For SAX parsing, on the other hand, you set a property on the parser generated by the factory).

Associating a Document with a Schema

Now that the program is ready to validate with an XML Schema definition, it is necessary only to ensure that the XML document is associated with (at least) one. There are two ways to do that:

  • With a schema declaration in the XML document
  • By specifying the schema(s) to use in the application

Note - When the application specifies the schema(s) to use, it overrides any schema declarations in the document.


To specify the schema definition in the document, you would create XML like this:

<documentRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='YourSchemaDefinition.xsd'> [...]

The first attribute defines the XML namespace (xmlns) prefix, xsi, which stands for "XML Schema instance." The second line specifies the schema to use for elements in the document that do not have a namespace prefix-that is, for the elements you typically define in any simple, uncomplicated XML document. (You will see how to deal with multiple namespaces in the next section.)

You can also specify the schema file in the application, which is the case for DOMEcho.

static final String JAXP_SCHEMA_SOURCE =
    "http://java.sun.com/xml/jaxp/properties/schemaSource";
        
// ...

dbf.setValidating(dtdValidate || xsdValidate);
if (xsdValidate) {
    // ...    
}

if (schemaSource != null) {
    dbf.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource));
}

Here, too, there are mechanisms at your disposal that will let you specify multiple schemas. We will take a look at those next.

Validating with Multiple Namespaces

Namespaces let you combine elements that serve different purposes in the same document without having to worry about overlapping names.


Note - The material discussed in this section also applies to validating when using the SAX parser. You are seeing it here, because at this point you have learned enough about namespaces for the discussion to make sense.


To contrive an example, consider an XML data set that keeps track of personnel data. The data set may include information from a tax declaration form as well as information from the employee's hiring form, with both elements named form in their respective schemas.

If a prefix is defined for the tax namespace, and another prefix defined for the hiring namespace, then the personnel data could include segments like the following.

<employee id="...">
  <name>....</name>
  <tax:form>
     ...w2 tax form data...
  </tax:form>
  <hiring:form>
     ...employment history, etc....
  </hiring:form>
</employee>

The contents of the tax:form element would obviously be different from the contents of the hiring:form element and would have to be validated differently.

Note, too, that in this example there is a default namespace that the unqualified element names employee and name belong to. For the document to be properly validated, the schema for that namespace must be declared, as well as the schemas for the tax and hiring namespaces.


Note - The default namespace is actually a specific namespace. It is defined as the "namespace that has no name." So you cannot simply use one namespace as your default this week, and another namespace as the default later. This "unnamed namespace" (or "null namespace") is like the number zero. It does not have any value to speak of (no name), but it is still precisely defined. So a namespace that does have a name can never be used as the default namespace.


When parsed, each element in the data set will be validated against the appropriate schema, as long as those schemas have been declared. Again, the schemas can be declared either as part of the XML data set or in the program. (It is also possible to mix the declarations. In general, though, it is a good idea to keep all the declarations together in one place.)

Declaring the Schemas in the XML Data Set

To declare the schemas to use for the preceding example in the data set, the XML code would look something like the following.

<documentRoot
  xmlns:xsi=
  "http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation=
    "employeeDatabase.xsd"
  xsi:schemaLocation=
  "http://www.irs.gov.example.com/ 
   fullpath/w2TaxForm.xsd
   http://www.ourcompany.example.com/ 
   relpath/hiringForm.xsd"
  xmlns:tax=
    "http://www.irs.gov.example.com/"
  xmlns:hiring=
    "http://www.ourcompany.example.com/"
>

The noNamespaceSchemaLocation declaration is something you have seen before, as are the last two entries, which define the namespace prefixes tax and hiring. What is new is the entry in the middle, which defines the locations of the schemas to use for each namespace referenced in the document.

The xsi:schemaLocation declaration consists of entry pairs, where the first entry in each pair is a fully qualified URI that specifies the namespace, and the second entry contains a full path or a relative path to the schema definition. In general, fully qualified paths are recommended. In that way, only one copy of the schema will tend to exist.

Note that you cannot use the namespace prefixes when defining the schema locations. The xsi:schemaLocation declaration understands only namespace names and not prefixes.

Declaring the Schemas in the Application

To declare the equivalent schemas in the application, the code would look something like the following.

static final String employeeSchema = "employeeDatabase.xsd";
static final String taxSchema = "w2TaxForm.xsd";
static final String hiringSchema = "hiringForm.xsd";

static final String[] schemas = {
    employeeSchema,
    taxSchema, 
    hiringSchema,
};

static final String JAXP_SCHEMA_SOURCE =
    "http://java.sun.com/xml/jaxp/properties/schemaSource";

// ...

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance()
    
// ...

factory.setAttribute(JAXP_SCHEMA_SOURCE, schemas);

Here, the array of strings that points to the schema definitions (.xsd files) is passed as the argument to the factory.setAttribute method. Note the differences from when you were declaring the schemas to use as part of the XML data set.

  • There is no special declaration for the default (unnamed) schema.
  • You do not specify the namespace name. Instead, you only give pointers to the .xsd files.

To make the namespace assignments, the parser reads the .xsd files, and finds in them the name of the target namespace they apply to. Because the files are specified with URIs, the parser can use an EntityResolver (if one has been defined) to find a local copy of the schema.

If the schema definition does not define a target namespace, then it applies to the default (unnamed, or null) namespace. So, in our example, you would expect to see these target namespace declarations in the schemas:

  • A string that points to the URI of the schema
  • An InputStream with the contents of the schema
  • A SAX InputSource
  • A File
  • An array of Objects, each of which is one of the types defined here

An array of Objects can be used only when the schema language has the ability to assemble a schema at runtime. Also, when an array of Objects is passed it is illegal to have two schemas that share the same namespace.

Running the DOMEcho Sample With Schema Validation

To run the DOMEcho sample with schema validation, follow the steps below.

  1. Navigate to the samples directory.% cd install-dir/jaxp-1_4_2-release-date/samples.
  2. Compile the example class, using the class path you have just set.% javac dom/*
  3. Run the DOMEcho program on an XML file, specifying schema validation.

    Choose one of the XML files in the data directory and run the DOMEcho program on it with the -xsd option specified. Here, we have chosen to run the program on the file personal-schema.xml.

    % java dom/DOMEcho -xsd data/personal-schema.xml

    As you saw in Configuring the Factory, the -xsd option tells DOMEcho to perform validation against the XML schema that is defined in the personal-schema.xml file. In this case, the schema is the file personal.xsd, which is also located in the sample/data directory.

  4. Open personal-schema.xml in a text editor and delete the schema declaration.

    Remove the following from the opening <personnel> tag.

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='personal.xsd'

    Do not forget to save the file.

  5. Run DOMEcho again, specifying the -xsd option once more.% java dom/DOMEcho -xsd data/personal-schema.xml

    This time, you will see a stream of errors.

  6. Run DOMEcho one more time, this time specifying the -xsdss option and specifying the schema definition file.

    As you saw in Configuring the Factory, the -xsdss option tells DOMEcho to perform validation against an XML schema definition that is specified when the program is run. Once again, use the file personal.xsd.

    % java dom/DOMEcho -xsdss data/personal.xsd data/personal-schema.xml

    You will see the same output as before, meaning that the XML file has been successfully validated against the schema.