On this page
Class Parser
- All Implemented Interfaces:
- 
     DTDConstants
- Direct Known Subclasses:
- 
     DocumentParser
public class Parser extends Object implements DTDConstantsUnfortunately there are many badly implemented HTML parsers out there, and as a result there are many badly formatted HTML files. This parser attempts to parse most HTML files. This means that the implementation sometimes deviates from the SGML specification in favor of HTML.
The parser treats \r and \r\n as \n. Newlines after starttags and before end tags are ignored just as specified in the SGML/HTML specification.
The html spec does not specify how spaces are to be coalesced very well. Specifically, the following scenarios are not discussed (note that a space should be used here, but I am using   to force the space to be displayed):
'<b>blah <i> <strike> foo' which can be treated as: '<b>blah <i><strike>foo'
as well as: '<p><a href="xx"> <em>Using</em></a></p>' which appears to be treated as: '<p><a href="xx"><em>Using</em></a></p>'
If strict is false, when a tag that breaks flow, (TagElement.breaksFlows) or trailing whitespace is encountered, all whitespace will be ignored until a non whitespace character is encountered. This appears to give behavior closer to the popular browsers.
- See Also:
Field Summary
| Modifier and Type | Field | Description | 
|---|---|---|
| protected DTD | dtd | 
          The dtd.
          | 
| protected boolean | strict | 
          This flag determines whether or not the Parser will be strict in enforcing SGML compatibility.
          | 
Fields declared in interface javax.swing.text.html.parser.DTDConstants
ANY, CDATA, CONREF, CURRENT, DEFAULT, EMPTY, ENDTAG, ENTITIES, ENTITY, FIXED, GENERAL, ID, IDREF, IDREFS, IMPLIED, MD, MODEL, MS, NAME, NAMES, NMTOKEN, NMTOKENS, NOTATION, NUMBER, NUMBERS, NUTOKEN, NUTOKENS, PARAMETER, PI, PUBLIC, RCDATA, REQUIRED, SDATA, STARTTAG, SYSTEM
    Constructor Summary
Method Summary
| Modifier and Type | Method | Description | 
|---|---|---|
| protected void | endTag | 
            Handle an end tag.
            | 
| protected void | error | 
            Invokes the error handler with the 1st, 2nd and 3rd error message argument "?".
            | 
| protected void | error | 
            Invokes the error handler with the 2nd and 3rd error message argument "?".
            | 
| protected void | error | 
            Invokes the error handler with the 3rd error message argument "?".
            | 
| protected void | error | 
            Invokes the error handler.
            | 
| protected void | flushAttributes() | 
            Removes the current attributes.
            | 
| protected SimpleAttributeSet | getAttributes() | 
            Returns attributes for the current tag.
            | 
| protected int | getCurrentLine() | 
            Returns the line number of the line currently being parsed.
            | 
| protected int | getCurrentPos() | 
            Returns the current position.
            | 
| protected void | handleComment | 
            Called when an HTML comment is encountered.
            | 
| protected void | handleEmptyTag | 
            Called when an empty tag is encountered.
            | 
| protected void | handleEndTag | 
            Called when an end tag is encountered.
            | 
| protected void | handleEOFInComment() | 
            Called when the content terminates without closing the HTML comment.
            | 
| protected void | handleError | 
            An error has occurred.
            | 
| protected void | handleStartTag | 
            Called when a start tag is encountered.
            | 
| protected void | handleText | 
            Called when PCDATA is encountered.
            | 
| protected void | handleTitle | 
            Called when an HTML title tag is encountered.
            | 
| protected TagElement | makeTag | 
            Makes a TagElement.
            | 
| protected TagElement | makeTag | 
            Makes a TagElement.
            | 
| protected void | markFirstTime | 
            Marks the first time a tag has been seen in a document
            | 
| void | parse | 
            Parse an HTML stream, given a DTD.
            | 
| String | parseDTDMarkup() | 
            Parses the Document Type Declaration markup declaration.
            | 
| protected boolean | parseMarkupDeclarations | 
            Parse markup declarations.
            | 
| protected void | startTag | 
            Handle a start tag.
            | 
Field Details
dtd
protected DTD dtdstrict
protected boolean strictConstructor Details
Parser
public Parser(DTD dtd)dtd.
    - Parameters:
- dtd- the dtd.
Method Details
getCurrentLine
protected int getCurrentLine()- Returns:
- the line number of the line currently being parsed
makeTag
protected TagElement makeTag(Element elem, boolean fictional)- Parameters:
- elem- the element storing the tag definition
- fictional- the value of the flag "- fictional" to be set for the tag
- Returns:
- 
      the created TagElement
makeTag
protected TagElement makeTag(Element elem)- Parameters:
- elem- the element storing the tag definition
- Returns:
- 
      the created TagElement
getAttributes
protected SimpleAttributeSet getAttributes()- Returns:
- SimpleAttributeSetcontaining the attributes
flushAttributes
protected void flushAttributes()handleText
protected void handleText(char[] text)- Parameters:
- text- the section text
handleTitle
protected void handleTitle(char[] text)- Parameters:
- text- the title text
handleComment
protected void handleComment(char[] text)- Parameters:
- text- the comment being handled
handleEOFInComment
protected void handleEOFInComment()handleEmptyTag
protected void handleEmptyTag(TagElement tag) throws ChangedCharSetException- Parameters:
- tag- the tag being handled
- Throws:
- ChangedCharSetException- if the document charset was changed
handleStartTag
protected void handleStartTag(TagElement tag)- Parameters:
- tag- the tag being handled
handleEndTag
protected void handleEndTag(TagElement tag)- Parameters:
- tag- the tag being handled
handleError
protected void handleError(int ln, String msg)- Parameters:
- ln- the number of line containing the error
- msg- the error message
error
protected void error(String err, String arg1, String arg2, String arg3)- Parameters:
- err- the error type
- arg1- the 1st error message argument
- arg2- the 2nd error message argument
- arg3- the 3rd error message argument
error
protected void error(String err, String arg1, String arg2)- Parameters:
- err- the error type
- arg1- the 1st error message argument
- arg2- the 2nd error message argument
error
protected void error(String err, String arg1)- Parameters:
- err- the error type
- arg1- the 1st error message argument
error
protected void error(String err)- Parameters:
- err- the error type
startTag
protected void startTag(TagElement tag) throws ChangedCharSetException- Parameters:
- tag- the tag
- Throws:
- ChangedCharSetException- if the document charset was changed
endTag
protected void endTag(boolean omitted)- Parameters:
- omitted-- trueif the tag is no actually present in the document, but is supposed by the parser
markFirstTime
protected void markFirstTime(Element elem)- Parameters:
- elem- the element represented by the tag
parseDTDMarkup
public String parseDTDMarkup() throws IOException- Returns:
- the string representation of the markup declaration
- Throws:
- IOException- if an I/O error occurs
parseMarkupDeclarations
protected boolean parseMarkupDeclarations(StringBuffer strBuff) throws IOException- Parameters:
- strBuff- the markup declaration
- Returns:
- trueif this is a valid markup declaration; otherwise- false
- Throws:
- IOException- if an I/O error occurs
parse
public void parse(Reader in) throws IOException- Parameters:
- in- the reader to read the source from
- Throws:
- IOException- if an I/O error occurs
getCurrentPos
protected int getCurrentPos()- Returns:
- the current position
© 1993, 2023, Oracle and/or its affiliates. All rights reserved.
Documentation extracted from Debian's OpenJDK Development Kit package.
Licensed under the GNU General Public License, version 2, with the Classpath Exception.
Various third party code in OpenJDK is licensed under different licenses (see Debian package).
Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.
 https://docs.oracle.com/en/java/javase/21/docs/api/java.desktop/javax/swing/text/html/parser/Parser.html