|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.ObjectSegment
Source
Represents a source HTML document.
The first step in parsing an HTML document is always to construct a Source object from the source data, which can be a
String, Reader, InputStream or URL.
Each constructor uses all the evidence available to determine the original character encoding of the data.
Once the Source object has been created, you can immediately start searching for tags or elements within the document
using the tag search methods.
In certain circumstances you may be able to improve performance by calling the fullSequentialParse() method before calling any
tag search methods. See the documentation of the fullSequentialParse() method for details.
Any issues encountered while parsing are logged to a Logger object.
The setLogger(Logger) method can be used to explicitly set a Logger implementation for a particular Source instance,
otherwise the static Config.LoggerProvider property determines how the logger is set by default for all Source instances.
See the documentation of the Config.LoggerProvider property for information about how the default logging provider is determined.
Note that many of the useful functions which can be performed on the source document are
defined in its superclass, Segment.
The source object is itself a segment which spans the entire document.
Most of the methods defined in this class are useful for determining the elements and tags surrounding or neighbouring a particular character position in the document.
For information on how to create a modified version of this source document, see the OutputDocument class.
Segment| Constructor Summary | |
Source(java.lang.CharSequence text)
Constructs a new Source object from the specified text. |
|
Source(java.io.InputStream inputStream)
Constructs a new Source object by loading the content from the specified InputStream. |
|
Source(java.io.Reader reader)
Constructs a new Source object by loading the content from the specified Reader. |
|
Source(java.net.URL url)
Constructs a new Source object by loading the content from the specified URL. |
|
| Method Summary | |
void |
clearCache()
Clears the tag cache of all tags. |
java.util.List |
findAllElements()
Returns a list of all elements in this source document. |
java.util.List |
findAllStartTags()
Returns a list of all start tags in this source document. |
java.util.List |
findAllTags()
Returns a list of all tags in this source document. |
Element |
findEnclosingElement(int pos)
Returns the most nested Element that encloses the specified position in the source document. |
Element |
findEnclosingElement(int pos,
java.lang.String name)
Returns the most nested Element with the specified name that encloses the specified position in the source document. |
Tag |
findEnclosingTag(int pos)
Returns the Tag that encloses the specified position in the source document. |
Tag |
findEnclosingTag(int pos,
TagType tagType)
Returns the Tag of the specified type that encloses the specified position in the source document. |
int |
findNameEnd(int pos)
Returns the end position of the XML Name that starts at the specified position. |
CharacterReference |
findNextCharacterReference(int pos)
Returns the CharacterReference beginning at or immediately following the specified position in the source document. |
Element |
findNextElement(int pos)
Returns the Element beginning at or immediately following the specified position in the source document. |
Element |
findNextElement(int pos,
java.lang.String name)
Returns the Element with the specified name beginning at or immediately following the specified position in the source document. |
Element |
findNextElement(int pos,
java.lang.String attributeName,
java.lang.String value,
boolean valueCaseSensitive)
Returns the Element with the specified attribute name/value pair beginning at or immediately following the specified position in the source document. |
EndTag |
findNextEndTag(int pos)
Returns the EndTag beginning at or immediately following the specified position in the source document. |
EndTag |
findNextEndTag(int pos,
java.lang.String name)
Returns the normal EndTag with the specified name beginning at or immediately following the specified position in the source document. |
EndTag |
findNextEndTag(int pos,
java.lang.String name,
EndTagType endTagType)
Returns the EndTag with the specified name and type beginning at or immediately following the specified position in the source document. |
StartTag |
findNextStartTag(int pos)
Returns the StartTag beginning at or immediately following the specified position in the source document. |
StartTag |
findNextStartTag(int pos,
java.lang.String name)
Returns the StartTag with the specified name beginning at or immediately following the specified position in the source document. |
StartTag |
findNextStartTag(int pos,
java.lang.String attributeName,
java.lang.String value,
boolean valueCaseSensitive)
Returns the StartTag with the specified attribute name/value pair beginning at or immediately following the specified position in the source document. |
Tag |
findNextTag(int pos)
Returns the Tag beginning at or immediately following the specified position in the source document. |
Tag |
findNextTag(int pos,
TagType tagType)
Returns the Tag of the specified type beginning at or immediately following the specified position in the source document. |
CharacterReference |
findPreviousCharacterReference(int pos)
Returns the CharacterReference at or immediately preceding (or enclosing) the specified position in the source document. |
EndTag |
findPreviousEndTag(int pos)
Returns the EndTag beginning at or immediately preceding the specified position in the source document. |
EndTag |
findPreviousEndTag(int pos,
java.lang.String name)
Returns the normal EndTag with the specified name at or immediately preceding (or enclosing) the specified position in the source document. |
StartTag |
findPreviousStartTag(int pos)
Returns the StartTag at or immediately preceding (or enclosing) the specified position in the source document. |
StartTag |
findPreviousStartTag(int pos,
java.lang.String name)
Returns the StartTag with the specified name at or immediately preceding (or enclosing) the specified position in the source document. |
Tag |
findPreviousTag(int pos)
Returns the Tag beginning at or immediately preceding (or enclosing) the specified position in the source document. |
Tag |
findPreviousTag(int pos,
TagType tagType)
Returns the Tag of the specified type beginning at or immediately preceding (or enclosing) the specified position in the source document. |
Tag[] |
fullSequentialParse()
Parses all of the tags in this source document sequentially from beginning to end. |
java.lang.String |
getCacheDebugInfo()
Returns a string representation of the tag cache, useful for debugging purposes. |
java.util.List |
getChildElements()
Returns a list of the top-level elements in the document element hierarchy. |
int |
getColumn(int pos)
Returns the column number of the specified character position in the source document. |
java.lang.String |
getDocumentSpecifiedEncoding()
Returns the document encoding specified within the text of the document. |
Element |
getElementById(java.lang.String id)
Returns the Element with the specified id attribute value. |
java.lang.String |
getEncoding()
Returns the character encoding scheme of the source byte stream used to create this object. |
java.lang.String |
getEncodingSpecificationInfo()
Returns a concise description of how the encoding of the source document was determined. |
Logger |
getLogger()
Returns the Logger that handles log messages. |
java.io.Writer |
getLogWriter()
Deprecated. Use ((WriterLogger)getLogger()).getWriter() instead. |
java.lang.String |
getNewLine()
Returns the newline character sequence used in the source document. |
ParseText |
getParseText()
Returns the parse text of this source document. |
java.lang.String |
getPreliminaryEncodingInfo()
Returns the preliminary encoding of the source document together with a concise description of how it was determined. |
int |
getRow(int pos)
Returns the row number of the specified character position in the source document. |
RowColumnVector |
getRowColumnVector(int pos)
Returns a RowColumnVector object representing the row and column number of the specified character position in the source document. |
SourceFormatter |
getSourceFormatter()
Formats the HTML source by laying out each non-inline-level element on a new line with an appropriate indent. |
Tag |
getTagAt(int pos)
Returns the Tag at the specified position in the source document. |
void |
ignoreWhenParsing(java.util.Collection segments)
Causes all of the segments in the specified collection to be ignored when parsing. |
void |
ignoreWhenParsing(int begin,
int end)
Causes the specified range of the source text to be ignored when parsing. |
CharStreamSource |
indent(java.lang.String indentString,
boolean tidyTags,
boolean collapseWhiteSpace,
boolean indentAllElements)
Deprecated. Use getSourceFormatter().setIndentString(indentString).setTidyTags(tidyTags).setCollapseWhiteSpace(collapseWhiteSpace).setIndentAllElements(indentAllElements) instead. |
boolean |
isLoggingEnabled()
Deprecated. Use getLogger().isInfoEnabled() instead. |
boolean |
isXML()
Indicates whether the source document is likely to be XML. |
void |
log(java.lang.String message)
Deprecated. Use getLogger().info(message) instead. |
Attributes |
parseAttributes(int pos,
int maxEnd)
Parses any Attributes starting at the specified position. |
Attributes |
parseAttributes(int pos,
int maxEnd,
int maxErrorCount)
Parses any Attributes starting at the specified position. |
void |
setLogger(Logger logger)
Sets the Logger that handles log messages. |
void |
setLogWriter(java.io.Writer writer)
Deprecated. Use setLogger(new WriterLogger(writer)) instead. |
java.lang.String |
toString()
Returns the source text as a String. |
| Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
public Source(java.lang.CharSequence text)
Source object from the specified text.
text - the source text.
public Source(java.io.Reader reader)
throws java.io.IOException
Source object by loading the content from the specified Reader.
If the specified reader is an instance of InputStreamReader, the getEncoding() method of the
created source object returns the encoding from InputStreamReader.getEncoding().
reader - the java.io.Reader from which to load the source text.
java.io.IOException - if an I/O error occurs.
public Source(java.io.InputStream inputStream)
throws java.io.IOException
Source object by loading the content from the specified InputStream.
The algorithm for detecting the character encoding of the source document from the raw bytes
of the specified input stream is the same as that for the Source(URL) constructor,
except that the first step is not possible as there is no
Content-Type header to check.
inputStream - the java.io.InputStream from which to load the source text.
java.io.IOException - if an I/O error occurs.getEncoding()
public Source(java.net.URL url)
throws java.io.IOException
Source object by loading the content from the specified URL.
The algorithm for detecting the character encoding of the source document is as follows:
(process termination is marked by ♦)
charset parameter, then use the encoding specified in the value of the charset parameter. ♦
getEncoding() method
returns null. ♦
| BOM Bytes | Encoding |
|---|---|
EF BB FF | UTF-8 |
FF FE 00 00 | UTF-32 (little-endian) |
00 00 FE FF | UTF-32 (big-endian) |
FF FE | UTF-16 (little-endian) |
FE FF | UTF-16 (big-endian) |
0E FE FF | SCSU |
2B 2F 76 | UTF-7 |
DD 73 66 73 | UTF-EBCDIC |
FB EE 28 | BOCU-1 |
getPreliminaryEncodingInfo() method for details.
charset parameter was included in the HTTP
Content-Type header.
This is consistent with the preliminary encoding detected in this scenario.
url - the URL from which to load the source text.
java.io.IOException - if an I/O error occurs.getEncoding()| Method Detail |
public java.lang.String getDocumentSpecifiedEncoding()
The document encoding can be specified within the document text in two ways. They are referred to generically in this library as an encoding specification, and are listed below in order of precedence:
<?xml version="1.0" encoding="ISO-8859-1" ?>
META tag with attribute http-equiv="Content-Type".
The encoding is specified in the charset parameter of a
Content-Type
HTTP header value, which is placed in the value of the meta tag's content attribute.
This META declaration should appear as early as possible in the HEAD element.
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
Both of these tags must only use characters in the range U+0000 to U+007F, and in the case of the META declaration must use ASCII encoding. This, along with the fact that they must occur at or near the beginning of the document, assists in their detection and decoding without the need to know the exact encoding of the full text.
null if no encoding is specified.getEncoding()public java.lang.String getEncoding()
The encoding of a document defines how the original byte stream was encoded into characters.
The HTTP specification section 3.4
uses the term "character set" to refer to the encoding, and the term "charset" is similarly used in Java
(see the class java.nio.charset.Charset).
This often causes confusion, as a modern "coded character set" such as Unicode
can have several encodings, such as UTF-8, UTF-16, and UTF-32.
See the Wikipedia character encoding article
for an explanation of the terminology.
This method makes the best possible effort to return the name of the encoding used to decode the original source byte stream
into character data. This decoding takes place in the constructor when a parameter based on a byte stream such as an
InputStream or URL is used to specify the source text.
The documentation of the Source(InputStream) and Source(URL) constructors describe how the return value of this
method is determined in these cases.
It is also possible in some circumstances for the encoding to be determined in the Source(Reader) constructor.
If a constructor was used that specifies the source text directly in character form (not requiring the decoding of a byte sequence)
then the document itself is searched for an encoding specification. In this case, this
method returns the same value as the getDocumentSpecifiedEncoding() method.
The getEncodingSpecificationInfo() method returns a simple description of how the value of this method was determined.
null if the encoding is not known.getEncodingSpecificationInfo()public java.lang.String getEncodingSpecificationInfo()
The description is intended for informational purposes only. It is not guaranteed to have any particular format and can not be reliably parsed.
getEncoding()public java.lang.String getPreliminaryEncodingInfo()
It is sometimes necessary for the Source(InputStream) and Source(URL) constructors to search the document for an
encoding specification in order to determine the exact encoding
of the source byte stream.
In order to search for the document specified encoding before the exact encoding is known, a preliminary encoding is determined using the first four bytes of the input stream.
Because the encoding specification must only use characters in the range U+0000 to U+007F, the preliminary encoding need only have the following basic properties determined:
The encodings used to represent the most commonly encountered combinations of these basic properties are:
In some descriptions returned by this method, and the documentation below, a pattern is used to help demonstrate the contents of the first four bytes of the stream.
The patterns use the characters "00" to signify a zero byte, "XX" to signify a non-zero byte, and "??" to signify
a byte than can be either zero or non-zero.
The algorithm for determining the preliminary encoding is as follows:
00 00..." : If the stream starts with two zero bytes, the default 32-bit big-endian encoding UTF-32BE is used.
00 XX..." : If the stream starts with a single zero byte, the default 16-bit big-endian encoding UTF-16BE is used.
XX ?? 00 00..." : If the third and fourth bytes of the stream are zero, the default 32-bit little-endian encoding UTF-32LE is used.
XX 00..." or "XX ?? XX 00..." : If the second or fourth byte of the stream is zero, the default 16-bit little-endian encoding UTF-16LE is used.
XX XX 00 XX..." : If the third byte of the stream is zero, the default 16-bit big-endian encoding UTF-16BE is used (assumes the first character is > U+00FF).
4C XX XX XX..." : If the first four bytes are consistent with the EBCDIC encoding of
an XML declaration ("<?xm") or
a document type declaration ("<!DO"),
or any other string starting with the EBCDIC character '<' followed by three non-ASCII characters (8th bit set),
which is consistent with EBCDIC alphanumeric characters,
the default EBCDIC-compatible encoding
Cp037 is used.
XX XX XX XX..." : Otherwise, if all of the first four bytes of the stream are non-zero,
the default 8-bit ASCII-compatible encoding
ISO-8859-1 is used.
If it was not necessary to search for a document specified encoding when determining the
encoding of this source document from a byte stream, this method returns null.
See the documentation of the Source(InputStream) and Source(URL) constructors for more detailed information about when the detection of a
preliminary encoding is required.
The description returned by this method is intended for informational purposes only. It is not guaranteed to have any particular format and can not be reliably parsed.
null if no preliminary encoding was required.getEncoding()public boolean isXML()
The algorithm used to determine this is designed to be relatively inexpensive and to provide an accurate result in most normal situations. An exact determination of whether the source document is XML would require a much more complex analysis of the text.
The algorithm is as follows:
xhtml", it is an XHTML document, and hence
also an XML document.
As of version 2.5, this method no longer returns true if the document doesn't contain an HTML element.
The library is often used to parse partial HTML documents, so the lack of an HTML element is not a reliable test for an XML document.
true if the source document is likely to be XML, otherwise false.public java.lang.String getNewLine()
If the document does not contain any newline characters, this method returns null.
The three possible return values (aside from null) are "\n", "\r\n" and "\r".
null if none is present.public int getRow(int pos)
pos - the position in the source document.
java.lang.IndexOutOfBoundsException - if the specified position is not within the bounds of the document.getColumn(int pos),
getRowColumnVector(int pos)public int getColumn(int pos)
pos - the position in the source document.
java.lang.IndexOutOfBoundsException - if the specified position is not within the bounds of the document.getRow(int pos),
getRowColumnVector(int pos)public RowColumnVector getRowColumnVector(int pos)
RowColumnVector object representing the row and column number of the specified character position in the source document.
pos - the position in the source document.
RowColumnVector object representing the row and column number of the specified character position in the source document.
java.lang.IndexOutOfBoundsException - if the specified position is not within the bounds of the document.getRow(int pos),
getColumn(int pos)public java.lang.String toString()
String.
toString in interface java.lang.CharSequencetoString in class SegmentString.public Tag[] fullSequentialParse()
Calling this method can greatly improve performance if most or all of the tags in the document need to be parsed.
Calling the findAllTags(), findAllStartTags(), findAllElements() or getChildElements() method on the Source object
performs a full sequential parse automatically.
There are however still circumstances where it should be called manually, such as when it is known that most or all of the tags in the document will need to be parsed,
but none of the abovementioned methods are used, or are called only after calling one or more other tag search methods.
If this method is called manually, is should be called soon after the Source object is created,
before any tag search methods are called.
By default, tags are parsed only as needed, which is referred to as parse on demand mode. In this mode, every call to a tag search method that is not returning previously cached tags must perform a relatively complex check to determine whether a potential tag is in a valid position.
Generally speaking, a tag is in a valid position if it does not appear inside any another tag. Server tags can appear anywhere in a document, including inside other tags, so this relates only to non-server tags. Theoretically, checking whether a specified position in the document is enclosed in another tag is only possible if every preceding tag has been parsed, otherwise it is impossible to tell whether one of the delimiters of the enclosing tag was in fact enclosed by some other tag before it, thereby invalidating it.
When this method is called, each tag is parsed in sequence starting from the beginning of the document, making it easy to check whether each potential tag is in a valid position. In parse on demand mode a compromise technique must be used for this check, since the theoretical requirement of having parsed all preceding tags is no longer practical. This compromise involves only checking whether the position is enclosed by other tags with certain tag types. The added complexity of this technique makes parsing each tag slower compared to when a full sequential parse is performed, but when only a few tags need parsing this is an extremely beneficial trade-off.
The documentation of the TagType.isValidPosition(Source, int pos, int[] fullSequentialParseData) method,
which is called internally by the parser to perform the valid position check,
includes a more detailed explanation of the differences between the two modes of operation.
Calling this method a second or subsequent time has no effect.
This method returns the same list of tags as the Source.findAllTags() method, but as an array instead of a list.
If this method is called after any of the tag search methods are called,
the cache is cleared of any previously found tags before being restocked via the full sequential parse.
This is significant if the Segment.ignoreWhenParsing() method has been called since the tags were first found, as any tags inside the
ignored segments will no longer be returned by any of the tag search methods.
See also the Tag class documentation for more general details about how tags are parsed.
public java.util.List getChildElements()
The objects in the list are all of type Element.
The term top-level element refers to an element that is not nested within any other element in the document.
The term document element hierarchy refers to the hierarchy of elements that make up this source document.
The source document itself is not considered to be part of the hierarchy, meaning there is typically more than one top-level element.
Even when the source represents an entire HTML document, the document type declaration and/or an
XML declaration often exist as top-level elements along with the HTML element itself.
The Element.getChildElements() method can be used to get the children of the top-level elements, with recursive use providing a means to
visit every element in the document hierarchy.
The document element hierarchy differs from that of the Document Object Model
in that it is only a representation of the elements that are physically present in the source text. Unlike the DOM, it does not include any "implied" HTML elements
such as TBODY if they are not present in the source text.
Elements formed from server tags are not included in the hierarchy at all.
Structural errors in this source document such as overlapping elements are reported in the log. When elements are found to overlap, the position of the start tag determines the location of the element in the hierarchy.
Calling this method on the Source object performs a full sequential parse automatically.
A visual representation of the document element hierarchy can be obtained by calling:
getSourceFormatter().setIndentAllElements(true).setCollapseWhiteSpace(true).setTidyTags(true).toString()
getChildElements in class Segmentnull.Element.getParentElement(),
Element.getChildElements(),
Element.getDepth()public SourceFormatter getSourceFormatter()
The output format can be configured by setting any number of properties on the returned SourceFormatter instance before
obtaining its output.
To create a SourceFormatter instance based on a Segment rather than an entire Source document,
use new SourceFormatter(segment) instead.
SourceFormatter based on this source document.public java.util.List findAllTags()
Calling this method on the Source object performs a full sequential parse automatically.
See the Tag class documentation for more details about the behaviour of this method.
findAllTags in class Segmentpublic java.util.List findAllStartTags()
Calling this method on the Source object performs a full sequential parse automatically.
See the Tag class documentation for more details about the behaviour of this method.
findAllStartTags in class Segmentpublic java.util.List findAllElements()
Calling this method on the Source object performs a full sequential parse automatically.
The elements returned correspond exactly with the start tags returned in the findAllStartTags() method.
findAllElements in class Segmentpublic Element getElementById(java.lang.String id)
Element with the specified id attribute value.
This simulates the script method
getElementById
defined in DOM HTML level 1.
This is equivalent to findNextStartTag(0,"id",id,true).getElement(), assuming that the element exists.
A well formed HTML document should have no more than one element with any given id attribute value.
id - the id attribute value (case sensitive) to search for, must not be null.
Element with the specified id attribute value, or null if no such element exists.public final Tag getTagAt(int pos)
Tag at the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
This method also returns unregistered tags.
pos - the position in the source document, may be out of bounds.
Tag at the specified position in the source document, or null if no tag exists at the specified position or it is out of bounds.public Tag findPreviousTag(int pos)
Tag beginning at or immediately preceding (or enclosing) the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.
Tag beginning at or immediately preceding the specified position in the source document, or null if none exists or the specified position is out of bounds.
public Tag findPreviousTag(int pos,
TagType tagType)
Tag of the specified type beginning at or immediately preceding (or enclosing) the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.tagType - the TagType to search for.
Tag with the specified type beginning at or immediately preceding the specified position in the source document, or null if none exists or the specified position is out of bounds.public Tag findNextTag(int pos)
Tag beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
Use Tag.findNextTag() to find the tag immediately following another tag.
pos - the position in the source document from which to start the search, may be out of bounds.
Tag beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.
public Tag findNextTag(int pos,
TagType tagType)
Tag of the specified type beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.tagType - the TagType to search for.
Tag with the specified type beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.public Tag findEnclosingTag(int pos)
Tag that encloses the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document, may be out of bounds.
Tag that encloses the specified position in the source document, or null if the position is not within a tag or is out of bounds.
public Tag findEnclosingTag(int pos,
TagType tagType)
Tag of the specified type that encloses the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document, may be out of bounds.tagType - the TagType to search for.
Tag of the specified type that encloses the specified position in the source document, or null if the position is not within a tag of the specified type or is out of bounds.public Element findNextElement(int pos)
Element beginning at or immediately following the specified position in the source document.
This is equivalent to findNextStartTag(pos).getElement(),
assuming the result is not null.
pos - the position in the source document from which to start the search, may be out of bounds.
Element beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.
public Element findNextElement(int pos,
java.lang.String name)
Element with the specified name beginning at or immediately following the specified position in the source document.
This is equivalent to findNextStartTag(pos,name).getElement(),
assuming the result is not null.
Specifying a null argument to the name parameter is equivalent to
findNextElement(pos).
Specifying an argument to the name parameter that ends in a colon (:) searches for all elements
in the specified XML namespace.
This method also returns elements consisting of unregistered tags if the specified name is not a valid XML tag name.
pos - the position in the source document from which to start the search, may be out of bounds.name - the name of the element to search for.
Element with the specified name beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.
public Element findNextElement(int pos,
java.lang.String attributeName,
java.lang.String value,
boolean valueCaseSensitive)
Element with the specified attribute name/value pair beginning at or immediately following the specified position in the source document.
This is equivalent to findNextStartTag(pos,attributeName,value,valueCaseSensitive).getElement(),
assuming the result is not null.
pos - the position in the source document from which to start the search, may be out of bounds.attributeName - the attribute name (case insensitive) to search for, must not be null.value - the value of the specified attribute to search for, must not be null.valueCaseSensitive - specifies whether the attribute value matching is case sensitive.
Element with the specified attribute name/value pair beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.public StartTag findPreviousStartTag(int pos)
StartTag at or immediately preceding (or enclosing) the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.
StartTag at or immediately preceding the specified position in the source document, or null if none exists or the specified position is out of bounds.
public StartTag findPreviousStartTag(int pos,
java.lang.String name)
StartTag with the specified name at or immediately preceding (or enclosing) the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
Specifying a null argument to the name parameter is equivalent to
findPreviousStartTag(pos).
This method also returns unregistered tags if the specified name is not a valid XML tag name.
pos - the position in the source document from which to start the search, may be out of bounds.name - the name of the start tag to search for.
StartTag with the specified name at or immediately preceding the specified position in the source document, or null if none exists or the specified position is out of bounds.public StartTag findNextStartTag(int pos)
StartTag beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.
StartTag beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.
public StartTag findNextStartTag(int pos,
java.lang.String name)
StartTag with the specified name beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
Specifying a null argument to the name parameter is equivalent to
findNextStartTag(pos).
Specifying an argument to the name parameter that ends in a colon (:) searches for all start tags
in the specified XML namespace.
This method also returns unregistered tags if the specified name is not a valid XML tag name.
pos - the position in the source document from which to start the search, may be out of bounds.name - the name of the start tag to search for.
StartTag with the specified name beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.
public StartTag findNextStartTag(int pos,
java.lang.String attributeName,
java.lang.String value,
boolean valueCaseSensitive)
StartTag with the specified attribute name/value pair beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.attributeName - the attribute name (case insensitive) to search for, must not be null.value - the value of the specified attribute to search for, must not be null.valueCaseSensitive - specifies whether the attribute value matching is case sensitive.
StartTag with the specified attribute name/value pair beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.public EndTag findPreviousEndTag(int pos)
EndTag beginning at or immediately preceding the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.
EndTag beginning at or immediately preceding the specified position in the source document, or null if none exists or the specified position is out of bounds.
public EndTag findPreviousEndTag(int pos,
java.lang.String name)
EndTag with the specified name at or immediately preceding (or enclosing) the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.name - the name of the end tag to search for, must not be null.
EndTag with the specified name at or immediately preceding (or enclosing) the specified position in the source document, or null if none exists or the specified position is out of bounds.public EndTag findNextEndTag(int pos)
EndTag beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.
EndTag beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.
public EndTag findNextEndTag(int pos,
java.lang.String name)
EndTag with the specified name beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.name - the name of the end tag to search for, must not be null.
EndTag with the specified name beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.
public EndTag findNextEndTag(int pos,
java.lang.String name,
EndTagType endTagType)
EndTag with the specified name and type beginning at or immediately following the specified position in the source document.
See the Tag class documentation for more details about the behaviour of this method.
pos - the position in the source document from which to start the search, may be out of bounds.name - the name of the end tag to search for, must not be null.endTagType - the type of the end tag to search for, must not be null.
EndTag with the specified name and type beginning at or immediately following the specified position in the source document, or null if none exists or the specified position is out of bounds.public Element findEnclosingElement(int pos)