Class UnicodeString

    • Constructor Detail

      • UnicodeString

        public UnicodeString()
    • Method Detail

      • makeUnicodeString

        public static UnicodeString makeUnicodeString​(CharSequence in)
        Make a UnicodeString for a given CharSequence
        Parameters:
        in - the input CharSequence
        Returns:
        a UnicodeString using an appropriate implementation class
      • makeUnicodeString

        public static UnicodeString makeUnicodeString​(int[] in)
        Make a UnicodeString for a given array of codepoints
        Parameters:
        in - the input CharSequence
        Returns:
        a UnicodeString using an appropriate implementation class
      • containsSurrogatePairs

        public static boolean containsSurrogatePairs​(CharSequence value)
        Test whether a CharSequence contains Unicode codepoints outside the BMP range
        Parameters:
        value - the string to be tested
        Returns:
        true if the string contains non-BMP codepoints
      • uSubstring

        public abstract UnicodeString uSubstring​(int beginIndex,
                                                 int endIndex)
        Get a substring of this string
        Parameters:
        beginIndex - the index of the first character to be included (counting codepoints, not 16-bit characters)
        endIndex - the index of the first character to be NOT included (counting codepoints, not 16-bit characters)
        Returns:
        a substring
        Throws:
        IndexOutOfBoundsException - if the selection goes off the start or end of the string (this function follows the semantics of String.substring(), not the XPath semantics)
      • uIndexOf

        public abstract int uIndexOf​(int search,
                                     int start)
        Get the first match for a given character
        Parameters:
        search - the character to look for
        start - the first position to look
        Returns:
        the position of the first occurrence of the sought character, or -1 if not found
      • uCharAt

        public abstract int uCharAt​(int pos)
        Get the character at a specified position
        Parameters:
        pos - the index of the required character (counting codepoints, not 16-bit characters)
        Returns:
        a character (Unicode codepoint) at the specified position.
      • uLength

        public abstract int uLength()
        Get the length of the string, in Unicode codepoints
        Returns:
        the number of codepoints in the string
      • isEnd

        public abstract boolean isEnd​(int pos)
        Ask whether a given position is at (or beyond) the end of the string
        Parameters:
        pos - the index of the required character (counting codepoints, not 16-bit characters)
        Returns:
        true iff if the specified index is after the end of the character stream
      • hashCode

        public int hashCode()
        Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
        Overrides:
        hashCode in class Object
        Returns:
        a hashCode that distinguishes this UnicodeString from others
      • equals

        public boolean equals​(Object obj)
        Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
        Overrides:
        equals in class Object
        Parameters:
        obj - the object to be compared
        Returns:
        true if obj is a UnicodeString containing the same codepoints
      • compareTo

        public int compareTo​(UnicodeString other)
        Compare two unicode strings in codepoint collating sequence
        Specified by:
        compareTo in interface Comparable<UnicodeString>
        Parameters:
        other - the object to be compared
        Returns:
        less than 0, 0, or greater than 0 depending on the ordering of the two strings
      • asAtomic

        public AtomicValue asAtomic()
        Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.
        Specified by:
        asAtomic in interface AtomicMatchKey
        Returns:
        an atomic value that encapsulates this match key