QChar Class

The QChar class provides a 16-bit Unicode character. More...

Header: #include <QChar>
qmake: QT += core

Note: All functions in this class are reentrant.

Public Types

enum Category { Mark_NonSpacing, Mark_SpacingCombining, Mark_Enclosing, Number_DecimalDigit, Number_Letter, …, Symbol_Other }
enum Decomposition { NoDecomposition, Canonical, Circle, Compat, Final, …, Wide }
enum Direction { DirAL, DirAN, DirB, DirBN, DirCS, …, DirWS }
enum JoiningType { Joining_None, Joining_Causing, Joining_Dual, Joining_Right, Joining_Left, Joining_Transparent }
enum Script { Script_Unknown, Script_Inherited, Script_Common, Script_Adlam, Script_Ahom, …, Script_ZanabazarSquare }
enum SpecialCharacter { Null, Tabulation, LineFeed, FormFeed, CarriageReturn, …, LastValidCodePoint }
enum UnicodeVersion { Unicode_1_1, Unicode_2_0, Unicode_2_1_2, Unicode_3_0, Unicode_3_1, …, Unicode_Unassigned }

Detailed Description

In Qt, Unicode characters are 16-bit entities without any markup or structure. This class represents such an entity. It is lightweight, so it can be used everywhere. Most compilers treat it like an unsigned short.

QChar provides a full complement of testing/classification functions, converting to and from other formats, converting from composed to decomposed Unicode, and trying to compare and case-convert if you ask it to.

The classification functions include functions like those in the standard C++ header <cctype> (formerly <ctype.h>), but operating on the full range of Unicode characters, not just for the ASCII range. They all return true if the character is a certain type of character; otherwise they return false. These classification functions are isNull() (returns true if the character is '\0'), isPrint() (true if the character is any sort of printable character, including whitespace), isPunct() (any sort of punctation), isMark() (Unicode Mark), isLetter() (a letter), isNumber() (any sort of numeric character, not just 0-9), isLetterOrNumber(), and isDigit() (decimal digits). All of these are wrappers around category() which return the Unicode-defined category of each character. Some of these also calculate the derived properties (for example isSpace() returns true if the character is of category Separator_* or an exceptional code point from Other_Control category).

QChar also provides direction(), which indicates the "natural" writing direction of this character. The joiningType() function indicates how the character joins with it's neighbors (needed mostly for Arabic or Syriac) and finally hasMirrored(), which indicates whether the character needs to be mirrored when it is printed in it's "unnatural" writing direction.

Composed Unicode characters (like ring) can be converted to decomposed Unicode ("a" followed by "ring above") by using decomposition().

In Unicode, comparison is not necessarily possible and case conversion is very difficult at best. Unicode, covering the "entire" world, also includes most of the world's case and sorting problems. operator==() and friends will do comparison based purely on the numeric Unicode value (code point) of the characters, and toUpper() and toLower() will do case changes when the character has a well-defined uppercase/lowercase equivalent. For locale-dependent comparisons, use QString::localeAwareCompare().

The conversion functions include unicode() (to a scalar), toLatin1() (to scalar, but converts all non-Latin-1 characters to 0), row() (gives the Unicode row), cell() (gives the Unicode cell), digitValue() (gives the integer value of any of the numerous digit characters), and a host of constructors.

QChar provides constructors and cast operators that make it easy to convert to and from traditional 8-bit chars. If you defined QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII, as explained in the QString documentation, you will need to explicitly call fromLatin1(), or use QLatin1Char, to construct a QChar from an 8-bit char, and you will need to call toLatin1() to get the 8-bit value back.

For more information see "About the Unicode Character Database".

See also Unicode, QString, and QLatin1Char.

Member Type Documentation

enum QChar::Category

This enum maps the Unicode character categories.

The following characters are normative in Unicode:

ConstantValueDescription
QChar::Mark_NonSpacing0Unicode class name Mn
QChar::Mark_SpacingCombining1Unicode class name Mc
QChar::Mark_Enclosing2Unicode class name Me
QChar::Number_DecimalDigit3Unicode class name Nd
QChar::Number_Letter4Unicode class name Nl
QChar::Number_Other5Unicode class name No
QChar::Separator_Space6Unicode class name Zs
QChar::Separator_Line7Unicode class name Zl
QChar::Separator_Paragraph8Unicode class name Zp
QChar::Other_Control9Unicode class name Cc
QChar::Other_Format10Unicode class name Cf
QChar::Other_Surrogate11Unicode class name Cs
QChar::Other_PrivateUse12Unicode class name Co
QChar::Other_NotAssigned13Unicode class name Cn

The following categories are informative in Unicode:

ConstantValueDescription
QChar::Letter_Uppercase14Unicode class name Lu
QChar::Letter_Lowercase15Unicode class name Ll
QChar::Letter_Titlecase16Unicode class name Lt
QChar::Letter_Modifier17Unicode class name Lm
QChar::Letter_Other18Unicode class name Lo
QChar::Punctuation_Connector19Unicode class name Pc
QChar::Punctuation_Dash20Unicode class name Pd
QChar::Punctuation_Open21Unicode class name Ps
QChar::Punctuation_Close22Unicode class name Pe
QChar::Punctuation_InitialQuote23Unicode class name Pi
QChar::Punctuation_FinalQuote24Unicode class name Pf
QChar::Punctuation_Other25Unicode class name Po
QChar::Symbol_Math26Unicode class name Sm
QChar::Symbol_Currency27Unicode class name Sc
QChar::Symbol_Modifier28Unicode class name Sk
QChar::Symbol_Other29Unicode class name So

See also category().

enum QChar::Decomposition

This enum type defines the Unicode decomposition attributes. See the Unicode Standard for a description of the values.

ConstantValue
QChar::NoDecomposition0
QChar::Canonical1
QChar::Circle8
QChar::Compat16
QChar::Final6
QChar::Font2
QChar::Fraction17
QChar::Initial4
QChar::Isolated7
QChar::Medial5
QChar::Narrow13
QChar::NoBreak3
QChar::Small14
QChar::Square15
QChar::Sub10
QChar::Super9
QChar::Vertical11
QChar::Wide12

See also decomposition().

enum QChar::Direction

This enum type defines the Unicode direction attributes. See the Unicode Standard for a description of the values.

In order to conform to C/C++ naming conventions "Dir" is prepended to the codes used in the Unicode Standard.

ConstantValueDescription
QChar::DirAL13 
QChar::DirAN5 
QChar::DirB7 
QChar::DirBN18 
QChar::DirCS6 
QChar::DirEN2 
QChar::DirES3 
QChar::DirET4 
QChar::DirFSI21Since Qt 5.3
QChar::DirL0 
QChar::DirLRE11 
QChar::DirLRI19Since Qt 5.3
QChar::DirLRO12 
QChar::DirNSM17 
QChar::DirON10 
QChar::DirPDF16 
QChar::DirPDI22Since Qt 5.3
QChar::DirR1 
QChar::DirRLE14 
QChar::DirRLI20Since Qt 5.3
QChar::DirRLO15 
QChar::DirS8 
QChar::DirWS9 

See also direction().

enum QChar::JoiningType

since 5.3

This enum type defines the Unicode joining type attributes. See the Unicode Standard for a description of the values.

In order to conform to C/C++ naming conventions "Joining_" is prepended to the codes used in the Unicode Standard.

ConstantValue
QChar::Joining_None0
QChar::Joining_Causing1
QChar::Joining_Dual2
QChar::Joining_Right3
QChar::Joining_Left4
QChar::Joining_Transparent5

See also joiningType().

enum QChar::Script

This enum type defines the Unicode script property values.

For details about the Unicode script property values see Unicode Standard Annex #24.

In order to conform to C/C++ naming conventions "Script_" is prepended to the codes used in the Unicode Standard.

ConstantValueDescription
QChar::Script_Unknown0For unassigned, private-use, noncharacter, and surrogate code points.
QChar::Script_Inherited1For characters that may be used with multiple scripts and that inherit their script from the preceding characters. These include nonspacing marks, enclosing marks, and zero width joiner/non-joiner characters.
QChar::Script_Common2For characters that may be used with multiple scripts and that do not inherit their script from the preceding characters.
QChar::Script_Adlam132Since Qt 5.11
QChar::Script_Ahom126Since Qt 5.6
QChar::Script_AnatolianHieroglyphs127Since Qt 5.6
QChar::Script_Arabic8 
QChar::Script_Armenian6 
QChar::Script_Avestan80 
QChar::Script_Balinese62 
QChar::Script_Bamum84 
QChar::Script_BassaVah104Since Qt 5.5
QChar::Script_Batak93 
QChar::Script_Bengali12 
QChar::Script_Bhaiksuki133Since Qt 5.11
QChar::Script_Bopomofo36 
QChar::Script_Brahmi94 
QChar::Script_Braille54 
QChar::Script_Buginese55 
QChar::Script_Buhid44 
QChar::Script_CanadianAboriginal29 
QChar::Script_Carian75 
QChar::Script_CaucasianAlbanian103Since Qt 5.5
QChar::Script_Chakma96 
QChar::Script_Cham77 
QChar::Script_Cherokee28 
QChar::Script_Chorasmian153Since Qt 5.15
QChar::Script_Coptic46 
QChar::Script_Cuneiform63 
QChar::Script_Cypriot53 
QChar::Script_Cyrillic5 
QChar::Script_Deseret41 
QChar::Script_Devanagari11 
QChar::Script_DivesAkuru154Since Qt 5.15
QChar::Script_Dogra142Since Qt 5.15
QChar::Script_Duployan105Since Qt 5.5
QChar::Script_EgyptianHieroglyphs81 
QChar::Script_Elbasan106Since Qt 5.5
QChar::Script_Elymaic149Since Qt 5.15
QChar::Script_Ethiopic27 
QChar::Script_Georgian25 
QChar::Script_Glagolitic57 
QChar::Script_Gothic40 
QChar::Script_Grantha107Since Qt 5.5
QChar::Script_Greek4 
QChar::Script_Gujarati14 
QChar::Script_GunjalaGondi143Since Qt 5.15
QChar::Script_Gurmukhi13 
QChar::Script_Han37 
QChar::Script_Hangul26 
QChar::Script_HanifiRohingya144Since Qt 5.15
QChar::Script_Hanunoo43 
QChar::Script_Hatran128Since Qt 5.6
QChar::Script_Hebrew7 
QChar::Script_Hiragana34 
QChar::Script_ImperialAramaic87 
QChar::Script_InscriptionalPahlavi90 
QChar::Script_InscriptionalParthian89 
QChar::Script_Javanese85 
QChar::Script_Kaithi92 
QChar::Script_Kannada18 
QChar::Script_Katakana35 
QChar::Script_KayahLi72 
QChar::Script_Kharoshthi61 
QChar::Script_KhitanSmallScript155Since Qt 5.15
QChar::Script_Khmer32 
QChar::Script_Khojki109Since Qt 5.5
QChar::Script_Khudawadi123Since Qt 5.5
QChar::Script_Lao22 
QChar::Script_Latin3 
QChar::Script_Lepcha68 
QChar::Script_Limbu47 
QChar::Script_LinearA110Since Qt 5.5
QChar::Script_LinearB49 
QChar::Script_Lisu83 
QChar::Script_Lycian74 
QChar::Script_Lydian76 
QChar::Script_Mahajani111Since Qt 5.5
QChar::Script_Makasar145Since Qt 5.15
QChar::Script_Malayalam19 
QChar::Script_Mandaic95 
QChar::Script_Manichaean112Since Qt 5.5
QChar::Script_Marchen134Since Qt 5.11
QChar::Script_MasaramGondi138Since Qt 5.11
QChar::Script_Medefaidrin146Since Qt 5.15
QChar::Script_MeeteiMayek86 
QChar::Script_MendeKikakui113Since Qt 5.5
QChar::Script_MeroiticCursive97 
QChar::Script_MeroiticHieroglyphs98 
QChar::Script_Miao99 
QChar::Script_Modi114Since Qt 5.5
QChar::Script_Mongolian33 
QChar::Script_Mro115Since Qt 5.5
QChar::Script_Multani129Since Qt 5.6
QChar::Script_Myanmar24 
QChar::Script_Nabataean117Since Qt 5.5
QChar::Script_Nandinagari150Since Qt 5.15
QChar::Script_Newa135Since Qt 5.11
QChar::Script_NewTaiLue56 
QChar::Script_Nko66 
QChar::Script_Nushu139Since Qt 5.11
QChar::Script_NyiakengPuachueHmong151Since Qt 5.15
QChar::Script_Ogham30 
QChar::Script_OlChiki69 
QChar::Script_OldHungarian130Since Qt 5.6
QChar::Script_OldItalic39 
QChar::Script_OldNorthArabian116Since Qt 5.5
QChar::Script_OldPermic120Since Qt 5.5
QChar::Script_OldPersian60 
QChar::Script_OldSogdian147Since Qt 5.15
QChar::Script_OldSouthArabian88 
QChar::Script_OldTurkic91 
QChar::Script_Oriya15 
QChar::Script_Osage136Since Qt 5.11
QChar::Script_Osmanya52 
QChar::Script_PahawhHmong108Since Qt 5.5
QChar::Script_Palmyrene118Since Qt 5.5
QChar::Script_PauCinHau119Since Qt 5.5
QChar::Script_PhagsPa65 
QChar::Script_Phoenician64 
QChar::Script_PsalterPahlavi121Since Qt 5.5
QChar::Script_Rejang73 
QChar::Script_Runic31 
QChar::Script_Samaritan82 
QChar::Script_Saurashtra71 
QChar::Script_Sharada100 
QChar::Script_Shavian51 
QChar::Script_Siddham122Since Qt 5.5
QChar::Script_SignWriting131Since Qt 5.6
QChar::Script_Sinhala20 
QChar::Script_Sogdian148Since Qt 5.15
QChar::Script_SoraSompeng101 
QChar::Script_Soyombo140Since Qt 5.11
QChar::Script_Sundanese67 
QChar::Script_SylotiNagri59 
QChar::Script_Syriac9 
QChar::Script_Tagalog42 
QChar::Script_Tagbanwa45 
QChar::Script_TaiLe48 
QChar::Script_TaiTham78 
QChar::Script_TaiViet79 
QChar::Script_Takri102 
QChar::Script_Tamil16 
QChar::Script_Tangut137Since Qt 5.11
QChar::Script_Telugu17 
QChar::Script_Thaana10 
QChar::Script_Thai21 
QChar::Script_Tibetan23 
QChar::Script_Tifinagh58 
QChar::Script_Tirhuta124Since Qt 5.5
QChar::Script_Ugaritic50 
QChar::Script_Vai70 
QChar::Script_Wancho152Since Qt 5.15
QChar::Script_WarangCiti125Since Qt 5.5
QChar::Script_Yezidi156Since Qt 5.15
QChar::Script_Yi38 
QChar::Script_ZanabazarSquare141Since Qt 5.11

This enum was introduced or modified in Qt 5.1.

See also script().

enum QChar::SpecialCharacter

ConstantValueDescription
QChar::Null0x0000A QChar with this value isNull().
QChar::Tabulation0x0009Character tabulation.
QChar::LineFeed0x000a 
QChar::FormFeed0x000c 
QChar::CarriageReturn0x000d 
QChar::Space0x0020 
QChar::Nbsp0x00a0Non-breaking space.
QChar::SoftHyphen0x00ad 
QChar::ReplacementCharacter0xfffdThe character shown when a font has no glyph for a certain codepoint. A special question mark character is often used. Codecs use this codepoint when input data cannot be represented in Unicode.
QChar::ObjectReplacementCharacter0xfffcUsed to represent an object such as an image when such objects cannot be presented.
QChar::ByteOrderMark0xfeff 
QChar::ByteOrderSwapped0xfffe 
QChar::ParagraphSeparator0x2029 
QChar::LineSeparator0x2028 
QChar::LastValidCodePoint0x10ffff 

enum QChar::UnicodeVersion

Specifies which version of the Unicode standard introduced a certain character.

ConstantValueDescription
QChar::Unicode_1_11Version 1.1
QChar::Unicode_2_02Version 2.0
QChar::Unicode_2_1_23Version 2.1.2
QChar::Unicode_3_04Version 3.0
QChar::Unicode_3_15Version 3.1
QChar::Unicode_3_26Version 3.2
QChar::Unicode_4_07Version 4.0
QChar::Unicode_4_18Version 4.1
QChar::Unicode_5_09Version 5.0
QChar::Unicode_5_110Version 5.1
QChar::Unicode_5_211Version 5.2
QChar::Unicode_6_012Version 6.0
QChar::Unicode_6_113Version 6.1
QChar::Unicode_6_214Version 6.2
QChar::Unicode_6_315Version 6.3 Since Qt 5.3
QChar::Unicode_7_016Version 7.0 Since Qt 5.5
QChar::Unicode_8_017Version 8.0 Since Qt 5.6
QChar::Unicode_9_018Version 9.0 Since Qt 5.11
QChar::Unicode_10_019Version 10.0 Since Qt 5.11
QChar::Unicode_11_020Version 11.0 Since Qt 5.15
QChar::Unicode_12_021Version 12.0 Since Qt 5.15
QChar::Unicode_12_122Version 12.1 Since Qt 5.15
QChar::Unicode_13_023Version 13.0 Since Qt 5.15
QChar::Unicode_Unassigned0The value is not assigned to any character in version 8.0 of Unicode.

See also unicodeVersion() and currentUnicodeVersion().