HeaderDoc::BlockParse

Declared In:

Introduction

Core parser routines and parser interfaces

Discussion

The BlockParse package is a group of functions that are used for parsing declarations in every supported language except Python. (Support functions in this package are used when parsing Python, but the actual parsing of Python declarations happens in the PythonParse package.)

The main entry points are blockParse (used for parsing a declaration and returning information about what was parsed) and blockParseOutside (used for taking both a declaration and a HeaderDoc comment and reconciling them into a HeaderDoc object (descended from HeaderElement).

Other important functions are cpp_add (adds a C preprocessor macro from a parse tree), cpp_add_string (adds a C preprocessor macro from a string), and blockParseReturnState (used for handling APIs inside classes — interprets a ParserState object hidden away inside the parse tree for the class, returning the same results that blockParse would have returned had been called on the individual declaration).



Member Functions

blockParse

The core of HeaderDoc's parse engine.

blockParseOutside

The outer block parser

blockParseReturnState

The magic box.

bracematching

Returns the closing token to match a given opening token.

buildCommentFromFields

Reconstructs a HeaderDoc comment from a field list.

changeAll

Changes an array of TypeHelper objects.

changeAllMatching

Changes matching members of an array of TypeHelper objects.

configureAccessControlStateForClass

Configures the access control state and optional/required state for methods and variables within a class based on the current language and class type.

cpp_add

Adds a C preprocessor macro to the parser.

cpp_add_cl

Adds C preprocessor macro passed in with the -D flag on the command line.

cpp_add_string

Adds a C preprocessor macro to the parser.

cpp_argparse

Parses C preprocessor arguments.

cpp_preprocess

Performs C preprocessing on a single token.

cpp_remove

Removes a token from the C preprocessor macros list.

cpp_subparse

Used by cpp_argparse to recursively perform preprocessing on tokens within the actual arguments to a macro.

cppHashMerge

Merges CPP hashes and CPP argument hashes based on interpreting a stack of #if ... #else ... #elif ... #endif directives.

cppsupers

Scrapes the C++ superclass information from a declaration.

decomment

Strips comments out of a return type declaration.

defParmParse

Parses #define arguments.

empty_comment

Returns true if a field set is effectively empty.

findMatch

Searches an array of TypeHelper objects for a matching name.

getAndClearCPPHash

Returns the current C preprocessor hash tables and wipes them clean for the next header.

getLangAndSublangFromClassType

Returns the new language and language dialect based on the token that began a class declaration.

ignore

Returns whether a token should be ignored.

macroRegexpFromList

Returns a regular expression for searching for macro tokens derived from a hash table.

mergeComplexAvailability

Merges availability from multiple sources.

nameObjDump

Dumps an array of TypeHelper objects for debugging purposes.

nspaces

A legacy piece of code that generates spaces for the raw declaration.

objForType

Returns a HeaderDoc object (Var, Enum, Typedef, CPPClass, etc.) for a given set of type information.

objlink

Creates "see also" references between related APIs.

pbs

A piece of debug code that prints the brace stack.

peekmatch

Returns the closing token that matches the token at the top of the brace stack.

setCPPHashes

Sets a new CPP hash and CPP argument hash in place of the existing one.

spacefix

A legacy piece of code that adjusts spacing in the raw declaration.


blockParse


The core of HeaderDoc's parse engine.

Parameters
fullpath

The path to the file being parsed.

fileoffset

The line number where the current block begins. The line number printed is (fileoffset + inputCounter).

inputLinesRef

A reference to an array of code lines.

inputCounter

The offset within the array. This is added to fileoffset when printing the line number.

argparse

Set to 1 for parsing function arguments, enum constants, or struct fields, 2 for reparsing embedded HeaderDoc markup in a class, 0 otherwise.

This has the following effects:

  • Disables warnings when parsing arguments to avoid seeing them twice.

  • Disables C preprocessing (to avoid double-replacement).

  • Sets $parseTokens{assignmentwithcolon} = 2 in AppleScript.

  • Disables the handling of the of and in tokens and label keywords in AppleScript.

  • Forces the block parser to return only the outer name for a type (a la $HeaderDoc::outerNamesOnly) if argpase is 2.

ignoreref

A reference to a hash of tokens to ignore on all headers.

perheaderignoreref

A reference to a hash of tokens, generated from @ignore headerdoc comments.

perheaderignorefuncmacrosref

A reference to a hash of tokens, generated from @ignorefunmacro headerdoc comments.

keywordhashref

A reference to a hash of keywords.

case_sensitive

Boolean value that controls whether keywords should be processed in a case-sensitive fashion.

lang

The language family to use in parsing. Overrides HeaderDoc::lang.

sublang

The language variant to use in parsing. Overrides HeaderDoc::sublang.

Return Value

Returns the array ($inputCounter, $declaration, $typelist, $namelist, $posstypes, $value, @pplStack, $returntype, $privateDeclaration, $treeTop, $simpleTDcontents, $availability) to the caller.

Discussion

Most of the variables used by this parser are things that are used for determining what type of declaration we just parsed. Such variables are stored as keys in the $parserState variable. For more information about these variables, see the documentation for the ParserState class.

This parser consists of three parsers running in parallel:

  • The namePending parser — looks for names a certain number of non-keyword tokens after keyword tokens like struct. Used mainly for data structures.

  • The startOfDec parser — looks for names based on the number of tokens since the start of the declaration (SOD/SODEC). Used for functions, etc.

  • The parameter list parser.

  • The callback name parser — uses parameter list parse results.

Local Variables
External variables
HeaderDoc::parseIfElse

Enables parsing of if/else statements. Not used by HeaderDoc; used by other tools that share this parser.

HeaderDoc::fileDebug

Set to 1 by the outer layers when the filename matches a particular filename. This is useful when you need to enable lots of debugging for a single file. When 1, enables lots of debugging.

HeaderDoc::lang

The programming language being parsed. This is deprecated, and is used only if you do not pass in a value for the lang parameter.

HeaderDoc::sublang

The programming language dialect being parsed (e.g. cpp for C++). This is deprecated, and is used only if you do not pass in a value for the sublang parameter.

HeaderDoc::AccessControlState

The current access control state (public, private, protected, etc.). When a permanent access control change (with a colon after it) occurs, this global variable is modified. After a declaration, the temporary (per-declaration) access control state is restored from this variable.

HeaderDoc::parsing_man_pages

Set to 1 if (in C) you want a function declaration to end after the closing parenthesis even if there is no trailing semicolon. Do NOT set this for normal parsing; it will break many typedef declarations and similar. This also enables some broken man page detection for deleting lines that say or and and.

Key parser state variables
continue

Indicates that parsing should continue. Upon receiving a terminating token, this gets set to zero, and parsing ends at the end of the line.

continue_no_return

This gets high when we see an opening brace at the start of parsing. If the parser returned, you would get a bogus declaration, so instead, the parser reboots itself, starting parsing from scratch at the next line.

lang

The programming language being parsed. Set from HeaderDoc::lang.

sublang

The programming language dialect being parsed (e.g. cpp for C++). Set from HeaderDoc::sublang.

callback_typedef_and_name_on_one_line

Legacy formatting cruft variable.

inRegexp

Indicates whether the parser is in a regular expression. Values are:

  • 0 — Not in a regex (or in the tail of a regex).

  • 1 — In the second part of a two-part regexp, or the only part of a one-part regexp.

  • 2 — Between the two parts; only occurs if the separator is neither '|' nor '/'. Otherwise, this state gets skipped.

  • 3 — In the first part of a regular expression after the first separator.

  • 4 — Before the first separator. This state ends instantly unless there is a prefix.

inRegexpFirstPart

When parsing regular expressions, the contents of the right side are largely unparsed (no parenthesis or bracket interpolation, for example). Thus, it is important to know whether you are in the left side or the right side during parsing. Unfortunately, the inRegexp variable only indicates how many pieces remain in the regexp. Although this is vital information, it is insufficient for this purpose. For a single-part regexp, you would have to look for 1, but for a two-part regexp, a 1 would indicate the last part instead of the first. This variable solves that problem.

Values are:

  • 0 — Not in the first part of a regular expression.

  • 1 — In the first part of a regular expression.

  • 2 — Before the first part of a regular expression.

The value is 2 up to and including the leading symbol (e.g. /). It goes to zero upon reaching the symbol that terminates the first part of the regular expression (e.g. /).

inRegexpCharClass

In a regular expression character class, the first character behaves differently; a closing bracket as the first character in a character class is treated as a literal. (For example, []] is a character class containing only a close bracket.) To support this, the inRegexpCharClass has several values:

  • 0 — Not in a character class.

  • 1 — In a chracter class (not at the beginning).

  • 2 — The first character of a character class. (Reduced to 1 at end of token loop.)

  • 3 — Just saw the opening bracket. (Reduced to 2 at end of token loop.)

  • 4 — In a nested character class. (Reduced to 1 after closing :] mark.)

  • 5 — In a nested character class after possible trailing colon. (Reduced to 1 if next character is a right bracket.)

  • 6 — In a nested character class at possible trailing colon. (Reduced to 5 at end of token loop.)

regexpNoInterpolate

Certain regular expression commands don't result in any parsing within them (e.g. the tr command). If set, this is equivalent to setting inRegexpFirstPart to 0.

leavingRegexp

In the trailing part of a regular expression.

inParen

Indicates the number of levels of nested parentheses the current token is within.

inPType

Indicates that the parser is currently processing a Pascal type declaration.

ppSkipOneToken

Used to tell the parameter parsing code to skip the end-of-comment character. (The value of inComment (in the parserState object) goes to 0 before that code, so without this, it would end up at the start of the next parameter.)

asConcat

In AppleScript parsing, set to 1 when a vertical pipe operator (|) is encountered to protect an identifier. Set to 0 when the next vertical pipe operator is encountered.

Parameter parsing
inPrivateParamTypes

In the cruft after a colon in a C++ method declaration.

Token variables
curline

The (input) line being parsed.

part

The current token being processed (from curline).

nextpart

The token after the token being processed (from curline).

treepart

In some cases, it is necessary to drop a token for formatting purposes but keep it in the parse tree. When this is needed, the treepart variable contains the original token, and the part variable contains a placeholder value (generally a space).

lastchar

This variable is rather odd. The last token in this string is the last character, but it may contain multiple characters. This should probably not be used in the parser, but it is used in a few spots.

lastnspart

The last non-space token encountered.

lasttoken

The last token encountered (though newlines and carriage returns may be replaced by a space).

Parser states and parser state insertion
parserState

The ParserState object used for storing most of the parser state variables.

sethollow

This variable is normally 0. It gets set to 1 to tell the hollow insertion code (at the bottom of the token loop) to set the value of the hollow variable (in the parserState object) to the tree node for the current token (which has not been created yet at the time this variable gets set).

hollowskip

Indicates that in spite of sethollow being set to 1, the current node is a bad place to insert the parser state because it is one of the access control tokens (e.g. public/private) or because it isn't really being inserted into the tree.

pushParserStateAfterToken

Normally 0. Set to 1 if the parser state should be pushed onto the stack after this token.

pushParserStateAfterWordToken

Normally 0. Set to 1 if the parser state should be pushed onto the stack after the next word token. May also be set to 2 if the parser state should be pushed at the word token after the next word token.

pushParserStateAtBrace

Normally 0. Set to 1 if the parser state should be pushed onto the stack after the next opening brace.

occPushParserStateOnWordTokenAfterNext

Normally 0. The name of this variable is slightly misleading. When used, the variable is initially set to 2. On the next word token (and only a word token), this variable is decremented to 1.

At this point, matching behavior changes, and the parser state is pushed at the first token that is either a word token, an at sign (@), a minus sign (-), or a plus sign (+).

Tree management
treeTop

The top of the current parse tree.

treeCur

The current position in the parse tree.

treeNest

Used to control whether the code at the bottom of the token loop should trigger a loop nesting after the current token.

  • 0 — tokens after this one should be siblings of this one.

  • 1 — tokens after this one should be nested as children of this node.

  • 2 — tokens after this one should be nested as children of this node and this node has already been inserted into the tree, so it should not be inserted again at the bottom of the loop.

treeSkip

This gets set to 1 if the current part should not be inserted into the parse tree (generally because it has already been inserted in some form during parsing).

treePopOnNewLine

This indicates that the current position in the parse tree should be popped from the treeStack stack after the next newline character.

trailingHide

Indicates that this is a token that follows a state change to a new state in which the seenBraces flag was previously set, and that this token should be treated as though seenBraces were still set. This flag is only supported in bits of code after where it is first set (in the right closing brace code).

Parser stacks
regexpStack

Stack for regular expression characters.

braceStack

Stack for brace tokens, including the left curly brace, the start-of-template (sotemplate) value, the left square bracket, the left parenthesis and the opening class marker for class markers that aren't followed by a left curly brace (Objective-C @interface, for example).

parsedParamParseStack

A stack containing values from parsedParamParse (in the parserState object). These are pushed and popped on curly braces, parentheses, etc. This is basically used for keeping track of which split character to use as the parser goes into deeper nesting levels (e.g. when dropping into a function pointer/callback inside a struct).

treeStack

A stack of parse trees. These are pushed and popped at various points during the parse process as braces, colons, parentheses, etc. The behavior is controlled by the variables treeNest, treeSkip, treePopTwo (in parserState, and treePopOnNewLine.

Legacy junk variables
prespace

Temporary variable used for leading space during formatting.

prespaceadjust

Temporary variable used for leading space during formatting.

scratch

Temporary storage used during formatting.

curstring

The string currently being parsed. Was at one time used for checking for quoting, but no longer.

continuation

An obscure spacing workaround.

forcenobreak

An obscure spacing workaround.

setNoInsert

When set to a nonzero value, the noInsert variable in the ParseTree object created after the next open curly brace gets set to this value.


blockParseOutside


The outer block parser

Parameters
apiOwner

The API owner object (class, header, etc.) into which new declarations should be inserted.

fullpath

The path to the file being parsed.

The full (possibly relative) path to the current input file.

inFunction

Set to 1 if an @function comment preceded this declaration.

inUnknown

Set to 1 if a new-style comment (with no top-level HeaderDoc tag) preceded this declaration.

inTypedef

Set to 1 if an @typedef comment preceded this declaration.

inStruct

Set to 1 if an @struct comment preceded this declaration.

inEnum

Set to 1 if an @enum comment preceded this declaration.

inUnion

Set to 1 if an @union comment preceded this declaration.

inConstant

Set to 1 if an @constant or @const comment preceded this declaration.

inVar

Set to 1 if an @var comment preceded this declaration.

inMethod

Set to 1 if an @method comment preceded this declaration.

inPDefine

Set to 1 if an @define comment preceded this declaration.

Set to 2 if an @defineblock or @definedblock comment preceded this declaration.

inClass

Set to 1 if an @class comment preceded this declaration.

inInterface

Set to 1 if an @interface comment preceded this declaration.

blockOffset

The line number where the current block begins. The line number printed is (blockOffset + inputCounter).

categoryObjectsref

A reference to the initial array of category (HeaderDoc::ObjCCategory) objects. New category objects are added to this array.

classObjectsref

A reference to the initial array of class (HeaderDoc::CPPClass and HeaderDoc::ObjCClss) objects. New category objects are added to this array.

classType

The class type, based on what class was last parsed. Used when parsing fragments within a class. Legal values are intf, occ, occCat, or any value that is valid for sublang.

This is used to determine whether to treat the @method tag as an Objective-C method (HeaderDoc::Method) or as a normal method (HeaderDoc::Function).

cppAccessControlState

The new access control state (public, private, etc.). It is named cpp because at the time it was naed, the only langauge that required it was C++ (where sublang = "cpp").

fieldsref

An array of fields returned from a call to stringToFields on a HeaderDoc comment.

functionGroup

The function group currently in effect.

headerObject

The header object that will eventually contain any objects produced.

inputCounter

The offset within the array. This is added to blockOffset when printing the line number.

inputlinesref

A reference to an array of code lines.

lang

The language family to use in parsing. Overrides HeaderDoc::lang.

nlines

The number of lines in inputlinesref.

preAtPart

Text before the initial @ in the preceding HeaderDoc comment. Contains the discussion in a new-style comment. Otherwise, contains whitespace.

xml_output

Set to 1 if output should be in XML format, else 0. This sets the outputformat value on new objects.

localDebug

Set to 1 to enable lots of general debug spew.

hangDebug

Set to 1 to enable lots of debug spew specific to tracking down infinite loops.

parmDebug

Set to 1 to enable lots of debug spew specific to parameter handling.

blockDebug

Set to 1 to debug block handling (both define blocks and blocks wrapped in C preprocessor macros).

subparse

Set to 1 to use subparse mode (handling a declaration extracted out of an existing parse tree).

subparseTree

The source parse tree in subparse mode. Ignored otherwise.

nodec

No longer used. Always pass zero.

allow_multi

Pass 1 to allow blocks to be created when a #if statement is found immediately after a HeaderDoc comment. Pass 0 to disable this feature.

subparseCommentTree

Used in block mode because subparseTree is empty by definition when the comment precedes the declaration.

sublang

The language variant to use in parsing. Overrides HeaderDoc::sublang used in previous versions of this function. Optional FOR NOW.

hashtreecur

A HashObject instance that reflects the current position in the CPP hash tree. This is used by the parser to manage the C preprocessor hash tables in the presence of #if directives.

For a detailed explanation, see the documentation for the HashObject class.

Although this is optional, if you don't pass these correctly, you won't get support for #if/#else/#endif blocks.

hashtreeroot

A HashObject instance that represents the root of the CPP hash tree. This is used by the parser to manage the C preprocessor hash tables in the presence of #if directives.

For a detailed explanation, see the documentation for the HashObject class.

Although this is optional, if you don't pass these correctly, you won't get support for #if/#else/#endif blocks.

Return Value

Returns the array ($inputCounter, $cppAccessControlState, $classType, @classObjects, @categoryObjects, $blockOffset, $numcurlybraces, $foundMatch, $lang, $sublang).

inputCounter
The new value for inputCounter, adjusted for the lines that have were parsed.
cppAccessControlState
The new access control state (public, private, etc.)
classType
The new value for class type, based on what class was last parsed. Used when parsing fragments within a class.
classObjects
A reference to an array of class objects (either CPPClass or ObjCClass).
categoryObjects
A reference to an array of category objects (ObjCCategory).
blockOffset
The new block offset (relative to inputCounter), adjusted for the lines already parsed.
numcurlybraces
The number of curly braces parsed. Not particularly useful anymore.
foundMatch
True if this pass found an object that matches the requested type (e.g. an @function comment matched a function or function-like macro).
lang
The programming language.
sublang
The sublanguage (which may change as new classes are parsed).
Discussion

This is the block parser API you should generally be calling if you are reusing this code for other purposes. It parses a declaration and returns an appropriate set of HeaderDoc objects. It includes all of the HeaderDoc name processing voodoo. More explanation of this code is probably in order, but there's no time right now.

Common mistakes:

Unlike blockParse, you must increment the input counter or you risk an infinite loop. (When looping with blockParse, you must not increment the input counter or you will skip lines.)

Local Variables
blockmode

Possible values:

  • 0 — Not in a block of any kind.

  • 1 — Got an @defineblock comment, but have not yet seen any #define macros.

  • 2 — Got an @defineblock comment, and have seen at least one #define macro.

  • 3 — Got a #if macro before the first declaration. Treat the following declarations as a group until the corresponding #endif macro


blockParseReturnState


The magic box.

Parameters
parserState

The topmost parser state context object from blockParse.

treeTop

The top of the parser tree object from blockParse.

argparse

Set to 1 for parsing function arguments, enum constants, or struct fields, 2 for reparsing embedded HeaderDoc markup in a class, 0 otherwise. For more details, see blockParse.

declaration

The declaration returned by blockParse. If you pass an empty string, the declaration is obtained from the parse tree.

inPrivateParamTypes

Set to 1 if a C++ method with private parameters has been parsed and the public declaration needs to be restored.

publicDeclaration

The public declaration to restore.

lastACS

The access control state when the block parser finished, including any access control changes parsed this round.

forcedebug

Set to 1 to dump lots of debug information.

fileoffset

The base line number of the LineRange object containing this declaration. In subparse mode (reprocessing a declaration embedded in a class), this value gets overwritten with the correct value from the tree. Thus, this value is only relevant when this function is called from blockParse itself.

subparse

Set to 0 when this is called from blockParse. Set to 1 when reinterpreting a parse tree obtained from a declaration within a class.

definename

The token for #define. Used to determine whether to run a separate parser to extract the #define macro parameters.

inputCounter

The line number relative to the start of the LineRange object containing this declaration. In subparse mode (reprocessing a declaration embedded in a class), this value gets overwritten with the correct value from the tree. Thus, this value is only relevant when this function is called from blockParse itself.

Discussion

The block parser consists of a fairly complex state machine. Inside it lies a complex state object that requires further interpretation if you want to derive any useful information from it.

This code was originally part of the blockParse function itself. However, to improve class handling performance, the code was modified to reuse the previous class parse and extract information about each embedded method, etc. To support this, the parser state nformation needed to be stored in the parse tree and interpreted later. Thus, this portion was split off from the parser to interpret the structure when needed.

This function is called in three main places: at the end of blockParse, in the blockParseOutside function when reprocessing a parse tree, and at the end of pythonParse.

Local Variables
External variables
HeaderDoc::outerNamesOnly

Set by the -O flag.


bracematching


Returns the closing token to match a given opening token.

Parameters
tos

The opening symbol.

calledByParser

If 1, returns the original symbol and prints a warning message on error. If 0, returns an empty string on error (with no warning).

Discussion

This is used by peekmatch (and by other bits of code) to find the ending token that matches a starting token for braces, parentheses, and various other tokens that behave similarly.


buildCommentFromFields


Reconstructs a HeaderDoc comment from a field list.

Parameters
fields

An array of fields.

preAtPart

The part before the first @ sign (the declaration of a new-style HeaderDoc comment, or empty for an old-style HeaderDoc comment).

message

Content to use if the field set is empty.


changeAll


Changes an array of TypeHelper objects.

sub changeAll(
    $$$$) 
Parameters
arrayRef

The array to dump.

element

The key in each object to modify.

value

The desired value for the specified key.

append

If 0, replace the existing value with $value.

If 1, append $value to the existing value (space-delimited).


changeAllMatching


Changes matching members of an array of TypeHelper objects.

sub changeAllMatching(
    $$$$$$) 
Parameters
arrayRef

The array to dump.

matchingElement

The key in each object to match.

matchingValue

The value for that key that, if matching, indicates the object should be modified.

element

The key in each object to modify.

value

The desired value for the specified key.

append

If 0, replace the existing value with $value.

If 1, append $value to the existing value (space-delimited).


configureAccessControlStateForClass


Configures the access control state and optional/required state for methods and variables within a class based on the current language and class type.


cpp_add


Adds a C preprocessor macro to the parser.

sub cpp_add(
    $$) 
Parameters
parseTree

The parse tree for the macro in question.

dropdeclaration

True if the declaration's contents should be omitted entirely.


cpp_add_cl


Adds C preprocessor macro passed in with the -D flag on the command line.


cpp_add_string


Adds a C preprocessor macro to the parser.

sub cpp_add_string(
    $$) 
Parameters
string

The string form of the macro in question.

dropdeclaration

True if the declaration's contents should be omitted entirely.


cpp_argparse


Parses C preprocessor arguments.

Parameters
name

The name of the C preprocessor macro for which these arguments are the parameters.

linenum

The line number where this line appears. Used for determining which #define directives apply at that point in time.

arglistref

An array containing a parse tree for each actual parameters to this instance of the C preprocessor macro (in order of occurrence).


cpp_preprocess


Performs C preprocessing on a single token.

Parameters
part

The part to process.

linenum

The line number where the part appears. Used for determining which #define directives apply at that point in time.

Return Value

Returns the array ($newtoken, $hasargs, @arguments)

Discussion

Much of the actual processing happens in the caller. For simple substitutions, this returns the updated part. For function-like macros, this returns true for the hasargs value and also returns an array of argument names for use when processing the contents of the macros. In practice, that third value is never used.


cpp_remove


Removes a token from the C preprocessor macros list.

sub cpp_remove(
    $) 
Discussion

Used with availability macros so that C preprocessor doesn't strip out the availability macro tokens out before the parser sees them.


cpp_subparse


Used by cpp_argparse to recursively perform preprocessing on tokens within the actual arguments to a macro.

sub cpp_subparse(
    $) 
Parameters
tree

A parse tree for the actual argument in question.


cppHashMerge


Merges CPP hashes and CPP argument hashes based on interpreting a stack of #if ... #else ... #elif ... #endif directives.

Discussion

Used when processing blocks that might corrupt each other.

For example, if you have a #if ... #else ... #endif block in which the #if side is a #define that defines the name of a nonexistent function to an existing function and the #else side or #elif side is a real function definition for that same symbol name, the C preprocessor would dutifully turn that function declaration into a declaration for the other function. Oops.

Instead, upon entering such a block, the parser makes a backup of the C preprocessor's working hashes (which contain C preprocessing tokens and argument lists). This gives the preprocessor a base state for the block. Whenever a #else or #elif directive appears, the parser makes an intermediate copy of the hash coming out of that block, then resets the working hashes to the base state (prior to the initial #if). When the closing #endif directive appears, the parser merges all of the intermediate (per-block) hashes together and sets the working hashes to that value.

For detailed explanation, see the documentation for HashObject.


cppsupers


Scrapes the C++ superclass information from a declaration.

sub cppsupers 
Discussion

This function is also used for the Java implements information.


decomment


Strips comments out of a return type declaration.

sub decomment 
Discussion

This should only be used when handling return types. It does not handle strings or anything requiring actual parsing. It strictly rips out C comments (both single-line and standard).


defParmParse


Parses #define arguments.

Parameters
declaration

The text of the declaration to parse.

inputCounter

The line number (for debugging purposes).

definename

The name of the #define.

braceDebug

Set to 1 to print debug info.

fullpath

The header file path (for debugging purposes).


empty_comment


Returns true if a field set is effectively empty.


findMatch


Searches an array of TypeHelper objects for a matching name.

sub findMatch(
    $$$) 
Parameters
arrayRef

A reference to the array to search.

element

The key in each object to search.

value

The expected value of that key.


getAndClearCPPHash


Returns the current C preprocessor hash tables and wipes them clean for the next header.


getLangAndSublangFromClassType


Returns the new language and language dialect based on the token that began a class declaration.

Parameters
classtype

The class token.

Discussion

This function takes a class token (class, @class, @interface, etc.) and returns a lang and sublang value. Pretty trivial, but critical....


ignore


Returns whether a token should be ignored.

sub ignore 
Return Value

Returns the availability string if one is available. Otherwise, returns 0 if the token is a normal token, 1 if the token is in the ignore list and should be dropped during parsing, or 3 if the token represents an availability macro that has arguments and thus needs special handling.

Discussion

Checks the ignore list and availability macros.


macroRegexpFromList


Returns a regular expression for searching for macro tokens derived from a hash table.

Parameters
nameref

A reference to a hash in which the names of the macro tokens (e.g. #define, #if, #ifdef) are the hash keys.

onlywithpound

If 0, includes all tokens as-is.

If 1, includes only tokens that begin with a # sign and strips off the leading #, e.g. define instead of #define.

If 2, includes only tokens that do not begin with a # sign.


mergeComplexAvailability


Merges availability from multiple sources.

Parameters
orig_avail

The original availability derived from comments.

nodearrayref

The array of availability nodes generated from parse tokens parsed by the parser.


nameObjDump


Dumps an array of TypeHelper objects for debugging purposes.

sub nameObjDump(
    $) 
Parameters
arrayRef

A reference to the array to dump.


nspaces


A legacy piece of code that generates spaces for the raw declaration.

sub nspaces 

Deprecated

This is going away eventually.


objForType


Returns a HeaderDoc object (Var, Enum, Typedef, CPPClass, etc.) for a given set of type information.

Parameters
curObj

IN: The current master object from blockParseOutside. This master object is generated based on the top-level tag in the HeaderDoc comment, if present. (If the comment has no top-level tag, this is a generic HeaderElement object.)

typedefname

IN: The typedefname parse token (obtained from a call to parseTokens.

typestring

IN: The typestring field out of the parser state object. See the parserState class for more information.

posstypes

IN: The posstypes field out of the parser state object. See the parserState class for more information.

outertype

IN: The primary type returned by the parser. For example, in the case of a typedef struct declaration, the outer type would be typedef.

curtype

IN: The type that we are searching for (as defined by the HeaderDoc comment).

classType

IN: The class type of the enclosing context.

OUT: The class type of the class just parsed; unchanged if the current declaration is not a class.

classKeyword

IN: The class keyword from the HeaderDoc comment (e.g. for an @class comment, the value is class). If unspecified, the value is auto.

declaration

IN: The raw declaration. For classes, passed to classTypeFromFieldAndBPinfo. Otherwise unused.

fieldref

IN: A reference to the array of fields from the HeaderDoc comment.

functionGroup

IN: The name of the current function group.

varIsConstant

IN: Probably doesn't matter.

OUT: Returns 1 if the variable declaration is a constant, else 0.

blockmode

IN: Nonzero if the parser is in a #define block. (For details, see blockParseOutside.

inClass

IN: Nonzero if the HeaderDoc comment began with @class.

inInterface

IN: Nonzero if the HeaderDoc comment began with @interface.

inTypedef

IN: Nonzero if the HeaderDoc comment began with @typedef.

inStruct

IN: Nonzero if the HeaderDoc comment began with @struct.

fullpath

IN: The filename with leading path parts (for debugging purposes/warnings).

inputCounter

IN: The position within the current text block (for debugging purposes/warnings).

blockOffset

IN: The offset of the current text block from the start of the file (for debugging purposes/warnings).

lang

IN: The programming language of the file being parsed. Used to determine whether certain Pascal-specific keywords are active.

outerLocalDebug

IN: The value of localDebug in blockParseOutside. Set high for debugging.

functionContents

IN: The function body. Used to populate the object (if it's a function).

apiOwner

IN: The object into which this object will eventually be inserted. (Used to set the appropriate field in the object; this function does NOT add the object to the apiOwner object in any way.)

subparseInputCounter

IN: An override for the inputCounter field used when doing a subparse (handling a parse tree that has already been parsed once). Leave unset normally.

subparseBlockOffset

IN: An override for the blockOffset field used when doing a subparse (handling a parse tree that has already been parsed once). Leave unset normally.

extendsClass

IN: The superclass name (obtained from the block parser).

implementsClass

IN: The name of the class that this class implements (Java-specific, obtained from the block parser).

alwaysProcessComment

IN: Indicates that the processComment() call should me made on the resulting object even if curtype is UNKNOWN (meaning that the comment would normally get processed later in blockParseOutside). Used only in the case of a conversion request in blockParseOutisde.

Return Value

Returns the array ($extra, $classType, $varIsConstant).

Discussion

This logic got so large that it was too much of a pain to maintain in two places in blockParseOutside, hence the separate function.


objlink


Creates "see also" references between related APIs.

sub objlink 
Parameters
listref

A reference to an array of objects to be cross-linked.

Discussion

When the parser sees, for example, an @typedef comment, followed by a struct, followed by a typedef, it treats these as related APIs and automatically associates the comment with both of these two declarations. This function links those together at the end.


pbs


A piece of debug code that prints the brace stack.

sub pbs 
Discussion

This does nothing unless localDebug is set to 1 below. This should probably be revisited to key off something in the calling function.


peekmatch


Returns the closing token that matches the token at the top of the brace stack.

sub peekmatch 
Parameters
ref

A reference to the brace stack array.

fullpath

The path of the current header. Used for error messages.

linenum

The current line number within the header. Used for error messages.

Discussion

This is a variant of peek.


setCPPHashes


Sets a new CPP hash and CPP argument hash in place of the existing one.

Parameters
cpphashref

A reference to the new CPP symbol hash.

cpparghashref

A reference to the new CPP argument hash.


spacefix


A legacy piece of code that adjusts spacing in the raw declaration.

sub spacefix 

Deprecated

This is going away eventually.


Member Data

CPP_ARG_HASH

C preprocessor argument hash for the current header.

CPP_HASH

C preprocessor token hash for the current header.

HeaderDoc::BlockParse::VERSION

The revision control revision number for this module.

HeaderDoc::hideIDLAttributes

Controls whether IDL attributes (e.g. [foo]) should be hidden in HTML output.

HeaderDoc::includeFunctionContents

Tells the block parser to include the function body in the parse tree.

HeaderDoc::inputCounterDebug

Global variable that turns on input counter debugging in various parts of the code.

HeaderDoc::OptionalOrRequired

Stores whether Objective-C protocol methods are optional or required.

HeaderDoc::useParmNameForUnlabeledParms

Change this to 0 if you want to hide the parameter name for unlabeled parameters (old behavior).


CPP_ARG_HASH


C preprocessor argument hash for the current header.

my %CPP_ARG_HASH = ();  
Discussion

The token hash contains a mapping of C preprocessor token names to their argument lists. For example, if you have the following define:

             #define FOO(x, y) (x + (3 * y))
         

then the C preprocessor argument hash would contain a key called FOO with a (string) value of x, y.


CPP_HASH


C preprocessor token hash for the current header.

my %CPP_HASH = ();  
Discussion

The token hash contains a mapping of C preprocessor tokens to their values. For example, if you have the following define:

             #define FOO(x, y) (x + (3 * y))
         

then the C preprocessor token hash would contain a key called FOO with a (string) value of (x + (3 * y)).


HeaderDoc::BlockParse::VERSION


The revision control revision number for this module.

$HeaderDoc::BlockParse::VERSION = '$Revision: 1333753010 $';  
Discussion

In the git repository, contains the number of seconds since January 1, 1970.


HeaderDoc::hideIDLAttributes


Controls whether IDL attributes (e.g. [foo]) should be hidden in HTML output.

$HeaderDoc::hideIDLAttributes = 1;  
Discussion

By default, these tokens are hidden. Because this switch is unlikely to ever be used by anyone, it can be set only by changing the default value in BlockParse.pm from 1 to 0.


HeaderDoc::includeFunctionContents


Tells the block parser to include the function body in the parse tree.

$HeaderDoc::includeFunctionContents = 0;  

HeaderDoc::inputCounterDebug


Global variable that turns on input counter debugging in various parts of the code.

$HeaderDoc::inputCounterDebug = 0;  

HeaderDoc::OptionalOrRequired


Stores whether Objective-C protocol methods are optional or required.

$HeaderDoc::OptionalOrRequired = "";  

HeaderDoc::useParmNameForUnlabeledParms


Change this to 0 if you want to hide the parameter name for unlabeled parameters (old behavior).

$HeaderDoc::useParmNameForUnlabeledParms = 1;  
Discussion

Historically, HeaderDoc left out unlabeled parameters in constructing Objective-C method names. If you want that behavior, change this value. This is not an end-user-tunable parameter (without changing the code) because it doesn't seem likely that many people will want to change this behavior.