resolveLinks.c
IntroductionResolves links between HTML fies generated by Functions
addAttributeAdds an attribute to an HTML node, deleting any preexisting attribute with the same name as it does so. void addAttribute( xmlNode *node, char *attname, char *attstring) DiscussionThis differs from the core XML routines because it performs the matching in a casse-insensitive fashion like HTML does. addXRefInserts a new cross-reference into the cross-reference tree for later use in link resolution. void addXRef( xmlNode *node, char *filename) addXRefFromLineTakes a line from an xref cache file, splits it into its components, and adds the xref into the xref tree. int addXRefFromLine( char *line, char *basepath, int force_absolute) DiscussionEach line in an xref file is in the following format:
Note: This format is subject to change at any time. No forwards or backwards compatibility is guaranteed between different versions of the resolveLinks tool. addXRefSubRecursive tree walk subroutine used by
void addXRefSub( xrefnode_t newnode, xrefnode_t tree) checkDocDebug code. Not actually used during normal operation. void checkDoc( xmlNode *node, xmlNode *parent, xmlNode *prev, htmlDocPtr dp) closelogfileCloses the log file. void closelogfile() countfilesCounts the number of files in a list of files. int countfiles( fileref_t rethead) DiscussionThis function is used for debugging purposes. db_freeDebug version of void db_free( void *ptr) db_mallocDebug version of void *db_malloc( size_t length) EnableCoreDumpsDebug code. Not actually used during normal operation. static int EnableCoreDumps( void) findAnchorSearches a parse tree for the first anchor. xmlNode *findAnchor( xmlNode *node) fix_xrefbyname_t_to_headWalks the void fix_xrefbyname_t_to_head( xrefbyname_t node) fix_xrefnode_t_to_headWalks the void fix_xrefnode_t_to_head( xrefnode_t node) ; fixpathFixes a path by removing double slashes and trailing slashes from a path. char *fixpath( char *name) DiscussionNOTE: This function has a side-effect. The string passed via the name argument is modified in place. Since it can only shrink, not grow, it wasn't worth the potential for memory leaks to avoid this side effect. free_xrefbyname_t_treeReleases the void free_xrefbyname_t_tree( xrefbyname_t node) DiscussionThe resolveLinks tool runs some internal consistency checks on its AVL tree code at each launch. This ensures that those checks do not leak. free_xrefbyname_t_tree_subRecursively releases nodes in the void free_xrefbyname_t_tree_sub( xrefbyname_t node) DiscussionThe resolveLinks tool runs some internal consistency checks on its AVL tree code at each launch. This ensures that those checks do not leak. free_xrefnode_t_treeReleases the void free_xrefnode_t_tree( xrefnode_t node) DiscussionThe resolveLinks tool runs some internal consistency checks on its AVL tree code at each launch. This ensures that those checks do not leak. free_xrefnode_t_tree_subRecursively releases nodes in the void free_xrefnode_t_tree_sub( xrefnode_t node) DiscussionThe resolveLinks tool runs some internal consistency checks on its AVL tree code at each launch. This ensures that those checks do not leak. gatherXRefsWalks the parse tree of an HTML file, gathers all
API reference anchors, and adds then into the xref tree with
void gatherXRefs( xmlNode *node, htmlDocPtr dp, char *filename) getFilesReturns a list of HTML files within a given directory. getHashPosTruncates a link at the first hash mark ( int getHashPos( char *a) getNextXRefPartReturns the next part of an API reference and null-terminates the current one. char *getNextXRefPart( char *data) Return ValueReturns the next part of the API reference (the first character
after the slash), or DiscussionThis function null-terminates the current part of an API reference
by replacing the next slash ( WARNING: Modifies the string pointed to by getrefpartsDivides an API reference into some of its constituent parts. refparts_t getrefparts( char *origref, int parts) Return ValueReturns the parts in a dynamically allocated has_targetReturns whether an HTML node has a valid int has_target( xmlNode *node) insertNameInserts a name node into the tree used by int insertName( xrefbyname_t node, xrefbyname_t tree) installedpathCalculates the final absolute filesystem path for a link destination. char *installedpath( char *filename) Return ValueReturns an allocated chunk of memory (allocated by DiscussionThis takes into account the base path, whether the path is a (known) URI or not, and the global installation path (if set, else the global base path). This is similar to isCommentedAnchorChecks a comment to see if it looks like a commented-out anchor with a logicalPath attribute. int isCommentedAnchor( char *commentString) isEndOfLinkRequestReturns true if the text (from an HTML comment) represents the end of a link request int isEndOfLinkRequest( char *text) ishdindexReturns whether the filename looks like a frameset-based
int ishdindex( char *filename) ispartialmatchReturns whether a known symbol marker is a suitable partial match for a partial symbol marker in a link request. int ispartialmatch( char *partial_ref, char *complete_ref) ParametersDiscussionThe API reference convention for languages with classes allows for multiple methods with the same name and different parameters and/or return types. The unfortunate result of this is that it is impossible to programmatically generate a "best guess" link request that matches it precisely because there is no way to guess the return type or parameter signature. This function looks at the link request and checks to see
if it is of one of the symbol types that are affected by
this limitation. If so, it appends a slash to the
link request and checks to see if it is an exact match
for the first part of the actual link destination (including
the slash). If it is, this function returns This is used in isStartOfLinkRequestReturns true if the text (from an HTML comment) represents the start of a link request int isStartOfLinkRequest( char *text) isURIReturns whether a filename is a recognized URI scheme. int isURI( char *filename) mainThe main program body. int main( int argc, char *argv[]) DiscussionThis tool processes links (both in anchor form and in a commented-out form) and named anchors, rewriting link destinations to point to those anchors. Debugging Note: If makeurlTakes a filename (absolute paths only) and an anchor within that file and concatenates them into a full URL. char *makeurl( char *rawfilename, char *offset, int retarget, int relativeToInput) malloccopypartReturns a newly allocated string containing a range of characters from a source string. char *malloccopypart( char *source, int start, int length) matchingPathPartsReturns the number of leading path parts that match. int matchingPathParts( char *a, char *b, int *isbelowme) ParametersDiscussionThis is used to determine which of multiple possible link destinations is the closest match (for resolving API reference symbol conflicts). nameFromAppleRefReturns the actual object name from an API reference (UID). char *nameFromAppleRef( char *ref) nodefilenamerealpathReturns (and caches) the char *nodefilenamerealpath( xrefnode_t node) nodelineGenerates a line in a cross-reference file for the specified path, UID, and title. char *nodeline( char *path, char *xref, char *title) DiscussionUsed by nodelistRecursively descends an HTML parse tree, returning a list of nodes matching a given name. struct nodelistitem *nodelist( char *name, xmlNode *root) nodelist_recThe recursive tree walk subroutine used by
the void nodelist_rec( char *name, xmlNode *cur, struct nodelistitem **nl) nodematchingReturns the first node whose element name matches a given node. xmlNode *nodematching( char *name, xmlNode *cur, int recurse) DiscussionIf If This function is part of the xml2man library code and is not used in this tool. nodepathCalculates the absolute filesystem path for a relative path. char *nodepath( xrefnode_t node) DiscussionThis differs from It takes into account the base path specified for the folder containing the file in question and adds it back on. If the path is a (known) URI, it returns the value unmodified. onSameLevelReturns true if the two nodes are at the same level in the XML parse tree, else false. int onSameLevel( xmlNode *a, xmlNode *b) openlogfileOpens the log file. void openlogfile() partsOfPathReturns an array of indices to parts of a path. int *partsOfPath( char *path) print_statisticsPrints cumulative statistics about this run of the tool. void print_statistics( void) printNodeRangeDumps information about a range of HTML nodes for debugging purposes. int printNodeRange( xmlNode *start, xmlNode *end) printNodeRangeSubThe recursive portion of int printNodeRangeSub( xmlNode *start, xmlNode *end, int leading) printusagePrints command-line usage information. void printusage() propStringReturns a string containing the text representation of a node's properties. char *propString( xmlNode *node) proptextReturns the text contents of a named attribute in a list of attributes. char *proptext( char *name, struct _xmlAttr *prop) Return ValueReturns an object that must be released with a call to propvalReturns the value of a numeric property. int propval( char *name, struct _xmlAttr *prop) DiscussionThis function is part of the xml2man library code and is not used in this tool. quietErrorsA function that throws away parse errors without spewing warnings. void quietErrors( void *userData, xmlErrorPtr error) readXRefFileThis function reads a cross-reference cache file. int readXRefFile( char *filename, char *basepath, int force_absolute) DiscussionThis is intended to allow eventual incorporation of cross-references that do not live in the same directory (or even on the same machine. It is currently unused. realpath_workaroundWorkaround for a bug in char *realpath_workaround( char *path, char *buffer) DiscussionIn Mac OS X v10.6 and earlier, there is a bug in realpath such that it can return a static (non-malloc) chunk of memory if you pass a path of "/" with a NULL buffer. (Oops.) This bug is specific to Mac OS X v10.6 and earlier, but the workaround is harmless enough that it isn't work special casing it. rebalance_xrefbyname_t_treeRebalances the void rebalance_xrefbyname_t_tree( xrefbyname_t node, xrefbyname_t fromnode) Parametersrebalance_xrefnode_t_treeRebalances the void rebalance_xrefnode_t_tree( xrefnode_t node, xrefnode_t fromnode) Parametersredirect_stderr_to_nullRedirects standard error to void redirect_stderr_to_null( void) DiscussionBefore calling this, you must first call refByNameSearches the (binary) tree of names for a symbol matching that name. xrefnode_t refByName( char *name, char *basepath) Return ValueReturns the cross-reference node (which provides the list of candidate API reference markers) that matches the specified name. DiscussionThis is a fallback for when exact matching fails. It allows certain "shot in the dark" link requests to match against something less exact. refByNameRecRecursive portion of xrefnode_t refByNameRec( char *name, xrefbyname_t pos, char *basepath) refLangChangeTakes an API reference and rewrites it, changing the
language to the language specified by char *refLangChange( char *ref, char *lang) refpartsfreeFrees a void refpartsfree( refparts_t rp) Parameters
refRefChangeRewrites an API reference, changing the
char *refRefChange( char *ref, char *extref) relpathGenerates a relative path from one file to another. char *relpath( char *target, char *fromFile, int isDir) Return ValueReturns the relative path in a newly allocated chunk of memory
that must be released with DiscussionThe value returned is the relative path of the file specified
by Important: Both files must be either relative or absolute paths; mixing absolute and relative paths in the same call is not allowed. resolveAttempts to resolve a space-delimited list of cross-references, allowing language fallback (c++ to C, etc.), and returns the URL associated with it. char *resolve( char *xref, char *filename, int *retarget, char **frametgt) resolve_mainThe main pthread body for link resolution static void *resolve_main( void *ref) resolve_mainsubProcesses all of the files assigned to a given thread. int resolve_mainsub( int pos) DiscussionThis function takes a single argument (the thread number) and processes all of the files in the file list corresponding with that thread number. resolveLinksThe actual link resolution function. void resolveLinks( xmlNode *node, htmlDocPtr dp, char *filename, char *filename_rp) DiscussionWalks the parse tree of an HTML document and does the actual link resolution, inserting href attributes where applicable, converting anchors that fail to resolve into comments, and converting resolvable commented links back into anchors. restore_stderrRestores standard error after a call to void restore_stderr( void) rotate_xrefbyname_t_leftleftHandles rotations of the rotate_xrefbyname_t_leftrightHandles rotations of the rotate_xrefbyname_t_rightleftHandles rotations of the rotate_xrefbyname_t_rightrightHandles rotations of the rotate_xrefnode_t_leftleftHandles rotations of the rotate_xrefnode_t_leftrightHandles rotations of the rotate_xrefnode_t_rightleftHandles rotations of the rotate_xrefnode_t_rightrightHandles rotations of the round4Rounds up a number to the nearest 4-byte boundary. int round4( int k) safe_asprintfCompatibility shim for Linux int safe_asprintf( char **ret, const char *format, ...) DiscussionUnlike the BSD implementation of Because it is poor programming practice to accept a pointer from a system routine without checking to see if the routine returned NULL, this results in a rather messy pair of checks in order to get the desired behavior (one check for the return value, then another for the pointer). This function works around that flaw in the Linux
implementation by simply checking the return value,
then setting the variable pointed to by searchfreeReleases a void searchfree( searchobj_t obj) searchrefSearches for an xref in the xref tree,
returning the filename and anchor within that file, concatenated using
searchobj_t searchref( char *xref, xrefnode_t tree, int retarget, char *basepath) setup_redirectionOpens a file descriptor to void setup_redirection( void) tailcompareCompares the end of a filename to a given substring. int tailcompare( char *string, char *tail) Return ValueReturns 1 on a match or 0 on failure. test_xrefbyname_t_treeTests the int test_xrefbyname_t_tree() DiscussionThe resolveLinks tool runs some internal consistency checks on its AVL tree code at each launch. This runs some of those tests. This test is not implemented. However, the tree
does get abused significantly between the
resolveLinks test suite and the built-in
consistency tests in test_xrefnode_t_treeTests the int test_xrefnode_t_tree() DiscussionThe resolveLinks tool runs some internal consistency checks on its AVL tree code at each launch. This runs some of those tests. textmatchingReturns the text contents of the first node whose element name matches a given node. char *textmatching( char *name, xmlNode *node, int missing_ok, int recurse) DiscussionThis function is related to If the node name is "text" or "comment", it returns the text of the node itself. Otherwise, it returns the text of the node's first child. This function is part of the xml2man library code and is not used in this tool. ts_basenameThread-safe wrapper for char *ts_basename( char *path) ts_dirnameThread-safe wrapper for char *ts_dirname( char *path) verify_xrefbyname_t_treeVerifies the depth values at all levels of an int verify_xrefbyname_t_tree( xrefbyname_t node) verify_xrefbyname_t_tree_subVerifies the depth values at all levels of an int verify_xrefbyname_t_tree_sub( xrefbyname_t node, int *errors) verify_xrefnode_t_treeVerifies the depth values at all levels of an int verify_xrefnode_t_tree( xrefnode_t node) verify_xrefnode_t_tree_subVerifies the depth values at all levels of an int verify_xrefnode_t_tree_sub( xrefnode_t node, int *errors) writeFileWrites an HTML parse tree to a file on disk. void writeFile( xmlNode *node, htmlDocPtr dp, char *filename) writeFile_subWalks a tree recursively and writes an HTML parse tree to disk. void writeFile_sub( xmlNode *node, htmlDocPtr dp, FILE *fp, int this_node_and_children_only) DiscussionUsed by writePropsWrites a node's properties to a file. void writeProps( xmlNode *node, FILE *fp) writeXRefFileWrites a cross-reference cache file. void writeXRefFile( char *filename, char *indir) DiscussionA cross-reference cache file can be used to provide an initial seed for the resolver. It contains the information needed to create a link to any of the content processed (subject to passing the correct flags to strip off leading path components and prepend new leading path components). This allows you to resolve the contents of multiple independent projects iteratively. For example, you could have two projects, A and B, each of which exports a cross-reference cache file, and also imports the cache file from the other project when you build it. By doing this, the two projects can know nothing about each other except for the location of the cache file and the relative path on the server, yet can still link to one another. writeXRefFileSubWalks the tree and writes cross-references to a cache file. void writeXRefFileSub( xrefnode_t node, FILE *fp) DiscussionUsed by xmlNodeForCommentReturns a parsed XML node for a commented-out link. xmlNode *xmlNodeForComment( xmlChar *commentguts, htmlDocPtr parentDoc) xmlNodeGetRawStringGets the raw text (with entities intact) for a single node. char *xmlNodeGetRawString( htmlDocPtr dp, xmlNode *node, int whatever) DiscussionThis is similar to Typedefs
fileref_tA node in a list that cantains pointers to HTML parse tree nodes that matched a specific pattern. typedef struct fileref { char name[MAXPATHLEN]; struct fileref *next; struct fileref *threadnext; } *fileref_t; DiscussionUsed by the refparts_tThe parts of an API reference (UID). typedef struct refparts { char *refpart; char *langpart; char *rest; } *refparts_t; Fieldssearchobj_tAn object returned by typedef struct searchobj { char *uri; xrefnode_t obj; } *searchobj_t; Fieldsxrefbyname_tA node in a tree of pointers to typedef struct _xrefbyname { char *name; xrefnode_t node; int maxleftdepth, maxrightdepth; struct _xrefbyname *left, *right, *dup, *parent; } *xrefbyname_t; Fields
DiscussionUsed for looking up certain machine-generated or ambiguously described symbols as a last resort. The xrefnode_tA node in a tree of nodes that each describe information about a possible link destination. typedef struct _xrefnode { char *basepath; char *filename; #ifdef RP_CACHE char *filename_rp; #endif char *fullpath; // char *xref; char *title; int fromseed : 1; int force_absolute : 1; int maxleftdepth, maxrightdepth; struct _xrefnode *left, *right, *dup, *parent; } *xrefnode_t; Fields
Structs and Unions
nodelistitemA node in a list that cantains pointers to HTML parse tree nodes that matched a specific pattern. struct nodelistitem { xmlNode *node; struct nodelistitem *next; struct nodelistitem *prev; }; FieldsDiscussionUsed by the Globals
brokenThe number of broken link requests (such as link requests without an end marker). int broken = 0; debug_relpathBroken out storage for the relpath debug bit from the int debug_relpath = 0; DiscussionUsed for debugging bugs in the code that calculates relative paths between two absolute paths. debug_reparentBroken out storage for the reparent debug bit from the int debug_reparent = 0; DiscussionUsed for debugging bugs in the code that converts commented-out link requests into live links. debuggingStorage for the int debugging = 0; duplicatesNumber of duplicate API reference markers encountered. int duplicates = 0; extrefsThe array of external cross-reference seed files encountered during command-line argument handling. char *extrefs[MAXEXTREFS]; filedebugBroken out storage for the file debug bit from the int filedebug = 0; force_absolute_globallySet to 1 if the int force_absolute_globally = 0; global_basepathStorage for the char *global_basepath = "/"; global_basepath_setSet to 1 if the int global_basepath_set = 0; global_installedpathStorage for the char *global_installedpath = NULL; global_option_disable_name_matchingSet to 1 to disable by-name matching. int global_option_disable_name_matching = 0; DiscussionBy default, resolveLinks allows certain link requests to contain only a name and searches for all matching references. If you do not need this, you can significantly improve performance by disabling it. global_option_disable_partial_ref_matchingSet to 1 to disable partial matching. DiscussionBy default, resolveLinks looks for partial matching in C++ symbols. If you do not need this, you can significantly improve performance on larger docs by disabling this feature. inputDirectoryThe input directory, as passed in on the command line. char *inputDirectory = NULL; logfileA file pointer to a log file. FILE *logfile = NULL; DiscussionPoints to a file in /tmp for storing a detailed log of link resolution failures, etc. lognameThe temporary filename for the log file, as generated by char *logname = NULL; nameheadThe top of the tree of cross reference nodes sorted by the name of the symbol. xrefbyname_t namehead = NULL; nextrefsThe number of external cross-reference seed files encountered during command-line argument handling. int nextrefs = 0; nfilesThe total number of files processed. int nfiles = 0; nodeheadThe top of the tree of cross reference nodes sorted by the API reference symbol marker. xrefnode_t nodehead = NULL; nodotSet to 1 to disable printing the dots. This makes debugging in Instruments less traumatic. int nodot = 0; nopercentDisables emission of percent success/failure. int nopercent = 0; DiscussionTo prevent bogus test failures caused by differences in floating point representation, this flag disables that bit of the output. In the future, this flag may be overloaded for other purposes; its sole purpose is to tell resolveLinks that it is being run from within the test framework. nthreadsNumber of threads to use (from the int nthreads = 2; nullfdFile descriptor for int nullfd = -1; DiscussionUsed by plainThe number of normal (non-logicalPath-based) links. int plain = 0; prognameA global copy of char *progname; resolvedThe number of successfully resolved links. int resolved = 0; seeding_in_progressSet to 1 while new reference markers are being added from seed files, 0 otherwise. int seeding_in_progress = 0; DiscussionUsed to set the stderrfdFile descriptor for storing a copy of stderr. int stderrfd = -1; DiscussionUsed by thread_exitStorage for the exit status of helper threads. int thread_exit[MAXTHREADS]; thread_processed_filesPer-thread storage for an index into each thread's array of files to be processed. int thread_processed_files[MAXTHREADS]; threadfilesThe array of files to be handled by each helper thread. fileref_t threadfiles[MAXTHREADS]; unresolvedThe number of unresolved links. int unresolved = 0; unresolved_explicitThe number of unresolved links that were not machine-generated by automated processes. int unresolved_explicit = 0; warn_eachBroken out storage for the relpath warn_each bit from the int warn_each = 0; DiscussionIf set, prints the file name and other info for each unresolvable link request. writedebugBroken out storage for the write debug bit from the int writedebug = 0; Macro Definitions
FIX_TYPE_TO_TOPWhen used, declares the function fix_X_to_head where X is a type name. #define FIX_TYPE_TO_TOP(TYPE, TOP) DiscussionPart of the AVL code. The constructed function walks from the current node up to the head, fixing any incorrect maximum depth values as it does so. REBALANCE_TREEWhen used, declares the function rebalance_X_tree where X is a type name. #ifdef NOAVL #define REBALANCE_TREE(TYPE) #else #define REBALANCE_TREE(TYPE) #endif DiscussionPart of the AVL code. The constructed function rebalances the
AVL tree after insertion of the node ROTATE_LEFTLEFTWhen used, declares the function rotate_X_leftleft where X is a type name. #define ROTATE_LEFTLEFT(TYPE, TOP) DiscussionPart of the AVL code. The constructed function handles rotations in which the left child of the left child needs to be pulled up. ROTATE_LEFTRIGHTWhen used, declares the function rotate_X_leftright where X is a type name. #define ROTATE_LEFTRIGHT(TYPE, TOP) DiscussionPart of the AVL code. The constructed function handles rotations in which the right child of the left child needs to be pulled up. ROTATE_RIGHTLEFTWhen used, declares the function rotate_X_rightleft where X is a type name. #define ROTATE_RIGHTLEFT(TYPE, TOP) DiscussionPart of the AVL code. The constructed function handles rotations in which the left child of the right child needs to be pulled up. ROTATE_RIGHTRIGHTWhen used, declares the function rotate_X_rightright where X is a type name. #define ROTATE_RIGHTRIGHT(TYPE, TOP) DiscussionPart of the AVL code. The constructed function handles rotations in which the right child of the right child needs to be pulled up. VERIFY_TREEWhen used, declares the function verify_X_tree where X is a type name. #define VERIFY_TREE(TYPE) DiscussionPart of the AVL code. The constructed function verifies the depth values at all levels of an AVL tree and returns the number of errors. VERIFY_TREE_SUBWhen used, declares the function verify_X_tree_sub where X is a type name. #ifdef NOAVL #define VERIFY_TREE_SUB(TYPE) #else #define VERIFY_TREE_SUB(TYPE) #endif DiscussionPart of the AVL code. The constructed function verifies the depth values at all levels of an AVL tree. |