Database API

This chapter provides information that enables programmers to build applications that make use of Source-Navigator's project databases. It covers the API, including available functions and their syntax.

This chapter assumes that you are familiar with the Tcl or C programming languages. For information about Tcl see Practical Programming in Tcl and Tk¹ and Tcl and the Tk Toolkit² , and for information about C refer to The C Programming Language³ .

Introduction

Source-Navigator project information is stored in a database . There is one database for each project and the database may contain one or more views. A view is a named subset of records; this might be the subset of all .h files, or the subset of all files in a given subdirectory. Views are a powerful and fast way to narrow down a large project without building multiple projects (see Views).

A database consists of 15 to 25 files; each file consists of a table that contains symbol and index information. Database files are regular files in the operating system and can be shared between UNIX and Windows operating systems.

When you create a project with Source-Navigator, a database is created under the projectdir /SNDB4 directory. By default, this location can be changed in the Project Preferences dialog when the project is created. See General Project Preferences of User's Guide for more information. Each database file is created with a filename starting with projectname , where projectname is the project name you chose and projectdir is the project directory you chose. One or more databases may exist under the same directory.

Each file is a table that contains specific symbol information and indexes. A database table can be accessed via an index, which is handled internally by the database, or sequentially. To use the Tcl API, you do not need additional header files, compilers, or libraries. All of the Tcl commands you need to work with a Source-Navigator project database are built into the Source-Navigator Tcl interpreter called hyper.

Notes:

In previous versions of Source-Navigator, fields in the database were separated by space characters. To accommodate filenames that contain spaces, the field separator has been changed to an internal value that can be referenced through the read-only Tcl variable sn_sep. See the scripts under Cross-Reference Tables for more information on this variable.

The first three lines of the scripts in this chapter are "boilerplate" text to run these scripts under UNIX. On Windows you must delete these lines and run the remainder from a file using SNsdk.

To run standalone Tcl scripts (see multicludes.tcl) under Windows NT, an execution command must be used. The usage of this command is:

SNsdk <script name> <arguments>

To run the multicludes.tcl example script on a project called test1 in the C:\test directory the syntax would be:

SNsdk multicludes.tcl C:\test test1

Structure

The database may be accessed by a set of Tcl commands. The primary goals in designing the API were performance and flexibility; for high-level queries, the Tcl language provides a powerful and flexible solution.

The database supports the btree and hash file formats. The btree format is a representation of a sorted, balanced tree structure. The hash format is an extensible, dynamic hashing scheme.

The dbopen, close, del, get, put, and seq routines are used to access the database. Optimal database tuning calls and parameters may also be configured in Tcl using the cachesize and pagesize properties (see dbopen) to give optimum performance in specific applications.

Views

Views define sets of files to include or exclude from queries. The hidden files list is stored in a view. For example, in a project where you have both database- and GUI-specific source files, you can hide the database file sql.c and save the remaining project information as a view. Hiding this view means the records with references to the file sql.c must be skipped. The following example lists a view table:

set db_view [dbopen nav_view SNDB4/cpl.2 RDONLY 0644 hash \
cachesize=300000]
puts stdout [join [ $db_view seq -data] \n]

The list would be as follows:

sql.c

Using Views

Views have to be specified when a table is opened using the dbopen command; the application does not have to be changed.

The example below uses a view. The view table in the following example (SNDB4/cpl.2) is created using the Editor. The first view in a project has the suffix .1, the second .2. There is no limit for views.


#!/bin/sh
# Replace $HOME/snavigator with the Source-Navigator 
# installation directory! \
exec $HOME/snavigator/bin/hyper "$0" "$@"
#
# Don't forget the backslash before exec!
#
set db_view [dbopen nav_view SNDB4/cpl.2 RDONLY 0644 \
    hash cachesize=300000]

set db_functions [dbopen nav_func SNDB4/cpl.fu RDONLY \
    0644 btree cachesize=20000 $db_view]

# Output the list of matches with newline characters
# after each.
puts [join [$db_functions seq -data] \n]

# Force our way out of this event-driven shell.
exit

Cross-Reference Tables

Source-Navigator stores the cross-reference information in two tables with the suffixes by and to. They contain the same information, only their key format differs. The to table keeps the Refers-to information and the by table the Referred-by information. The following script opens the to cross-reference table of a project and lists its contents.


#!/bin/sh
# Replace $HOME/snavigator with the Source-Navigator 
# installation directory! \
exec $HOME/snavigator/bin/hyper "$0" "$@"
#
# Don't forget the backslash before exec!
#
set db_functions [dbopen nav_func .sn/cpl.to RDONLY 0644 btree \
  {cachesize=200000}]

puts [join [$db_functions seq -data] \n]
exit

The above script would generate these results:

# len fu # strlen fu p 000019 c.c
# main fu # glob_var gv w 000010 c.c
# main fu # len fu p 000013 c.c
# main fu # printf fu p 000013 c.c
# main fu # strcpy fu p 000012 c.c

The hash (#) characters mean that the symbols do not belong to any classes. To fetch only the references in the main function, modify the fetch instruction:

$db_functions seq -data [append "#" $sn_sep "main" $sn_sep]

In the query, note that the first character of the key must be a hash (#) character because main is not a method but a function. To be sure that only the references of main are reported, a separator character ($sn_sep) has to be added to the string main. Without the separator, the query would report the references of all functions whose names begin with the string main.

An example of the result after the modification:

# main fu # glob_var gv w 000010 c.c
# main fu # len fu p 000013 c.c
# main fu # printf fu p 000013 c.c
# main fu # strcpy fu p 000012 c.c

If an application collects references to a function Referred-by, it is better to use the by database table.

The script below opens the Referred-by table and reports every reference to the global variable glob_var and the function len.


#!/bin/sh
# Replace $HOME/snavigator with the Source-Navigator 
# installation directory! \
exec $HOME/snavigator/bin/hyper "$0" "$@"
#
# Don't forget the backslash before exec!
#
global sn_sep
set db_functions [dbopen nav_func SNDB4/cpl.by RDONLY \
    0644 btree cachesize=200000]
# Output the cross-references which match the following
# criteria, with newline characters between all of them.
puts [join $db_functions seq -data \
    [append "#" $sn_sep "glob_var" $sn_sep gv] \n]
puts [join $db_functions seq -data  \
    [append "#" $sn_sep "len" $sn_sep fu] \n]

# Force our way out of this event-driven shell.
exit

The result will be:

# glob_var gv # main fu w 000010 c.c
# len fu # main fu p 000013 c.c

The output above indicates that the symbols glob_var and len are only used by the function main.

Tcl API Functions

The Source-Navigator database tables can be accessed by means of Tcl commands created using the dbopen command. The database API can access Source-Navigator tables regardless of whether Source-Navigator is running, or a project is using the target tables.

dbopen

The dbopen command opens a table for reading and/or writing. dbopen creates a new Tcl object (command) with the name dbobject.

SYNOPSIS

dbopen dbobject
 tableName access permission type ?openinfo?

`dbobject`	The desired name of the new command. `dbopen` will fail if there is already a command named `dbobject`.
`tableName`	The `dbopen` command opens a table for reading and/or writing. Files not intended for permanent storage on disk can be created by setting the `tableName` parameter to `NULL`.
`access`	The `access` argument is as specified for the Tcl `open` routine; however, only the `CREAT`, `EXCL`, `RDONLY`, `RDWR`, and `TRUNC` flags are significant. Refer to the Tcl documentation for further information on modes for opening files.
`permission`	If a new file is created as part of the process of opening it, `permission` (an integer) sets the permissions for the file. On UNIX, this is done in conjunction with the process' file mode creation mask.
`type`	The `type` argument must be either `btree` or `hash`. The `btree` format represents a sorted, balanced tree structure. The hash format is an extensible, dynamic hashing scheme.
`openinfo`	The `openinfo` optional argument must be a valid Tcl list and can be used to set type-specific properties of database tables. The syntax is as follows: `property1=value,property2=value,...\ propertyn=value` See Hash Table Properties and Btree Table Properties for valid values for this parameter.

Hash Table Properties
Hash table properties	Meaning
`bsize`	Defines the hash table bucket size, and is, by default, 256 bytes. It may be preferable to increase the page size for disk-resident tables and tables with large data items.
`factor`	Indicates a desired density within the hash table. It is an approximation of the number of keys allowed to accumulate in any one bucket, determining when the hash table grows or shrinks. The default value is 8.
`nelem`	An estimate of the final size of the hash table. If not set, or set too low, hash tables will expand gradually as keys are entered. Although a slight degradation in performance may be noticed, the default value is 1.
`cachesize`	Suggested maximum size, in bytes, of the memory cache. This value is only advisory, and the access method will allocate more memory rather than fail.

Btree Table Properties
Btree table properties	Meaning
`flags`	The `flags` value is specified by applying the bitwise OR operation with any of these values: `R_DUP` permit duplicate keys in the tree; that is, permit insertion if the key to be inserted already exists in the tree. The default behavior is to overwrite a matching key when inserting a new key or to fail if the `R_NOOVERWRITE` flag is specified. The `R_DUP` flag is overridden by the `R_NOOVERWRITE` flag, and if the `R_NOOVERWRITE` flag is specified, attempts to insert duplicate keys into the tree will fail. If the database contains duplicate keys, the order of retrieval of key/data pairs is undefined if the get method is used; however, seq method calls with the `R_CURSOR` flag set will always return the logical "first" of any group of duplicate keys.
`cachesize`	Suggested maximum size (in bytes) of the memory cache. This value is only advisory, and the access method will allocate more memory rather than fail. Since every search examines the root page of the tree, caching the most recently used pages substantially improves access time. In addition, physical writes are delayed as long as possible, so a moderate cache can reduce the number of I/O operations significantly. Using a cache increases (but only increases) the likelihood of corruption or lost data if the system crashes while a tree is being modified. If cachesize is 0 (no size is specified), a default cache is used.
`minkeypage`	The minimum number of keys stored on any single page. This value determines which keys are stored on overflow pages; that is, if a key or data item is longer than the pagesize divided by the `minkeypage` value, it is stored on overflow pages instead of in the page itself. If `minkeypage` is 0 (no minimum number of keys is specified), a value of 2 is used.
`psize`	Page size is the size (in bytes) of the pages used for nodes in the tree. The minimum page size is 512 bytes, and the maximum page size is 64K. If `psize` is 0 (no page size is specified), a page size is selected based on the underlying file system I/O block size.
`lorder`	The byte order for integers in the stored database metadata. The number should represent the order as an integer; for example, big endian order would be the number 4,321. If `lorder` is 0 (no order is specified), the current host order is used.

If the file already exists (and the TRUNC flag is not specified), the values specified for the parameter flags, lorder, and psize are ignored in favor of the values used when the tree was created.

Forward sequential scans of a tree are from the least key to the greatest.

Space freed up by deleting key/data pairs from the tree is never reclaimed, although it is normally made available for reuse. This means that the btree storage structure is grow-only. The only solution is to avoid excessive deletions, or to create a fresh tree periodically from a scan of an existing one (see dbcp).

The instruction below opens (for read-only) a btree database table using cachesize of 2 MB.

set db [dbopen nav_classes brow.cl RDONLY 0644 btree \
    {cachesize=2000000}]

The next example opens a hash table that is used later as a view.

set db_view [dbopen nav_view SNDB4/progs.1 RDONLY \
    0644 hash "cachesize=300000"]
set db_functions [dbopen nav_func SNDB4/progs.fu \
    RDONLY 0644 btree "cachesize=200000" $db_view]

Methods

With commands created by dbopen, these methods can be used: close, del, exclude, get, isempty, put, reopen, seq, and sync. In the following examples, dbobject represents the command returned from dbopen.

close

SYNOPSIS

dbobject close

This method flushes any cached information to disk, frees any allocated resources, and closes the underlying table. Since key/data pairs may be cached in memory, a database should be closed or synchronized before the application exits; this flushes the cache to disk. Failure to close or synchronize can cause data loss. As a final step, dbobject is also destroyed.

del

SYNOPSIS

dboject
 del ?-glob pattern? ?-beg pattern? ?-end pattern?
    ?-regexp pattern? ?-strstr pattern? ?key? ?flags?

This method removes key/data pairs from the table.

With glob, beg, end, regexp, and strstr switches (applicable only to the btree tables), a pattern must be specified to delete every record whose key matches the pattern. If key is specified, only those records are checked for deletion whose keys begin with key.

The parameter flags may be set to the value R_CURSOR. Delete the record referenced by the cursor. The cursor must have previously been initialized.

This method returns the number of the deleted records.

exclude

SYNOPSIS

dbobject exclude view

This method effectively sets the view of dbobject by excluding all symbols not in the view. View must be the name of an already existing object created using an earlier dbopen command.

get

SYNOPSIS

dbobject get key

If key is found, this method returns key and the associated data in separate Tcl lists; otherwise, it returns an empty string. For more information, see Fetching Tables.

isempty

SYNOPSIS

dbobject isempty

This method returns 1 if the table (with its current view) is empty, otherwise 0.

put

SYNOPSIS

dbobject put key data ?flags?

This method stores key/data pairs in the table.

The parameter flags may be set to one of these values:

`R_CURSOR`	Replace the key/data pair referenced by the cursor. The cursor must have previously been initialized.
`R_NOOVERWRITE`	Enter the new key/data pair only if the key did not previously exist.
`R_SETCURSOR`	Store the key/data pair, setting or initializing the position of the cursor to reference it. (Applicable only to the `btree` tables.)

The default behavior of the put routines is to enter the new key/data pair, replacing any previously existing key.

This method returns 0 on success, and 1 if the R_NOOVERWRITE flag was set and the key already exists in the table.

reopen

SYNOPSIS

dbobject reopen

This method closes and reopens the table. This flushes data to disk and resets any views.

seq

SYNOPSIS

dbobject seq option

See Fetching Tables.

sync

SYNOPSIS

dbobject sync

If the table is in memory only, this method has no effect and will always succeed. This method returns 0 on success.

Fetching Tables

Database tables can be fetched by the get and seq methods.

get returns only one record if the fully qualified key can be found. For more information, see the description of the get method above.

seq can be used to fetch tables sequentially. If the key argument is not given, the whole table is fetched (records to be retrieved can be filtered with patterns). key limits the records that should be fetched. seq may begin at any time, and the position of the cursor is not affected by calls to the del, get, put, or sync methods. Use the optional filters -end, -glob, -nocase, -regexp, -result_filter, and -strstr to limit the retrieved records.

Using key assures the best performance because it already limits the records that should be fetched while the filters limit the results only after the records have been fetched.

By fetching, the view (if any) assigned to the table while it was open is always processed.

Fetch Methods

Database tables can be fetched sequentially using seq or with indexed get access.

seq

SYNOPSIS

dbobject seq ?-columns column_list? ?-data? ?-end pattern?
    ?-first? ?-format format_string? ?-glob pattern? ?-key?
    ?-nocase pattern? ?-regexp expression? 
    ?-result_filter pattern? ?-strstr pattern? ?-uniq? ?-list?
    ?key? ?flags?

`-columns column_list`
	This switch determines the order and format in which the fields of the records are to be retrieved. `column_list` is a Tcl list with the format: `{{col_num1 ?format?} {col_num2 ?format?}... {column_numbern ?format?}}` where `format` may be one of the following:
	/	Only the remainder of the field after the last '`/`' character is retrieved.
	`#separator`	The remainder of the field after the last '`/`' character is retrieved, and then `separator` and the field contents, until the last '`/`'. If the field doesn't contain any '`/`', only the field contents are retrieved.
	`%`	The field is formatted with the `sprintf` C Programming function as if it were: `sprintf(result,format,field);`
	`&`	The formatted result up to the current field is formatted as if it were: `sprintf(new_result, format, current_result)`; The '`&`' character is interpreted as if it were '`%`'.
	`:biteq:bitor`	Records are retrieved only if the current field contents fulfill the condition as follows: `(biteq & field) == biteq && (bitor & field)` Use this operator only for numerical fields.
	`=value`	Records are retrieved only if the current field equals to `value`.
	`-min:max`	Records are retrieved only if the current field is between `min` and `max`. The field is in the range if it equals to `min` or `max`. This operator can be used only for decimal numerical fields.
	`<value`	Records are retrieved only if the current field is less than `value`. Use this operator only for numerical fields.
	`>value`	Records are retrieved only if the current field is greater than `value`. Use this operator only for numerical fields.
	`\|suffix`	A suffix is added to the contents of the field.
	If `format` does not match any of the special characters listed above, it will just be appended after the field contents.
	In `column_list`, every field can be defined only once.
`-data`	When this switch is specified, the fetched data of the record is not retrieved.
`-end pattern`	Only those records are fetched whose keys end with `pattern`.
`-first`	When this switch is specified, only the first record to match the conditions is retrieved.
`-format format_string`	If this option is specified, every retrieved record is additionally formatted with the `sprintf` C Programming function as if it were: `sprintf(new_result,format_string,result);`
`-glob pattern`	The record is retrieved only if its key matches the `glob` pattern.
`-key`	If this key is specified, the keys of the selected records are not retrieved.
`-nocase pattern`	The record is retrieved only if its key matches the non-case-sensitive `glob` pattern.
`-regexp expression`	The record is retrieved only if its key matches the regular expression.
`-result_filter pattern`	The record is retrieved only if the formatted fields match the `glob` pattern.
`-strstr pattern`	The record is fetched only if the pattern can be found in the key of the fetched record. (For more information, see the `strstr` C library function).
`-uniq`	If this option is specified, duplicated lists in the result returned are eliminated.
`-list`	If neither the `column`, `data`, or `key` switch is used, the keys and data parts of the fetched records are retrieved in separated sub-Tcl list. If this switch is specified, the key and data parts are retrieved in the same lists.
`flags`	The `flags` value may be set to one of these values:
	`R_CURSOR`	The data associated with the specified key is returned. This differs from the `get` routines in that it sets or initializes the cursor to the location of the key as well. (For the `btree` access method, the returned key is not necessarily an exact match for the specified key. The returned key is the smallest key greater than or equal to the specified key, permitting partial key matches and range searches.)
	`R_FIRST`	The first key/data pair of the database is returned, and the cursor is set or initialized to reference it.
	`R_LAST`	The last key/data pair of the database is returned, and the cursor is set or initialized to reference it (applicable only to the `btree` tables.)
	`R_NEXT`	Retrieve the key/data pair immediately after the cursor. If the cursor is not yet set, this is the same as the `R_FIRST` flag. This value is taken if a flag is not given.
	`R_PREV`	Retrieve the key/data pair immediately before the cursor. If the cursor is not yet set, this is the same as the `R_LAST` flag (applicable only to the `btree` tables). `R_LAST` and `R_PREV` are available only for the `btree` tables because they each imply that the keys have an inherent order that does not change.

C Programming API Functions

This section provides the equivalent of manual pages for the C Programming API functions.

Note

You must #include <db.h> when using the C API, and you need to link the final executable with libdb.a and libtcl8.1.a to locate the database library routines.

dbopen

SYNOPSIS

#include <db.h>

DB *dbopen(const char *file, int flags, int mode, DBTYPE type, const void *openinfo);

dbopen is the library interface to database files.

dbopen opens file for reading and/or writing. Files that are never intended to be preserved on disk may be created by setting the file parameter to NULL.

The flags and mode arguments are as specified to the open(2) routine; however, only the O_CREAT, O_EXCL, O_EXLOCK, O_NONBLOCK, O_RDONLY, O_RDWR, O_SHLOCK, and O_TRUNC flags are meaningful. (Note that opening a database file O_WRONLY is meaningless.)

The type argument is of type DBTYPE, defined in the <db.h> include file. It may be set to DB_BTREE or DB_HASH.

The openinfo argument is a pointer to an access method specific structure described in the access method's manual page. If openinfo is NULL, each access method will use defaults appropriate for the system and the access method.

The dbopen routine returns a pointer to a DB structure on success and NULL on error. The DB structure is defined in the <db.h> include file, and contains at least these fields:

typedef struct {
    DBTYPE type;
    int (*close)(const DB *db);
    int (*del)(const DB *db, const DBT *key, u_int flags);
    int (*fd)(const DB *db);
    int (*get)(const DB *db, DBT *key, DBT *data, 
        u_int flags);
    int (*put)(const DB *db, DBT *key, const DBT *data,
        u_int flags);
    int (*sync)(const DB *db, u_int flags);
    int (*seq)(const DB *db, DBT *key, DBT *data, 
        u_int flags);
} DB;

The elements of this structure specify the database type and a set of functions required to perform operations on a database of this type. These functions take a pointer to a structure as returned by dbopen, and sometimes one or more pointers to key/data structures and a flag value.

The structure elements are:

`type`	The type of the underlying access method (and file format).
`close`	A pointer to a routine to flush any cached information to disk, free any allocated resources, and close the underlying file(s). Since key/data pairs may be cached in memory, failing to sync the file with a `close` or `sync` function may result in inconsistent or lost information. `close` routines return -1 on error (setting `errno`), and 0 on success.
`del`	A pointer to a routine to remove key/data pairs from the database.
	The parameter `flags` may be set to:
	`R_CURSOR`	Delete the record referenced by the cursor. The cursor must have previously been initialized.
	`del` routines return -1 on error (setting `errno`), 0 on success, and 1 if the specified `key` was not in the file.
`fd`	A pointer to a routine that returns a file descriptor representative of the underlying database. A file descriptor referencing the same file is returned to all processes that call `dbopen` with the same filename. This file descriptor may be safely used as an argument to the `fcntl(2)` and `flock(2)` locking functions. The file descriptor is not necessarily associated with any of the underlying files used by the access method. No file descriptor is available for use in memory databases. `fd` routines return -1 on error (setting `errno`), and the file descriptor on success.
`get`	A pointer to a routine that is the interface for keyed retrieval from the database. The address and length of the data associated with the specified `key` are returned in the structure referenced by `data`. `get` routines return -1 on error (setting `errno`), 0 on success, and 1 if the `key` was not in the file.
`put`	A pointer to a routine to store key/data pairs in the database.
	The parameter `flags` may be set to one of these values:
	`R_CURSOR`	Replace the key/data pair referenced by the cursor. The cursor must have previously been initialized.
	`R_NOOVERWRITE`	Enter the new key/data pair only if the key does not previously exist.
	`R_SETCURSOR` is available only for the `DB_BTREE` access method because it implies that the keys have an inherent order that does not change.
	The default behavior of the `put` routines is to enter the new key/data pair, replacing any previously existing key.
	`put` routines return -1 on error (setting `errno`), 0 on success, and 1 if the `R_NOOVERWRITE` flag was set and the key already exists in the file.
`seq`	A pointer to a routine that is the interface for sequential retrieval from the database. The address and length of the key are returned in the structure referenced by `key`, and the address and length of the data are returned in the structure referenced by `data`.
	Sequential key/data pair retrieval may begin at any time, and the position of the "cursor" is not affected by calls to the `del`, `get`, `put`, or `sync` routines. Modifications to the database during a sequential scan are reflected in the scan, that is, records inserted behind the cursor are not returned while records inserted in front of the cursor are returned.
	The `flags` value must be set to one of these values:
	`R_CURSOR`	The data associated with the specified key is returned. This differs from the `get` routines in that it sets or initializes the cursor to the location of the key as well. (For the `DB_BTREE` access method, the returned key is not necessarily an exact match for the specified key. The returned key is the smallest key greater than or equal to the specified key, permitting partial key matches and range searches.)
	`R_FIRST`	The first key/data pair of the database is returned, and the cursor is set or initialized to reference it.
	`R_LAST`	The last key/data pair of the database is returned, and the cursor is set or initialized to reference it. (Applicable only to the `DB_BTREE` access method.)
	`R_NEXT`	Retrieve the key/data pair immediately after the cursor. If the cursor is not yet set, this is the same as the `R_FIRST` flag.
	`R_PREV`	Retrieve the key/data pair immediately before the cursor. If the cursor is not yet set, this is the same as the `R_LAST` flag. (Applicable only to the `DB_BTREE` access method.)
	`R_LAST` and `R_PREV` are available only for the `DB_BTREE` access method because they each imply that the keys have an inherent order that does not change.
	`seq` routines return -1 on error (setting `errno`), 0 on success, and 1 if there are no key/data pairs less than or greater than the specified or current key.
`sync`	A pointer to a routine to flush any cached information to disk. If the database is in memory only, the `sync` routine has no effect and always succeeds.
	The `flags` value may be set to this value:
	`sync` routines return -1 on error (setting `errno`), and 0 on success.

Key/Data pairs

Access to all file types is based on key/data pairs. Both keys and data are represented by this data structure:

typedef struct {
  void *data;
  size_t size;
} DBT;

The elements of the DBT (mnemonic for "data base thang") structure are defined as follows:

`data`	A pointer to a byte string.
`size`	The length of the byte string.

Key and data byte strings may reference strings of essentially unlimited length although any two of them must fit into available memory at the same time. It should be noted that the access methods provide no guarantees about byte string alignment.

Errors

The dbopen routine may fail and set errno for any of the errors specified for the library routines open(2) and malloc(3) or the following:

`[EFTYPE]`	A file is incorrectly formatted.
`[EINVAL]`	A parameter, such as hash function or pad byte, has been specified that is incompatible with the current file specification, or which is not meaningful for the function (for example, use of the cursor without prior initialization), or there is a mismatch between the version number of the file and the software.

The close routines may fail and set errno for any of the errors specified for the library routines close(2), read(2), write(2), free(3), or fsync(2).

The del, get, put, and seq routines may fail and set errno for any of the errors specified for the library routines read(2), write(2), free(3), or malloc(3).

The fd routines will fail and set errno to ENOENT for any of the errors specified in memory databases.

The sync routines may fail and set errno for any of the errors specified for the library routine fsync(2).

Limitations

None of the access methods provides any form of concurrent access, locking, or transactions.

btree Database Access Method

SYNOPSIS

#include <db.h>

The routine dbopen is the library interface to database files. One of the supported file formats is btree files. For a general description of the database access methods, see dbopen(3); this describes only the btree-specific information.

The btree data structure is a sorted, balanced tree structure storing associated key/data pairs.

The btree access method specific data structure provided to dbopen is defined in the <db.h> include file as follows:

typedef struct {
  u_long flags;
  u_int cachesize;
  int maxkeypage;
  int minkeypage;
  u_int psize;
  int (*compare)(const DBT *key1, const DBT *key2);
  size_t (*prefix)(const DBT *key1, const DBT *key2);
  int lorder;
} BTREEINFO;

The elements of this structure are as follows:

`flags`	The `flags` value is specified by applying the bitwise OR operation with any of the following values:
	`R_DUP`	Permit duplicate keys in the tree, that is, permit insertion if the key to be inserted already exists in the tree. The default behavior, as described in `dbopen(3)`, is to overwrite a matching key when inserting a new key or to fail if the `R_NOOVERWRITE` flag is specified. The `R_DUP` flag is overridden by the `R_NOOVERWRITE` flag, and if the `R_NOOVERWRITE` flag is specified, attempts to insert duplicate keys into the tree will fail. If the database contains duplicate keys, the order of retrieval of key/data pairs is undefined if the `get` routine is used; however, `seq` routine calls with the `R_CURSOR` flag set will always return the logical "first" of any group of duplicate keys.
`cachesize`	A suggested maximum size (in bytes) of the memory cache. This value is only advisory, and the access method will allocate more memory rather than fail. Since every search examines the root page of the tree, caching the most recently used pages substantially improves access time. In addition, physical writes are delayed as long as possible, so a moderate cache can reduce the number of I/O operations significantly. Using a cache slightly increases the likelihood of corruption or lost data if the system crashes while a tree is being modified. If `cachesize` is 0 (no size is specified), a default cache is used.
`maxkeypage`	The maximum number of keys stored on any single page. Not currently implemented.
`minkeypage`	The minimum number of keys stored on any single page. This value determines which keys are stored on overflow pages; that is, if a key or data item is longer than the pagesize divided by the `minkeypage` value, it is stored on overflow pages instead of in the page itself. If `minkeypage` is 0 (no minimum number of keys is specified), a value of 2 is used.
`psize`	`psize` is the size (in bytes) of the pages used for nodes in the tree. The minimum page size is 512 bytes and the maximum page size is 64K. If `psize` is 0 (no page size is specified) a page size is chosen based on the underlying file system I/O block size.
`compare`	`compare` is the key comparison function. It must return an integer less than, equal to, or greater than zero if the first key argument is considered to be respectively less than, equal to, or greater than the second key argument. The same comparison function must be used on a given tree every time it is opened. If `compare` is NULL (no comparison function is specified), the keys are compared lexically, with shorter keys considered less than longer keys.
`prefix`	`prefix` is the prefix comparison function. If specified, this routine must return the number of bytes of the second key argument, which is necessary to determine that it is greater than the first key argument. If the keys are equal, the key length should be returned. Although the usefulness of this routine is very data dependent, in some data sets it can produce significantly reduced tree sizes and search times. If `prefix` is NULL (no prefix function is specified) and no comparison function is specified, a default lexical comparison routine is used. If `prefix` is NULL and a comparison routine is specified, no prefix comparison is done.
`lorder`	The byte order for integers in the stored database metadata. The number should represent the order as an integer; for example, big endian order would be the number 4,321. If `lorder` is 0 (no order is specified), the current host order is used.

If the file already exists (and the O_TRUNC flag is not specified), the values specified for the parameters flags, lorder, and psize are ignored in favor of the values used when the tree was created.

Forward sequential scans of a tree are from the least key to the greatest.

Space freed up by deleting key/data pairs from the tree is never reclaimed, although it is normally made available for reuse. This means that the btree storage structure is grow-only. The only solutions are to avoid excessive deletions, or to create a fresh tree periodically from a scan of an existing one.

Searches, insertions, and deletions in a btree will all complete in O (ln N) where N is the average fill factor. Often, inserting ordered data into a btree results in a low fill factor. This implementation has been modified to make ordered insertion the best case, resulting in a much better-than-normal page fill factor.

Errors

The btree access method routines may fail and set errno for any of the errors specified for the library routine dbopen(3).

hash database access method

SYNOPSIS

#include <db.h>

The routine dbopen is the library interface to database files. One of the supported file formats is hash files. For a general description of the database access methods, see dbopen(3); this describes only the hash specific information.

The hash data structure is an extensible, dynamic hashing scheme.

The access method specific data structure provided to dbopen is defined in the <db.h> include file as follows:

typedef struct {
  u_int bsize;
  u_int ffactor;
  u_int nelem;
  u_int cachesize;
  u_int32_t (*hash)(const void *, size_t);
  int lorder;
} HASHINFO;

The elements of this structure are as follows:

`bsize`	`bsize` defines the hash table bucket size, and is, by default, 256 bytes. It may be preferable to increase the page size for disk-resident tables and tables with large data items.
`ffactor`	`ffactor` indicates a desired density within the hash table. It is an approximation of the number of keys allowed to accumulate in any one bucket, determining when the hash table grows or shrinks. The default value is 8.
`nelem`	`nelem` is an estimate of the final size of the hash table. If not set or set too low, hash tables will expand gracefully as keys are entered, although a slight performance degradation may be noticed. The default value is 1.
`cachesize`	A suggested maximum size, in bytes, of the memory cache. This value is only advisory, and the access method will allocate more memory rather than fail.
`hash`	`hash` is a user-defined hash function. Since no hash function performs equally well on all possible data, the user may find that the built-in hash function performs poorly on a particular data set. User-specified hash functions must take two arguments (a pointer to a byte string and a length) and return a 32-bit quantity to be used as the hash value.
`lorder`	The byte order for integers in the stored database metadata. The number should represent the order as an integer; for example, big endian order would be the number 4,321. If `lorder` is 0 (no order is specified), the current host order is used. If the file already exists, the specified value is ignored, and the value specified when the tree was created is used.

If the file already exists (and the O_TRUNC flag is not specified), the values specified for the parameters bsize, ffactor, lorder, and nelem are ignored and the values specified when the tree was created are used.

If a hash function is specified, hash_open attempts to determine if the hash function specified is the same as the one with which the database was created, and will fail if it is not.

Backwardly compatible interfaces to the routines described in dbm(3) and ndbm(3) are provided; however, these interfaces are not compatible with previous file formats.

Errors

The hash access method routines may fail and set errno for any of the errors specified for the library routine dbopen(3).

Simple Query Tool

The example below shows a simple query tool written in the C programming language. Note that it works only for btree tables and that views are not supported.

#include <db.h>

main(int argc,char *argv[])
{
  DB    *db;
  DBT   data,key;
  int   flag,len;
  char  *pattern;
  if (argc != 3)
  {
    printf("usage: %s database pattern\n",argv[0]);
    exit(1);
  }
  if (!(db = dbopen(argv[1],O_RDONLY,0644,DB_BTREE,NULL)))
  {
    fprintf(stderr,"Could not open \
        \"%s\",%s\n",argv[1],
      strerror(errno));
    exit(2);
  }
  pattern = argv[2];
  len = strlen(pattern);
  key.data = (void *)pattern;
  key.size = len;
  for(flag = R_FIRST;
    db->seq(db,&key,&data,flag) == 0 &&
    strncmp(key.data,pattern,len) == 0; flag = R_NEXT)
  {
    printf("key:  %s\n",key.data);
    printf("data: %s\n",data.data);
  }
  db->close(db);
  exit (0);
}

To compile and link you can use the following Makefile:

SDK=/export/home/tom/snavigator/sdk

CFLAGS=  -I$(SDK)/include
LIB=     -L$(SDK)/lib -lpafdb

dbqry:    dbqry.c
  $(CC) -o $@ $< $(LIB)

Database Table Structures

Source-Navigator stores information about source files in project (database) tables to assure high performance with flexible query possibilities.

With the exception of the project file (that is itself also a hash database table), every table normally relies on the SNDB4 sub-directory of the project and can be accessed like any other database table. The following example shows what the table structure of a project database looks like on a UNIX system. It was produced using the shell command ls -l SNDB4.

-rw-r--r--   1 user sys        16384 Aug 12 12:19 cpl.1
-rw-r--r--   1 user sys        16384 Aug 12 12:34 cpl.2
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.by
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.cl
-rw-r--r--   1 user sys        16384 Aug 12 12:19 cpl.f
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.fil
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.fu
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.gv
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.iv
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.md
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.mi
-rw-r--r--   1 user sys         8192 Aug 12 12:19 cpl.to

In the following table, the symbol ? represents the sn_sep separator character. See the scripts under Cross-Reference Tables for more information on this variable. Additionally, all of the following keys must be on one line.

The hash # character in class names means that the symbol does not belong to any classes, and semicolon (;) separates the key and data parts. Positions consist of line and column numbers separated by a comma (,).

Database Table Structures
File Suffix	Table Type	Table Description	Record Format
0	hash	Ignored words	word;#
1	hash	Default view	filename;#
2	hash	Second view	filename;#
by	btree	Referred by	ref-class?ref-symbol-name?ref-type?class?symbol?type?access?position?filename;\ {caller_argument_types}?{ref_argument_types}
cl	btree	Classes	name?start_position?filename;end_position?attributes?{}?{class template}?{}?{comment}
com	btree	Common blocks	name?start_position?filename;end_position?attributes?{}?{}?{}?{comment}
con	btree	Constants	name?start_position?filename;end_position?attributes?{dec_type}?{}?{}?{comment}
cov	btree	Common value	common-block?name?start_position?filename;end_position?attributes?{}?{}?{}?{comment}
e	btree	Enumerations	name?start_position?filename;end_position?attributes?{}?{}?{}?{comment}
ec	btree	Enum-constants	name?start_position?filename;end_position?attributes?{}?{}?{}?{comment}
f	btree	Project files	name;group?parsing-time?highlight-file
fd	btree	Function	name?start_position?filename;end_position?attributes?{ret_type}?{arg_types}?{arg_names}?\ {comment}
fil	btree	Symbols of files	filename?start_position?class?identifier?type;end_position?high_start_pos?high_end_pos?\ arg_types
fr	btree	Friends	name?start_position?filename;end_position?attributes?{ret_type}?{arg_types}?{arg_names}?\ {comment}
fu	btree	Functions	name?start_position?filename;end_position?attributes?{ret_type}?{arg_types}?{arg_names}?\ {comment}
gv	btree	Variables	name?position?filename;attributes?{type}?{template?parameter}?{comment}
in	btree	Inheritances	class?base-class?start_position?filename;end_position?attributes?{}?{class template}?\ {inheritance?template}?{comment}
iu	btree	Include	included_file?start_position?include_from_file;0.0?0x0?{}?{}?{}?{}
iv	btree	Instance variables	class?variable-name?start_position?filename;end_position?attributes?{type}?{}?{}?{comment}
lv	btree	Local variables	function?variable-name?start_position?filename;end_position?attributes?{}?{type}?{}?\ {comment}
ma	btree	Macros	name?start_position?filename;end_position?attributes?{}?{}?{}?{comment}
md	btree	Method definitions	class?name?start_position?filename;end_position?attributes?{ret_type}?{arg_types}?\ {arg_names}?{comment}
mi	btree	Method implementations	class?name?start_position?filename;end_position?attributes?{ret_type}?{arg_types}?\ {arg_names}?{comment}
rem	btree	Remarks	filename?position?class?method_or_function;comment
su	btree	Subroutines	name?position?filename;attributes?{}?{}?{comment}
t	btree	Typedefs	name?position?filename;attributes?{original}?{}?{comment}
to	btree	Refers to	class?symbol-name?type?ref-class?ref-symbol?ref-type?access?position?filename;\ {caller_argument_types}?{ref_argument_types}
un	btree	Unions	name?position?filename;attributes?{}?{}?{comment}

Database API Program Examples

The Tcl script below opens a table for a fictitious Source-Navigator project named pure.


#!/bin/sh
# Replace $HOME/snavigator with the Source-Navigator 
# installation directory! \
exec $HOME/snavigator/bin/hyper "$0" "$@"
#
# Don't forget the backslash before exec!
#
set db_functions [dbopen nav_func SNDB4/pure.fu RDONLY \
    0644 btree cachesize=200000]
# Output the list of matches with newline characters after
# each.
puts [join [$db_functions seq] \n]

This shell script produces the following result:

{chk 000011.012 chk.c} {17.1 0x8 {void} {int} {size} {}}
{fnc1 000019.012 chk.c} {33.1 0x8 {void} {int,int,char *}
    {i,size,str} {}}
{keys 000026.005 keybind.tcl} {28.1 0x0 {} {} {k} {}}
{main 000035.000 chk.c} {38.1 0x0 {int} {} {} {}}

Each record contains two Tcl lists: the first is the key part, the second is the data part. If -key, -data or -columns is used, the key and the data parts are always retrieved in separate Tcl lists.

If you use the -data switch in the fetch command (e.g. $db_functions seq -data ) in the example script above, only the key is fetched and the result is the following:

chk 000011.012 chk.c
fnc1 000019.012 chk.c
keys 000026.005 keybind.tcl
main 000035.000 chk.c

To restrict the result to functions whose names begin with main, use the following command:

$db_functions seq -data "main"

To fetch only the functions with the name main, you should add a blank to the key value:

$db_functions seq -data "main$sn_sep"

The -columns switch can be used to change the order of fields. The query below retrieves the name of the files, then the name of the functions and their positions:

$db_functions seq -data -columns [list 2 0 1]

The result is:

chk.c chk 000011.012
chk.c fnc1 000019.012
keybind.tcl keys 000026.005
chk.c main 000035.000

As described on Fetch Methods, the Tcl list following -columns contains sub-lists. The first element of a sub-list identifies (offset number beginning from 0) which field should be retrieved, and the second element is an optional format that controls formatting. In the example below, the "" appends a blank after every retrieved field. Sometimes it is useful to use \t (tab) instead of blanks. In Source-Navigator, types are often shown in parentheses.

For example, to obtain a listing of every function, indicated as (fu), the following command

$db_functions seq -data -columns [list {2 "(fu) "} 0 1]

produces the result:

chk.c(fu) chk 000011.012
chk.c(fu) fnc1 000019.012
keybind.tcl(fu) keys 000026.005
chk.c(fu) main 000035.000

The format characters :, <, >, and = can be used to make comparisons of database table field contents. When a condition is not true, the record is not retrieved.

For the purposes of this example, the source file where the following C++ class TEST is defined will be called test.cc .

class TEST {
private:
  int inm()
  {
    return 0;
  }
  outside(int x,int y);

public:
  static void copy(){}

protected:
  int var;
};

TEST::outside(int x,int y)
{
}

The Tcl script below queries the project for all of the public methods of classes that have been defined in test.cc. The value 4 used in the example maps to the constant PAF_PUBLIC from sn.h.

#!/bin/sh
# Replace $HOME/snavigator with the Source-Navigator 
# installation directory! \
exec $HOME/snavigator/bin/hyper "$0" "$@"
#
# Don't forget the backslash before exec!
#

set db_prefix .sn/doc.md
set db [dbopen methods_db $db_prefix RDONLY 0444 btree]

set res [$db seq -col [list 0 1 3 2 "5 :0:4"] -end "test.cc"]
# 4 -> 1 for private methods, 4 -> 5 for public and private methods

puts [join $res "\n"]

The script produces this output:

TEST copy test.cc 000009.014 0x200c

To query the private methods, change the value 4 to 1 (the value of PAF_PUBLIC).

This modified script produces this output:

TEST inm test.cc 000003.006 0x2001
TEST outside test.cc 000007.002 0x1

To query the public and private methods, use the value 5. This is the bitwise OR of the values for PAF_PRIVATE and PAF_PUBLIC.

To query all static (SN_STATIC) methods defined in test.cc, change the script as follows:

set res [$db seq -col [list 0 1 3 2 "5 :8:7"] -end "test.cc"]

To query every method between lines 7-9 in test.cc, make the following query:

set res [$db seq -col [list 0 1 3 \
"2 <10" "2 >6"] -end "test.cc"]

Database Application Examples

The Source-Navigator installation contains a number of larger examples for useful tools that can be quickly realized using the database API. They are located in .../share/sdk/api/tcl/database/examples.

Source-Navigator can assist in a wide variety of software engineering and re-engineering tasks and these examples tend to address the common scenario of bringing under control inherited bodies of source code that may be poorly written and poorly understood.

These examples are all written in the Tcl programming language. Some examples utilize the Tk toolkit. None of the examples require that Source-Navigator be running in order to use them. They work on the database directly using the database API provided by the hyper interpreter that comes with Source-Navigator.

Note

At the top of each script is a path to the interpreter that may need to be edited to locate hyper on your system.

Most of the examples require at least two command line arguments: the path to the Source-Navigator project directory and the name of the project you're interested in. More details can be found in the comment block at the top of each script file, and each script is quite heavily documented.

Scripts

The example scripts are described below.

multicludes.tcl

This tool reports on redundant header files. By reducing #include complexity in a source file, compilation time can be reduced. This tool locates simple duplication, whereby foo.c may include bar.h (e.g. #include "bar.h") and then bar.h again later. By optionally specifying a -transitive command line argument to the script, a more thorough search through the header file graph is performed, such that includes of stdio.h may be detected as unnecessary if another included header file includes it on your behalf.

diamonds.tcl

This tool locates multiple inheritance "diamonds" in the class hierarchy of a project written in an object-oriented language like C++. In his book, Effective C++ ⁴ , Scott Myers points out the dangers associated with class hierarchies in which two classes derived from the same superclass are inherited by a fourth derived-most class. Diamonds are universally considered to be poor C++ programming practice and this tool can locate them in a Source-Navigator project.

call-freq.tk

This tool plots the caller/callee frequencies for all functions and methods in a project. Functions appearing to have called many functions or that are called by many functions may be ones requiring coverage testing, additional documentation, optimization, etc.

Each function is represented as a point on a graph. Clicking on a point opens a list box showing the name of the function and the caller/callee statistics.

clobber.tcl

This tool shows the names of all functions/methods in a project that modify a particular global variable.

constants.tcl

This tool identifies global variables in projects which are accessed as read-only objects. These variables are therefore candidates for becoming constants.

unimp-methods.tcl

This tool locates class method definitions that are surplus to a class (i.e. for which there is no method implementation). This tool is not always accurate as it will also suggest that methods that are defined inline are not implemented when they actually are.

unused.tcl

This tool determines where unused global variables exist in a project.

1. Welch, Brent B. 1997. Practical Programming in Tcl and Tk. 2nd ed. ISBN 0-13-616830-2. Return to text

2. Ousterhout, John K. 1994. Tcl and the Tk Toolkit. ISBN 0-201-63337-X. Return to text

3. Kernighan, Brian W., and Dennis M. Ritchie. 1988. The C Programming Language. 2nd ed. ISBN 0-13-110362-8. Return to text

4. Meyers, Scott. 1997. Effective C++: 50 Specific Ways to Improve Your Programs and Designs. 2nd ed. ISBN 0-20-192488-9. Return to text

Contents