DBZ(3Z)


NAME
       dbminit,  fetch, store, dbmclose - somewhat dbm-compatible
       database routines
       dbzfresh, dbzagain, dbzfetch, dbzstore - database routines
       dbzsync,   dbzsize,   dbzincore,   dbzcancel,  dbzdebug  -
       database routines

SYNOPSIS
       #include <<dbz.h>>

       dbminit(base)
       char *base;

       datum
       fetch(key)
       datum key;

       store(key, value)
       datum key;
       datum value;

       dbmclose()

       dbzfresh(base, size, fieldsep, cmap, tagmask)
       char *base;
       long size;
       int fieldsep;
       int cmap;
       long tagmask;

       dbzagain(base, oldbase)
       char *base;
       char *oldbase;

       datum
       dbzfetch(key)
       datum key;

       dbzstore(key, value)
       datum key;
       datum value;

       dbzsync()

       long
       dbzsize(nentries)
       long nentries;

       dbzincore(newvalue)

       dbzcancel()

       dbzdebug(newvalue)

DESCRIPTION
       These functions provide an indexing system for rapid  ran-
       dom  access  to  a  text file (the base file).  Subject to
       certain constraints, they are call-compatible with dbm(3),
       although  they  also  provide some extensions.  (Note that
       they are not  file-compatible  with  dbm  or  any  variant
       thereof.)

       In  principle,  dbz stores key-value pairs, where both key
       and value are arbitrary sequences of bytes,  specified  to
       the  functions  by  values of type datum, typedefed in the
       header file to be a structure with members dptr  (a  value
       of  type  char * pointing to the bytes) and dsize (a value
       of type int indicating how long the byte sequence is).

       In practice, dbz is  more  restricted  than  dbm.   A  dbz
       database  must  be  an  index  into  a base file, with the
       database values being fseek(3) offsets into the base file.
       Each such value must ``point to'' a place in the base file
       where the corresponding key sequence is found.  A key  can
       be  no  longer  than  DBZMAXKEY (a constant defined in the
       header file) bytes.  No key can be an initial  subsequence
       of  another, which in most applications requires that keys
       be either bracketed or terminated in  some  way  (see  the
       discussion  of  the fieldsep parameter of dbzfresh, below,
       for a fine point on terminators).

       Dbminit opens a database, an  index  into  the  base  file
       base, consisting of files base.dir and base.pag which must
       already exist.  (If the database is new,  they  should  be
       zero-length   files.)   Subsequent  accesses  go  to  that
       database until dbmclose is called to close  the  database.
       The  base  file need not exist at the time of the dbminit,
       but it must exist before accesses are attempted.

       Fetch searches the database for the specified key, return-
       ing the corresponding value if any.  Store stores the key-
       value pair in the database.  Store will  fail  unless  the
       database  files  are writeable.  See below for a complica-
       tion arising from case mapping.

       Dbzfresh is a  variant  of  dbminit  for  creating  a  new
       database  with  more  control  over  details.   Unlike for
       dbminit, the database files need not exist: they  will  be
       created if necessary, and truncated in any case.

       Dbzfresh's  size parameter specifies the size of the first
       hash table within the database, in key-value pairs.   Per-
       formance  will  be  best if size is a prime number and the
       number of key-value pairs stored in the database does  not
       exceed  about  2/3  of size.  (The dbzsize function, given
       the expected number of key-value  pairs,  will  suggest  a
       database  size  that meets these criteria.)  Assuming that
       an fseek offset is 4 bytes, the .pag file will  be  4*size

       bytes (the .dir file is tiny and roughly constant in size)
       until the number of key-value pairs exceeds about  80%  of
       size.   (Nothing  awful  will happen if the database grows
       beyond 100% of size, but accesses will slow down  somewhat
       and the .pag file will grow somewhat.)

       Dbzfresh's  fieldsep parameter specifies the field separa-
       tor in the base file.  If this is not  NUL  (0),  and  the
       last character of a key argument is NUL, that NUL compares
       equal to either a NUL or a  fieldsep  in  the  base  file.
       This  permits  use of NUL to terminate key strings without
       requiring that NULs appear in the base file.  The fieldsep
       of  a  database created with dbminit is the horizontal-tab
       character.

       For use in news systems, various  forms  of  case  mapping
       (e.g.  uppercase to lowercase) in keys are available.  The
       cmap parameter to dbzfresh is a single character  specify-
       ing which of several mapping algorithms to use.  Available
       algorithms are:
              0      case-sensitive:  no case mapping
              B      same as 0
              NUL    same as 0
              =      case-insensitive:  uppercase  and  lowercase
                     equivalent
              b      same as =
              C      RFC822   message-ID   rules,  case-sensitive
                     before `@'  (with  certain  exceptions)  and
                     case-insensitive after
              ?      whatever the local default is, normally C

       Mapping algorithm 0 (no mapping) is faster than the others
       and is overwhelmingly the correct choice for most applica-
       tions.   Unless compatibility constraints interfere, it is
       more efficient to pre-map the keys, storing mapped keys in
       the  base  file,  than to have dbz do the mapping on every
       search.

       For historical reasons, fetch and store expect  their  key
       arguments  to  be  pre-mapped, but expect unmapped keys in
       the base file.  Dbzfetch and dbzstore do the same jobs but
       handle  all  case mapping internally, so the customer need
       not worry about it.

       Dbz stores only the database values in its files,  relying
       on  reference  to the base file to confirm a hit on a key.
       References to the base  file  can  be  minimized,  greatly

       speeding up searches, if a little bit of information about
       the keys can be stored in the dbz files.  This is ``free''
       if  there are some unused bits in an fseek offset, so that
       the offset can be tagged with some information  about  the
       key.   The tagmask parameter of dbzfresh allows specifying
       the location of unused bits.  Tagmask  should  be  a  mask
       with one group of contiguous 1 bits.  The bits in the mask
       should be unused (0) in most offsets.  The bit immediately
       above  the mask (the flag bit) should be unused (0) in all
       offsets; (dbz)store will reject attempts to store  a  key-
       value  pair in which the value has the flag bit on.  Apart
       from this restriction, tagging is invisible to  the  user.
       As  a  special  case, a tagmask of 1 means ``no tagging'',
       for use with  enormous  base  files  or  on  systems  with
       unusual offset representations.

       A size of 0 given to dbzfresh is synonymous with the local
       default; the normal default  is  suitable  for  tables  of
       90-100,000  key-value pairs.  A cmap of 0 (NUL) is synony-
       mous with the character  0,  signifying  no  case  mapping
       (note  that  the  character ?  specifies the local default
       mapping, normally C).  A tagmask of 0 is  synonymous  with
       the  local default tag mask, normally 0x7f000000 (specify-
       ing the top bit in a 32-bit offset as the  flag  bit,  and
       the  next  7  bits as the mask, which is suitable for base
       files up to circa 24MB).  Calling dbminit(name)  with  the
       database    files   empty   is   equivalent   to   calling
       dbzfresh(name,0,'\t','?',0).

       When databases are regenerated periodically, as  in  news,
       it  is  simplest to pick the parameters for a new database
       based on the old one.  This also permits  some  memory  of
       past  sizes  of  the  old database, so that a new database
       size can be chosen to cover expected fluctuations.   Dbza-
       gain  is  a variant of dbminit for creating a new database
       as a new generation of  an  old  database.   The  database
       files  for  oldbase must exist.  Dbzagain is equivalent to
       calling dbzfresh with the same field separator, case  map-
       ping,  and  tag mask as the old database, and a size equal
       to the result of applying dbzsize to the largest number of
       entries in the oldbase database and its previous 10 gener-
       ations.

       When many accesses are being done by the same program, dbz
       is  massively faster if its first hash table is in memory.
       If an internal flag is 1, an attempt is made to  read  the
       table  in when the database is opened, and dbmclose writes
       it out to disk again (if it was read successfully and  has
       been  modified).   Dbzincore  sets  the  flag  to newvalue
       (which should be 0 or 1) and returns the  previous  value;
       this  does  not  affect  the status of a database that has
       already been opened.  The default is 0.   The  attempt  to
       read the table in may fail due to memory shortage; in this
       case dbz quietly  falls  back  on  its  default  behavior.

       Stores to an in-memory database are not (in general) writ-
       ten out to the file  until  dbmclose  or  dbzsync,  so  if
       robustness  in  the  presence  of  crashes  or  concurrent
       accesses is crucial, in-memory databases  should  probably
       be avoided.

       Dbzsync  causes  all buffers etc. to be flushed out to the
       files.  It is  typically  used  as  a  precaution  against
       crashes  or  concurrent  accesses when a dbz-using process
       will be running for a long time.  It is a somewhat  expen-
       sive operation, especially for an in-memory database.

       Dbzcancel  cancels  any pending writes from buffers.  This
       is typically useful  only  for  in-core  databases,  since
       writes  are  otherwise done immediately.  Its main purpose
       is to let a child process, in the wake of  a  fork,  do  a
       dbmclose without writing its parent's data to disk.

       If  dbz has been compiled with debugging facilities avail-
       able (which makes it bigger and a  bit  slower),  dbzdebug
       alters  the  value  (and returns the previous value) of an
       internal flag which (when 1; default is 0) causes  verbose
       and cryptic debugging output on standard output.

       Concurrent  reading of databases is fairly safe, but there
       is no (inter)locking, so concurrent updating is not.

       The database files include a record of the byte  order  of
       the  processor creating the database, and accesses by pro-
       cessors with different byte order will work, although they
       will be slightly slower.  Byte order is preserved by dbza-
       gain.  However, agreement on the size and internal  struc-
       ture  of  an fseek offset is necessary, as is consensus on
       the character set.

       An open database occupies three stdio  streams  and  their
       corresponding  file descriptors; a fourth is needed for an
       in-memory  database.   Memory  consumption  is  negligible
       (except for stdio buffers) except for in-memory databases.

SEE ALSO
       dbz(1) dbm(3) 

DIAGNOSTICS
       Functions returning int values return 0  for  success,  -1
       for  failure.   Functions  returning datum values return a
       value with dptr set to NULL for failure.  Dbminit attempts
       to  have errno set plausibly on return, but otherwise this
       is not guaranteed.  An errno of EDOM  from  dbminit  indi-
       cates  that  the database did not appear to be in dbz for-
       mat.

HISTORY
       The original  dbz  was  written  by  Jon  Zeeff  (zeeff@b-

       tech.ann-arbor.mi.us).   Later contributions by David But-
       ler and Mark Moraes.  Extensive reworking, including  this
       documentation, by Henry Spencer henry@zoo.toronto.edu as
       part of the C News project.   Hashing  function  by  Peter
       Honeyman.

BUGS
       The  dptr members of returned datum values point to static
       storage which is overwritten by later calls.

       Unlike dbm, dbz will misbehave if  an  existing  key-value
       pair  is  `overwritten'  by a new (dbz)store with the same
       key.  The user is responsible for avoiding this  by  using
       (dbz)fetch  first  to  check  for  duplicates; an internal
       optimization remembers the result of the first  search  so
       there is minimal overhead in this.

       Waiting  until  after  dbminit to bring the base file into
       existence will fail if chdir(2) has been used meanwhile.

       The RFC822 case mapper implements only a first  approxima-
       tion to the hideously-complex RFC822 case rules.

       The prime finder in dbzsize is not particularly quick.

       Should  implement  the dbm functions delete, firstkey, and
       nextkey.

       On C implementations which trap integer overflow, dbz will
       refuse to (dbz)store an fseek offset equal to the greatest
       representable positive number, as this would  cause  over-
       flow in the biased representation used.

       Dbzagain  perhaps ought to notice when many offsets in the
       old database were too big for tagging, and shrink the  tag
       mask to match.

       Marking  dbz's  file  descriptors close-on-exec would be a
       better approach to the problem dbzcancel tries to address,
       but that's harder to do portably.
DBE
Home Page
Subroutines Index
DB_File