Net::AsycnZ
sets options by means of named parameters for both the parent process
and each of its child processes. Options for the parent are set in Net::AsycnZ->new
.
Options for the child processes are set via the options
parameter of Net::AsycnZ->new
;
the value of this parameter must be an array of Net::Z3950::AsyncZ::Options::_params
objects.
If a _params
object doesn't exist for a child process, Net::AsycnZ->new
will create it with a set of default options. There will always be a _params
object for every server in the servers
array, and they are cross-indexed, that is
$_params_object[0]
is used for $server[0]
, etc. So, if you are creating your
own array of _params
objects, you must keep this parallelism in mind.
[1]
Options set in Net::Z3950::AsyncZ::new
which control the parent process
and selected features of the child processes for which no alternatives
are present: the alternatives are set as indicated in [2] and [3].
[2]
Options set in a Net::Z3950::AsyncZ:Options::_params
object: this is returned
by Net::Z3950::AsyncZ::asyncZOptions()
. There is one _params
object
for each server: if you don't create one, it is created for you with
the default values. If you don't create a _params
object for
a server, then log
and query
options set in the AsyncZ
constructor will be used. The rationale behind this is that you usually will
be asking one question across all servers and will usually be using only
one log file for debugging.
But in all other cases where it is possible
to set an option for the child in both the AsyncZ
constructor or
_params
, the _params
setting will be used. At the moment this affects
the format
and num_to_fetch
options.
[3]
Options set in the Net::Z3950::Manager
by using the Z3950_options
option of the _params
object. These take precedence over any others
and must be passed in with the first _params
object, that is, $_params_object[0]
,
because AsyncZ uses only one Net::Z3950::Manager
. The Manager
is created when setting up the first server passed into the constructor.
=>
operator:
HTML=>0
In some instances, the type of variable is shown and defaults detailed in commentary:
format=>\&format
cb=>\&cb
callback function to which records will be sent as available.
See Output Callback.
format=>\&format
callback function to format individual lines of records.
See Format Callback.
If you create a _params
object for a server and do not set its format
option, then the default
format
will be used, even if you set the format
option of the
AsyncZ
constructor to another value.
interval=>1
Event loop timer interval in seconds: This controls how frequently AsyncZ checks to
see if servers have responded and if the timeout
period is up.
log=>undef
controls how extended error messages are handled. There are two sets of
error messages--those handled through Net::Z3950::AsyncZ::ErrMsg and which are
meant for the user and those meant for debugging. The latter are generated by both
AsyncZ and the Perl library and can accumlulate at a rapid clip. AsyncZ writes
its debugging messages to STDOUT, while those coming from library
routines almost always go to STDERR. There are 3 options for log
.
[1] undef
, the default, in which case all debugging messages
go to the terminal, and those written to STDOUT will end up in a browser if you are
on the web.
[2]log=>Net::Z3950::AsyncZ::Errors::suppressErrors()
(or log=>suppressErrors()
if you import the function)--in which case these messages will be suppressed
[3]log=>$filespec
,
in which case all of these messages will go to the file specified
in $filespec
The Net::Z3950::AsyncZ::Options::_param
object also has a log
option--which means that you can
specify a log file for each child process--ie. for each server queried--
while keeping a separate one for the parent. Or you can set up a system where parent
and child_1 write to log.1, while child_2 and child_3 write to log.2, etc.
Note: All error logs are automatically opened and closed. Do NOT open or close them yourself!
Do NOT open or close log files yourself!
maxpipes=>4
maximum number of forks to be executed at one time--the greater
the number the more resources are used--both of memory and cpu.
monitor=>0
timeout in seconds for a monitoring child process, or 0,
in which case a monitor is not set.
The monitor is a child process which runs a timer and kills the parent process, if it exceeds the timeout period. You run the monitor only if your software hangs. An orderly shutdown of all runnning processes is put into effect, the purpose of which is to prevent the development of zombie processes and to release all shared memory.
num_to_fetch->5
number of records to fetch; this setting will be used
only if you have not created a _params
object. This means that if you
create _params
object for the server and do not set its num_to_fetch
option, then num_to_fetch
will default to 5 even if you have set another value
for num_to_fetch
in the AsyncZ
constructor.
options=>\@options
reference to an array of references to Net::Z3950::AsyncZ::Options::_params objects.
Each reference is obtained from a call to Net::Z3950::AsyncZ::asyncZOptions. For instance:
@options = ( asyncZOptions(option_1=>opt_1,option_2=>opt_2, . . .), undef, asyncZOptions(option_1=>opt_1,option_2=>opt_2, . . .) );
This array parallels the servers
array:
@servers = ( [$host_1, $port_1, $database_1], [$host_2, $port_2, $database_2], [$host_3, $port_3, $database_3] );
$options[0]
is used for $server[0]
and $options[2]
for $server[2]
. If a
_params
object is not found or if it is not defined, as for $server[1],
then a default _params
object is created for the server.
query=> undef
the query string: its format depends on Z3950 querytype
and defaults to 'prefix' (as in Net::Z3950
). You can set
a separate Z3950 querytype for each query, or you can change the querytype
for all servers by using Z3950_options
.
If you create a _params
for a server but do not set the query
option
in _params
, then this query
will be used. This means that you can
set one query
for all of your servers without having to re-set it for
each of the _params
objects you create. But if you create
a _params
with a different query
, then the query set in _params
will be used.
servers=>\@servers
array of references to servers in form: [ $host, $port, $database]
See options
above and AsyncZ.pod: "The Basic Script"
.
swap_attempts=>5
the number of times that a swap check will be done before exiting;
see swap_check
for details.
swap_check=>0
the number of seconds between checks for swapping activity--
used when querying a great number of servers and requesting large amounts of data. It
instructs AsyncZ to sleep for swap_check
number of seconds before processing any
further connections. If you are attempting to process too much data for the size of your RAM,
the system will have to swap out of memory into the swap space on your disk; too much swapping causes
loss of data and disk ``thrashing''--i.e. repeated disk access--and will overburden the system.
When swap_check
is set, AsyncZ will check for signs of swap activity; if it finds swap
activity it will go to sleep for the number of seconds set in swap_check
and then re-check
for swap_attempts
number of times. If the swap activity continues beyond this number
of checks, AsyncZ dies. For large throughput, you will probably want to set the monitor,
and to set it for a long period of time, for instance, 3000 seconds. This means that
you can set swap_check
to a period of 10,20, 30 seconds. The values you set on these
variables will depend on your own system memory resources and the amount of data you
are processing.
Note: This has been tested only on Linux but should also work
on Unix, at least on Solaris.
timeout=>25
total timeout in seconds for all processes to complete their work.
timeout_min=>5
minumum timeout in secs to exit Event loop if all processes are finished;
a security blanket to make sure all processes get a chance to report their
results to the parent process before exiting the loop.
Where a _param
option duplicates an AsyncZ::new
option, consult the
AsyncZ::new
description for more details.
HTML=>0
if true use default HTML formatting for records, if false format as plain text;
see Row Formatting Priorities.
Z3950_options=>undef
reference to hash of additional Z3950 options.
These options are passed to the Z3950 Manager and
take precedence over _param
options and options set in
Net::Z3950::AsyncZ->new
.
Z3950_options
makes it possible to implement Z3950 options which may not be specifically
accounted for in any of the options to the AsyncZ module. For instance, to ask for
``full'' as opposed to ``brief'' records (which is the Z3950 default):
@options = (asyncZOptions(Z3950_options=>{elementSetName =>'f'}) <, (asyncZOptions(. . .), . . >);
Note: To use this option, it must appear in the first _params
object of the _params
array,
$options[0]
, as in the above example. It is ignored in any subsequent uses. This
means that you cannot set these options on a per-server basis; they apply
across to board to all the servers you are querying. In the above exmaple, for instance,
you could not ask for brief records from some servers and full from others.
See Types of Options
cb=>\&cb
reference to callback function to which records will be sent as available
format=>\&format
reference to a callback function that formats each row of a record
interval=>5
timer interval for this forked process. See interval
above under Net::Z3950::AsyncZ::new
.
log=>undef
controls how extended error messages are handled for this
child process. A separate log file can be opened for each process.
Note: All error logs are automatically opened and closed.
See log
above under Net::Z3950::AsyncZ::new
.
num_to_fetch=>5
number of records to fetch from this server.
pipetimeout=>20
timeout in seconds for this child process
preferredRecordSyntax=>Net::Z3950::RecordSyntax::USMARC
the Z3950 preferredRecordSyntax for this child process
query=>undef
the query for this process
querytype=>'prefix'
Z3950 querytype for this child process; it can be set to'ccl', or 'ccl2rpn'.
raw=>0
(boolean) if true the raw record data for this process is returned; its format
is dependent on the render
option.
render=>1
(boolean) if true
the raw record data for this process is returned filtered
through the Z3950 Record::render
function; this is the default. If false
the
raw data is returned unfiltered in its original state. The unfiltered raw data can
be read using Net::Z3950::AsyncZ::prep_Raw
and Net::AsyncZ::get_ZRawRec
.
startrec=>1
number of the record with which to start result from Record Set.
utf8=>0
when set to true
conversions will be made to utf8/unicode
characters from the character codes used in MARC records to represent non-latin1
and accented latin1 chatacters. When ouputting utf8
, you must call binmode
on the ouput stream, for example:
binmode(STDOUT, ":utf8");
When outputting to a browser, you should also notify the browser:
print "Content-type: text/html;charset=utf-8'\n\n"; print '<head><META http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body>';
See the sample script: MARC_HTML.pl
.
Note: To use utf8
you must have the MARC::Charset
module installed. Otherwise,
the utf8
option will be ignored.
If more than one option is set that affects the formatting of a record's rows, the following priority squence is in effect:
raw, format, HTML, plaintext (default)
Net::Z3950::AsyncZ::Options::_params
provides a full range of get_option
/ set_option
methods,
enabling the dynamic setting of option values.
$_params_object->set_HTML(0); $num_to_fetch = $_params_object->get_num_to_fetch();
In addition there are functions for setting options with fixed values:
Function Equivalent
set_marc_xtra() set_marc_fields($Net::Z3950::AsyncZ::Report::xtra) set_marc_all() set_marc_fields($Net::Z3950::AsyncZ::Report::all) set_marc_std() set_marc_fields($Net::Z3950::AsyncZ::Report::std) set_raw_on() set_raw(1) set_raw_off() set_raw(0) set_plaintext() set_HTML(0) set_HTML() set_HTML(1) set_prefix() set_querytype('prefix') set_ccl=>() set_querytype('ccl') set_GRS1() set_preferredRecordSyntax(Net::Z3950::RecordSyntax::GRS1) set_USMARC() set_preferredRecordSyntax(Net::Z3950::RecordSyntax::USMARC)
The get/set methods guarantee that you have in fact set or queried the option you are interested in and, in the case of the fixed value options, that you have set it to the value required. You don't have to be concerned that a meaningless hash key will spring into existence through misspelling:
$_params_object = asyncZoptions(leg=>Error.LOG, num_to_fish=>3);
In the case of the some of the fixed value methods, one advantage is the obvious simplicity of calling set_GRS1()
instead of
set_preferredRecordSyntax(Net::Z3950::RecordSyntax::USMARC)
.
This method works to both get and set values.
$value = $_params_obj->option('option'); $old_options_ref = $_params_obj->option(option=>value,option=>value,option=>value. . . );
params
in get mode: 'option' to be queried in set mode: list of option=>value pairs to be set (or %hash)
returns
in get mode: $value of option being queried in set mode: $old_options_ref -- reference to a hash of option=>value pairs which have been replaced by list or %hash
$bool = $_params_obj->validOption('option');
$bool = $_params_obj->invalidOption('option');
Both of the above methods will enable you to determine whether an option you choose to
set is a valid option. Useful when using Net::Z3950::AsyncZ::Option::_params::option
.
$option = 'num_to_fetch'; $_params_obj->validOption($option) ? $_params_obj->option($option=>3) : die "invalid option: $option";
$_params_obj->test();
Calling this function will print a listing of defined options and values for
$_params_obj
.
Myron Turner <turnermm@shaw.ca> or <mturner@ms.umanitoba.ca>
Copyright 2003 by Myron Turner
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.