pydesign User's Manual

Author:Peter Dobcsanyi
Version:0.5
Web site:http://DesignTheory.org/software/pydesign/
Contact:pydesign@designtheory.org

Contents

1   Introduction

The pydesign package is a collection of Python modules and application programs built on top of these modules to deal with statistical and combinatorial designs encoded in a standard XML format. This data format and its mathematical semantics is described in a separate document: The External Representation of Block Designs. Instead of using the long name, we frequently refer to this standard XML format as ext-rep. The XML schema definition of ext-rep is given in Relax NG compact format. The most recent version is always available at http://DesignTheory.org/xml/design.rnc. We also refer to this XML schema as DTRS protocol. For an explanation why, and more details, see here.

This release contains the following components:

  • Modules:

    1. ext_rep -- XML I/O, Parsing
    2. block_design -- Fundamental data structures
  • Applications:

    1. bdnoniso -- Isomorphism Filter
    2. bdstat -- Statistical Expander
    3. bdtool -- Ext-rep Utility

The current release is compatible with DTRS protocol version 2.0.

1.1   Licence

Copyright (c) 2004 Peter Dobcsanyi

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

2   Installation

The most recent version of pydesign is always available from here. The package, source code and documentation, is distributed as a compressed tar-file named pydesign-x.y.tar.gz, where x.y is the version number.

2.1   Dependencies

For a fully functional pydesign you need the following components:

  • Python 2.3 or newer. Python is available from Python.org both in source and in binary forms. Binaries can be found for many, including Windows and Mac, platforms. Alternatively, all general Linux distributions provide Python.
  • Numarray version 1.1.1 or newer.
  • The pynauty package for computing automorphism groups and canonical labelings for graphs. Only the bdnoniso application depends on pynauty.

In general, pydesign should work on all platforms (Linux, Windows, Mac, etc.) where Python can run.

2.2   Installing pydesign

Download pydesign, untar the distribution file then change to the pydesign-x.y/ directory and issue the command:

python setup.py install

Apart from the standard commands, setup.py also accepts the following package specific commands:

clean
remove build/ dist/ MANIFEST
uninstall
remove all installed components from the system
distclean
clean + uninstall

2.3   Documentation

The documentation can be found in the doc/ subdirectory.

3   Module ext_rep

This module provides I/O services for reading/writing XML data sources representing designs specified in The External Representation of Block Designs. The module is also an Application Programming Interface (API) to a data structure, called XTree, used to represent an ext-rep document internally in Python programs.

3.1   class Xtree

XTree is a class to create and manipulate a labeled rooted tree data structure whose underlying raw data structure is a tree whose nodes are tuples of the structure:

(node_name, {dictionary of attributes}, [list of children])

The dictionary of attributes represents the label associated with the node. This raw tree is not accessed directly, however, but through the XTree object's interface. XTree, in fact, is a lazy recursive wrapper around the raw structure and hides the implementation details. From the user's point of view, the internal nodes of the tree presented this way are of the XTree type. The leaves of the tree are either childless XTree objects or some particular Python data types. Currently, the following Python data types can be used for leaves: integer, floating point, string, list of integers.

In the example below we build a small blocks subtree then print the result in XML:

>>> from pydesign import ext_rep
>>> 
>>> b1 = ext_rep.XTree('block')
>>> b1.append_child([0, 1, 2])
>>> b2 = ext_rep.XTree('block')
>>> b2.append_child([0, 3, 4])
>>> 
>>> d = ext_rep.XTree('blocks')
>>> d.set_attribute('ordered', 'true')
>>> d.append_child(b1)
>>> d.append_child(b2)
>>> 
>>> ext_rep.xtree2xml(d)
<?xml version="1.0"?>
<blocks ordered="true">
   <block>
      <z>0</z>
      <z>1</z>
      <z>2</z>
   </block>
   <block>
      <z>0</z>
      <z>3</z>
      <z>4</z>
   </block>
</blocks>
>>> 

3.1.1   Methods and services provided by an XTree object

t.attribute
Return the value of the attribute named attribute.
t.child
Return the the first child with the name child.
len(t)
Return the number of children of t.
t.[i]
Return the i-th child of t, indexing starts from 0.
for child in t:
Iterate over t's children.
t.append_child(child)
Append child to the end of the list of children of t.
t.insert_child(index, child)
Insert child before texttt{index}-th child.
t.child_index(child_name)
Return the index of the first child named by child_name.
t.remove_child(child_name)
Remove the first child named by child_name.
t.set_attribute(attr_name, value)
Set an attribute named by attr_name to the given value.
t.xt_children, t.xt_attributes, t.xt_node
Direct access to the underlying raw components of t.

3.2   Class XTreeReader

This class implements a full (parser+converter) Reader for XML ext-rep documents. The parsing is based on expat but a tree structure is built based on the aforementioned XTree object. This tree structure is somewhat similar to a DOM tree however during parsing, an automatic conversion of values takes place. The rules of this conversion are:

  • Attribute values representing scalar values are converted into binary before stored in the attribute dictionary.
  • Elements enclosing scalar values are replaced with the binary representation of the scalar.
  • Elements enclosing a list of elements enclosing scalar values are replaced with a list of the binary format of the scalars.

XTreeReader provides one public method:

parse(xml_source)

xml_source should be either a string or a file like object which provides a read() method.

The parser provided by XTreeReader is not a validating parser at the moment. If an ext-rep document is not well formed with respect to the ext-rep schema design.rnc then the result of the parsing is undefined. However, during the conversion process into XTree, many malformation would result in raising exceptions. It is planned that future versions will provide a full validating parsers.

One can validate an ext-rep file against the schema using the jing program. For example:

jing -c design.rnc t2-v7-k3-L1.xml

Usually there is no need to use an XTreeReader object directly. The recommended way to access its parsing service using one of the parsing functions described in the Functions section.

3.3   Functions

decimal_sigfig(x, n)

Round floating point number x to one containing at most n significant (decimal) figures.

xml2xtree(ext-rep-data)

Return an XTree object parsed and built form ext-rep-data which is assumed to be a string containing an ext-rep document.

xmlfile2xtree(ext-rep-file)

Return an XTree object representing the content of the ext-rep file. The file can be compressed by gzip or bzip2, compression being recognized by standard extensions, if the corresponding modules are present in the standard Python library.

xtree2xml(t, step=1, outf=sys.stdout)

Print the XTree object t as a complete ext-rep XML document. The default output file is the standard output. The output is pretty printed with indentation step (default is 1). If step == 0 print in a compact format without indentation or line breaks. The beginning of a typical output would look like this:

<?xml version="1.0"?>
<list_of_designs
 design_type="block_design"
 dtrs_protocol="2.0"
 no_designs="80"
 pairwise_nonisomorphic="true"
 xmlns="http://designtheory.org/xml-namespace">
 <info>
...

print_subxt(t, level=0, step=1, outf=sys.stdout)

Print the XTree object t. This function is used for incremental output. level indicates the position of the subtree within the whole document. The other parameters are the same as in xtree2xml(). The example below shows the result of calling print_subxt() on the <indicators> subtree:

<indicators>
 <repeated_blocks flag="false"/>
 <resolvable flag="false"/>
 <affine_resolvable flag="false"/>
 <equireplicate
  flag="true"
  r="3"/>
 <constant_blocksize
  flag="true"
  k="3"/>
 <t_design
  flag="true"
  maximum_t="2"/>
 <connected
  flag="true"
  no_components="1"/>
 <pairwise_balanced
  flag="true"
  lambda="1"/>
 <variance_balanced flag="true"/>
 <efficiency_balanced flag="true"/>
 <cyclic flag="true"/>
 <one_rotational flag="false"/>
</indicators>

xtree2sexpr(t, outf=sys.stdout)

Print the XTree object t as Lisp S-expression. There is no difference between attributes and subtrees. Attributes are represented by (name value) pairs.

stamp_software(t, stamps)

Insert a software stamp as a <software> subtree into t which is assumed to be the <info> subtree of an ext-rep document. stamps, a list of strings, should carry version information about the caller application. The function extends this list with version information on the package and the Python interpreter used. The software stamp below was issued by the bdstat application:

<software>
 [ bdstat-0.7/188, numarray-0.9, pydesign-0.4/188, python-2.3.4 ]
</software>

3.4   Class XTreeProcessor

This class provides a framework to build a streaming ext-rep processor which is based on an incremental event-driven parser somewhere between a SAX and DOM type parser. However, an XTreeProcessor object also performs the on-the-fly conversion of leaves into Python data types like XTreeReader. The result is a stream of XTree objects representing smaller subtrees of the document, in particular individual block designs. Most of the applications in the pydesign package use this method, since ext-rep documents and consequently the corresponding full XTree objects can be prohibitely large.

The processing of a full ext-rep document is broken up into the processing of the following parts:

  • <list_of_designs ...> opening element.
  • <list_definition> subtree.
  • <info> subtree.
  • iterating over <designs> subtree by processing each <block_design> separately.
  • finishing with closing </designs> and </list_of_designs>.

After creating an XTreeProcessor object, the user can specify various call-back functions corresponding to the breakup structure above. Currently, the following attributes are available for this purpose:

list_of_designs_start_proc
called with the attribute dictionary of <list_of_designs> tag.
info_proc
called with the tree data structure corresponding to the <info> subtree.
block_design_proc
called with the tree data structure corresponding to the <block_design> subtree.

As the XTreeProcessor object goes through the ext-rep document it calls the appropriate call-back function when it encounters with the related component then outputs the part processed. If there is no call-back function specified it simply prints back the input unchanged.

Here is an example of how XTreeProcessor can be used:

from pydesign import ext_rep

def do_something(t):
    xt = ext_rep.XTree(t)
    for block in xt.blocks:
        block[0] = 0

proc = ext_rep.XTreeProcessor()
proc.block_design_proc = do_something

f = ext_rep.open_extrep_file('my-designs.xml')
proc.parse(f)
f.close()

This program replaces the first point of each block with 0 in all designs listed in the ext-rep file my-designs.xml.

4   Module block_design

The module provides two classes BaseBlockDesign and StatsBlockDesign. Many of their attributes are implemented as properties which are computed only when they are referred first time then get stored and never computed again. The reason for this is that the computation of these properties of the design is expensive both in CPU time and memory.

Further specialized block design classes will be included as the development is progressing.

4.1   Class BaseBlockDesign(init_bd)

This is the base class for the more specialized block design classes. The initialization parameter init_bd is either an XTree object representing a block_design subtree or a tuple (v, list_of_blocks).

If init_bd is a tuple then then during initialization a basic check is performed and the list of blocks get ordered in the canonical order as it is specified in the The External Representation of Block Designs.

There is no check or ordering of the blocks when init_bd is an XTree object.

BaseBlockDesign have the following attributes and properties:

v
Integer giving the number of points.
blocks
List of blocks, where a block is represented as an ordered list of nonnegative integers less than v. The list of blocks are ordered by length and within the same length lexicographically.
block_sizes (property)
A list containing the sizes of the blocks.
incidence_matrix (property)
The v by b incidence matrix of the design represented as a Numarray integer matrix.

4.2   Class StatsBlockDesign

This a subclass of BaseBlockDesign and provides a block design object for statistical computations.

Additionally to the ones inherited from BaseBlockDesign, StatsBlockDesign have the following attributes and properties:

concurrence_matrix (property)
The concurrence matrix of the design.
information_matrix (property)
The information matrix of the design.
eigenvalues (property)
The eigenvalues of the information matrix.

The exact definition of these terms can be found in the Statistical Properties section of The External Representation of Block Designs.

4.3   Functions

tdesign_params(t, v, k, L)

Return the t-design parameters as a tuple (t, v, b, r , k, L).

int_div(a, b)

Perform an integer division of a and b. Raise ArithmeticError if the remainder is not zero otherwise return the quotient.

binomial(n, k)

Return the number of k-element subsets of an n-element set.

two_subsets(v)

A generator to produce all two-element subsets of {0,1,...,v-1} in lexicographic order.

The functions below depend on the presence of the pynauty package.

block_design_automorphism_group(d)

Return the generators of the automorphism group of block design d.

is_isomorphic_block_design(a, b)

Return True if block design a is isomorphic to block design b, otherwise return False.

5   Applications

5.1   bdnoniso -- Isomorphism Filter

bdnoniso is an isomorphism filter for block designs.

Running bdnoniso without arguments prints its usage:

Isomorphism Filter for Block Designs.

Usage:
        bdnoniso [options] { ext-rep-file | - }

    -pn  Pretty print with n-space indentation. The default n is 1.

    -b  Use binary search for checking certificates.
        The default is to use simple linear search.

    -V  Print version and licensing information.

Collect a maximal set of pairwise nonisomorphic block designs from the
input files.  The program takes its input either from the specified
file or from the standard input when '-' is given instead of a file's
name.  The result is printed on the standard output.  The input files
can be compressed by gzip or bzip2.  The I/O format is "The External
Representation of Block Designs", see the specification at
http://designtheory.org/library/extrep/.

5.2   bdstat -- Statistical Expander

An expander is a program which takes as an input, a possible bare bone, ext-rep document and computes some properties for each design listed, then output a new expanded version of the ext-rep document.

Running bdstat without arguments prints its usage:

Statistical Expander for Block Designs.

Usage:
        bdstat [options] { ext-rep-file | - }

    -V  Print version and licensing information.

Compute the "statistical properties" for each design in the input file.
The program takes its input either from the specified file or from the
standard input when '-' is given instead of a file's name.  The output is
printed on the standard output.  The input files can be compressed by
gzip or bzip2.  The I/O format is "The External Representation of Block
Designs", see the specification at http://designtheory.org/library/extrep/.

The statistical properties computed by bdstat are defined in The External Representation of Block Designs. Please note, robustness properties are not provided yet.

5.3   bdtool -- Ext-rep Utility

bdtool is a utility to manipulate ext-rep documents.

Running bdtool without arguments prints its usage.

Tool to manipulate ext-rep files of block designs.

Usage:
        bdtool  [options]  { ext-rep-file | - }

    -o x/y/... 

        Specify an Xpath to a numerical component of the ext-rep document
        by which the output list of designs should be ordered. The XPath
        should be relative to <block_design>. The ordering is increasing.

    -O x/y/...

        The same as -o but in decreasing order.

    -m x/y/...

        Output only the designs having the smallest x/y/... value.

    -M x/y/...

        Output only the designs having the largest  x/y/... value.

    -s  Strip down an ext-rep file to the bare bone list of designs.

    -V  Print version and licensing information.

The program takes its input either from the specified file or from the
standard input when '-' is given instead of a file's name.  The output is
printed on the standard output.  The input files can be compressed by
gzip or bzip2.  The I/O format is "The External Representation of Block
Designs", see the specification at http://designtheory.org/library/extrep/.

The ordering operations read the entire ext-rep file in and build a
complete XTree in memory. So the size of an ext-rep file which can be
ordered depends on the available memory.  The other operations process
the ext-rep file iteratively design by design, there is no limit on the
size of the input in these cases. Although, searching for minimal or
maximal elements makes two passes over the input file.

If there are no options given the program simply pretty-print the input.

Here is an example for ordering an ext-rep file by the order of the automorphism group of the t-designs contained:

bdtool -O automorphism_group/permutation_group@order t2-v15-b15-r7-k7-L3.xml