Jeffrey Heinz
Assistant Professor
Dept. of Linguistics and Cognitive Science.
University of Delaware
42 E. Delaware Avenue
Newark, DE 19716
heinz@udel.edu
Home
xtype
fsm
feature
fbfsa
ssq
Useful Links
Software
Overview.I have written some software for for manipulating
finite state machines, and elements of an alphabet which have various
properties. I typically use these programs from the command line, or
in scripts, which I find very effective (though if anyone is
interested in developing a graphical user interface for them, I'd like
to hear about it). These programs help me research properties of
phonological patterns and the particulars of certain kinds of learning
algorithms.
Requirements. The software available here are all implemented in the OCaML
programming language so OCaML needs to be installed in order for them
to compile and run. You can learn more about OCaML and download it
from the OCaML homepage at http://caml.inria.fr/ Another option
for downloading and installing is using GODI which organizes a lot of
libraries for OCaML. It is easy to use and I recommend it. Go to http://godi.ocaml-programming.de/.
Cross-platform usage. On GNU/Linux and Mac OSes, it will be
easy to install OCaML. With a Windows OS, you could install Cygwin and then
OCaML. Alternatively, if you use a Windows OS, you could simply add
the GNU/Linux OS to your computer and make it "dual-boot". Many
GNU/Linux installers (e.g. Ubuntu) have automated this process making
it very simple. My own feeling is this is a better choice than
Cygwin. It does require partitioning your hard drive so some empty
space is required. Here are
some instructions on how to do this with Ubuntu.
License. Everything here is under the GNU General Public
License.
Installing the software. Download the archives below and
unpack them.
gunzip file.tar.gz
tar -xvf file.tar
Then navigate into the relevant directory. Then type
make
make opt
mkdir doc
make doc
There is more detailed information about installation in the
README in the directory.
Using the software. See the documentation for details about
these OCaML libraries and command line utilities.
xtype
- xtype is an OCaML library that provides useful
functions for abstract types. Ultimately it is a highly flexible
toolkit for writing larger programs.
- Documentation
- Obtain xtype here: xtype.tar.gz
- xtype requires OCaML 3.07+.
Top
fsm
- fsm provides an OCaML library which implements several
functions for finite state machines. Currently only finite state
acceptors are implemented. Modules for weighted finite state
acceptors and transducers are in the works. Functions implemented for
finite state acceptors include minimization, determinization,
concatenation, union, intersection, state-merging with various
state-equivalence criteria, prefix and suffix tree construction, etc.
- There is also a command line utility fsa which allows one
to manipulate finite state acceptors from the command line so that
one can pipe commands together, redirect output, etc.
- Documentation (for both the OCaML
modules and command line utility, click on Fsa in the documentation
for command line utility)
- Obtain fsm here: fsm.tar.gz
- fsm requires OCaML 3.07+, and the xtype library.
Top
feature
- feature provides OCaML modules for working with a
feature system over a set of elements. Sequences of elements
(e.g. words) can be converted to sequences of feature bundles and
vice versa. Additional programs allow one to compute the natural
classes, find the partitions of the set by natural class, compute a
measure of similarity based on natural class (Frisch, Pierrehumbert,
and Broe 2004), as well as find the smallest natural class
containing two elements (i.e. minimal generalization (Albright and
Hayes (2002, 2003)).
- There is also a command line utility fdo which allows one
to run several functions from the command line so that one can pipe
commands together, redirect output, etc.
- Documentation (for both the OCaML
modules and command line utility, click on Fdo in the documentation
for command line utility)
- Obtain feature here: feature.tar.gz
- feature requires OCaML 3.07+, and the xtype library.
Top
fbfsa
- fbfsa provides an OCaML module for working with finite
state acceptors where the labels on the transitions are feature
bundles instead of individual symbols. It is often easier to write a
complex finite state machine using this featural information, but it
can be more difficult for it to process. This module allows one to
write the machine with feature bundles on the transitions, and then
translate it to an acceptor with individual symbols on the
transitions so it can be run properly with fsm above.
- There is also a command line utility fbfsa which allows one
to run this conversion function from the command line.
- To be posted soon. Email me if you want it right away.
- fbfsa requires OCaML 3.07+, and the xtype, fsm,
and feature libraries.
Top
ssq
- ssq provides a command line utility ssq which
allows one to find subsequences of many kinds in data sets. Both
contiguous, discontiguous, and subsequences with particular amounts
of contiguity and discontiguity can be found. Options are available
for counting the subsequences, normalizing counts (i.e. counting at
most one per word), including word boundary symbols (or not),
- Documentation
- Obtain ssq here: ssq.tar.gz
- ssq requires OCaML 3.07+, and the xtype library.
Top
Useful OCaML Links
Top
Last modified: Tue Sep 29 14:38:13 EDT 2009