*|***************************************************************************|*
*|                                                                           |*
*|   Program: phred                                                          |*
*|   Version: 0.961028                                                       |*
*|                                                                           |*
*|   Copyright (C) 1993-1996 by Phil Green and Brent Ewing.                  |*
*|   All rights reserved.                                                    |*
*|                                                                           |*
*|   This software is a beta-test version of the phred package.              |*
*|   It should not be redistributed or used for any commercial               |*
*|   purpose, including commercially funded sequencing, without              |*
*|   written permission from the author and the University of                |*
*|   Washington.                                                             |*
*|                                                                           |*
*|   This software is provided ``AS IS'' and any express or                  |*
*|   implied warranties, including, but not limited to, the                  |*
*|   implied warranties of merchantability and fitness for a                 |*
*|   particular purpose, are disclaimed.  In no event shall                  |*
*|   the authors or the University of Washington be liable for               |*
*|   any direct, indirect, incidental, special, exemplary, or                |*
*|   consequential damages (including, but not limited to,                   |*
*|   procurement of substitute goods or services; loss of use,               |*
*|   data, or profits; or business interruption) however caused              |*
*|   and on any theory of liability, whether in contract, strict             |*
*|   liability, or tort (including negligence or otherwise)                  |*
*|   arising in any way out of the use of this software, even                |*
*|   if advised of the possibility of such damage.                           |*
*|                                                                           |*
*|   Portions of the code benefit from ideas due to Dave Ficenec,            |*
*|   LaDeana Hillier, Mike Wendl, and Tim Gleeson.  These are                |*
*|   indicated in the relevant source files.                                 |*
*|                                                                           |*
*|***************************************************************************|*


PHRED Documentation
-------------------

1. Introduction.

   Phred reads DNA sequencer trace data, calls bases, assigns quality
   values to the bases, and writes the base calls and quality values to
   output files.  Phred can read trace data from SCF files and ABI model
   373 and 377 DNA sequencer chromat files, automatically detecting the
   file format.  It automatically uncompresses compressed data files too.
   After calling bases, phred writes the sequences to files in either
   FASTA format, the format suitable for XBAP, PHD format, or the SCF
   format.  Quality values for the bases are written to FASTA format
   files or PHD files, which can be used by the phrap sequence assembly
   program in order to increase the accuracy of the assembled sequence.


2. Acknowledgements.

   Phred benefits from ideas developed by LaDeana Hillier, Mike Wendl,
   Dave Ficenec, Tim Gleeson, and Alan Blanchard.
   

3. Algorithms.

   Phred uses simple Fourier methods to examine the four base traces in
   the region surrounding each point in the data set in order to predict
   a series of evenly spaced predicted locations.  That is, it determines
   where the peaks would be centered if there were no compressions,
   dropouts, or other factors shifting the peaks from their "true"
   locations.

   Next phred examines each trace to find the centers of the actual, or
   observed, peaks and the areas of these peaks relative to their neighbors.
   The peaks are detected independently along each of the four traces so
   many peaks overlap.  A dynamic programming algorithm is used to match
   the observed peaks detected in the second step with the predicted peak
   locations found in the first step.


4. Building and installing.

   The INSTALL file describes the steps for building and installing
   phred.


5. Running phred.

   Phred uses command line options to control input, processing, and
   output.  The command line options are delimited by a dash, "-".
   For example, let us say you want to process a group of chromat
   files in the directory "chromat_dir".  For each chromat file you
   want phred to call the bases, append the base calls to a file
   named "seqs_fasta", and append the quality values to another file
   named "qual_fasta".  You would use the command

   $ phred -id data_files -sa seqs_fasta -qa qual_fasta

   Compressed chromat files: phred checks whether or not the file was
   compressed by either "gzip" or "compress".  If the file was not
   compressed, phred reads and processes it immediately.  If it was
   compressed, phred creates a symbolic link to the compressed file
   in the temporary directory and uncompresses file into the temporary
   directory without deleting the compressed version.  Phred then reads,
   processes, and deletes the uncompressed file.
   


   The command line options are

   Input Options
   -------------
   
   -id 		Read and process files in .

   -if 		Read and process files listed in the file
                                .  Each line in  must
                                specify a valid path to a single input file.

   -zd          Location of compression program.  If -zd is
                                omitted, phred uses the current path to find
                                the compression program.

   -zt          Directory where chromat is uncompressed.  If
                                if -zt is omitted, phred uses /usr/tmp.  When
                                phred processes a compressed file, it first
                                creates a symbolic link to the compressed
                                file in this temporary directory before it
                                uncompresses the file and reads it.  It
                                subsequently deletes the symbolic link and
                                uncompressed file in the temporary directory.


   Processing Options
   ------------------

   -nocall			Disable phred base calling and set the
                                current sequence to the ABI base calls
                                that are read from the input file.  By
                                default, the current sequence is set
                                to the phred base calls.  This affects
                                the base trimming and output options.

   -trim       Perform sequence trimming on the current
                                sequence.  Bases are trimmed from the start
                                and end of the sequence on the basis of
                                trace quality.  In addition, 
                                specifies a base sequence that is used
                                to trim bases off the start of the current
                                sequence.  You can specify a NULL enzyme
                                sequence using empty double quotes, "". 


   Output Options
   --------------

   -st fasta                    Set the output sequence file format
                                to FASTA. (Default.)
   -st xbap                     Set the output sequence file format
                                to XBAP.

   -s                           Write sequence output files with the
                                names obtained by appending ".seq" to
                                the names of the input files, and store
                                them in the directory where phred is
                                running.

   -s                Write a sequence output file with the
                                name .
                                This option is valid for a single input
                                file only.

   -sd          Write sequence output files with the
                                names obtained by appending ".seq" to
                                the names of the input files, and write
                                them in the directory .

   -sa               Write a sequence output file in FASTA
                                format with the name .  The
                                file contains the base calls of all the
                                reads processed in this run of phred.

   -qt fasta                    Set the output quality file format
                                to FASTA. Trimmed off base quality
                                values are set to zero. (Default.)
   -qt xbap                     Set the output quality file format
                                to XBAP.  Trimmed off base quality
                                values are omitted.
   -qt mix                      Set the output quality file format
                                to FASTA. Base quality values for
                                all bases are written (including those
                                for trimmed off bases).

   -q                           Write quality output files with the
                                names obtained by appending ".qual" to
                                the names of the input files, and store
                                them in the directory where phred is
                                running.
                                This option is valid for FASTA format
                                output files only.

   -q                Write a quality output file with the
                                name .
                                This option is valid for a single input
                                file and a FASTA format output file only.

   -qd          Write quality output files with the
                                names obtained by appending ".qual" to
                                the names of the input files, and store
                                them in the directory .

   -qa               Write a quality output file in FASTA
                                format with the name .  The
                                file contains the quality values of all the
                                reads processed in this run of phred.

   -qr               Write a histogram of the number of high
                                quality bases per read.  This is meaning-
                                ful when phred processes more than one
                                read.

   -c                           Write SCF files with the trace data,
                                the base calls of the current sequences,
                                and the positions of the base calls.  The
                                SCF files have the names of the input
                                files (phred will refuse to write the SCF
                                file if you ask it to write the SCF file
                                in the directory in which the input file
                                resides).

   -c                Write an SCF file with the trace data,
                                the base calls of the current sequence,
                                and the positions of the base calls.
                                The SCF file has the name .
                                This option is valid for a single input
                                file only.

   -cd          Write SCF files with the trace data,
                                the base calls of the current sequences,
                                and the positions of the base calls.
                                The SCF files are written in the directory
                                 and have the same names
                                as the input files.

   -cp         Store SCF trace data as 1 or 2 byte values.
                                Defaults to 1 when the maximum trace value is
                                less than 256, or to 2 when the maximum
                                trace value is greater than or equal to 256.

   -p                           Write a PHD file, which is used by the
                                consed editor to display bases.  A PHD
                                file contains a set of comments used by
                                consed for maintaining consistency between
                                the chromat file, the .ace file and
                                the PHD file, and it contains base data
                                as triples consisting of the base call,
                                quality, and position.  Phred always
                                writes the first version of the PHD
                                file for a read, which has the name
                                .phd.1.  When a read is edited
                                using consed, a new version of the phd is
                                written by consed, for example, the second
                                version has the name .phd.2.  With
                                the -p option,  is the name of the
                                input file.

   -p                 Write a PHD file with the name .phd.1.
                                This option is valid for processing a single
                                input file.

   -pd          Write PHD files in directory .
                                The PHD files have the names .phd.1
                                where  is the name of the input file.

   -d                           Write a data file that is used for detecting
                                polymorphic bases.  The file has the
                                name .poly where  is the
                                name of the input file.  The first line of
                                the file consists of the sequence name, the
                                smallest amplitude normalization factor, and
                                the amplitude normalization factors for the
                                A, C, G, and T traces.  One line for each
                                called base follows the header line.  The
                                information on each line consists of the
                                called base, the position of the called base,
                                the area of the called peak, the relative area
                                of the called peak, the uncalled base, the
                                position of the uncalled base, the area of the
                                uncalled base, the relative area of the
                                uncalled base, and the amplitudes of the four
                                traces at the position of the called base.

   -dd                 Write polymorphism data files in directory
                                .  The files have the names
                                .poly where  is the name
                                of the input file.
              
   -raw          Write  in the header of
                                the sequence output file and the quality
                                output file.
                                By default, the name of the input file
                                is written in the headers of these files.
                                This option is valid for a single input
                                file only.

   -log                         Make phred append a log entry describing
                                the processing run in the file "phred.log".



   Miscellaneous
   -------------

   -h, -help                    Display a command line option summary.

   -V                           Display phred version.                                




   Examples
   -------

   If you plan to use phred base calls and base quality information as
   input to the phrap assembly program, run phred as follows.

   Let us say that you want to process the chromat files in subdirectory
   "chromat_dir".  You want phred to write the base calls to a FASTA
   file named "seqs_fasta" and the base quality values to "seqs_fasta.qual".
   In this case you run phred with the options

     $ phred -id chromat_dir -sa seqs_fasta -qa seqs_fasta.qual

   Phred reads and processes each file in the "chromat_dir" directory,
   writing the sequences to "seqs_fasta" and the quality values to
   "seqs_fasta.qual".  We recommend that you not use the trim option.
   Inaccurate bases called near the ends of the traces will not interfere
   with proper assembly.

   Subsequently you should screen out the vector in the sequences in
   "seqs_fasta" using cross_match:

     $ cross_match seqs_fasta vector.seq -minmatch 12 -minscore 20 -screen > screen.out

   which generates the screened sequence file "seqs_fasta.screen", and move
   "seqs_fasta.qual" to "seqs_fasta.screen.qual" using the command

     $ mv seqs_fasta.qual seqs_fasta.screen.qual

   Run phrap to perform the sequence assembly as follows:

     $ phrap seqs_fasta.screen -ace > phrap.out

   Phrap writes the the assembled contigs to the file
   "seqs_fasta.screen.contigs", and creates a .ace file that can be used
   for importing the assembly to xbap, CONSED, or ace-mbly for editing.

   Refer to the file "phrap.doc", which is part of the phrap distribution,
   for information on cross_match and phrap.
             

End: PHRED.DOC