CONSED 6.0 DOCUMENTATION

CONTENTS:

 


All new consed users, skip this section and jump down to 'Quick Tour of Consed'

----------------------------------------------------------------------
IF YOU DO NOT USE PHREDPHRAP

This section is just for old consed users who have their own scripts
for running phred and phrap.  For example, some sites have scripts
that check every night for new reads.  If there are any new reads
found, the project is automatically reassembled, so it is ready for
the finishers in the morning.  If you have such a script, you will
have to modify it to take advantage of consed's consensus tags.

Consensus tags are related to a particular assembly since they refer
to a particular position and a particular consensus position.  For
example, there may be a consensus tag on Contig2 position 20.  On
subsequent reassembly, all reads may be assembled into a single
contig, Contig1.  Thus in the subsequent assembly, there is no such
thing as Contig2 position 20.  The script transferConsensusTags.perl
handles this by figuring out the correspondence between each old
assembly contig and position and each new contig and position.
transferConsensusTags.perl will then use this correspondence to
transfer all consensus tags from the old assembly to the new assembly.

Consensus tags are stored in the ace file.  Thus
transferConsensusTags.perl must be told the name of the old ace file
and the name of the new ace file.  There are 2 methods currently in
use at beta sites:

Method 1:  Before reassembling, check which ace file is most recent
and rename it to 

(project).screen.fasta.ace_saved_for_transfer_tags

Then reassemble.  Then run

transferConsensusTags.perl (project).screen.fasta.ace_saved_for_transfer_tags (project).screen.fasta.ace.1

Thereafter the transferred consensus tags will be in the new ace file
(project).screen.fasta.ace.1

Method 2:  Before reassembling, rename edit_dir to edit_dir.backup  
    Create a new edit_dir  
    Reassemble in edit_dir
    In edit_dir, run:

    transferConsensusTags.perl ../edit_dir.backup/(project).screen.fasta.ace.2 (project).screen.fasta.ace.1

Both of these methods (and many others) will work.

When you have made these changes, please test them!  Open an assembly,
add a consensus tag (see below), reassemble using your script, open
the new assembly, and check that the consensus tags are on the same
bases.

There is a possibility that bases in the old ace file have *no*
corresponding bases in the new ace file.  This would occur, for
example, if the low quality tail of a contig were tagged in the old
ace file.  Then with reassembly, let's suppose that reads were added
that made that tail high quality and significantly changed the
consensus in that location.  In this case, transferConsensusTags.perl
would fail to find the corresponding location in the new assembly, and
would log that failure to a file with extension ".err".
transferConsensusTags.perl always creates this file--if it is of 0
length, that means all tags transferred with no problem.  

If you have a script to automatically reassemble, you might want to
alert someone (such as by sending an email message) if the ".err" file
is not of zero length.

If you use the script phredPhrap supplied with this distribution for
reassembling, you don't need to worry about such issues.  phredPhrap
is made to be run by a person from the command line.


--------------------------------------------------------------------------

QUICK TOUR OF CONSED


Release 6.0

Consed is a program for viewing and editing assemblies assembled with
the phrap assembly program.

To follow this Quick Tour will take you less than 1 hour.  However, it
will save you approximately 2 days of agony.  If you have lots of
excess time, and you prefer to waste 2 days in agony, then skip down
to "USING YOUR OWN DATA" below and do not do the quick tour.

Here is a quick tour of consed:

After unpacking all the files, including the 'standard' test data set
and consed:

1)  cd to standard/edit_dir

2)  start consed by typing one of the following:

(the one you should type depends on which executable you downloaded)

../../consed_sunos
../../consed_alpha
../../consed_solaris
../../consed_hp
../../consed_sgi


(If you are an SGI user, see the note at the bottom of this file.)

3 windows should popup.  One of these will have the list of .ace files
and say 'select assembly file to open'.  Double click on the one
standard.fasta.screen.ace.1  The 'select assembly file' window will
disappear and the window behind it will have a list of contigs and a
list of fragments.  I will call this 'the main consed window' in the rest
of this tutorial.  There should be only one contig in this example,
labelled 'Contig1'.  Double click on it.  Then the aligned reads
window should have some bases appear in it.

3) Try scrolling back and forth.  Try scrolling by dragging the thumb
of the scrollbar.  Also try scrolling by clicking on the 4 << < > >>
buttons for scrolling by small amounts.  For scrolling by tiny
amounts, click on the arrows at either end of the scrollbar.  For
scrolling by huge amounts, use the middle mouse button and just click
on some location on the scrollbar.

Notice the colors.  The bases that are in red are the ones that
disagree with the consensus.

Notice the different shades of background colors (around the bases).
They have the following meanings, but first, you need to understand
the meaning of the quality values.

A quality value of 10 means 1 error in 10**1.0 (ten to the 1.0 power)
A quality value of 20 means 1 error in 10**2.0
A quality value of 30 means 1 error in 10**3.0
A quality value of 40 means 1 error in 10**4.0

and for quality values in between:

A quality value of 25 means 1 error in 10**2.5

Get the idea?

(These have actually been empirically verified--if you are interested
in the gory details, read the phred papers:

Ewing B, Hillier L, Wendl M, Green P: Basecalling of automated
sequencer traces using phred. I. Accuracy assessment.  Genome Research
8, 175-185 (1998).

Ewing B, Green P: Basecalling of automated sequencer traces using
phred. II. Error probabilities.  Genome Research 8, 186-194 (1998).

In that same copy of the journal is a paper about consed, as well.)


These quality values are shown in grey scales:

Quality 0 through 4 is given by dark grey
Quality 5 through 9 is given by a shade lighter
Quality 10 through 14 is given a a shade still lighter
.
.
.
Quality of 40 through 97 is given by white (the brightest shade)

A quality value of 99 is reserved for bases that have been edited and
the user is absolutely sure of the base (high quality edit).

A quality value of 98 is reserved for bases that have been edited and
the user is not sure of the base (low quality edit).

The ends of reads shows bases that are grey and have a black
background.  These are the low quality ends of reads or the unaligned
ends of reads, as determined by phrap.

4)  Click on a read base.  You will see the numeric value of the
quality shown in the xterm.  Click on the consensus base.  You will
similarly see its quality.  There are situations in which you really
want to see the numeric value of the quality.

5)  Put location 510 in the middle of the aligned reads window.  On
the main consed window, click on Options/General Preferences.  When the
General Preferences window pops up, click on 'Dim Low Quality Ends Of
Reads' False and 'Dim Unaligned Ends of Reads' False.  Then click on
'Apply'.  Now look back in the aligned reads window.  You should see
the ends of reads that were black now appear grey with red.  You are
seeing the clipped-off bases with all the same information as any
other base.  Since there is a huge amount of red (discrepant) bases,
the screen becomes distracting and busy.  Thus by default the low
quality clipped off bases are made with a black background and a grey
foreground so they don't distract you.  

On the Options/General Preferences window, change both 'Dim Low
Quality Ends of Reads' and 'Dim Unaligned Ends of Reads' back to True,
and click Apply & Dismiss.  We'll keep the ends dim for the rest of
the tour.

(Notice there is a distinction here between 'low quality ends of
reads' and 'unaligned ends of reads'.  For now, there is no
difference.  However, with a future version of phrap, there will be an
important difference.)

Now go to the menu labelled 'color', and pulldown and release on
'color means match'.  

Now you notice different colors:  The
colors have the following meaning:

    Blue:   agrees with consensus
    Orange: disagrees with consensus
    Yellow: this stretch of this read was used to form the consensus
    Grey:   Low quality or unaligned ends of reads 

Now go back to the colormode 'color means quality and tags' (the
default) for the next exercise. 



TRACES AND EDITING

6) Put the cursor on the bases of one of the reads and click mouse
button 2.  The traces for that stretch of that read should popup.
There are 4 rows of bases in the trace window: the consensus "con",
the edited fragment bases "edt", the phred called bases "phd", and the
ABI called bases "ABI".  Notice that a red cursor blinks in
corresponding positions in the two windows.

7) Try editing in the trace window.  You can click the cursor on the
"edt" line and directly overstrike a base.  Try this.  Try undoing it
(by clicking on 'undo' ). 

You can insert a column of pads by pushing the space bar.  Try this.

(For those of you new to editing assemblies, a 'pad', which in consed
and phrap is represented by the '*' character, is used to align
two or more sequences such as these:
     gttgacagtaatcta
     gttgacataatcta
in which one sequence has an inserted or deleted base with respect to
the other.  By inserting the pad character, it is possible to get a
good alignment: 
     gttgacagtaatcta
     gttgaca*taatcta
This is the purpose of pad character--it is just a placeholder.)


Try highlighting a stretch of a read by holding down the control key
and mouse button 2 over the 'edt' (edited) read bases.  There will be
a popup which will give you the following choices:

    make high quality--makes the highlighted bases high quality
    change consensus--make the highlighted bases high quality and
        changes the consensus to agree with that stretch of the read
    make low quality--change their quality to the lowest possible
    make low quality to left end--same as above from the highlighted
        region to the left end of the read
    make low quality to right end--same as above from the highlighted
        region to the right end of the read
    change to n's--change the highlighted bases to 'n's and change
        their quality to the lowest possible
    change to n's to left end--same as above from the highlighted
        region to the left end of the read
    change to n's to right end--same as above from the highlighted
        region to the right end of the read
    add comment tag--allows user to add a comment to the stretch of bases
    add tag--allows user to add any tag to the stretch of bases

    dismiss--get rid of this popup

This popup is made so that nothing else works until you choose
something.  (NEVER iconify this box--if you do, nothing will work until
you deiconify it.)  Try each of these choices, except for tags, which 
you'll try below.   

In particular, you should try 'change consensus'.  This can also be
used to extend the consensus on the right, in case phrap did not
accurately find the cloning site.  However, you can't try this
feature with this sample database since there are no reads that extend
past the end of the consensus.  You will probably be able to try this
with your own data.

To delete a base, you can overstrike it with a '*' character.  (Phrap
ignores '*', so this is the same as deleting the character.)  There is
no way to remove the '*' from an assembly except by re-phrapping.  We
believe there should be a visual indication that a base was deleted.

To move the cursor, use the mouse and click on a different base.

HOTKEYS FOR EDITING

8) When you get really fast at editing, you will want to have a faster
method of doing these edits than having the popup and selecting an
option.  Thus the following hot keys exist:

    < and > (less than and greater than) to make n's to the left and
        right of the cursor 
    control-l and control-r to make low quality to the left and right
    of the cursor
    capital letters cause the base to be overstruck in high quality
        rather than low quality

Give these a try.

9)  Now that you have made some edits, try the 3rd colormode 'color means
edited'.  Notice that the bases that you have edited will stand out in
either white or grey (depending on whether the base was made high
quality or low quality).  Observe this both in the trace window and
the aligned reads window.  Return to the 'color means quality and
tags' colormode.

MULTIPLE UNDO

10) In the main consed window, click the 'Undo Edit...' button.  There
will be a popup indicating the most recent edit.  Click 'undo'.  Then
you will see the edit that was done before that.  Click 'undo'.
You can continue if you like.  You now know how to undo more than one
edit.  You cannot choose which edits to undo and which to not
undo--edits can only be undone in precisely reverse order from the 
order you made them.

SCROLLING TRACES

11)  In the aligned reads window, scroll along the contig to a different
point.  Click mouse button 2 on a read whose trace is already up.
Notice that the existing trace is scrolled to the new location.

ALIGNED TRACES

12) Dismiss all of your trace windows.  Then popup traces for 2
different reads in approximately the same location.  Scroll one of
them.  You may want to scroll by clicking the arrows or clicking to
the left or right of the thumb.  You will notice that both will
scroll.  Consed will do its best to have corresponding peak lined up.
(Consed can't line all of them up because the peak spacing is not
uniform and differs from read to read.)  You will notice that the
furthest left and right bases in each trace are aligned.  Try removing
a trace.  Try adding other traces.  Then click on 'No' for scrolling
the traces together and try scrolling.  You will now observe that they
scroll separately.

MULTIPLE TRACE POPUP

13) Dismiss the trace window.  In the aligned reads window, scroll to a
region that has many reads and that has some discrepancies--try
position 921.  Click with mouse button 2, but this time click on the
consensus.  At this location 3 traces will popup--these are the 2
highest quality traces that agree with the consensus (on each strand)
and the highest quality trace that disagrees with the consensus.  This
feature is useful in areas of high coverage when you want to rapidly
just examine the most significant traces rather than looking at all of
them.

MAXIMUM NUMBER OF TRACES DISPLAYED

14) Try bringing up some other traces that aren't displayed, such as
K26-217c and then K26-526t.  You will notice that new reads are put at
the top of the stack of traces and, once there are 4 traces displayed,
traces are removed from the bottom of the stack.  If you want to
change this maximum number of traces to something besides 4, you can
do that: In the main consed window, pull down the "Options" menu,
release on General Preferences.  Try changing the "Max Number of
Traces Shown" to 5.  Then click OK'.  Now try adding additional traces
to the trace window.  You will notice that now the number of traces
shown will not exceed 5.


NAVIGATE

For this exercise, start in 'color means edit' mode.  (Put the cursor
on 'color', hold down mouse button 1, pull the cursor down to 'Color
Means Edited', and release mouse button 1.)  Bring up a few traces and
make some edits (see above for how to do that).

15) Go back to the aligned reads window.  Put the cursor on 'Navigate',
hold down mouse button 1, pull the cursor to 'Edits' and release the
mouse button.  Click on 'Next' button repeatedly to take go repeatedly
to the place you edited.

16) Dismiss the navigate window.  Switch to 'color means quality and
tags'.  (Put the cursor on 'Color', hold down mouse button 1, pull the
cursor down to 'Color Means Quality and Tags', and release mouse
button 1.)  Put the cursor on 'Navigate', hold down mouse button 1,
pull the cursor to 'Low consensus quality', and release the mouse
button.  Click on 'Next' button repeatedly to go to the next low
quality consensus position.  This saves you from having to look
through large amounts of high quality data trying to find problem
areas.

You may want to click on the 'save' button to save to a file a copy of
this list of problem areas as you work through them.

In our experience, this will be the most important navigate list you
will use.  In fact, finishing consists mainly of adding reads and
rephrapping until this list is reduced to nothing.

17) Dismiss the navigate window.  Now put the cursor on 'Navigate',
hold down mouse button 1, pull the cursor to 'High quality
discrepancy', and release the mouse button.  You will notice there are
no entries (unless you created some yourself by editing).  That is
because there are no high quality discrepancies with this dataset.  So
let's force there to be some by lowering the quality threshold.
First, dismiss the old 'high quality discrepancy' window.

Go to the main consed window, pulldown the 'Options' menu and release
on 'General Preferences'.  Notice that the default for 'Threshold for
Navigate/High Quality Discrepancy' is 40.  Change it to 20 and click
'OK'.

Then follow the steps above to recreate it.  Now you
will see several entries.  Click 'next' repeatedly to go successively
to the next high quality discrepancy.

Dismiss the navigate window.

GOTO POSITION

18)  In the aligned reads window, click in the 'Pos:' box in the upper
right-hand corner.  Type in a number, such as 540, and push 'Return'.
The aligned reads window will scroll to position 540.  We find this
feature is particularly useful when one person wants another person to
look at something in the sequence.


COMPLEMENTING THE CONTIG


19)  Push 'Comp Contig' in the aligned reads window to complement the
contig.  Push it again to uncomplement it.


SEARCH FOR STRING

20) Try the 'Search For String' button on the main window.  Type in a
string (such as aaaca), and click 'ok'.  There should be a list of
'hits'.  Double click on one of the hits (or single click on it and
click on 'go'.)  Notice that the Aligned Reads Window scrolls to that
position and has the cursor on the found string.  (It might be
complemented.)

Dismiss this window.  Try this again, only this time select 'Search
Just Reads'.  This is searching the file standard.fasta.screen which
was created by crossmatch and is used as input to phrap.

COPY AND PASTE

21) Now try the following: In the aligned reads window, swipe some
bases by holding down the left mouse button.  You should see the bases
turn yellow, at least temporarily.  Then click the 'Search for String'
button on the contig list window.  Use mouse button 2 to paste the
bases you have just swiped into the 'Query string:' box.  Notice that
you can swipe bases either from the consensus or from a read.

The search for string is case-insensitive so don't worry about the
pasting being upper or lowercase.


SAVING THE ASSEMBLY

22) To save the assembly, just click on the 'save assembly' button in
any aligned reads window (it will save all contigs).  When the dialog
box comes up with a suggested name for the new ace file, I suggest you
use the one it suggests.  The idea is that the ace files:

(project).fasta.screen.ace.1
(project).fasta.screen.ace.2
(project).fasta.screen.ace.3
(project).fasta.screen.ace.4
(project).fasta.screen.ace.5

are in order of how old they are.  If you feel you are taking up too
much disk space, then start deleting the ace files starting at the
oldest.  I do not recommend that you overwrite existing ace files.
The version numbers just keep growing, and that is not a problem.


RECOVERY FROM CRASHES

23)  It is important to feel that your data is safe, even if the
computer (or consed) were to crash.  Consed will recover your data
from such a crash. 

Make an edit and jot down its location.  Then simulate a crash by
going to the xterm where you started consed and typing ^C.  Restart
consed and select the same ace file you used before
(standard.fasta.screen.ace.1).  consed will tell you that there have
been edits since that ace file and ask whether you want to apply those
edits.  Answer 'yes', and the edits will be applied.  (This is similar
to the edit/recover feature on the old VMS operating system, if you
remember that.)

This is the purpose of the .wrk files--they are a log file of your
edits and they are added to as you make edits.

24) You should save your edits by clicking on the 'save assembly'
button on the aligned reads window.

COMPARE CONTIGS

25) Now you will see the 'compare contigs' feature.  Click on the
consensus in the aligned reads window to set the cursor.  Then click
'compare contigs'.  A large alignment window will come up.  Then,
still in the aligned reads window, scroll to some other location
within the contig and click on it to set the cursor somewhere else.
Then click 'compare contigs' again.  Now turn your attention to the
alignment window.  Try scrolling the two contigs by each other.  You
can click on each contig to set a cursor on each of them.  Set the
cursor on base 320 on the top and 390 on the bottom.  Then click
'align' to perform an alignment of the two contigs with the two cursor
locations pinned together.  You can try changing the cursor locations
and then clicking 'align' again.

Now set a cursor on a base in the alignment--the bottom half of this
window.  The 'scroll top contig' scrolls the corresponding aligned
reads window (the one with the consensus and all the reads) to the
corresponding position.  The 'scroll bottom contig' does the same with
the bottom contig.  Experiment with this.

This is one method of exploring joins of contigs that were not made by
phrap.  Another method is to use phrapview, supplied with phrap.
phrapview gives a high level view of all internal joins while "compare
contigs" shows the alignment of a single internal join.  Some users
have found them to work well together--phrapview to find a join and,
having found it, "compare contigs" to examine it in more detail.


TAGS

26) Bring up a trace for a read (as above).  Swipe some bases with the
(holding the control key down) middle mouse button (as above).  Choose
'Add Tag'.  You will see a list of tag types that you can assign to
the highlighted bases.  Be sure that you notice that you can scroll
this list down so you can see the type "significantDiscrepancy".  Try
adding various tags.  Notice the different colors for the different
tag types.

Also try 'Add Comment Tag' which is the same except it allows you to
enter a multiple-line comment.


If you forget which tag type a particular color means, click on the
tag with right mouse button while holding the control key down.  You
can do this either in the aligned reads window or in the trace window.
(Alternatively, in the aligned reads window, click with the right
mouse button and then click on 'Show Tag Info').  Note that you can
modify the location of the tag in this popup--try this.  (Note that
you must modify the READ position--not the consensus position.)

The following tag types will be read by phrap (when it is implemented
in phrap):

     becomeConsensus
     ignoreMismatches
     ignoreMatches
     significantDiscrepancy

(Until phrap reads these tags, they have no affect.)

27) When you have created a bunch of tags, experiment with the
'navigate by tags'.  On the aligned reads window, choose the
'navigate' menu, item 'tags'.  Pick one of the tag types you have
created, and click 'next' through each of the tag locations.
Experiment with various tag types.


CONSENSUS TAGS

28) In the aligned reads window, swipe a stretch of consensus bases by
holding down the control key and holding down the middle mouse button.
Up will pop a list of tag types.  Click on one of them.  Try it again
somewhere else.  Try it with the tag type being 'comment'.  In this
case, you must enter a comment.  Notice the pretty colors!  If you
forget what a particular color means, you can click on the colored tag
with mouse button 3 while holding down the control key and the
information about the tag will pop up.

PRIMER-PICKING

**** Temporary step ****  After you have completed the
'install vector files' step (below), you should never do this again.

On the main window, click on 'Options'/'Primer Picking Preferences'.
Notice the question "Screen Primers Against Sequences in File?"  Click
on 'False'.  The click 'ok' and the Primer Picking Preferences box
will pop down.  

**** end of temporary step ****

29) Go to some location near the right end of the contig, say 1180.
Click with mouse button 3 and click on either one of the forward
primer choices (either from subclone template or from clone template).
There will be a selection of primers that pass all of consed's
requirements.  Double click on one of them.  That will cause the
aligned reads window to scroll to show that oligo in context.  Click
on 'Accept Primer'.  Notice that an oligo tag is created for that
primer.  Dismiss the list of primers.

If you are interested in the details of primer-picking, see the
section 'PRIMER PARAMETERS' (below).


AUTOFINISH

30)  Try starting consed by typing:

consed -autofinish -ace standard.fasta.screen.ace.1

Consed will print out a list of primers you should make and reads you
should make from those primers in order to reduce the number of errors
below a target threshold.  This finishing tool is designed to be run
in batch after each assembly.  In a high throughput operation, the
production people can make these reads without anyone using consed to
examine the assembly interactively.  Only when consed -autofinish
cannot help you any longer (either it reduces the number of expected
errors below your error threshold, or it says it can't help you
further), must you bring up consed interactively and examine the
assembly.

Current restrictions: a) it just suggests custom primers and these are
assumed to be sequenced directly off the clone (not subclone)
template.  b) consed -autofinish must be run either on a monitor, or,
if run as part of a cron job, the cron job must setenv DISPLAY xxxxx
where xxxxx is some display that the cron job has access to.





HIGHLIGHTING READ NAMES

31) In the aligned reads window, click on a read name with mouse
button 1.  The name will turn magenta.  Click again and it will turn
yellow again.  Try turning it magenta and then scrolling large
distances.  If you want to follow a particular read, this helps you
keep track of it as you scroll.

INCREMENTAL SEARCH FOR READ NAME

32) Restart consed.  Instead of clicking on a read or contig name,
type a read name into the 'Find read:' box.  Try typing "K26-5".  You
will notice that as you type each letter, the first item in the list
that matches the letters typed will be highlighted.  Experiment with
deleting a few letters and typing others.  This is a powerful method
of quickly getting to the read name you are interested in.  When you
get to the read you want, just type carriage return or click the 'OK'
button.


ONLINE DOCUMENTATION

33)  On the aligned reads window, click on the 'help' menu, 'show
documentation' item.  You will see this document.




At this point you've seen consed's current capabilities.  Now you will
want to try it on your own data.


----------------------------------------------------------------------------

USING YOUR OWN DATA AND INSTALLING CONSED

The next few steps will probably require the assistance of someone
with root access.


34) Put consed in /usr/local/genome/bin (or wherever you like to keep
consed).

35) Build phd2fasta.  The sources and a makefile for this program are
supplied with the consed distribution, in the subdirectory misc

36) Put the following files in /usr/local/genome/bin
They will be also be found in subdirectory misc

phredPhrap
fasta2Phd.perl
phd2Ace.perl
ace2Oligos.perl
transferConsensusTags.perl

37) Get perl

In order to use consed, the ABI chromatigrams must be run through a
gauntlet of phred, phd2fasta, crossmatch, transferConsensusTags.perl
and phrap.  In order to simplify this procedure, we have written a
perl script:

    phredPhrap

You MUST use this script if you are going to use consed.  There are
other methods of using phred and phrap to create an ace file, but
consed may not work.  If you go that route, you are on your own.  If
you want to be sure consed works, use phredPhrap.




phredPhrap requires perl which is available public domain from a
number of ftp sites, including those that have standard gnu unix
utilities.  


These machines had perl via anonymous FTP last time I checked:

        ftp.uu.net                 137.39.1.2 in /languages/perl
        ftp.netlabs.com            192.94.48.152 in /pub/outgoing/perl5.0
        coombs.anu.edu.au          150.203.76.2
        archive.cis.ohio-state.edu 128.146.8.52
        jpl-devvax.jpl.nasa.gov    128.149.1.143
        prep.ai.mit.edu            18.71.0.38 in /pub/gnu
        ftp.cs.ruu.nl              131.211.80.17  (Europe)

or try getting it from the web at:

    http://www.perl.com/perl/info/software.html

(If you don't know about perl, try it--it will save you a
huge amount of time over developing the same utilities in C, awk, or
csh or sh.)  

To work out any problems using phredPhrap, I suggest that you first
try it on a tiny database, such as the the test database you were
using above.  Copy standard/* to a new location.  Then delete the
files in phd_dir and in edit_dir.  Then cd to edit_dir, and type:

phredPhrap standard -notags

phredPhrap may need to be edited to reflect where you have put phred,
phrap, phd2fasta, crossmatch, and the vector sequence library.
phredPhrap is very easy to read and modify.  (But keep a backup copy
in case you cause problems.)


When you have worked out all the problems with installing phred,
phrap, crossmatch, phd2seqfasta, phd2qualfasta, and phredPhrap, this
should work flawlessly and you should be able to bring up consed on
the newly-created standard.fasta.screen.ace.1

Now run consed on this ace file and add some consensus tags.  Save the
assembly as standard.fasta.screen.ace.2

Then run

phredPhrap standard standard.fasta.screen.ace.2

phredPhrap will create a new ace file:  standard.fasta.screen.ace.3
which will contain the consensus tags transferred from
standard.fasta.screen.ace.2 

Bring up consed on standard.fasta.screen.ace.3 and you will see your
consensus tags.

When you have successfully done that, you are now ready to do the same
with a larger database (such as your own data).


38) Install the cosmid, BAC, M13, (or whatever you use) vector files.

These files should be in fasta format and be named:

primerSubcloneScreen.seq for the subclone (M13, plasmid, or whatever
    you use) vector sequences 
primerCloneScreen.seq for the clone (BAC, cosmid, or whatever you use)
    vector sequences 

These should be put in:
/usr/local/genome/lib/screenLibs

(This location configurable via X resources, but this is the easiest
place to put them, since it is the default location.)

To check that this works, do step PRIMER PICKING (above), except this
time skip the 'temporary step'.  Thus you will now be using the
full-blown primer picking program that screens against vector
sequence.


39) Create the following directory structure

Directory structure:
    top level directory
        subdirectory 'chromat_dir'--chromatigrams go in here
        subdirectory 'phd_dir'--just create this.
        subdirectory 'edit_dir'--just create this.

If you already have your chromatigrams somewhere else, you can make
chromat_dir be a link to wherever you have them.  

The various phrap and crossmatch files will be put into edit_dir.

40) cd to the edit_dir directory, and run phredPhrap as above.

41) (optional) If you have problems and need to start again, delete all
files from phd_dir and edit_dir.  Then repeat the step above.

42) cd to edit_dir and run consed

You should see a file with the extension .ace.1
Double click on it.

You should see a list of contigs.  

Double click on the one you want to see.

Now you should see a big colorful alignment of your sequences.
Repeat some of the experimenting you did with the test data set above.



----------------------------------------------------------------------------

PRIMER PARAMETERS


On the main window, click on 'Options'/'Primer Picking Preferences'
again.  A great deal of science and experimentation has gone into
setting these defaults and I suggest you do not change them.  However,
I know you will anyway, so now you know where to find them.

This is what they mean (I suggest you skip over this for now):

    PrimersNumberOfBasesToBackupToStartLooking
        Consed is designed for you to put the cursor on the left-most
        (or right-most) edge of a region that you want to cover with a
        new read.  Since the data quality immediately after an oligo
        is not good, you don't want the oligo immediately next to the
        region you want to cover, but rather a little bit back from
        it.  This parameter gives how far back.

    PrimersWindowSizeInLooking
        This is the width of the region in which consed looks for
        primers.  So if PrimersNumberOfBasesToBackupToStartLooking is
        50 and PrimersWindowSizeInLooking is 450, and you are looking
        for a forward primer, then the consed will look from 500 bases
        to the left of the cursor up to 50 bases to the left of the
        cursor.  If you are looking for a reverse primer, then consed
        will start looking 50 bases to the right of the cursor and
        continue until 500 bases to the right of the cursor.

    PrimersMinimumLengthOfAPrimer
    PrimersMaximumLengthOfAPrimer
        (just what they sound like)

    PrimersMaxInsertSizeOfASubclone
        When you click on forward or reverse primer/subclone template,
        consed knows that it is all right if it finds a primer that
        has an additional match to somewhere else in the assembly, as
        long as that location is not on the same subclone template you
        intend to use.  Consed uses this parameter to specify the
        range of the search for unacceptable additional matches.

    PrimersMinMeltingTemp
    PrimersMaxMeltingTemp
        Consed uses the nearest-neighbor (with salt concentration
        correction) formula, just as all modern primer picking
        programs do

    PrimersMaxSelfMatchScore
        In choosing a primer, you don't want the primer to bind to
        itself (form a hairpin) or bind to another copy of itself.  It
        is particularly bad if it binds to another copy at its 3' end.
        This parameter is used in the algorithm that tests this.

    PrimersMaxMatchElsewhereScore
        In choosing a primer, it is important that the primer not
        stick somewhere besides the place you are trying to get a
        read--a "false match".  This can cause a primer to fail even
        if the false match is not perfect.  The worst kind of false
        matches are those the extend to the 3' end of the primer, and
        worse yet if they have a high percentage of G/C matches since
        G and C bind more tightly than A and T.  The algorithm used
        here takes both of these effects into account.  This parameter
        sets the max acceptable false match.

    PrimersMinQuality
        Some primers fail because the primers don't match where they
        are supposed to.  This is because the sequence where the
        primer is supposed to stick isn't accurately known.  Thus it
        is important to be certain of the sequence where the primer is
        chosen from.  This parameter is an indication of this
        certainty--it is the min quality of every base in an
        acceptable primer.

    PrimersMaxLengthOfMononucleotideRepeat
        Folklore says that mononucleotide repeats are bad.  To please
        consed users, I've put this check in.

    Screen Primers Against Sequences in File?  True False
        It is important that the primers not stick to the vector of
        the template.  Thus you must provide consed with two files--a
        file in fasta format of all subclone vectors, and a file in
        fasta format of all clone vectors.  Consed will not accept any
        primer that has a match against the appropriate one of these
        vectors (depending on whether you click in the aligned reads
        window mouse button 3 on forward/reverse primer from subclone
        template or clone template).  A primer that has a false match
        to a vector is rejected if that false match has a score worse
        than PrimersMaxMatchElsewhereScore


You can also read about this in the consed paper:

Gordon, D., C. Abajian, and P. Green. 1998. Consed: A graphical tool
for sequence finishing. Genome Research. 8:195-202


----------------------------------------------------------------------------

FOR PROGRAMMERS AND FELLOW TRAVELLERS ONLY



CONSED VERSION

On the command line, type:

consed -v

This is particularly useful to system administrators to make sure the
the latest version in installed on all computers.


CONSED CUSTOMIZATION

Click the "Info" menu items on the window with the list of contigs
on it.  Click the 'Show X Resources' menu item.  This shows you what
to put in your ~/.Xdefaults file to change any of the colors to
something you like better.  Note that for changes to .Xdefaults to
take effect, you must do one of following:

    1) xrdb -remove
        or
    2) xrdb -load ~/.Xdefaults

I only use the former (keep my server empty of resources so that
changes to .Xdefaults takes effect for all newly created processes).

Changes in ~/.Xdefaults only affects one user.  If you want to make a
change to affect all consed users on the system, put a file called
'consed' in /usr/lib/X11/app-defaults.  In this file, you must put ALL
resources listed in the 'Show X Resources' list.  Modify the ones you
want different.


COMPRESSING CHROMATOGRAMS

If you are interested in compressing your chromatogram files, go into
chromat_dir and gzip one of the chromatogram files.  Then see the X
resources under 'Info'/'Show X Resources' as described above.  Notice
there is an X resource consed.gunzipFullPath which, by default is
/usr/local/bin/gunzip.  If your gunzip is not there, you must change
this X resource in your ~/.Xdefaults file.  Put a line that looks like
this:

consed.gunzipFullPath: /usr/local/bin/gunzip

(or whatever your path to gunzip is).

Restart consed and bring up the corresponding trace.  You will notice
no appreciable delay.


CONSED -ACE

Try bringing up consed like this:

consed -ace (name of ace file)

This can be useful if you are going to have consed brought up from
some other program.


NO PHD FILES

Try bring up consed like this:

consed -nophd

This mode does not allow editing and does not show quality
information.  It allows you to view an assembly when you don't have
phd files or chromatigrams but you only have the ace file.  You will
not be able to see the quality information, since that information is
kept in the phd files.  I do not recommend nor support this option!


CUSTOM NAVIGATION

You can also create a file that has special locations you want to
examine in consed.  By clicking in the main window on "Navigate/Custom
Navigation", you can read that file in.  Then you can use it to goto
each region, just as you did the low consensus quality regions
(above).  Using this feature, you can write programs that will create
this file, and then use consed to examine these regions.  The format
of the file is quite simple, as follows:

TITLE:  Single Stranded Regions
 
BEGIN_REGION
TYPE: CONSENSUS
CONTIG: Contig1
UNPADDED_CONS_POS: 1 5
COMMENT: This is a comment
END_REGION
 
BEGIN_REGION
TYPE: CONSENSUS
CONTIG: Contig1
UNPADDED_CONS_POS: 20 25
COMMENT: This is comment 2
END_REGION
 
BEGIN_REGION
TYPE: CONSENSUS
CONTIG: Contig1
UNPADDED_CONS_POS: 40 45
COMMENT: This is comment 3
END_REGION
 
BEGIN_REGION
TYPE: READ
CONTIG: Contig1
READ: K26-394c
UNPADDED_CONS_POS: 820 850
COMMENT: This is a comment
END_REGION
 
BEGIN_REGION
TYPE: READ
CONTIG: Contig1
READ: K26-394c
UNPADDED_CONS_POS: 870 880
COMMENT: This is a comment
END_REGION
 
Notice that the first 3 are consensus locations and the last 2 are
locations on a read.  You can have any number of either type of
locations, and can have a mixture of both types of locations.

You can cut/paste the above into a file and try it.


CONTROL OF CONSED FROM SOME OTHER PROGRAM

Consed can be controlled by some other program.  For example, you
might have a program that displays mapping data and you would like the
user to be able to click on a location and have consed come up showing
the bases in that region.  This feature allows a programmer to do
this.


The external program can start up consed as follows:

consed -socket (local port number) -ace (ace filename)

For example,

consed -socket 5432 -ace standard.fasta.screen.ace

After consed completes coming up (including you clicking whether you
want to apply edits), you will see the message in the xterm:

success bind to local port number: 5432

And then you will see a file created by consed in the default
directory called consedSocketLocalPortNumber

This gives the port number of the Berkeley socket that consed has
opened and is listening on.  Thus your program can read this file and
create a connection to the Berkeley socket created by consed.

Once the connection is established, your program can send commands to
consed at that socket indicating to consed which contig to display and
what consensus position to scroll to.  Currently, the only acceptable
command is:

Scroll (contigname) (consensus position)

Just send such a command to the Berkeley socket, and consed will
respond appropriately.



AUTOMATIC ORDERING OF OLIGOS

I heard of a finisher who manually ordered 72 oligos.  She had to
cut/paste the bases of each oligo.  That is not only painful, but also
error prone.  I've supplied you a script that you can use to
automatically determine which oligos have been newly requested since
the last order, aggregate them into a single order, and email the
request off.

The script is ace2Oligos.perl.  It takes as parameters the name of an
ace file and the name of the oligo file.  The oligo file is a list of
oligos that have been ordered for that particular project, and looks
like this:

name=G1980A181.1
sequence=ctgcatggctaggga
template=seq from subclone
date=980427 temp=52
 
name=G1980A181.2
sequence=tcttactttctgactttcattt
template=seq from clone
date=980427 temp=50

ace2Oligos.perl finds all oligo tags in the ace file and makes sure
that all of them are in this oligo file.

To automatically order oligos each night, there is an additional
script you will have to write.  I suggest that you run your script
each night under cron and that it do the following:

for each project, it will look for the most recent ace file.  It will
run ace2Oligos.perl on that ace file and direct the oligo file to be
in the parent directory of edit_dir, phd_dir, and chromat_dir for that
project.  Thus there will be one oligos file for each project.  Your
script will run ace2Oligos.perl once for each project.

Then your script would, for each project, look in the oligos file for
new oligos, and aggregate the unordered oligos into a central file,
which it would email to the oligo company.  If it finds any new oligos
in an oligo file, it draws a line at the bottom:

-------------------------------

which indicates that all oligos have been ordered.  When this script
looks at this file the next night, it uses this line to determine
whether any additional oligos have been requested since the previous
order.  (The idea of this line came from St Louis.)  Thus the oligos
file tells you which oligos have been ordered and which have not yet
been ordered.



----------------------------------------------------------------------------

ADVANCED PHRAP/CONSED USAGE



BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY

If you decide that all your edits are terrible and you want to start
over (perhaps you have been training a new finisher), the cleanest
solution is to delete everything except for the chromats and just run
phredPhrap again.  Thus you would delete everything in edit_dir and
phd_dir, but leave everything in chromat_dir alone.


SELECTIVELY BACKING OUT EDITS AND REMOVING READS

You should only attempt the below after you have spent, say, more than
40 hours total in consed and are quite familiar with how things
*should* work.  If you make a mistake in the process below, you can
really mess things up.  In that case, the most certain way to clean up
your mess is to delete everything in phd_dir and edit_dir and run
phredPhrap again.  (Realize that you will lose all your edits by doing
this.)

If you want to back out some of the edits, but not others, you will
need to learn a little about how consed stores the edits.  So here
goes:

phd_dir

This directory contains the one or more files for each read.  For
example, if a read is G1980A181_672.s1, there is a phd file:

G1980A181_672.s1.phd.1

This was created by phred and contains the base calls.

There also may be subsequent versions created by consed:

G1980A181_672.s1.phd.2
G1980A181_672.s1.phd.3
G1980A181_672.s1.phd.4

These contain the edited bases and read tags, but are otherwise
identical.  When you edit a read, and then click 'save assembly',
consed creates a new version of the phd file for the read you just
edited.

When you reassemble using phredPhrap, phredPhrap uses the phd file
with the highest version number (in this case G1980A181_672.s1.phd.4)
to be bases for that read (G1980A181_672.s1).

Thus if you wanted to back out all edits to a particular read (but not
back out all edits to all reads), you must delete all of the versions
of phd files for that particular read except for the first one (the
one ending in .phd.1).  Then you must reassemble running the script
phredPhrap.

If you wanted to remove a read entirely from an assembly, then you
must delete all copies of the phd files and reassemble.

Backing out newly added consensus tags is easier--you can just delete
the more recent versions of ace files that contain the newly added
consensus tags.


ADDING READS WITHOUT CHROMATOGRAM FILES

This may happen if you, for example, download sequence from Genbank
and want to assemble it along with your reads.  Use the following
script:

fasta2Phd.perl (name of file with fasta sequence)

It will create a file whose name is taken from the fasta file name:
for example, if the fasta filename is Contig1.fasta, then the phd file
will be called Contig1.phd.1 The fasta name in the file is ignored.
You can then put this in the phd_dir, and reassemble using phredPhrap
However, you will not be able to edit this read, since you won't be
able to bring up a chromatogram for it.  (This restriction may be
removed in the future if there is enough interest--however, our belief
is that users should generally be required to look at the traces when
they are making edits to reads.)

VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS

If you have a chromatogram, you can use consed to view it, even if it
hasn't been assembled.  To do this, make the same edit_dir, phd_dir,
and chromat_dir as above, put the chromatogram into chromat_dir, run
phred on it to generate the phd file which goes into phd_dir.

Then go to edit_dir and run:

phd2Ace.perl (name of phd file)

For example, if your phd file is myRead.phd.1
from edit_dir, type:

phd2Ace.perl myRead.phd.1

This will produce myRead.ace

Then just start consed normally:
consed -ace myRead.ace
and you can view the chromatogram.


CORRECTING FALSE JOINS MADE BY PHRAP

Phrap may put several reads together that you believe do not belong
together.  (For example, you may see several high quality
discrepancies between the reads.)  If you are sure these reads do not
belong together, you can force a subsequent reassembly by phrap to not
assemble those reads together.  You do that by changing to 'high
quality' (see above for how to do this) one of the reads that agrees
with the consensus and one of the reads that disagrees with the
consensus (has a high quality discrepancy).  You must do this for the
2 reads over the same range of consensus positions.  The reads must
both be aligned to the consensus at that location.  In the currently
released version of phrap, the way to tell a read is not aligned to
the consensus at a particular position is that there are long strings
of discrepant bases.  For example, if there are 10 agreeing bases in a
row, then a discrepancy or two, and then another 10 agreeing bases,
you can be pretty sure that the read is aligned against the consensus
at that location.  You should make high quality more than just one
base--3 or 4 is a good number.

For example, suppose the reads are:

    ATTGCCCG
    ATTGACCG
        ^

In this case you should swipe GCCC of the top read and GACC on the
bottom read.

If you have done all of this correct, and you reassemble, then phrap
will not put these reads together.





--------------------------------------------------------------------------

NOTE TO SGI USERS

In /usr/lib, there must be a file: libCsup.so

If you don't have this file, you must get it from SGI.  To get it, if
you are on Irix 6.2 through 6.4, request:

SG0001637 "C++ Exception handling patch for 7.00 (and above) compilers
on irix 6.2" (it's on the "Development Options 7.1" CD).

If you are on Irix 5.3, install patch 1600

To make things easier for you, I've included my libCsup.so
This might save you having to get the patches above.



--------------------------------------------------------------------------

WHAT IS NEW IN CONSED 6.0

This section is mainly intended for advanced consed users to quickly
see what is new in this version.  Novice consed users should consult the
quick-tour (above).


error rate displayed

primer picking 
    In a test of 98 primers selected by this program for cosmid
    sequencing reactions, all succeeded.  We believe this primer
    picking program to be so successful because it takes advantage of
    consed's knowledge about the entire assembly.  It has been beta
    tested at numerous sites, as has been further improved since then.
    Oligo's are given unique names by consed, allowing you to
    automate the ordering of oligos, if you choose. 

autoFinish
    This function of consed will tell you the expected number of
    errors per megabase in your current assembly, and will tell you
    primers to make and reads to make in order to reduce the expected
    number of errors below a threshold.

    You can experiment with this by typing: consed -autofinish -ace
    (ace filename)

    There is also an interactive way of using autoFinish.

consensus tags
    Tags can now be added to the consensus!  This includes oligo tags
    which can be manually added or automatically added by the primer
    picking program (above).  It also includes comment tags, repeat
    tags, sequencing and cloning vector tags, etc.  You can navigate
    by any particular type of consensus tag.

automatic ordering of oligos
    Consed and a script keep track of which oligos you have chosen and
    ordered.  You must write a script that runs our script, formats
    the information as required by the oligo company, and emails to
    the company.

ability to extend the consensus
    Just in case phrap/crossmatch incorrectly picks out the cloning
    site at the end of the cosmid or BAC, you can correct it in
    consed.

dim low quality ends of reads
    This helps in (optionally) dimming the low quality ends of the
    (without the distracting red discrepancy color).

show quality of consensus base
    You know how you can click on a base of a read and the quality
    value is printed out?  Now the same thing works with consensus
    bases.

better coordination with polyphred
    Tags built in for polyphred.  Hooks for future improvements.

ability of an external program to control consed
    Another program can cause consed to scroll without interfering
    with normal interactive consed operation.

small items to make editing more pleasant
    improved vertical scrolling
    visual indication of which traces are popped up    
    navigation window shows information for comment tags and oligo tags

bugs fixed

friendlier error handling
    Some common errors have messages telling the user what they
    probably did wrong.  

allow pads at beginning or end of sequence

improved start-up performance
improved trace scroll performance



------------------------------------------------------------------------------
David Gordon                                    gordon@genome.washington.edu
Dept. of Molecular Biotechnology                
Box 352145                                      
University of Washington
Seattle, WA 98195
------------------------------------------------------------------------------