CONTENTS:
---------------------------------------------------------------------- IF YOU DO NOT USE PHREDPHRAP This section is just for old consed users who have their own scripts for running phred and phrap. For example, some sites have scripts that check every night for new reads. If there are any new reads found, the project is automatically reassembled, so it is ready for the finishers in the morning. If you have such a script, you will have to modify it to take advantage of consed's consensus tags. Consensus tags are related to a particular assembly since they refer to a particular position and a particular consensus position. For example, there may be a consensus tag on Contig2 position 20. On subsequent reassembly, all reads may be assembled into a single contig, Contig1. Thus in the subsequent assembly, there is no such thing as Contig2 position 20. The script transferConsensusTags.perl handles this by figuring out the correspondence between each old assembly contig and position and each new contig and position. transferConsensusTags.perl will then use this correspondence to transfer all consensus tags from the old assembly to the new assembly. Consensus tags are stored in the ace file. Thus transferConsensusTags.perl must be told the name of the old ace file and the name of the new ace file. There are 2 methods currently in use at beta sites: Method 1: Before reassembling, check which ace file is most recent and rename it to (project).screen.fasta.ace_saved_for_transfer_tags Then reassemble. Then run transferConsensusTags.perl (project).screen.fasta.ace_saved_for_transfer_tags (project).screen.fasta.ace.1 Thereafter the transferred consensus tags will be in the new ace file (project).screen.fasta.ace.1 Method 2: Before reassembling, rename edit_dir to edit_dir.backup Create a new edit_dir Reassemble in edit_dir In edit_dir, run: transferConsensusTags.perl ../edit_dir.backup/(project).screen.fasta.ace.2 (project).screen.fasta.ace.1 Both of these methods (and many others) will work. When you have made these changes, please test them! Open an assembly, add a consensus tag (see below), reassemble using your script, open the new assembly, and check that the consensus tags are on the same bases. There is a possibility that bases in the old ace file have *no* corresponding bases in the new ace file. This would occur, for example, if the low quality tail of a contig were tagged in the old ace file. Then with reassembly, let's suppose that reads were added that made that tail high quality and significantly changed the consensus in that location. In this case, transferConsensusTags.perl would fail to find the corresponding location in the new assembly, and would log that failure to a file with extension ".err". transferConsensusTags.perl always creates this file--if it is of 0 length, that means all tags transferred with no problem. If you have a script to automatically reassemble, you might want to alert someone (such as by sending an email message) if the ".err" file is not of zero length. If you use the script phredPhrap supplied with this distribution for reassembling, you don't need to worry about such issues. phredPhrap is made to be run by a person from the command line. -------------------------------------------------------------------------- QUICK TOUR OF CONSED Release 6.0 Consed is a program for viewing and editing assemblies assembled with the phrap assembly program. To follow this Quick Tour will take you less than 1 hour. However, it will save you approximately 2 days of agony. If you have lots of excess time, and you prefer to waste 2 days in agony, then skip down to "USING YOUR OWN DATA" below and do not do the quick tour. Here is a quick tour of consed: After unpacking all the files, including the 'standard' test data set and consed: 1) cd to standard/edit_dir 2) start consed by typing one of the following: (the one you should type depends on which executable you downloaded) ../../consed_sunos ../../consed_alpha ../../consed_solaris ../../consed_hp ../../consed_sgi (If you are an SGI user, see the note at the bottom of this file.) 3 windows should popup. One of these will have the list of .ace files and say 'select assembly file to open'. Double click on the one standard.fasta.screen.ace.1 The 'select assembly file' window will disappear and the window behind it will have a list of contigs and a list of fragments. I will call this 'the main consed window' in the rest of this tutorial. There should be only one contig in this example, labelled 'Contig1'. Double click on it. Then the aligned reads window should have some bases appear in it. 3) Try scrolling back and forth. Try scrolling by dragging the thumb of the scrollbar. Also try scrolling by clicking on the 4 << < > >> buttons for scrolling by small amounts. For scrolling by tiny amounts, click on the arrows at either end of the scrollbar. For scrolling by huge amounts, use the middle mouse button and just click on some location on the scrollbar. Notice the colors. The bases that are in red are the ones that disagree with the consensus. Notice the different shades of background colors (around the bases). They have the following meanings, but first, you need to understand the meaning of the quality values. A quality value of 10 means 1 error in 10**1.0 (ten to the 1.0 power) A quality value of 20 means 1 error in 10**2.0 A quality value of 30 means 1 error in 10**3.0 A quality value of 40 means 1 error in 10**4.0 and for quality values in between: A quality value of 25 means 1 error in 10**2.5 Get the idea? (These have actually been empirically verified--if you are interested in the gory details, read the phred papers: Ewing B, Hillier L, Wendl M, Green P: Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8, 175-185 (1998). Ewing B, Green P: Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Research 8, 186-194 (1998). In that same copy of the journal is a paper about consed, as well.) These quality values are shown in grey scales: Quality 0 through 4 is given by dark grey Quality 5 through 9 is given by a shade lighter Quality 10 through 14 is given a a shade still lighter . . . Quality of 40 through 97 is given by white (the brightest shade) A quality value of 99 is reserved for bases that have been edited and the user is absolutely sure of the base (high quality edit). A quality value of 98 is reserved for bases that have been edited and the user is not sure of the base (low quality edit). The ends of reads shows bases that are grey and have a black background. These are the low quality ends of reads or the unaligned ends of reads, as determined by phrap. 4) Click on a read base. You will see the numeric value of the quality shown in the xterm. Click on the consensus base. You will similarly see its quality. There are situations in which you really want to see the numeric value of the quality. 5) Put location 510 in the middle of the aligned reads window. On the main consed window, click on Options/General Preferences. When the General Preferences window pops up, click on 'Dim Low Quality Ends Of Reads' False and 'Dim Unaligned Ends of Reads' False. Then click on 'Apply'. Now look back in the aligned reads window. You should see the ends of reads that were black now appear grey with red. You are seeing the clipped-off bases with all the same information as any other base. Since there is a huge amount of red (discrepant) bases, the screen becomes distracting and busy. Thus by default the low quality clipped off bases are made with a black background and a grey foreground so they don't distract you. On the Options/General Preferences window, change both 'Dim Low Quality Ends of Reads' and 'Dim Unaligned Ends of Reads' back to True, and click Apply & Dismiss. We'll keep the ends dim for the rest of the tour. (Notice there is a distinction here between 'low quality ends of reads' and 'unaligned ends of reads'. For now, there is no difference. However, with a future version of phrap, there will be an important difference.) Now go to the menu labelled 'color', and pulldown and release on 'color means match'. Now you notice different colors: The colors have the following meaning: Blue: agrees with consensus Orange: disagrees with consensus Yellow: this stretch of this read was used to form the consensus Grey: Low quality or unaligned ends of reads Now go back to the colormode 'color means quality and tags' (the default) for the next exercise. TRACES AND EDITING 6) Put the cursor on the bases of one of the reads and click mouse button 2. The traces for that stretch of that read should popup. There are 4 rows of bases in the trace window: the consensus "con", the edited fragment bases "edt", the phred called bases "phd", and the ABI called bases "ABI". Notice that a red cursor blinks in corresponding positions in the two windows. 7) Try editing in the trace window. You can click the cursor on the "edt" line and directly overstrike a base. Try this. Try undoing it (by clicking on 'undo' ). You can insert a column of pads by pushing the space bar. Try this. (For those of you new to editing assemblies, a 'pad', which in consed and phrap is represented by the '*' character, is used to align two or more sequences such as these: gttgacagtaatcta gttgacataatcta in which one sequence has an inserted or deleted base with respect to the other. By inserting the pad character, it is possible to get a good alignment: gttgacagtaatcta gttgaca*taatcta This is the purpose of pad character--it is just a placeholder.) Try highlighting a stretch of a read by holding down the control key and mouse button 2 over the 'edt' (edited) read bases. There will be a popup which will give you the following choices: make high quality--makes the highlighted bases high quality change consensus--make the highlighted bases high quality and changes the consensus to agree with that stretch of the read make low quality--change their quality to the lowest possible make low quality to left end--same as above from the highlighted region to the left end of the read make low quality to right end--same as above from the highlighted region to the right end of the read change to n's--change the highlighted bases to 'n's and change their quality to the lowest possible change to n's to left end--same as above from the highlighted region to the left end of the read change to n's to right end--same as above from the highlighted region to the right end of the read add comment tag--allows user to add a comment to the stretch of bases add tag--allows user to add any tag to the stretch of bases dismiss--get rid of this popup This popup is made so that nothing else works until you choose something. (NEVER iconify this box--if you do, nothing will work until you deiconify it.) Try each of these choices, except for tags, which you'll try below. In particular, you should try 'change consensus'. This can also be used to extend the consensus on the right, in case phrap did not accurately find the cloning site. However, you can't try this feature with this sample database since there are no reads that extend past the end of the consensus. You will probably be able to try this with your own data. To delete a base, you can overstrike it with a '*' character. (Phrap ignores '*', so this is the same as deleting the character.) There is no way to remove the '*' from an assembly except by re-phrapping. We believe there should be a visual indication that a base was deleted. To move the cursor, use the mouse and click on a different base. HOTKEYS FOR EDITING 8) When you get really fast at editing, you will want to have a faster method of doing these edits than having the popup and selecting an option. Thus the following hot keys exist: < and > (less than and greater than) to make n's to the left and right of the cursor control-l and control-r to make low quality to the left and right of the cursor capital letters cause the base to be overstruck in high quality rather than low quality Give these a try. 9) Now that you have made some edits, try the 3rd colormode 'color means edited'. Notice that the bases that you have edited will stand out in either white or grey (depending on whether the base was made high quality or low quality). Observe this both in the trace window and the aligned reads window. Return to the 'color means quality and tags' colormode. MULTIPLE UNDO 10) In the main consed window, click the 'Undo Edit...' button. There will be a popup indicating the most recent edit. Click 'undo'. Then you will see the edit that was done before that. Click 'undo'. You can continue if you like. You now know how to undo more than one edit. You cannot choose which edits to undo and which to not undo--edits can only be undone in precisely reverse order from the order you made them. SCROLLING TRACES 11) In the aligned reads window, scroll along the contig to a different point. Click mouse button 2 on a read whose trace is already up. Notice that the existing trace is scrolled to the new location. ALIGNED TRACES 12) Dismiss all of your trace windows. Then popup traces for 2 different reads in approximately the same location. Scroll one of them. You may want to scroll by clicking the arrows or clicking to the left or right of the thumb. You will notice that both will scroll. Consed will do its best to have corresponding peak lined up. (Consed can't line all of them up because the peak spacing is not uniform and differs from read to read.) You will notice that the furthest left and right bases in each trace are aligned. Try removing a trace. Try adding other traces. Then click on 'No' for scrolling the traces together and try scrolling. You will now observe that they scroll separately. MULTIPLE TRACE POPUP 13) Dismiss the trace window. In the aligned reads window, scroll to a region that has many reads and that has some discrepancies--try position 921. Click with mouse button 2, but this time click on the consensus. At this location 3 traces will popup--these are the 2 highest quality traces that agree with the consensus (on each strand) and the highest quality trace that disagrees with the consensus. This feature is useful in areas of high coverage when you want to rapidly just examine the most significant traces rather than looking at all of them. MAXIMUM NUMBER OF TRACES DISPLAYED 14) Try bringing up some other traces that aren't displayed, such as K26-217c and then K26-526t. You will notice that new reads are put at the top of the stack of traces and, once there are 4 traces displayed, traces are removed from the bottom of the stack. If you want to change this maximum number of traces to something besides 4, you can do that: In the main consed window, pull down the "Options" menu, release on General Preferences. Try changing the "Max Number of Traces Shown" to 5. Then click OK'. Now try adding additional traces to the trace window. You will notice that now the number of traces shown will not exceed 5. NAVIGATE For this exercise, start in 'color means edit' mode. (Put the cursor on 'color', hold down mouse button 1, pull the cursor down to 'Color Means Edited', and release mouse button 1.) Bring up a few traces and make some edits (see above for how to do that). 15) Go back to the aligned reads window. Put the cursor on 'Navigate', hold down mouse button 1, pull the cursor to 'Edits' and release the mouse button. Click on 'Next' button repeatedly to take go repeatedly to the place you edited. 16) Dismiss the navigate window. Switch to 'color means quality and tags'. (Put the cursor on 'Color', hold down mouse button 1, pull the cursor down to 'Color Means Quality and Tags', and release mouse button 1.) Put the cursor on 'Navigate', hold down mouse button 1, pull the cursor to 'Low consensus quality', and release the mouse button. Click on 'Next' button repeatedly to go to the next low quality consensus position. This saves you from having to look through large amounts of high quality data trying to find problem areas. You may want to click on the 'save' button to save to a file a copy of this list of problem areas as you work through them. In our experience, this will be the most important navigate list you will use. In fact, finishing consists mainly of adding reads and rephrapping until this list is reduced to nothing. 17) Dismiss the navigate window. Now put the cursor on 'Navigate', hold down mouse button 1, pull the cursor to 'High quality discrepancy', and release the mouse button. You will notice there are no entries (unless you created some yourself by editing). That is because there are no high quality discrepancies with this dataset. So let's force there to be some by lowering the quality threshold. First, dismiss the old 'high quality discrepancy' window. Go to the main consed window, pulldown the 'Options' menu and release on 'General Preferences'. Notice that the default for 'Threshold for Navigate/High Quality Discrepancy' is 40. Change it to 20 and click 'OK'. Then follow the steps above to recreate it. Now you will see several entries. Click 'next' repeatedly to go successively to the next high quality discrepancy. Dismiss the navigate window. GOTO POSITION 18) In the aligned reads window, click in the 'Pos:' box in the upper right-hand corner. Type in a number, such as 540, and push 'Return'. The aligned reads window will scroll to position 540. We find this feature is particularly useful when one person wants another person to look at something in the sequence. COMPLEMENTING THE CONTIG 19) Push 'Comp Contig' in the aligned reads window to complement the contig. Push it again to uncomplement it. SEARCH FOR STRING 20) Try the 'Search For String' button on the main window. Type in a string (such as aaaca), and click 'ok'. There should be a list of 'hits'. Double click on one of the hits (or single click on it and click on 'go'.) Notice that the Aligned Reads Window scrolls to that position and has the cursor on the found string. (It might be complemented.) Dismiss this window. Try this again, only this time select 'Search Just Reads'. This is searching the file standard.fasta.screen which was created by crossmatch and is used as input to phrap. COPY AND PASTE 21) Now try the following: In the aligned reads window, swipe some bases by holding down the left mouse button. You should see the bases turn yellow, at least temporarily. Then click the 'Search for String' button on the contig list window. Use mouse button 2 to paste the bases you have just swiped into the 'Query string:' box. Notice that you can swipe bases either from the consensus or from a read. The search for string is case-insensitive so don't worry about the pasting being upper or lowercase. SAVING THE ASSEMBLY 22) To save the assembly, just click on the 'save assembly' button in any aligned reads window (it will save all contigs). When the dialog box comes up with a suggested name for the new ace file, I suggest you use the one it suggests. The idea is that the ace files: (project).fasta.screen.ace.1 (project).fasta.screen.ace.2 (project).fasta.screen.ace.3 (project).fasta.screen.ace.4 (project).fasta.screen.ace.5 are in order of how old they are. If you feel you are taking up too much disk space, then start deleting the ace files starting at the oldest. I do not recommend that you overwrite existing ace files. The version numbers just keep growing, and that is not a problem. RECOVERY FROM CRASHES 23) It is important to feel that your data is safe, even if the computer (or consed) were to crash. Consed will recover your data from such a crash. Make an edit and jot down its location. Then simulate a crash by going to the xterm where you started consed and typing ^C. Restart consed and select the same ace file you used before (standard.fasta.screen.ace.1). consed will tell you that there have been edits since that ace file and ask whether you want to apply those edits. Answer 'yes', and the edits will be applied. (This is similar to the edit/recover feature on the old VMS operating system, if you remember that.) This is the purpose of the .wrk files--they are a log file of your edits and they are added to as you make edits. 24) You should save your edits by clicking on the 'save assembly' button on the aligned reads window. COMPARE CONTIGS 25) Now you will see the 'compare contigs' feature. Click on the consensus in the aligned reads window to set the cursor. Then click 'compare contigs'. A large alignment window will come up. Then, still in the aligned reads window, scroll to some other location within the contig and click on it to set the cursor somewhere else. Then click 'compare contigs' again. Now turn your attention to the alignment window. Try scrolling the two contigs by each other. You can click on each contig to set a cursor on each of them. Set the cursor on base 320 on the top and 390 on the bottom. Then click 'align' to perform an alignment of the two contigs with the two cursor locations pinned together. You can try changing the cursor locations and then clicking 'align' again. Now set a cursor on a base in the alignment--the bottom half of this window. The 'scroll top contig' scrolls the corresponding aligned reads window (the one with the consensus and all the reads) to the corresponding position. The 'scroll bottom contig' does the same with the bottom contig. Experiment with this. This is one method of exploring joins of contigs that were not made by phrap. Another method is to use phrapview, supplied with phrap. phrapview gives a high level view of all internal joins while "compare contigs" shows the alignment of a single internal join. Some users have found them to work well together--phrapview to find a join and, having found it, "compare contigs" to examine it in more detail. TAGS 26) Bring up a trace for a read (as above). Swipe some bases with the (holding the control key down) middle mouse button (as above). Choose 'Add Tag'. You will see a list of tag types that you can assign to the highlighted bases. Be sure that you notice that you can scroll this list down so you can see the type "significantDiscrepancy". Try adding various tags. Notice the different colors for the different tag types. Also try 'Add Comment Tag' which is the same except it allows you to enter a multiple-line comment. If you forget which tag type a particular color means, click on the tag with right mouse button while holding the control key down. You can do this either in the aligned reads window or in the trace window. (Alternatively, in the aligned reads window, click with the right mouse button and then click on 'Show Tag Info'). Note that you can modify the location of the tag in this popup--try this. (Note that you must modify the READ position--not the consensus position.) The following tag types will be read by phrap (when it is implemented in phrap): becomeConsensus ignoreMismatches ignoreMatches significantDiscrepancy (Until phrap reads these tags, they have no affect.) 27) When you have created a bunch of tags, experiment with the 'navigate by tags'. On the aligned reads window, choose the 'navigate' menu, item 'tags'. Pick one of the tag types you have created, and click 'next' through each of the tag locations. Experiment with various tag types. CONSENSUS TAGS 28) In the aligned reads window, swipe a stretch of consensus bases by holding down the control key and holding down the middle mouse button. Up will pop a list of tag types. Click on one of them. Try it again somewhere else. Try it with the tag type being 'comment'. In this case, you must enter a comment. Notice the pretty colors! If you forget what a particular color means, you can click on the colored tag with mouse button 3 while holding down the control key and the information about the tag will pop up. PRIMER-PICKING **** Temporary step **** After you have completed the 'install vector files' step (below), you should never do this again. On the main window, click on 'Options'/'Primer Picking Preferences'. Notice the question "Screen Primers Against Sequences in File?" Click on 'False'. The click 'ok' and the Primer Picking Preferences box will pop down. **** end of temporary step **** 29) Go to some location near the right end of the contig, say 1180. Click with mouse button 3 and click on either one of the forward primer choices (either from subclone template or from clone template). There will be a selection of primers that pass all of consed's requirements. Double click on one of them. That will cause the aligned reads window to scroll to show that oligo in context. Click on 'Accept Primer'. Notice that an oligo tag is created for that primer. Dismiss the list of primers. If you are interested in the details of primer-picking, see the section 'PRIMER PARAMETERS' (below). AUTOFINISH 30) Try starting consed by typing: consed -autofinish -ace standard.fasta.screen.ace.1 Consed will print out a list of primers you should make and reads you should make from those primers in order to reduce the number of errors below a target threshold. This finishing tool is designed to be run in batch after each assembly. In a high throughput operation, the production people can make these reads without anyone using consed to examine the assembly interactively. Only when consed -autofinish cannot help you any longer (either it reduces the number of expected errors below your error threshold, or it says it can't help you further), must you bring up consed interactively and examine the assembly. Current restrictions: a) it just suggests custom primers and these are assumed to be sequenced directly off the clone (not subclone) template. b) consed -autofinish must be run either on a monitor, or, if run as part of a cron job, the cron job must setenv DISPLAY xxxxx where xxxxx is some display that the cron job has access to. HIGHLIGHTING READ NAMES 31) In the aligned reads window, click on a read name with mouse button 1. The name will turn magenta. Click again and it will turn yellow again. Try turning it magenta and then scrolling large distances. If you want to follow a particular read, this helps you keep track of it as you scroll. INCREMENTAL SEARCH FOR READ NAME 32) Restart consed. Instead of clicking on a read or contig name, type a read name into the 'Find read:' box. Try typing "K26-5". You will notice that as you type each letter, the first item in the list that matches the letters typed will be highlighted. Experiment with deleting a few letters and typing others. This is a powerful method of quickly getting to the read name you are interested in. When you get to the read you want, just type carriage return or click the 'OK' button. ONLINE DOCUMENTATION 33) On the aligned reads window, click on the 'help' menu, 'show documentation' item. You will see this document. At this point you've seen consed's current capabilities. Now you will want to try it on your own data. ---------------------------------------------------------------------------- USING YOUR OWN DATA AND INSTALLING CONSED The next few steps will probably require the assistance of someone with root access. 34) Put consed in /usr/local/genome/bin (or wherever you like to keep consed). 35) Build phd2fasta. The sources and a makefile for this program are supplied with the consed distribution, in the subdirectory misc 36) Put the following files in /usr/local/genome/bin They will be also be found in subdirectory misc phredPhrap fasta2Phd.perl phd2Ace.perl ace2Oligos.perl transferConsensusTags.perl 37) Get perl In order to use consed, the ABI chromatigrams must be run through a gauntlet of phred, phd2fasta, crossmatch, transferConsensusTags.perl and phrap. In order to simplify this procedure, we have written a perl script: phredPhrap You MUST use this script if you are going to use consed. There are other methods of using phred and phrap to create an ace file, but consed may not work. If you go that route, you are on your own. If you want to be sure consed works, use phredPhrap. phredPhrap requires perl which is available public domain from a number of ftp sites, including those that have standard gnu unix utilities. These machines had perl via anonymous FTP last time I checked: ftp.uu.net 137.39.1.2 in /languages/perl ftp.netlabs.com 192.94.48.152 in /pub/outgoing/perl5.0 coombs.anu.edu.au 150.203.76.2 archive.cis.ohio-state.edu 128.146.8.52 jpl-devvax.jpl.nasa.gov 128.149.1.143 prep.ai.mit.edu 18.71.0.38 in /pub/gnu ftp.cs.ruu.nl 131.211.80.17 (Europe) or try getting it from the web at: http://www.perl.com/perl/info/software.html (If you don't know about perl, try it--it will save you a huge amount of time over developing the same utilities in C, awk, or csh or sh.) To work out any problems using phredPhrap, I suggest that you first try it on a tiny database, such as the the test database you were using above. Copy standard/* to a new location. Then delete the files in phd_dir and in edit_dir. Then cd to edit_dir, and type: phredPhrap standard -notags phredPhrap may need to be edited to reflect where you have put phred, phrap, phd2fasta, crossmatch, and the vector sequence library. phredPhrap is very easy to read and modify. (But keep a backup copy in case you cause problems.) When you have worked out all the problems with installing phred, phrap, crossmatch, phd2seqfasta, phd2qualfasta, and phredPhrap, this should work flawlessly and you should be able to bring up consed on the newly-created standard.fasta.screen.ace.1 Now run consed on this ace file and add some consensus tags. Save the assembly as standard.fasta.screen.ace.2 Then run phredPhrap standard standard.fasta.screen.ace.2 phredPhrap will create a new ace file: standard.fasta.screen.ace.3 which will contain the consensus tags transferred from standard.fasta.screen.ace.2 Bring up consed on standard.fasta.screen.ace.3 and you will see your consensus tags. When you have successfully done that, you are now ready to do the same with a larger database (such as your own data). 38) Install the cosmid, BAC, M13, (or whatever you use) vector files. These files should be in fasta format and be named: primerSubcloneScreen.seq for the subclone (M13, plasmid, or whatever you use) vector sequences primerCloneScreen.seq for the clone (BAC, cosmid, or whatever you use) vector sequences These should be put in: /usr/local/genome/lib/screenLibs (This location configurable via X resources, but this is the easiest place to put them, since it is the default location.) To check that this works, do step PRIMER PICKING (above), except this time skip the 'temporary step'. Thus you will now be using the full-blown primer picking program that screens against vector sequence. 39) Create the following directory structure Directory structure: top level directory subdirectory 'chromat_dir'--chromatigrams go in here subdirectory 'phd_dir'--just create this. subdirectory 'edit_dir'--just create this. If you already have your chromatigrams somewhere else, you can make chromat_dir be a link to wherever you have them. The various phrap and crossmatch files will be put into edit_dir. 40) cd to the edit_dir directory, and run phredPhrap as above. 41) (optional) If you have problems and need to start again, delete all files from phd_dir and edit_dir. Then repeat the step above. 42) cd to edit_dir and run consed You should see a file with the extension .ace.1 Double click on it. You should see a list of contigs. Double click on the one you want to see. Now you should see a big colorful alignment of your sequences. Repeat some of the experimenting you did with the test data set above. ---------------------------------------------------------------------------- PRIMER PARAMETERS On the main window, click on 'Options'/'Primer Picking Preferences' again. A great deal of science and experimentation has gone into setting these defaults and I suggest you do not change them. However, I know you will anyway, so now you know where to find them. This is what they mean (I suggest you skip over this for now): PrimersNumberOfBasesToBackupToStartLooking Consed is designed for you to put the cursor on the left-most (or right-most) edge of a region that you want to cover with a new read. Since the data quality immediately after an oligo is not good, you don't want the oligo immediately next to the region you want to cover, but rather a little bit back from it. This parameter gives how far back. PrimersWindowSizeInLooking This is the width of the region in which consed looks for primers. So if PrimersNumberOfBasesToBackupToStartLooking is 50 and PrimersWindowSizeInLooking is 450, and you are looking for a forward primer, then the consed will look from 500 bases to the left of the cursor up to 50 bases to the left of the cursor. If you are looking for a reverse primer, then consed will start looking 50 bases to the right of the cursor and continue until 500 bases to the right of the cursor. PrimersMinimumLengthOfAPrimer PrimersMaximumLengthOfAPrimer (just what they sound like) PrimersMaxInsertSizeOfASubclone When you click on forward or reverse primer/subclone template, consed knows that it is all right if it finds a primer that has an additional match to somewhere else in the assembly, as long as that location is not on the same subclone template you intend to use. Consed uses this parameter to specify the range of the search for unacceptable additional matches. PrimersMinMeltingTemp PrimersMaxMeltingTemp Consed uses the nearest-neighbor (with salt concentration correction) formula, just as all modern primer picking programs do PrimersMaxSelfMatchScore In choosing a primer, you don't want the primer to bind to itself (form a hairpin) or bind to another copy of itself. It is particularly bad if it binds to another copy at its 3' end. This parameter is used in the algorithm that tests this. PrimersMaxMatchElsewhereScore In choosing a primer, it is important that the primer not stick somewhere besides the place you are trying to get a read--a "false match". This can cause a primer to fail even if the false match is not perfect. The worst kind of false matches are those the extend to the 3' end of the primer, and worse yet if they have a high percentage of G/C matches since G and C bind more tightly than A and T. The algorithm used here takes both of these effects into account. This parameter sets the max acceptable false match. PrimersMinQuality Some primers fail because the primers don't match where they are supposed to. This is because the sequence where the primer is supposed to stick isn't accurately known. Thus it is important to be certain of the sequence where the primer is chosen from. This parameter is an indication of this certainty--it is the min quality of every base in an acceptable primer. PrimersMaxLengthOfMononucleotideRepeat Folklore says that mononucleotide repeats are bad. To please consed users, I've put this check in. Screen Primers Against Sequences in File? True False It is important that the primers not stick to the vector of the template. Thus you must provide consed with two files--a file in fasta format of all subclone vectors, and a file in fasta format of all clone vectors. Consed will not accept any primer that has a match against the appropriate one of these vectors (depending on whether you click in the aligned reads window mouse button 3 on forward/reverse primer from subclone template or clone template). A primer that has a false match to a vector is rejected if that false match has a score worse than PrimersMaxMatchElsewhereScore You can also read about this in the consed paper: Gordon, D., C. Abajian, and P. Green. 1998. Consed: A graphical tool for sequence finishing. Genome Research. 8:195-202 ---------------------------------------------------------------------------- FOR PROGRAMMERS AND FELLOW TRAVELLERS ONLY CONSED VERSION On the command line, type: consed -v This is particularly useful to system administrators to make sure the the latest version in installed on all computers. CONSED CUSTOMIZATION Click the "Info" menu items on the window with the list of contigs on it. Click the 'Show X Resources' menu item. This shows you what to put in your ~/.Xdefaults file to change any of the colors to something you like better. Note that for changes to .Xdefaults to take effect, you must do one of following: 1) xrdb -remove or 2) xrdb -load ~/.Xdefaults I only use the former (keep my server empty of resources so that changes to .Xdefaults takes effect for all newly created processes). Changes in ~/.Xdefaults only affects one user. If you want to make a change to affect all consed users on the system, put a file called 'consed' in /usr/lib/X11/app-defaults. In this file, you must put ALL resources listed in the 'Show X Resources' list. Modify the ones you want different. COMPRESSING CHROMATOGRAMS If you are interested in compressing your chromatogram files, go into chromat_dir and gzip one of the chromatogram files. Then see the X resources under 'Info'/'Show X Resources' as described above. Notice there is an X resource consed.gunzipFullPath which, by default is /usr/local/bin/gunzip. If your gunzip is not there, you must change this X resource in your ~/.Xdefaults file. Put a line that looks like this: consed.gunzipFullPath: /usr/local/bin/gunzip (or whatever your path to gunzip is). Restart consed and bring up the corresponding trace. You will notice no appreciable delay. CONSED -ACE Try bringing up consed like this: consed -ace (name of ace file) This can be useful if you are going to have consed brought up from some other program. NO PHD FILES Try bring up consed like this: consed -nophd This mode does not allow editing and does not show quality information. It allows you to view an assembly when you don't have phd files or chromatigrams but you only have the ace file. You will not be able to see the quality information, since that information is kept in the phd files. I do not recommend nor support this option! CUSTOM NAVIGATION You can also create a file that has special locations you want to examine in consed. By clicking in the main window on "Navigate/Custom Navigation", you can read that file in. Then you can use it to goto each region, just as you did the low consensus quality regions (above). Using this feature, you can write programs that will create this file, and then use consed to examine these regions. The format of the file is quite simple, as follows: TITLE: Single Stranded Regions BEGIN_REGION TYPE: CONSENSUS CONTIG: Contig1 UNPADDED_CONS_POS: 1 5 COMMENT: This is a comment END_REGION BEGIN_REGION TYPE: CONSENSUS CONTIG: Contig1 UNPADDED_CONS_POS: 20 25 COMMENT: This is comment 2 END_REGION BEGIN_REGION TYPE: CONSENSUS CONTIG: Contig1 UNPADDED_CONS_POS: 40 45 COMMENT: This is comment 3 END_REGION BEGIN_REGION TYPE: READ CONTIG: Contig1 READ: K26-394c UNPADDED_CONS_POS: 820 850 COMMENT: This is a comment END_REGION BEGIN_REGION TYPE: READ CONTIG: Contig1 READ: K26-394c UNPADDED_CONS_POS: 870 880 COMMENT: This is a comment END_REGION Notice that the first 3 are consensus locations and the last 2 are locations on a read. You can have any number of either type of locations, and can have a mixture of both types of locations. You can cut/paste the above into a file and try it. CONTROL OF CONSED FROM SOME OTHER PROGRAM Consed can be controlled by some other program. For example, you might have a program that displays mapping data and you would like the user to be able to click on a location and have consed come up showing the bases in that region. This feature allows a programmer to do this. The external program can start up consed as follows: consed -socket (local port number) -ace (ace filename) For example, consed -socket 5432 -ace standard.fasta.screen.ace After consed completes coming up (including you clicking whether you want to apply edits), you will see the message in the xterm: success bind to local port number: 5432 And then you will see a file created by consed in the default directory called consedSocketLocalPortNumber This gives the port number of the Berkeley socket that consed has opened and is listening on. Thus your program can read this file and create a connection to the Berkeley socket created by consed. Once the connection is established, your program can send commands to consed at that socket indicating to consed which contig to display and what consensus position to scroll to. Currently, the only acceptable command is: Scroll (contigname) (consensus position)Just send such a command to the Berkeley socket, and consed will respond appropriately. AUTOMATIC ORDERING OF OLIGOS I heard of a finisher who manually ordered 72 oligos. She had to cut/paste the bases of each oligo. That is not only painful, but also error prone. I've supplied you a script that you can use to automatically determine which oligos have been newly requested since the last order, aggregate them into a single order, and email the request off. The script is ace2Oligos.perl. It takes as parameters the name of an ace file and the name of the oligo file. The oligo file is a list of oligos that have been ordered for that particular project, and looks like this: name=G1980A181.1 sequence=ctgcatggctaggga template=seq from subclone date=980427 temp=52 name=G1980A181.2 sequence=tcttactttctgactttcattt template=seq from clone date=980427 temp=50 ace2Oligos.perl finds all oligo tags in the ace file and makes sure that all of them are in this oligo file. To automatically order oligos each night, there is an additional script you will have to write. I suggest that you run your script each night under cron and that it do the following: for each project, it will look for the most recent ace file. It will run ace2Oligos.perl on that ace file and direct the oligo file to be in the parent directory of edit_dir, phd_dir, and chromat_dir for that project. Thus there will be one oligos file for each project. Your script will run ace2Oligos.perl once for each project. Then your script would, for each project, look in the oligos file for new oligos, and aggregate the unordered oligos into a central file, which it would email to the oligo company. If it finds any new oligos in an oligo file, it draws a line at the bottom: ------------------------------- which indicates that all oligos have been ordered. When this script looks at this file the next night, it uses this line to determine whether any additional oligos have been requested since the previous order. (The idea of this line came from St Louis.) Thus the oligos file tells you which oligos have been ordered and which have not yet been ordered. ---------------------------------------------------------------------------- ADVANCED PHRAP/CONSED USAGE BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY If you decide that all your edits are terrible and you want to start over (perhaps you have been training a new finisher), the cleanest solution is to delete everything except for the chromats and just run phredPhrap again. Thus you would delete everything in edit_dir and phd_dir, but leave everything in chromat_dir alone. SELECTIVELY BACKING OUT EDITS AND REMOVING READS You should only attempt the below after you have spent, say, more than 40 hours total in consed and are quite familiar with how things *should* work. If you make a mistake in the process below, you can really mess things up. In that case, the most certain way to clean up your mess is to delete everything in phd_dir and edit_dir and run phredPhrap again. (Realize that you will lose all your edits by doing this.) If you want to back out some of the edits, but not others, you will need to learn a little about how consed stores the edits. So here goes: phd_dir This directory contains the one or more files for each read. For example, if a read is G1980A181_672.s1, there is a phd file: G1980A181_672.s1.phd.1 This was created by phred and contains the base calls. There also may be subsequent versions created by consed: G1980A181_672.s1.phd.2 G1980A181_672.s1.phd.3 G1980A181_672.s1.phd.4 These contain the edited bases and read tags, but are otherwise identical. When you edit a read, and then click 'save assembly', consed creates a new version of the phd file for the read you just edited. When you reassemble using phredPhrap, phredPhrap uses the phd file with the highest version number (in this case G1980A181_672.s1.phd.4) to be bases for that read (G1980A181_672.s1). Thus if you wanted to back out all edits to a particular read (but not back out all edits to all reads), you must delete all of the versions of phd files for that particular read except for the first one (the one ending in .phd.1). Then you must reassemble running the script phredPhrap. If you wanted to remove a read entirely from an assembly, then you must delete all copies of the phd files and reassemble. Backing out newly added consensus tags is easier--you can just delete the more recent versions of ace files that contain the newly added consensus tags. ADDING READS WITHOUT CHROMATOGRAM FILES This may happen if you, for example, download sequence from Genbank and want to assemble it along with your reads. Use the following script: fasta2Phd.perl (name of file with fasta sequence) It will create a file whose name is taken from the fasta file name: for example, if the fasta filename is Contig1.fasta, then the phd file will be called Contig1.phd.1 The fasta name in the file is ignored. You can then put this in the phd_dir, and reassemble using phredPhrap However, you will not be able to edit this read, since you won't be able to bring up a chromatogram for it. (This restriction may be removed in the future if there is enough interest--however, our belief is that users should generally be required to look at the traces when they are making edits to reads.) VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS If you have a chromatogram, you can use consed to view it, even if it hasn't been assembled. To do this, make the same edit_dir, phd_dir, and chromat_dir as above, put the chromatogram into chromat_dir, run phred on it to generate the phd file which goes into phd_dir. Then go to edit_dir and run: phd2Ace.perl (name of phd file) For example, if your phd file is myRead.phd.1 from edit_dir, type: phd2Ace.perl myRead.phd.1 This will produce myRead.ace Then just start consed normally: consed -ace myRead.ace and you can view the chromatogram. CORRECTING FALSE JOINS MADE BY PHRAP Phrap may put several reads together that you believe do not belong together. (For example, you may see several high quality discrepancies between the reads.) If you are sure these reads do not belong together, you can force a subsequent reassembly by phrap to not assemble those reads together. You do that by changing to 'high quality' (see above for how to do this) one of the reads that agrees with the consensus and one of the reads that disagrees with the consensus (has a high quality discrepancy). You must do this for the 2 reads over the same range of consensus positions. The reads must both be aligned to the consensus at that location. In the currently released version of phrap, the way to tell a read is not aligned to the consensus at a particular position is that there are long strings of discrepant bases. For example, if there are 10 agreeing bases in a row, then a discrepancy or two, and then another 10 agreeing bases, you can be pretty sure that the read is aligned against the consensus at that location. You should make high quality more than just one base--3 or 4 is a good number. For example, suppose the reads are: ATTGCCCG ATTGACCG ^ In this case you should swipe GCCC of the top read and GACC on the bottom read. If you have done all of this correct, and you reassemble, then phrap will not put these reads together. -------------------------------------------------------------------------- NOTE TO SGI USERS In /usr/lib, there must be a file: libCsup.so If you don't have this file, you must get it from SGI. To get it, if you are on Irix 6.2 through 6.4, request: SG0001637 "C++ Exception handling patch for 7.00 (and above) compilers on irix 6.2" (it's on the "Development Options 7.1" CD). If you are on Irix 5.3, install patch 1600 To make things easier for you, I've included my libCsup.so This might save you having to get the patches above. -------------------------------------------------------------------------- WHAT IS NEW IN CONSED 6.0 This section is mainly intended for advanced consed users to quickly see what is new in this version. Novice consed users should consult the quick-tour (above). error rate displayed primer picking In a test of 98 primers selected by this program for cosmid sequencing reactions, all succeeded. We believe this primer picking program to be so successful because it takes advantage of consed's knowledge about the entire assembly. It has been beta tested at numerous sites, as has been further improved since then. Oligo's are given unique names by consed, allowing you to automate the ordering of oligos, if you choose. autoFinish This function of consed will tell you the expected number of errors per megabase in your current assembly, and will tell you primers to make and reads to make in order to reduce the expected number of errors below a threshold. You can experiment with this by typing: consed -autofinish -ace (ace filename) There is also an interactive way of using autoFinish. consensus tags Tags can now be added to the consensus! This includes oligo tags which can be manually added or automatically added by the primer picking program (above). It also includes comment tags, repeat tags, sequencing and cloning vector tags, etc. You can navigate by any particular type of consensus tag. automatic ordering of oligos Consed and a script keep track of which oligos you have chosen and ordered. You must write a script that runs our script, formats the information as required by the oligo company, and emails to the company. ability to extend the consensus Just in case phrap/crossmatch incorrectly picks out the cloning site at the end of the cosmid or BAC, you can correct it in consed. dim low quality ends of reads This helps in (optionally) dimming the low quality ends of the (without the distracting red discrepancy color). show quality of consensus base You know how you can click on a base of a read and the quality value is printed out? Now the same thing works with consensus bases. better coordination with polyphred Tags built in for polyphred. Hooks for future improvements. ability of an external program to control consed Another program can cause consed to scroll without interfering with normal interactive consed operation. small items to make editing more pleasant improved vertical scrolling visual indication of which traces are popped up navigation window shows information for comment tags and oligo tags bugs fixed friendlier error handling Some common errors have messages telling the user what they probably did wrong. allow pads at beginning or end of sequence improved start-up performance improved trace scroll performance ------------------------------------------------------------------------------ David Gordon gordon@genome.washington.edu Dept. of Molecular Biotechnology Box 352145 University of Washington Seattle, WA 98195 ------------------------------------------------------------------------------