joinmap_manual.html

JoinMap Version 1.4

A computer program to generate genetic linkage maps

Developed by

Piet Stam

Centre for Plant Breeding and Reproduction Research CPRO-DLO P.O. Box 16 6700 AA Wageningen The Netherlands

and

Department of Genetics Wageningen Agricultural University Dreyenlaan 2 6703 HA Wageningen The Netherlands

E-mail (internet) P.Stam@cpro.agro.nl

License Agreement
Referencing
1) Copyright
2) Overview of JoinMap
3) Data Input
4) Output
5) Critical LOD Scores
6) Job Parameters
7) How to run JoinMap
8) Errors and Warnings
9) Operating Systems
10) Example Data Files

License Agreement

The JoinMap package, including the JoinMap 1.4 version program, the documentation and the associated files, is distributed under the following license terms. Installation of the program on any computer or any use of the program implies that the user and the user's organisation (hereafter referred to as "you") agree to the following terms.

1. Warranty or lack thereof. JoinMap is provided on an "as is" basis, with no warranty of any type, including warranty of suitabilty for any particular purpose or ability to function correctly on any type of computer. No technical support can be given. Note that this agreement implies that neither the author nor CPRO-DLO can at any time be held responsible for any damage resulting from using this program. At no time can the author or CPRO-DLO be charged with claims associated with such damage.

2. Redistribution rights. You are given permission to copy JoinMap for distribution within your organisation only. When doing so the entity of JoinMap should be preserved, i.e. the copyright notice and this license agreement must remain part of it. Note that this agreement prohibits both selling and free distribution of JoinMap to any person outside your organisation.

Referencing.

When publishing results that have been obtained with the JoinMap package reference should be made to the following paper:

P. Stam (1993) Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. The Plant Journal 5:739-744.

1. Copyright (C) 1993.

The copyright of the JoinMap package, which is the JoinMap program and this manual, is owned by the author, Piet Stam. Academic Institutions may use the package freely (a charge for mailing and floppy disk only). Commercial companies can buy the package for their own use. Users may not charge for copies or modified versions of the package. Charges are to cover the cost of periferal activities and consumables only. Users can not claim any refunding in case of damage resulting from errors in the package.

Registered trademarks: IBM International Business Machines Corporation MS-DOS Microsoft Corporation VAX,VMS Digital Equipment Corporation UNIX American Telephone and Telegraph

2. OVERVIEW OF JOINMAP.

Reference:

P. Stam (1993) Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. The Plant Journal 3:739-744.

JoinMap calculates genetic recombination frequencies between genes (mar- kers) and aligns these in a linear order on a genetic (recombination) map.

JoinMap takes two basic types of data as input: - raw population data, e.g. backcross, F2, recombinant inbred lines (RILs) - pairwise estimates of recombination percentages, together with their standard error. Raw data are not restricted to a particular type; for example, both backcross data and F2 data may occur in a single data input file. In most cases such distinct segregation types will correspond to distinct popula- tions (progenies), i.e. a population in which all markers segregate accor- ding to the backcross ratio (1:1) and a population in which all markers segregate according to an F2 ratio (1:2:1, 1:3, or 3:1). But even within a single population the segregation type may vary between markers; one marker may segregate as in an F2, whereas another one may segregate as in a backcross. JoinMap is especially useful to combine genetic maps which are based on distinct data sets. If, for example, raw F2 data are available for the markers a, b, c, d, e, f, g, h, and raw backcross data are available for the markers f, g, h, k, l, m (supposedly belonging to the same linkage group), JoinMap will calculate the combined map for the genes a,..., m. In addition to raw data, the user can always supply linkage information, taken from other sources, in the form of a list of pairwise estimates of recombination percentages, together with their standard error (the latter being required for a proper weighing of the estimate).

After reading the data, JoinMap first calculates all pairwise recombination frequencies for those pairs on which linkage information is available. (In the above example no linkage info is available for the pairs (a,k), (a,l),...,(e,m)). If on a particular pair more than one piece of informati- on is available (such as the pairs (f,g),...,(g,h)), these are combined into a single estimate (after weighing with the "amount of information" contained in the distinct pieces of info). Based on thes pairwise estimates, the markers are grouped into linkage groups. In order to establish these linkage groups the user must supply JoinMap with a critical LOD score (linklod). Marker pairs with a recombina- tion LOD score above this critical LOD are considered to be linked. Two genes are placed in distinct linkage groups if they ared not linked (by LOD value) to any member of the other group.

When JM "detects" several linkage groups, the input data file is split into separate files, corresponding to these linkage groups; they can be use as input subsequent runs.

If the data comprise a proper single linkage group, JoinMap sequentially builds up the genetic map. It starts with the marker pair containing most linkage info (there are some additional conditions for this first pair, such as both being linked to at least two other markers). At each step a marker is added to the map; the marker to be added is chosen on the basis of its total linkage info with the markers that were placed earlier on the map.

Calculations for positioning markers and for map distances (centiMorgan, cM) are performed in terms of map distances raher than recombination fre- quencies. This implies that the user must supply JoinMap with a genetic mapping function (mf), which translates recombination frequencies into map distances and vice versa. The choice is between Haldane's and Kosambi's mf. The first assumes absence of interference between cross-overs in meiosis, whereas the latter assumes a certain degree of interference. For most higher organisms Kosambi's mf results in a better over all fit of recombi- nation data. However, no a priori choice can be made on this point.

JoinMap has been designed for non-interactive use; positioning of markers needs no (inter)action of the user. This was done in order to avoid the time consuming user-controlled trial-and-error search for the best order in some other packages. It does not mean, however that the user cannot steer at all. In addition to the input data, the user may supply (in an separate file) so called "fixed sequences" of markers. JoinMap will in that case try to produce a map which does not contradict any of these "fixed sequences". If the data are seriously in conflict with "fixed sequences", a warning is issued. Information about "fixed sequences" could have arisen from sources other than the raw data (e.g. physical mapping). The use of fixed sequences will in general significantly reduce computation time, since in the search for the best fitting order, any order conflicting with a "fixed order" is skipped.

The method of "calculating" a map is as follows. For a given order of the markers, an adapted version of the Jensen & Jorgensen method of least squa- res is used (Jensen & Jorgensen, 1975, Hereditas 80:6-16). A likelihood ra- tio is calculated which indicates the goodness-of-fit. The best fitting or- der is found by trial-and-error, though some tricks are applied to pre- vent searching the whole parameter space. After adding a marker to the map, the map region containing the latest added marker is "reshuffled" to find a possibly better order; after each second added marker the whole map is being "reshuflled" to this end. Reshuffling means rearrangeing the markers in a moving interval until no bettter solution can be found.

The matrix routines necessary for the least square procedure, the routine for finding the maximum likelihood solution for recombinant inbred lines, as well as some routines for dynamic storage allocation were taken from "Numerical Recipes in C, the art of scientific computing" by W.H. Press, B.P. Flanney, S.A. Teukolsky and W.T. Vetterling, Cambridge University Press, 1988.

Estimates of recombination frequencies from raw data are obtained by using the EM algorithm; this is an iterative way to obtain maximum likelihood estimates.

JoinMap produces the best fitting map distances between markers as well as the cumulative distance (both top-down and bottom-up).

JoinMap was written in C; for portability reasons the source code is ANSI C.

3. DATA INPUT.

Data are read from text (ASCII) files. Coding of genotypes (single characters) is as follows.

A : homozygous
H : heterozygous
B : alternative homozygous
C : non-A, i.e. either H or B
D : non-B, i.e. either A or H
- or U : unknown.

The segregation type is coded according to the genotype of the parent(s). Backcross types of segregation are denoted as AxH, BxH, HxA and HxB, AxH(BC), BxH(BC), HxA(BC) and HxB(BC).

The F2 type of segregation may be denoted by either F2 or HxH.

There is a difference between the segregation types "F2" and "HxH"; in the latter case it is assumed that the phase of linkage is unknown. In that case JoinMap deduces the phase for pairs of markers from the data. In a normal F2 however the phase is always known, so in that case it is wise to use "F2" as segregation type. The same holds for the backcross segregation types; in a true backcross the type AxH(BC), or one of the others, as applicable, should be used.

Recombinant inbred lines (RILs) are recorded as RIL or ril, where denotes the generation; examples: RIL6, ril11. Note that F2, f2, RIL2 and ril2 indicate the same type of segregation.

The use of the characters A, B, C, D, H, F, U is case-insensitive; howe- ver the "cross sign" (x) in the segregation type code MUST be the lower case x; it is essential that no white space characters occur in the codes for segregation types; see examples.

When using RIL data, all markers of a population (lines of a set) must be of the same segregation type (if not, you are asking the impossible).

The other segregation types with unknown phase (F2, AxH, HxB, etc.) may vary from marker to marker within a population. This feature enables the analysis of an offspring generation from a cross between two individuals in an outbreeding species. In such an offspring some markers may segregate according to the type HxB, whereas other markers may segregate according to HxH.

Raw data format.

The following is an example of a small (hypothetical and genetically meaningless) raw data file, examplifying the format.

2

pop_1

30	5

ASH1711 F2 DD--DDDDDB DDDDBDBDDD BDDDBDDBDD pym	F2 ahhhahbhah 
AAAABHBHHH BAAHBAABHH
gamma12 F2 aaccacccac acac---ccc aacccacccc 121-b	f2 BAHHHHBHAA 
BAAHBHHABA BAAAHAHBAH
548-ae HxH BDDDDDBDDD BDDDBDDDBD bdddbdbbdd 


pop_2

50	5

gamma12 f2 CACCC ACCAC CCCAC UUCCA ACUUC ACACA 
AACCC CCCCA CCACC CCCCA
121-b	HxH HAHBH ABHHA HHBAB BHAHH HBHHH AHHHA
AHBHB HHHAA HBAHH BBHHA
548-ae F2 DDDBD DBDDD DDBDB BDDDD ----- DDDDD 
DDBDB DDDDD DDDDD BBDDD
hypo518 hxh HHHBA ABHHA HHBAB BHAAA HAAHH HHHHH 
AHBBB --H-- HBAHH HHHBH
DAR-bs F2 CCCCA ACCCA CCCAC CCAAA CAACC CCACCA 
CCCCC ----- CCC-- CCCC

Explanation / meaning:

1st line : 2 populations in this file
2nd line : label for first population (should not contain white space characters)
3rd line : population size is 30,
5 markers are segregating
4th line : marker name (no white space characters!)
segregation type thirty genotypes
5th line : similar
6th line : similar
7th line : similar
8th line : similar
9th line : label for second population
10th line : population size is 50
5 markers are segregating
11th line : marker name
segregation type
50 genotypes
et cetera.

The above examplifies the following features:

* upper and lower case may be used for genotypes

* upper and lower case may be used for segregation types

* individuals may be separated by blanks, tabs, and newline characters for grouping and readability.

* the code combinations A/C and B/D cover dominance; they can also be used in case a codominant marker cannot be scored with certainty in a particular individual.

* a particular marker may be segregating in several populations; the above example illustrates the aligning of two maps: the populations 1 and 2 could be used to construct distinct maps; however when feeding JoinMap with both data sets in a single file, as above, it will produce a combined map.

!!! IMPORTANT !!! Characters in names of marker are case-sensitive, i.e. makers fm45 and Fm45 are treated as being different.

!!! IMPORTANT !!! Make sure that marker names are unique within a population, i.e. a maker name should occur only once in a population data set.

Independent estimates as data.

Independent estimates of recombination percentages may be included in the input data file in the example format below.

indep_est_OBrien_86

gl25	ms36	20.3 4.2
g3045	KWS12	4.15 0.35
BR46	KWS12	13.21 2.11
ms36	BR46	18.7 7.2
ms36	g3045	9.8 2.1
ms36	gl25	22.8 4.5

Each of the above lines has two marker names followed by recombination PERCENTAGE (!) and standard error of this estimated percentage. The source of such estimates could be any type of linkage experiment, or literature reference.

The above mini data set examplifies the following features.

* a set of independent estimates is preceded by an identifier (indep_est_OBrien_86 in this case; NO white space characters in identi fiers !!). * a particular pair of markers may occur more than once (the pair gl25 / ms36). * the order of the markers within a line is irrelevant. * the order of the pairs within the file is irrelevant.

If a data file contains both raw population data and pairwise estimates, the raw population data must come first.

Fixed sequences.

If the order of sets of (at least three) markers is known in advance, this information can be stored in a "fixed sequence" file which is read by JoinMap. The following is an example if a fixed sequence file with three known sequences.

@ KAG114 mNR12 BAT15 pCOT25
@ KAG114 WG88 AG87 am14 pCOT25
@ am14 AG87 mNR12 KAG114 ZW21

!!! IMPORTANT !!! A fixed sequence may extend over several text lines; each sequence MUST start with the at-sign (@).

The "polarity" of a fixed sequence is irrelevant (see lines 2 and 3 above; JoinMap "looks" either way). JoinMap checks for internal contradictions in a list of fixed sequences; if so, a message is issued and execution stops.

Fixed sequences, even when based on more than circumstancial evidence, may be contradicting the segregation data of the input data file. In that case JoinMap will produce a map fitting closely to the restrictions of the "fixed" sequences, and a warning is issued. The use of fixed sequences will, when not in conflict with the segregation data, substantially reduce computation time. (During the trial-and-error search for the best fitting order, all orders violating the restrictions of fixed sequences are skipped.)

4. OUTPUT.

JoinMap produces (apart from echoing the "job parameters" (see below)) the following output.

1. A complete list of all pairwise recombination estimates (percentages !!) together with their standard errors is written to a file; the format of this file is such that it can be used as input in a later run. The file name is the same as the input file name, except for the extension, which is always "REC". So running JM with a raw data file named will produce an output file . When running JM with different values of to establish linkage groups (see below), using the pairwise data file as input reduces the computation time, since the time consuming calculation of all pairwise recombination estimates from raw data is skipped.

!!! IMPORTANT !!! Input file names MUST be of the form: name.extension, i.e. it must contain a period.

2. When the data suggest the existence of more than one linkage group (by linklod value (see below)), new input data files are created, each of them corresponding to a linkage group. These data files have extension .001, .002, .003, etc. For example, if the data in file suggest the existence of four linkage groups, new data files , , and are extracted from . Remark: new data files are created only for those linkage groups that contain more than two markers.

3. (Optional) At each step in constructing the map, the intermediate map is written to the output file. The output of a map lists the markers with their map distance (centiMorgan) as well as the cumulative map distance. Both a top-down and a bottom-up list are produced (JoinMap does'nt "know" top and bottom of a linkage group).

4. Together with each map, a chi-square value, indicating the overall goodness-of-fit is printed. This feature may be helpful in detecting "troublesome" or "suspect" markers. When, after having added a marker to the map, the (average) chi-square value jumps upward, this indicates that the linkage data for this marker is at variance with the "previous" linkage data. When mapping large linkage groups and using consistent data, the average chi-square value normally fluctuates between 0.5 and 3.0. An average chi-square value of over 6.0 should alarm the mapper.

5. At completion the final map is printed.

6. (Optional) An ordered list of marker pairs and corresponding linkage information. The ordering corresponds to the order in the final map. Should the map read a b c d e f, the ordering of pairs in this list is (a,b) (a,c) (a,d) (a,e) (a,f) (b,c) (b,d) (b,e) (b,f) (c,d) (c,e) etc.

The numbers expressing the linkage information for each pair are the following. a) calculated map distance (joint best estimate). b) "direct" map distance; this is the calculated recombination frequency translated into centiMorgans with the (inverse) mapping function. c) recombination percentage (same as in the list sub 1). d) LOD score (same as in the list sub 1). The difference between the numbers a) and b) is as follows. a) is based on the joint information for all markers and b) is based on the information for this particular pair only. In the ideal case the numbers b) should be a non-decreasing series when running down the list for all pairs involving a particular marker. This will be exeption rather than rule, because in calculating the best fitting map distance (a) the estimates (c) do not contribute equally, but are weighted by their LOD value. Nevertheless, inspection of this list can be helpful in detecting serious inconsistencies in the data.

This output can be suppressed with the switch "pairs:n[o]" (see below). When feeding JM with a data set with over 200 (say) markers the pairwise list may become quite long. Especially when the markers belong to different linkage groups, this pairwise information is not very interesting in most cases; hence the option to suppress it.

7. Finally a map is printed based on the alternative mapping function (the user must supply a mapping function, either Haldane's or Kosambi's). The order of markers in this second map is basically the same as in the first one; some minor rearrangements may occur, however. (A change of mapping function, from Haldane to Kosambi or vice versa, results in a different addition rule for recombination frequencies over adjacent intervals, and, therefore will affect the goodness-of-fit of any given order.)

In addition to the output described above, two small output files (top-down and bottom-up) are created. These merely list the markers and the cumulati- ve map distances. The format of these files (with extensions .DM1 and .DM2) is such that they can be used directly as input to the graphics program DrawMap ((C) Johan W van Ooijen). DawMap is available from Johan W van Ooijen, CPRO-DLO, PO Box 16, 6700 AA Wageningen, The Netherlands; it runs on IBM compatible PCs with the MS-DOS operating system and produces graphics output to a variety of devices.

5. CRITICAL LOD SCORES.

The user has to supply JoinMap with two numbers, the critical LOD score for linkage detection and the critical LOD score to be used in calculating map distances. These are referred to as "linklod" and "maplod", respectively.

Linklod. Linklod is used to establish linkage groups. Pairs of markers with a LOD score less than linklod are considered unlinked. (JoinMap considers a marker to be a member of a linkage group if it is significantly (by LOD score > linklod) linked to at least one other member of the group.) The usual value for linklod in linkage analysis is in the range 2.0 - 3.0. When starting from scratch (i.e. no linkage groups known in advance), it is wise to try several values of linklod. Large values will tend to "fragmen- tation" into many small linkage groups; small values will tend to create few large linkage groups.

Maplod. Maplod is used as follows. Only information for marker pairs with a LOD score above maplod is used in the calculation of map distances. When setting, for example, the value of maplod equal to 0.01, this results in using even very weak linkage information (usually corresponding to recombi- nation values slightly less than 50%). The LOD value however not only depends on the observed recombination frequency; it also depends on other factors, such as dominance/no dominance, coupling phase/repulsion phase, segregation type (F2/backcross/RIL) and population size. The choice of maplod values may, at first sight, seem arbitrary. However, since LOD scores are used as weights in the mapping calculations, the less informati- ve estimates of recombination frequencies contribute less to the jointly estimated map distances. Nevertheless it may be instructive in some cases to set the value of maplod well above 0.05. Especially when dealing with large linkage groups (over 50 markers, say) that cover quite a total map distance, many marker pairs will show insignificant linkage (if at all); to be at the safe side in such a case a maplod value in the range 0.5 - 1.0 will ensure that no information is used which comes from distant markers. The author's experience is that internally consistent data sets with a high level of "interconnectedness" (i.e. sufficient "anchor" points to join distinct maps) are virtually insensitive to the value of maplod. In practice this means that if a change in the value of maplod results in a significantly different ordering of the markers, this indicates inconsis- tency of the data.

6. JOB PARAMETERS.

When running JoinMap it must be supplied with the following information.

1. Name of the input data file. 2. (optional) Name of fixed sequence(s) file. 3. Name of output file. 4. Mapping function (Haldane's or Kosambi's). 5. Critical LOD score for linkage groups (linklod). 6. Critical LOD score for calculation of maps (maplod). 7. Whether or not output of intermediate maps is required. 8. Whether or not the output has to be written to the screen. 9. Whether or not a file with all pairwise recombination data is to written to a file. (This file can be used as input in later runs.) 10.Whether or not the pairwise estimates are written to th standard output file.

These "arguments" can be supplied in two ways, i.e. either from keyboard or as command line arguments.

Arguments 4 through 10 can be set to default "values" by issuing the command "jm set " (see below).

When activating JoinMap by the command "jm " (no command line argu- ments) one has to enter the arguments from keyboard as answers to questi- ons. The defaults, as suggested in square brackets, will be used when hitting just the key at a question.

7. HOW TO RUN JOINMAP.

When having installed JoinMap on your system, it is activated by the command "jm", optionally followed by one or more command line arguments.

A command line has the following general format:

jm [help] [?] [set] [i[nput]]:filename] [o[utput]]:filename] [f[ixed]]:filename] [mf:mapping function] [ll:decimal number] [ml:decimalnumber] [scr[een]]:y[es]/n[o]] [im:y[es]/n[o]] [rec:y[es]/n[o]] [pairs:y[es]/n[o]] [go]

( stands for the carriage return key.)

With the exeption of "help", "?", "set" and "go", arguments consist of two parts separated by a colon (:). It is essential that no white space characters occur on either side of the colon.

In order to facilitate the use of command line arguments the following strings are equivalent.

left of colon:

i/in/input (input file) o/out/output (output file) f/fx/fixed (fixed sequence(s) file) mf/mapfun (mapping function) ll/linklod (linklod) ml/maplod/ (maplod) s/scr/screen (screen output) oi/inter/intermap (output of intermediate maps) rec (output of pairwise estimates to a separate file, to be used as input in later runs) pairs (output if pairwise estimates to standard output file) right of colon:

h/hal/haldane k/kos/kosambi y/yes n/no

This means that, for example, the commands "jm i:grass.dat o:grass.out mf:h ll:2.0 sc:y" and "jm in:grass.dat output:grass.out mapfun:hal linklod:2.0 screen:yes" are fully equivalent.

Command line arguments may be given in ANY ORDER.

When not all the arguments needed by JoinMap are issued in the command line, the user is asked to input the remaining ones from keyboard.

Arguments "help" and "?". The commands "jm help" and "jm ?" will produce a few quick help- screens.

The command "jm set". This enables to set the arguments 4..8 to default values. The defaults are written to the file JM.SET, residing on the current working directory. When the user is to enter arguments from keyboard the suggested defaults are taken from this file. In case JM.SET does not exist, JoinMap will suggest its own defaults.

The use of "go". A command line with "go" as its last argument will cause JoinMap to use the defaults in JM.SET for those arguments that were not listed in the command line.

EXAMPLES.

"jm"

This will cause to appear a number of questions on the screen, some followed by a suggestion (default) in square brackets. If the suggested "value" is to be taken, just hit ; otherwise type the answer.

"jm in:chr3.dat o:chr3.rst"

Data will be read from chr3.dat; output will be written to chr3.rst; Remaining arguments to be entered from keyboard as answers to questions.

"jm input:chr4.dat out:c:\maps\chr4.hou mf:hal scr:no linklod:2.0"

Data are read from chr4.dat; output is written to c:\maps\chr4.hou; Haldane's mapping function is used; no output written to screen; linklod takes the value 2.0; remaining parameters to be entered from keyboard.

"jm in:ch67.dat out:ch67.dat mapfun:kosambi"

This will produce an error warning because in- and outputfile names must be different; execution stops.

"jm out:test56.out in:tes56.dat linklod:3.0 pairs:y maplod:0.1 rec:y go"

Data to be read from tes56.dat; output written to test56.out; no fixed sequece(s) file to be read; sets linklod equal to 3.0; pairwise data are written to standard output file; sets maplod equal to 0.1; a file tes56.rec is created with all pairwise recombination percentages, together with theur standard errors; reads remaining arguments from JM.SET. Execution stops and a message is issued when JM.SET does not exist.

"jm i:bonas.rfl fix:bonas.fix go"

Data to be read from bonas.rfl; output written to bonas.jmo (default), bonas.dm1 and bonas.dm2; fixed sequences to be read from bonas.fix; reads remaining arguments from JM.SET.

Remark. The use of "go" avoids the entering of text from the keyboard. When using JoinMap interactively, the defaults can also used by hitting at each question. This way of entering the arguments is fast enough in most cases. However, when processing batch jobs the use of "go" also facilitates the preparation of the batch file. Assuming that JM.SET resides on the current working directory, a batch file might look like this:

jm in:c45.dat o:c45hal1.out mf:hal ml:0.05 go jm in:c45.dat o:c45kos1.out mf:kos ml:0.05 go jm in:c45.dat o:c45hal2.out mf:hal ml:0.25 go jm in:c45.dat o:c45kos2.out mf:kos ml:0.25 go

Default file names.

Consider the command

"jm i:beta_5.rap go"

In this case no output file is specified, which will cause the output file name to be the deafault, i.e. "beta_5.jmo". In general: if the input file name reads "[path]infilename.ext", the default output file name reads "[path]infilename.jmo." Similarly the file names for the *.dm* files (DrawMap input) are extracted from the output file name. The output file name "[path]outfilename.ext" will cause the files "[path]outfilename.dm1" and "[path]outfilename.dm2" to be created.

When the argument for the fixed sequence(s) file does not occur in the command line, or when at the corresponding question the key is hit, JoinMap assumes that no fixed sequences are to be read. If a filename IS entered, but this file does not exist (or it is for some other reason not available for reading), JoinMap also assumes that no fixed sequences are to be read.

8. ERRORS AND WARNINGS.

Most error messages and warnings are self-explaining. Fatal errors result in halting the execution.

!!! Always inspect the output file for the occurrence of WARNINGS !!!

Format errors in data.

Most errors of this type are caught in an early phase and result in a WARNING to be issued; fatal errors (such as less indivuduals in a populati- on than indicated in the heading) will stop the program.

Some common format error messages. * unknown segregation type encountered (not fatal; data are not used). * illegal genotype code encountered. * end-of-file reached before all data read.

Memory allocation errors.

Depending on the size of the data, memory is (dynamically) allocated; when your system doesn't have enough memory, execution stops after having issued a mesage saying "short of memory". If so, this usually happens in an early phase before calculations have started, but it may also occur later in the process of building up the map.

Part of the memory allocation is not dynamic, i.e. some array bounds are fixed. Version 1.4 of JoinMap can deal with a maximum of 1500 markers in a data set. These limits can only be changed by changing the source code, re- compiling and linking.

Other run-time errors.

At a certain phase in building the map it may occur that "no next marker can be found"; this is the result of the fact that none of the remaining markers (not yet placed on the map) is linked to at least two markers placed on the map so far. (In order to be able to position a next marker on the map unambiguously it must be linked to at leat two other ones on the map.) In such a case the position can be forced by the use of a "fixed sequence" containing this "unplacable" marker and at least two markers that could be mapped before the error occurred.

When data are really messy (such as extremely distorted segregation ratio's with dominant markers), the (iterative) calculation of recombination frequencies may not converge to a (biologically) realistic solution within a reasonable number of iterations. This will halt execution; inspection of the raw data is recommended in that case.

Halting execution without error messages may occasionally occur; this is in most cases due to unrecognized data format errors or otherwise pathological (biologically impossible) data.

9. OPERATING SYSTEMS.

JoinMap presently has been prepared to run under four operating systems, i.e. MS-DOS (version 3.1 and later) for use on IBM-compatible PCs, VAX VMS (version 5.2 and later) for use on a VAX system, UNIX for use on SUN Sparc workstations (SunOS release 4.1.3), and for Macintosh machines.

MS-DOS.

The MS-DOS version is distributed as a file with the name JM.EXE. Special installation is not required; simply copy JM.EXE to a proper directory. When a math coprocessor is installed, it will be used. (Without coprocessor JoinMap may become very slow when running large data sets; (a 80486 machine is to be preferred).)

For use on MS-DOS machines a second version, using 32 bit addressing is also distributed. This version utilizes a DOS extender so that additional memory on your mother board is available for the program. This version will not run machines with a 286 processor; you'll need a 386 or 486 processor. Be sure that your emm386.sys does not use the "noems" switch. The 32 bit version of JoinMap is distributed as an executable with the name JM32.exe.

SUN Workstations (UNIX)

When arrived on disk:

1. Copy the file jmsun.uue to your directory;

2. decode jmsun.uue to jm.tar with the command uudecode jmsun.uue 3. extract all files from the archive-file jm.tar with the command tar xf jm.tar

When arrived by E-mail :

1. extract the file jmsun.uue from the mail;

2. decode jmsun.uue to jm.tar with the command uudecode jmsun.uue

3. extract all files from the archive-file jm.tar with the command tar xf jm.tar

MACINTOSH machines.

When arrived on a floppy disk: No special installation required for Macs. Simply copy the program to to your system and run.

When arived by E-mail:

1. extract the file jmmac.bh from your mail;

2. convert jmmac.bh with BinHex 4.0. this produces a self-extracting archive;

3. running the archive produces a folder with: the joinmap program the documentation file example data files.

VAX VMS.

The VAX VMS version is primarily distributed as the executable file JMVAX.EXE. In order to be able to use command line arguments under VMS, the program must be installed and run as a "foreign command". The easiest way to make JoinMap run properly on a VAX is as follows (consult your system manager if necessary).

1. Copy JMVAX.EXE to a directory.

2. Issue a command like

$ jm == "$ DSK$:jmvax.exe"

($ DSK$ is a device, possibly containing a path name, $ DISK$USERDSK[MYNAME.MAPS.JM] is a syntactically valid device)

Now the symbol jm is installed as a foreign command that invokes the image file jmvax.exe.

It is not unwise to put the command $ jm == "$ DSK$:jmvax.exe" in your login.com file and run the login program; at successive logins the installation as a foreing command is then performed at logging in.

(3. Try "jm ?" or "jm help" to see if it works.)

UUENCODEd binary files, arrived by E-mail must, of course, be UUDECODEd before being executable.

10. EXAMPLE DATA FILES.

When receiving the package on floppy disk, a number of example data input files (EXAM_*.DAT) should be present. These data sets examplify the various types of data input. It may be instructive to inspect these and to feed JoinMap with these example data, to vary the command line argument "valu- es", etc. before analyzing your own data. Use them as an excercise (some of them contain format errors; try to find out what's wrong!).

Happy mapping!