archive-edu.com » EDU » R » ROCKEFELLER.EDU

Total: 626

Choose link from "Titles, links and description words view":

Or switch to "Titles and links view".
  • Pawe 3D
    di allelic disease and a marker locus in disequilibrium Our webtool PAWE 3D allows one to perform power calculations considering a range of values for any subset of the eight parameters with the remaining parameters specified at a single value If we consider a range for only one parameter the resulting figure is a graph If we consider a range for exactly two parameters the resulting figure is a contour plot If we consider a range for three or more parameters the resulting figure is a histogram The figures are created by randomly sampling 100 000 data points assuming either a Uniform or Beta prior distribution for values in the n dimensional cube determined by the endpoints of the n user specified intervals For the Beta distribution the user specifies a mean and a variance for the distribution These values are then transformed into parameters necessary for determination of the particular Beta distribution on the 0 1 interval A simple linear transformation maps the Beta distribution on 0 1 to the interval for the specific parameter For more details see the PAWE 3D Helpfile Run Pawe 3D Please cite the following two references when reporting results obtained from PAWE 3D Gordon D Haynes C Blumenfeld J Finch SJ 2005 PAWE 3D visualizing Power for Association With Error in case control genetic studies of complex traits Bioinformatics 21 3935 3937 Gordon D Finch SJ Nothnagel M Ott J 2002 Power and sample size calculations for case control genetic association tests when errors are present application to single nucleotide polymorphisms Hum Hered 54 22 33 If power and or sample size calculations are performing using phenotype misclassification error then please also cite the following papers depending upon the test statistic considered Linear Trend Test Zheng G Tian X 2005 The impact of diagnostic error on testing genetic association in case control studies Stat Med 24 869 882 Genotypic Test Edwards BJ Haynes C Levenstien MA Finch SJ Gordon D 2005 Power and sample size calculations in the presence of phenotype errors for case control genetic association studies BMC Genet 6 18 References Gordon D Finch S J Nothnagel M and Ott J 2002 Power and sample size calculations for case control genetic association tests when errors are present application to single nucleotide polymorphisms Hum Hered 54 22 33 Boehnke M 1986 Estimating the power of a proposed linkage study a practical computer simulation approach Am J Hum Genet 39 513 27 Purcell S Cherny S S and Sham P C 2003 Genetic power calculator design of linkage and association genetic mapping studies of complex traits Bioinformatics 19 149 150 Weeks D E Ott J and M L G 1990 SLINK a general simulation program for linkage analysis Am J Hum Genet 47 A204 Supplement De La Vega F M Gordon D Su X Scafe C Isaac H Gilbert D A and Spier E G 2005 Power and sample size calculations for genetic case control studies using gene centric SNP maps application to

    Original URL path: http://linkage.rockefeller.edu/pawe3d/ (2012-11-26)
    Open archived version from archive


  • simibd
    programs included here you must compile them to run on your machine We recommend that you read the README XSLINK before doing this as there are compilation instructions specific to xslink included there However simply issuing the command make will produce the executables simibd and xslink that can perform the SimIBD and SimISO calculations see reference Note that if you have a version of the program already made you must issue the command make cleanall before attempting to make a new version In our experiences producing optimized code provides a significant increase in speed while still affording correct answers If you are unfamiliar with producing optimized code when compiling see your system administrator for assistance We recommend using gcc with at least one level of optimization the default setting for compiling with make The default compiler and options can be changed by editing the Makefile in this directory USE Using simibd is simply a matter of having a LINKAGE formatted pedigree and locus file Then issue the command simibd pedfile locfile where pedfile is the name of the pedigree file and locfile is the name of the locus or data file You will be prompted for several pieces of information including the family weighting function the total number of replicates this number will determine the accuracy of the resulting p value the number of replicates per xslink run this number allows the run to be broken up into smaller pieces thereby reducing the amount of disk space neecessary for calculation the number of the trait locus the value of the trait signifying affection and the marker locus to be analyzed We recommend using the weighting function 1 sqrt p p is the population frequency of a given allele The number of replicates will vary depending on the level of accuracy desired for the p value and to a lesser extent on the number of affecteds in the pedigree The number of replicates per xslink run is dependent on the amount of disk space that you wish to devote to writing the replicates generated by xslink Try using 10 of the total number of replicates as a starter value i e if you are using 1000 replicates try using 100 replicates per xslink run If you would like to perform either the SimISO and SimAPM statistics in addition to the SimIBD statistic issue the same command as above but with the optional a or u For example to get the SimAPM result issue the command simibd a pedfile locfile The command simibd au pedfile locfile will generate results for SimIBD SimISO and SimAPM Simibd attempts to keep you up to date about its progress and will estimate time necessary to complete the current task at hand When run is complete you will have before you a great deal of information For your convenience the brief output is contained in the file simibd out This contains only the summary values of the statistics and the assosiated p values for each family Other

    Original URL path: http://linkage.rockefeller.edu/soft/simibd.html (2012-11-26)
    Open archived version from archive

  • SIMLINK
    00 60 0 0 80 0 80 M F TRAIT LOCUS DAT PEDIG DAT 31171 2413 19771 The following records in the given order and with variables and formats as described below are required in the control file see Examples 1 Control Information The following nine variables in order each within an 8 column field all but the last right justified 8I8 F8 5 Note This record and its format have been substantially altered since version 4 0 The definition of NTHETA has also been changed to include free recombination Col 1 8 NREP the number of replicate data sets to simulate Col 9 16 NMLOCI the number of marker loci 1 then lod scores are calculated 2 then two markers are assumed to flank the trait locus and location scores are calculated Col 17 24 PENOPT the indicator of the type of penetrance function for the trait 1 a piecewise linear penetrance function for a dichotomous trait 2 a cumulative normal penetrance function for a dichotomous trait 3 a quantitative trait due to a mixture of normal distributions Col 25 32 IFREE indicator of whether free recombination between the trait and marker locus loci is to be simulated 0 if no 1 if yes Col 33 40 NTHETA if using one marker locus the number of different true recombination fractions between the trait and marker loci to be considered Ignored if using two flanking marker loci Col 41 48 IECHO data echoing indicator 0 if data will not be echoed in the output file 1 if data will be echoed in the output file Col 49 56 INDINF identify key individuals by heterozygosity homozygosity status 0 if no 1 if yes Col 57 64 LNKOPT linkage heterogeneity option indicator 0 if genetic homogeneity is assumed 1 if genetic heterogeneity is allowed Col 65 72 ALPHA probability that a pedigree is segregating the linked form of the trait ignored if LNKOPT 0 2 Recombination Fractions Map Distance If lod scores are to be calculated NMLOCI 1 the set of possible true recombination fractions between the trait and marker loci input in fields eight columns wide 8F8 6 If location scores are to be calculated NMLOCI 2 the true map distance in Morgans between the two marker loci only one distance is allowed followed by the distance option variable DISOPT input in fields eight columns wide F8 6 I8 with DISOPT right justified Col 1 8 First true recombination fraction if one marker locus or the true map distance if two marker loci Col 9 16 Second true recombination fraction if one marker locus or DISOPT if two marker loci right justified DISOPT 0 says to allow for multiple locations for the disease locus between the two markers DISOPT 1 says to assume the disease locus is in the middle DISOPT 1 requires much less computation Col 17 24 Third true recombination fraction if one marker locus etc 3 Parameter values for the trait penetrance function For each possible trait genotype gender combination input four parameters per line in fields eight columns wide 4F8 4 see Outline of the Power Calculation line 3 for a male with trait genotype 11 line 4 for a male with trait genotype 12 line 5 for a male with trait genotype 22 line 6 for a female with trait genotype 11 line 7 for a female with trait genotype 12 line 8 for a female with trait genotype 22 Here alleles 1 and 2 correspond to the first and second trait alleles entered in the locus file respectively For a dichotomous trait with a piecewise linear penetrance function PENOPT 1 Col 1 8 minimum age or whatever quantitative variable is to be used Col 9 16 maximum age Col 17 24 minimum penetrance i e penetrance at the minimum age Col 25 32 maximum penetrance i e penetrance at the maximum age Note If a constant penetrance of 80 is desired independent of age a line with the values 0 60 80 80 could be entered For a dichotomous trait with a cumulative normal penetrance function PENOPT 2 Col 1 8 mean age for the penetrance function Col 9 16 standard deviation of age for the penetrance function Col 17 24 minimum penetrance assuming an age of minus infinity Col 25 32 maximum penetrance assuming an age of plus infinity If dealing with a quantitative trait due to a mixture of normal distributions PENOPT 3 Col 1 8 mean trait value at age zero Col 9 16 rate at which the mean trait value changes linearly with age Col 17 24 standard deviation of the trait value at age zero Col 25 32 rate at which the standard deviation of the trait value changes linearly with age 4 Male and female symbols The symbols used to identify males and females in the pedigree file e g M and F or 1 and 2 Enter the symbols in character fields eight columns wide 2A8 Col 1 8 male symbol Col 9 16 female symbol 5 Trait locus name The name given the trait locus in the locus file Enter the name in a character field eight columns wide A8 Col 1 8 trait locus name 6 Locus file name The name of the locus file in character format A 7 Pedigree file name The name of the pedigree file in character format A 8 Seeds for the random number generator These three positive integers will be used to start the random number generator used in the simulation Wichman and Hill 1982 The values should be relatively large though no larger than 32767 and should be changed from one run to the next Input the numbers right justified in fields eight columns wide 3I8 Col 1 8 First random number generator seed Col 9 16 Second random number generator seed Col 17 24 Third random number generator seed Note The control file should end with an end of file symbol B The Locus File The locus file contains information describing the genetic loci involved in the power calculation This includes one trait locus and either one or two marker loci The sample locus file below includes a trait locus and two markers and could be used for a linkage power calculation based on location scores TRAIT AUTOSOME 2 3 d 99 D 01 1 1 d d 2 2 d d D d 3 1 D d MARKER1 AUTOSOME 2 3 1 50 2 50 11 1 1 1 12 1 1 2 22 1 2 2 ABO AUTOSOME 3 4 A 26 B 06 O 68 A 2 A A A O B 2 B B B O AB 1 A B O 1 O O The trait locus has autosomal dominant inheritance with reduced penetrance the specific penetrance functions are described in the control file Because the D allele is relatively rare the D D genotype is assumed impossible and unaffected spouses in the pedigree file see below will be assumed not at risk phenotype 1 While these assumptions are not exactly true they are reasonably accurate and they result in a much simplified power calculation We strongly recommend the use of such assumptions whenever possible It is important to remember that this is a power calculation approximate answers should be quite satisfactory Note excluding either homozygous genotype is not appropriate for an X linked trait since hemizygous males are assumed by MENDEL to be homozygous for their allele The first marker in the locus file is a two allele codominant marker with equal allele frequencies note allele names can be characters including numbers Given no prior interest in a particular marker we generally use such a codominant marker as a compromise along the broad continuum between infinitely polymorphic magic markers at one extreme and two allele polymorphisms with one rare allele at the other extreme The second marker is the ABO locus and demonstrates how dominance relationships are dealt with when all genotypes are allowed for Inspection of this example shows that data on the loci are provided one locus at a time with the following records also see Examples and Lange et al 1988 Trait locus general information the following four variables in 2A8 2I2 format the two integer variables right justified Col 1 8 the name of the trait locus Col 9 16 the chromosomal type of the trait locus AUTOSOME if the trait locus is autosomal X LINKED if the trait locus is X linked Col 17 18 number of alleles at the trait locus must be 2 Col 19 20 number of trait phenotypes by convention this must be 3 for a dichotomous trait see below or 0 for a quantitative trait Trait allele information for each allele a record with the following two variables in A8 F8 5 format Col 1 8 trait allele name Col 9 16 trait allele frequency Note Allele frequencies should sum to 1 0 For each trait phenotype enter record 3 below once and record 4 below once for each trait genotype that corresponds to the particular trait phenotype For dichotomous traits three trait phenotypes are possible 1 normal and not at risk of becoming affected 2 normal and at risk of becoming affected 3 affected Using the not at risk phenotype 1 when possible for example for spouses who marry into the pedigree for a relatively rare trait can result in substantial computational savings since it will usually correspond to fewer possible trait genotypes than the at risk phenotype 2 For quantitative traits by convention zero trait phenotypes are possible Note The dichotomous trait phenotypes must be 1 2 or 3 in that order and the trailing decimal points are required Trait phenotype information dichotomous traits only the following two variables in a record in A8 I2 format the integer variable right justified Col 1 8 trait phenotype name 1 2 or 3 in that order Col 9 10 number of trait genotypes associated with this trait phenotype Trait phenotype genotype correspondence dichotomous traits following each trait phenotype record list the trait genotypes corresponding to that phenotype one record per genotype each genotype in A17 format Each genotype is denoted by its two allele names separated by a slash The slash character should not be part of an allele name Note For an X linked trait no special symbols are required for males If a listed phenotype is appropriate for both females and males only the associated homozygous genotypes will be assigned to a male with the phenotype Internally the program identifies hemizygous genotypes with the corresponding homozygous genotypes Data on the marker loci are provided one locus at a time with the following records 5 8 required for each marker locus Marker locus general information the following four variables in 2A8 2I2 format the two integer variables right justified Col 1 8 the marker locus name Col 9 16 the chromosomal type of the marker locus AUTOSOME if the marker locus is autosomal X LINKED if the marker locus is X linked Col 17 18 number of alleles at the marker locus Col 19 20 number of phenotypes at the marker locus Note Lod location score calculation time can increase rapidly as a function of the number of marker alleles Given more alleles attendant array sizes may also become too large particularly on microcomputers Marker allele information for each allele a record with the following two variables in A8 F8 5 format Col 1 8 marker allele name Col 9 16 marker allele frequency Note Allele frequencies should sum to 1 0 For each phenotype for the current marker enter record 7 below once and record 8 below once for each marker genotype that corresponds to the particular marker phenotype Marker phenotype information the following two variables in a record in A8 I2 format the integer variable right justified Col 1 8 marker phenotype name Col 9 10 number of marker genotypes associated with this marker phenotype Marker phenotype genotype correspondence following each marker phenotype record list the marker genotypes associated with the marker phenotype in one record per marker genotype each genotype in A17 format Each marker genotype is denoted by its two allele names separated by a slash The slash character should not be part of an allele name Note For an X linked trait no special symbols are required for males If a listed phenotype is appropriate for both females and males only the associated homozygous genotypes will be assigned to a male with the phenotype Internally the program identifies hemizygous genotypes with the corresponding homozygous genotypes End of file symbol The locus file must end with one and only one end of file symbol THIS IS CRITICAL On some computers and with some word processors an end of file symbol is added automatically and the symbol is invisible On other computers there is a visible or partially visible symbol All FORTRAN 77 compilers have an ENDFILE command if it is necessary to produce the end of file symbol C The Pedigree File The pedigree file contains information describing the pedigrees identified for use in the power calculation The sample pedigree file below includes two pedigrees of ten and six individuals respectively I3 1X A8 3 A3 1X 2A1 A2 T15 A2 A3 A4 10 FAMILY1 1 M 3 1 80 2 F 1 1 70 3 1 2 F 3 1 80 4 1 2 M 1 1 80 5 8 9 F 3 1 80 6 4 5 M 1 1 80 7 4 5 M 1 1 85 8 M 3 1 80 9 F 1 1 75 10 8 9 F 3 1 50 6 FAMILY2 1 5 6 M 3 1 80 2 F 1 1 70 3 1 2 F 3 1 80 4 1 2 M 3 1 80 5 M 3 1 80 6 F 1 1 80 In the pedigree file two format statements are followed by information on each pedigree one pedigree at a time Pedigree information includes a pedigree description record followed by a record for each pedigree member The following records in the given order and with variables and formats as described below are required in the pedigree file see Examples and Lange et al 1988 Pedigree record format statement This FORTRAN format statement is used to read the pedigree description records It should consist of an integer format for reading the number of individuals in a pedigree and a character format maximum of eight characters for reading the pedigree ID For example I3 1X A8 Individual record format statement This FORTRAN format statement is used to read the individual records Each individual record consists of an ID parents IDs gender MZ twin status trait phenotype for the first time in character format corresponding exactly to what appears in the locus file for a dichotomous trait or a blank field if this is for a quantitative trait trait phenotype again present for both dichotomous and quantitative traits the observable phenotype indicator and penetrance variable such as age In order to read a dichotomous trait phenotype a second time a tab T can be used to reread the previous field two different fields must be read for quantitative trait data see below All items or fields on an individual record should be read in character format A and each should consist of eight characters or less This includes the quantitative variables trait phenotype observable phenotype indicator and penetrance variable for which decimal points are mandatory For example 3 A3 1X 2A1 A2 T15 A2 A3 A4 Pedigree information This record is present once for each pedigree Enter the following two variables in the format specified in record 1 Field 1 the number of individuals in the pedigree right justified Field 2 the pedigree ID optional Individual data This record is present once for each pedigree member For each pedigree member input the following variables in the format specified in record 2 Field 1 Individual s ID Field 2 ID of one of his her parents blank if the parent is not in the pedigree Field 3 ID of the other parent blank if the parent is not in pedigree Field 4 Individual s gender using symbols specified in the control file for example M or F 1 or 2 Field 5 MZ twin status must be left blank since SIMLINK does not allow for MZ twins Field 6 Individual s trait phenotype see note below for quantitative traits Field 7 Individual s trait phenotype again Field 8 Indicator of the availability of the individual s phenotypes if a linkage study is carried out 0 if marker phenotypes should not be simulated and the trait phenotype should be left as specified in the pedigree file 1 if marker phenotypes should be simulated and a trait phenotype should be simulated if not listed in the pedigree file 2 if marker phenotypes should be simulated and the trait phenotype should be left as specified in the pedigree file 3 if marker phenotypes should not be simulated and the trait phenotype should be simulated if not listed in the pedigree file Note These last two options were not available in earlier versions of SIMLINK Field 9 penetrance function variable for example age Note 1 Individual IDs must be unique within pedigrees Note 2 Either both parents or neither parent of a person must be listed in a pedigree Note 3 Missing values for any field must be represented by blanks Note 4 For a dichotomous trait the trait phenotype is read twice for each individual This can be done either by having two identical input fields and reading them both or having a single input field and reading it twice using a tab T in the format statement For a quantitative trait there must be two separate trait phenotype fields The first trait phenotype field must be left blank and the second trait phenotype field must contain the quantitative trait phenotype This approach to input makes it possible to use the same program for both dichotomous and quantitative traits Our apologies for any confusion it may cause End of file symbol The pedigree file must end with one and only one end of file symbol THIS IS CRITICAL On some computers and with some word processors this is done automatically and the symbol is invisible On other computers there is a visible or partially visible symbol All FORTRAN 77 compilers have an ENDFILE command if it is necessary to produce the end of file symbol 7 Compiling and Running SIMLINK There are several approaches to compiling and running SIMLINK If you are using an IBM or compatible microcomputer running DOS and if the default array dimensions with the shipped version see below happen to be appropriate for your problem you can skip compiling and just do the following Copy SIMLINK EXE from the diskette to your hard disk Create the control file the locus file and the pedigree file as described above Return to DOS and type SIMLINK CR where CR represents a carriage return SIMLINK will say hello and ask you for the control input file name see above and the output file name the name of the file to contain the results of the power calculation SIMLINK will then simulate and compute for awhile The SIMLINK EXE included on the diskette was compiled and linked using MICROSOFT FORTRAN version 5 00 with the commands FL c FOR FL c Gt0 SIMLINK FOR FL FeSIMLINK F C00 OBJ These commands may be found in the file MS BAT The advantages of MICROSOFT FORTRAN are that it is widely used generates fast executable code and with academic discount is inexpensive However for DOS based microcomputers I much prefer the Lahey F77L EM 32 FORTRAN This compiler generates executable code that is nearly as fast as that generated by MICROSOFT FORTRAN In addition F77L EM 32 breaks the 640K barrier imposed by DOS This makes it possible to carry out essentially any linkage power calculation on a microcomputer given sufficient time Using MICROSOFT FORTRAN many linkage power calculations simply cannot be carried out due to lack of array space Recompiling with MICROSOFT FORTRAN or with some other FORTRAN compiler will require compiling each of the FOR files on the floppy and linking them together with SIMLINK FOR as the main program The three commands listed above will accomplish this for MICROSOFT FORTRAN For F77L EM 32 the corresponding commands are F77L3 SIMLINK B I L F77L3 SIM1 B I L F77L3 SIM2 B I L F77L3 MEN1 B I L F77L3 MEN2 B I L UP L32 SIMLINK SIM1 SIM2 MEN1 MEN2 These commands may be found in the file F77L BAT SIMLINK has also been successfully compiled and run on SUN workstations and DEC VAX minicomputers To run SIMLINK on a VAX edit SIMLINK and change all occurrences of WRITE 0 to WRITE If you are running VMS use the G FLOATING option rather than the F FLOATING option when you compile SIMLINK For Examples 1 4 described below SIMLINK required about 3 45 17 50 17 40 and 12 00 minutes seconds elapsed time on my EVEREX 386 33MHz IBM compatible when compiled using MICROSOFT FORTRAN as above and about 4 15 20 30 20 30 and 12 30 minutes seconds elapsed time when compiled using F77L EM 32 as above After running SIMLINK two scratch files will exist SIMDOC SCR and SIMERR SCR SIMDOC SCR contains each pedigree member s ID assigned by the user and the corresponding ID used by MENDEL SIMERR SCR contains the MENDEL batch file and error messages produced by MENDEL These error messages will also be output to the screen In addition some error messages will be printed to the output file The ID numbers given in these error messages correspond to those being used by MENDEL To determine which individual s MENDEL is referring to check the file SIMDOC SCR After a successful SIMLINK run these two scratch files can be deleted 8 Output from SIMLINK The output from SIMLINK takes the form of up to seven tables depending on the analyses carried out Maximum lod location scores for each replicate of each pedigree are estimated by quadratic interpolation over the lod location score values calculated at the test recombination fractions map distances Table 1 Summary of Information Used in the Simulation Table 1 summarizes the information used in the simulation This includes the trait locus name the number of pedigree replicates simulated true recombination fractions map distances and the test recombination fractions map distances used Tables 2 and 3 give estimates of the mean maximum lod location score and the probabilities of maximum lod location scores greater than specified constants for each of the true recombination fractions map distances These estimates are given for each pedigree separately listed under 1 2 and so forth for the pedigrees combined assuming genetic homogeneity under SUMMED for the pedigrees combined allowing for between pedigree heterogeneity under SUMMEDH optional and for any one pedigree over all the available pedigrees under ANY The values for a specific pedigree give estimates of the expected information provided by that pedigree The values for the summed pedigrees estimate the expected information provided by pooling the data Pooling the data in this way assumes that the trait is caused by a single genetic locus that is there is no heterogeneity The values for the summed pedigrees allowing for heterogeneity estimates the expected information provided by pooling the data while explicitly allowing for heterogeneity The values under ANY correspond to the information provided when an analysis is carried out under the assumption of genetic heterogeneity and information from different pedigrees is not pooled but the trait is actually homogeneous Table 2 Estimated Mean Maximum Lod Location Score for a Marker Pair This table lists the estimated mean maximum lod location score its standard error and the maximum maximum lod location score among all replicates for each pedigree for the summed pedigrees assuming homogeneity for the summed pedigrees allowing for between pedigree heterogeneity optional and for any of the pedigrees These estimates are reported for each of the true recombination fractions map distances Note Since the maximum of the sum is usually less than the sum of the maxima the expected maximum summed lod location score for all pedigrees combined will usually be less than the sum of the expected maximum lod location scores for the individual pedigrees Table 3 Estimated Probabilities of Maximum Lod Location Scores Greater than Specified Constants for a Linked Marker Pair This table lists the estimates and standard errors of probabilities of maximum lod location scores greater than 0 5 1 0 1 5 2 0 2 5 and 3 0 for each pedigree for the summed pedigrees assuming homogeneity for the summed pedigrees allowing for heterogeneity optional and for any of the pedigrees These values are reported for each of the true recombination fractions map distances For linked loci estimates of the probabilities of maximum lod location scores greater than 3 0 give estimates of the power of a proposed linkage study based on the corresponding data and the assumption of a linked marker or a pair of flanking markers at the given recombination fraction map distance For unlinked loci these same estimates give estimates of the probability of incorrectly inferring linkage to an unlinked marker or pair of markers In statistical terms this estimates the probability a of making a type I error for a single analysis Since many markers will often be considered the overall probability of making a type I error is greater Assuming that the linkage calculations for the different marker pairs are independent the overall probability of making a type I error becomes 1 1 a n where n is the number of marker pairs and represents exponentiation Table 4 Estimated Probabilities of Maximum Location Scores Greater Than Specified Constants Averaged Over the Interval Between the Two Marker Loci This table lists estimates of the average probability when the trait locus is located somewhere between the two marker loci of a maximum location score greater than constants 0 5 1 0 1 5 2 0 2 5 and 3 0 for each pedigree for the summed pedigrees assuming homogeneity for the summed pedigrees allowing for heterogeneity and for any of the pedigrees Table 4 is omitted when simulating only one marker locus or if only a single location for the disease locus was chosen in the control file see above See Boehnke 1986 for a method using two point lod scores to calculate a lower bound on the information provided by flanking markers and location scores Tables 5 and 6 provide estimates of the expected lod location score and probability of a lod location score greater than specified constants when the marker pair is unlinked These tables differ from tables 2 and 3 by reporting values for each test recombination fraction map distance rather than maximizing over all test recombination fractions map distances Tables 5 and 6 can be used to estimate the distance to each side of an unlinked marker pair that is likely to be excluded using the available pedigrees Tables 5 and 6 are included only if free recombination is simulated that is IFREE 1 Table 5 Estimated Mean Lod Location Score for an Unlinked Marker Pair For each test recombination fraction map distance this table gives the estimate of the mean lod location score its standard error and the sample maximum and minimum lod location scores for each pedigree and for the summed pedigrees assuming homogeneity In addition an estimate of the test recombination fraction map distance at which the mean lod location score equals 2 0 is printed This estimate is based on quadratic interpolation of the lod location score This recombination fraction map distance gives an estimate of the expected exclusion distance when testing for linkage to an unlinked marker pair If interpolation is not possible asterisks are printed Table 6 Estimated Probabilities of Lod Location Scores Greater than Specified Constants for an Unlinked Marker Pair For each test recombination fraction map distance estimates and standard errors for the probabilities of lod location scores greater than 2 0 1 5 1 0 2 5 and 3 0 are given For each test recombination fraction map distance one minus the probability of a lod location score greater than 2 0 gives an estimate of the probability that linkage will be excluded for at least that distance from an unlinked marker pair 9 Four Sample Problems Input files for these examples are EXAMPLE CON EXAMPLE LOC and EXAMPLE PED output files are EXAMPLE OUT 1 2 3 4 These files are all included on the diskette Before using SIMLINK for your own data we strongly recommend running the test problems to verify that you are obtaining the same results The example input files should be helpful when you go to prepare input files for your own analyses Example 1 Eight Pedigrees Autosomal Dominant Trait with Piecewise Linear Penetrance Function Each of the eight pedigrees in this example is identical to that described by Ploughman and Boehnke 1989 Eight copies are used to achieve a moderate sized power estimate for demonstration purposes Pedigrees 1 through 8 are segregating an autosomal dominant trait with complete penetrance by age 40 Three pedigree members numbered 4 6 and 7 in each of the pedigrees are unaffected at risk and below the age of 40 The penetrance for these pedigree members is described by a piecewise linear function PENOPT 1 which increases from 0 at age 0 to 1 0 at age 40 for trait genotypes DD and Dd and is 0 at all ages for trait genotype dd The remaining pedigree members are either affected or unaffected and assumed not to be at risk The ages listed for these pedigree members are not needed by the penetrance function and hence need not be correct see pedigree file Only 20 replicates are simulated in this example so that it can be used to quickly check that the program is producing the same results as are given in EXAMPLE1 OUT Control file EXAMPLE1 CON Column numbers are provided for easy reference they are not part of the input file 1 2 3 4 5 6 1234567890123456789012345678901234567890123456789012345678901234 20 1 1 1 4 1 0 0 0 00 0 10 0 20 0 50 2 True rec frac 0 0 40 0 0 0 1 0 3 for males DD 0 0 40 0 0 0 1 0 for males Dd 0 0 40 0 0 0 0 0 for males dd 0 0 40 0 0 0 1 0 for females DD 0 0 40 0 0 0 1 0 for females Dd 0 0 40 0 0 0 0 0 for females dd M F 4 male and female symbols AUTODOM 5 trait locus name EXAMPLE1 LOC 6 locus file name EXAMPLE1 PED 7 pedigree file name 3791 3271 313 8 seeds for random number generator The control line states that 20 replicates will be simulated for each pedigree NREP 20 1 marker locus will be simulated NMLOCI 1 the penetrance function is piecewise linear PENOPT 1 free recombination will be simulated IFREE 1 4 true recombination fractions will be considered NTHETA 4 echo the data IECHO 1 do not examine the effects of individual heterozygosity homozygosity status INDINF 0 and assume the trait is homogeneous LNKOPT 0 Since LNKOPT 0 SIMLINK assumes the linked fraction alpha is 1 Linked marker phenotypes will be simulated at the following true recombination fractions between the trait and marker loci 0 00 0 10 0 20 and 0 50 The minimum age maximum age minimum penetrance and maximum penetrance for the piecewise linear penetrance function for each possible trait genotype gender combination The male and female symbols used in the pedigree file are M and F The trait locus name is AUTODOM in the locus file The locus file name is EXAMPLE1 LOC chosen to make clear the contents of the file The pedigree file name is EXAMPLE1 PED chosen to make clear the contents of the file These three values are chosen as seeds for the random number generator If the same values are used in a later run the same results will be obtained If they are changed the results will change too Locus file EXAMPLE1 LOC Column numbers are provided for easy reference they are not part of the input file 1 2 12345678901234567890123456789 Comments AUTODOM AUTOSOME 2 3 1 Trait locus information D 01 2 Trait allele information d 99 1 1 3 Trait phenotype information d d 4 Pheno geno correspondence 2 2 3 Trait phenotype information D d 4 Pheno geno correspondence d d 4 Pheno geno correspondence 3 1 3 Trait phenotype information D d 4 Pheno geno correspondence MARKER1 AUTOSOME 2 3 5 Marker locus information A 50 6 Marker allele information B 50 AA 1 7 Marker phenotype information A A 8 Pheno geno correspondence AB 1 7 Marker phenotype information A B 8 Pheno geno correspondence BB 1 7 Marker phenotype information B B 8 Pheno geno correspondence The trait locus name is AUTODOM it is autosomal has 2 alleles and 3 phenotypes The 2 trait alleles are the dominant disease susceptibility allele D with allele frequency 0 01 and the recessive allele d with allele frequency 0 99 3 4 There are 3 trait phenotypes phenotype 1 has 1 associated genotype d d phenotype 2 has 2 associated genotypes D d and d d and phenotype 3 has 1 associated genotype D d Because it is so rare genotype D D has been omitted from this analysis reducing the amount of computation time substantially We strongly recommend this approach whenever feasible Note Homozygous genotypes should not be eliminated if the trait locus is X linked 5 The marker locus name is MARKER1 it is autosomal has 2 alleles and 3 phenotypes 2 The 2 marker alleles are A and B each with allele frequency 0 50 3 4 There are 3 marker phenotypes phenotype AA has 1 associated genotype A A phenotype AB has 1 associated genotype A B and phenotype BB has 1 associated genotype B B so that the marker is codominant Pedigree file EXAMPLE1 PED Column numbers are provided for easy reference they are not part of the input file 1 2 12345678901234567890123456789 Comments I3 1X A8 1 Pedigree record format 3 A3 1X 2A1 A2 T15 A2 A3 A4 2 Individual record format 10 FAMILY 1 3 Pedigree information 1 M 3 1 80 4 Individual data 2 F 1 1 70 3 1 2 F 3 1 80 4 1 2 M 2 1 30 5 8 9 F 3 1 80 6 4 5 M 2 1 10 7 4 5 M 2 1 5 8 M 3 1 80 9 F 1 1 75 10 8 9 F 1 1 50 10 FAMILY 2 3 Pedigree information 1 M 3 1 80 4 Individual data 2 F 1 1 70 3 1 2 F 3 1 80 4 1 2 M 2 1 30 5 8 9 F 3 1 80 6 4 5 M 2 1 10 7 4 5 M 2 1 5 8 M 3 1 80 9 F 1 1 75 10 8 9 F 1 1 50 10 FAMILY 8 3 Pedigree information 1 M 3 1 80 4 Individual data 2 F 1 1 70 3 1 2 F 3 1 80 4 1 2 M 2 1 30 5 8 9 F 3 1 80 6 4 5 M 2 1 10 7 4 5 M 2 1 5 8 M 3 1 80 9 F 1 1 75 10 8 9 F 1 1 50 Each pedigree record consisting of the number of individuals in a pedigree and the pedigree ID optional

    Original URL path: http://linkage.rockefeller.edu/soft/simlink.html (2012-11-26)
    Open archived version from archive

  • User's Guide to the TDTae program 2
    are DSB GLHO SPL MA If no loci are specified all will be analyzed We explain each of these options in the order of their appearance above a The default setting for the minimum minor allele frequency of any allele being tested is 10 That means for a SNP with two alleles coded 1 and 2 unless either allele has a frequency of 10 TDTae will not analyze that marker For multi allelic loci with coded alleles 1 2 3 etc unless an allele i or not i i e all other alleles each have frequency of at least 10 TDTae will not analyze that allele This option which requires a positive integer to follow it allows the user to change the minimum frequency For example typing a 20 changes the minimum allele frequency requirement to 20 Our experience with this software is that the minimum allele frequency should be at least 10 The maximization method does not perform well when the minor allele frequency is very small see also Section 4 2 g This option groups together all alleles whose number of appearances is below the minimum count default 30 also see a command above into one allele s This option allows for calculation of support intervals Edwards 1992 for each of the maximum likelihood parameters under H0 and H1 The default setting is 2 that is the endpoints of the 2 unit support interval i e 100 1 odds of the MLEs of each parameter are provided The default setting can be changed by using the b option see directly below b This option enables the user to specify the length of the support interval when calculating using the s option above This option must be followed by a positive integer For example typing b 3 produces a 3 unit support interval instead of the default 2 unit interval n When performing maximization under H0 or H1 a two stage procedure is employed see Section 4 1 below on maximization Once the grid search 1 st stage is finished parameter values corresponding to the largest n log likelihoods are used as starting points for the Powell maximization method Acton 1970 Brent 1973 Jacobs 1977 This option allows the user to specify the number of largest n log likelihoods that will be followed up default is 5 When using this option the user must specify a positive integer indicating the number of largest log likelihoods that will be followed up po This option instructs the program that the format of the pedigree file is post MAKEPED format If this option is not used then the program will assume that the format is pre MAKEPED o This option enables the user to specify the name of the output file It is followed by the user specified name of the file If this option is not used the results will be written to the screen only f Invoking this option enables the user to specify marker names that will be used when reporting results If no such file of marker names is provided then the output file will label each of the markers Locus 1 Locus 2 etc t With this option the user can view what individuals were trimmed by the TDTae program to decrease the computational load Also see the x option next x In its present formulation TDTae s computational time to produce results increases with the number of individuals in a pedigree This option enables the user to trim the number of founders from the pedigree so that the size of the pedigree that is analyzed is reduced This option must be followed by a positive integer The default maximum number of founders in a pedigree is 9 v This option allows the user to view progress of the maximizations for each marker locus and allele It also provides an estimation of the time till completion for each TDTae analysis with a given allele e With this option the user can test whether recoded genotypes on founders are in Hardy Weinberg proportions This option uses the method employed in the HWE program http linkage rockefeller edu ott linkutil htm HWE This option requires that the user specify an output file for the results The next three options are related The default analysis for TDTae involves maximization over the parameters and under the alternative with no constraints placed on the relationship of these parameters As such the resulting test statistic has 2 degrees of freedom df The following options allow the user to perform a 1 df test subject to constraints on the parameters and These tests might be used if the user has some prior knowledge of the mode of inheritance of the trait being studied and wishes to potentially increase power by reducing the degrees of freedom The following constraints are invoked when using the three options d dominant mode of inheritance r recessive mode of inheritance m multiplicative mode of inheritance It is interesting to note that using the m option is equivalent to performing a TDT analysis with the original TDT statistic Weinberg 1999 c This option allows the user to specify the number of cuts c that are used in the first stage of the search procedure see Section 4 1 below This option must be followed by a positive integer greater than 1 The default number of cuts c used is 5 2 2 Example files with this distribution Example pedigree files marker files and output files are provided with the distribution of this software The pedigree files are pedsim err pre simulated data psor17 pre real data from a study of psoriasis pedigrees on chromosome 17 Helms et al 2003 and sito pre real data from a study of sitosterolemia pedigrees on chromosome 2 Lee et al 2001 The corresponding marker files are markers pedsim txt and markers sito txt there is no marker file for the psoriasis data 2 3 Error models To run the TDTae program with your data you will need as mentioned above a pedigree file in either pre or post MAKEPED format You must also specify the particular error model that you will use when performing the TDTae analyses The choices are GLHO Gordon Liu Heath Ott Gordon et al 2001 DSB Douglas Skol Boehnke Douglas et al 2002 SPL Sobel Papp Lange Sobel et al 2002 MA Mote Anderson Mote and Anderson 1965 A brief description of each error model is provided here Notationally we assume that all markers have two possibly down coded alleles labeled 1 and 2 The parameter list for each error model is provided below this description Also see our website http linkage rockefeller edu pawe The GLHO model introduces errors into alleles as opposed to genotypes It is described by 2 parameters The DSB model introduces errors into genotypes and is the only model for which it is not possible for a homozygous 11 genotype to be incorrectly recoded as a homozygous 22 genotype or vice versa It is described by 2 parameters The SPL model is for di allelic loci described by 3 parameters It is the most general error model possible for di allelic loci under the constraint that errors are independent of the particular allele The MA model which is the most general error model possible in the sense that it can describe all other error models is described by 6 parameters The GLHO SPL and MA error models all allow for errors in which one homozygote is incorrectly miscoded as another homozygote Gordon Heath Liu Ott GHLO error model parameters The parameter settings for this error model are E1 Pr 1 allele incorrectly coded as 2 allele E2 Pr 2 allele incorrectly coded as 1 allele Both entries must be positive real numbers less than 1 0 Douglas Skol Boehnke DSB error model The parameter settings for this error model are Gamma Pr homozygous 11 or 22 genotype incorrectly coded as heterozygote 12 Eta Pr heterozygote 12 genotype incorrectly coded as homozygote 11 or 22 Both entries must be positive real numbers less than 1 0 Note for the Eta parameter it is assumed that the 12 genotype has an equal probability 0 5 of being incorrectly coded as 11 or 22 Also the notation used here comes from the Gordon et al 2002 reference Sobel Papp Lange SPL error model The parameter settings for this error model are V 1 Pr true homozygote incorrectly coded as heterozygote V 2 Pr one homozygote incorrectly coded as another homozygote V 3 Pr true heterozygote incorrectly coded as a homozygote Note This parameterization of the SPL error model is an improvement over the parameterization previously used Gordon et al 2002 in that it only requires three parameter settings The author gratefully acknowledges S Seaman and P Holmans for the improvement All entries must be positive real numbers less than 1 0 subject to the following constraints V 1 V 2 1 0 V 3 0 5 Mote and Anderson MA error model The parameter settings for this error model are e21 Pr 12 genotype observed 11 true e31 Pr 22 genotype observed 11 true e12 Pr 11 genotype observed 12 true e32 Pr 22 genotype observed 12 true e13 Pr 11 genotype observed 22 true e23 Pr 12 genotype observed 22 true The following constraints are needed for the MA error model The MA error model is the most robust error model in that it completely characterizes all other error models given certain constraints Therefore it is the best error model to use However it comes with a computational price It requires three more parameters to be maximized than the SPL model and four more than the GLHO and DSB error models 3 0 INTERPRETING RESULTS FROM TDTae OUTPUT A critical ingredient in running the TDTae analysis is interpretation of the outcome The program produces MLEs of parameter estimates values for the TDTae statistic and uncorrected and corrected for multiple testing p values Headings for each of the parameters are as follows r1 MLE of the genotypic relative risk under alternative H1 and null H0 hypotheses r2 MLE of the genotypic relative risk under alternative H1 and null H0 hypotheses p11 MLE of the genotype frequency under alternative H1 and null H0 hypotheses note that the allele being tested is considered the 2 allele for estimation purposes p12 MLE of the genotype frequency under alternative H1 and null H0 hypotheses note that the allele being tested is considered the 2 allele for estimation purposes LogLike Maximum log likelihood estimates of the data under alternative H1 and null H0 hypotheses using two stage search procedure for purposes of programming LogLike is minimized rather than LogLike being maximized LRT The TDTae statistic this quantity is given by the formula 2 LogLike H1 LogLike H0 P p value uncorrected for multiple testing corresponding to the LRT statistic for allele being tested Corrected P value corrected for multiple testing The correction is done as follows if k alleles at marker locus are tested and p is the uncorrected p value corresponding to a particular allele then the corrected p value is given by See our paper Gordon et al 2004 for more details Note that for SNPs no correction for multiple testing is performed Also provided are MLEs for all error model parameters See Section 2 3 1 above for the list of different error model parameters 3 1 Example runs We present here the results of some example runs The first example uses the simulated data provided in this distribution pedsim err txt and markers pedsim txt We comment that the results file below was created by typing tdtae f marker pedsim txt n 20 o pedsim tdtae out pedsim err pre GLHO Note that we chose the error model of Gordon et al Gordon et al 2001 for this analysis because we simulated the data according to that error model Results from program TDTAE Version 2 01 using NR library Written By Chad Haynes and Derek Gordon Please email tdtae linkage rockefeller edu with any bugs or problems Locus 1 SNP1 Allele 1 875 occurrences 29 2 MLE r1 r2 p11 p12 E1 E2 LogLike LRT P Corrected H1 1 048007 1 508604 0 583394 0 378448 0 079448 0 000010 1287 559815 1 568215 0 456527 0 456527 H0 1 000000 1 000000 0 575082 0 385035 0 075341 0 000010 1288 343922 Locus 2 SNP2 Allele 1 770 occurrences 25 7 MLE r1 r2 p11 p12 E1 E2 LogLike LRT P Corrected H1 0 860711 0 338686 0 599816 0 358233 0 055971 0 000010 1214 811515 5 065659 0 079434 0 079434 H0 1 000000 1 000000 0 448872 0 466392 0 000010 0 180059 1217 344344 Locus 3 SNP3 Allele 1 883 occurrences 29 6 MLE r1 r2 p11 p12 E1 E2 LogLike LRT P Corrected H1 0 942563 1 324528 0 580013 0 345977 0 091583 0 084099 1330 769422 0 683097 0 710669 0 710669 H0 1 000000 1 000000 0 608122 0 328371 0 101303 0 047451 1331 110970 Locus 4 SNP4 Allele 1 662 occurrences 22 4 MLE r1 r2 p11 p12 E1 E2 LogLike LRT P Corrected H1 0 775813 0 123752 0 559178 0 362844 0 032243 0 186968 1159 912489 15 579787 0 000414 0 000414 H0 1 000000 1 000000 0 716827 0 251595 0 085297 0 014342 1167 702383 These data were simulated so that the first three markers SNPs 1 3 are null and the last SNP is in both linkage and linkage disequilibrium with a trait locus Also approximately 25 of the parents in this file were not genotyped and genotyping error was simulated according the GLHO model with each error parameter being set to 0 10 Because all data were simulated independently a Bonferroni correction is appropriate Thus we see that even in the presence of missing parental data and genotyping errors the TDTae method provides accurate information in that it indicates that the trait locus is located near SNP 4 The TDTae statistic is not significant at the 5 level for any other marker after the Bonferroni correction We also note that despite the relatively large sample size 500 trios error parameter estimation is not consistent from marker to marker Thus error parameter estimation should be used with caution when considering data from trios Note that for two of the loci 2 and 4 MLEs for genotypic relative risk values and for allele 1 are both less than 1 These values can be converted to genotypic relative risks for the non 1 allele using the formulas where the prime superscript indicates genotypic relative risk for the non 1 allele In the next example we present results for selected markers from the Sitosterolemia data Lee et al 2001 provided in this distribution The pedigree file is sito ped txt and the marker file is markers sito txt We choose the DSB model for our error model although as the output file indicates there are no observed genotyping errors in this data set Also because we know that the disease is inherited in a recessive fashion we chose the r option when running our analyses Section 2 1 1 Command Line Options The advantage of using this option is that there is only one degree of freedom for the corresponding TDTae LRT statistic Also we chose the a 20 option to allow testing for alleles whose minimal number of occurrences is 20 We comment that the results file below was created by typing tdtae a 20 f markers sito txt n 20 v r c 10 o sito tdtae out sito ped txt DSB 13 15 20 Results from program TDTAE Version 2 01 using NR library Written By Chad Haynes and Derek Gordon Please email tdtae linkage rockefeller edu with any bugs or problems Locus 13 D2S4009 Allele 2 39 occurrences 23 5 MLE r1 r2 p11 p12 Gamma Eta LogLike LRT P Corrected H1 1 000000 8 987662 0 606503 0 305744 0 000000 0 000000 52 606985 6 274741 0 012267 0 012267 H0 1 000000 1 000000 0 592418 0 318584 0 000000 0 000000 55 744355 Locus 15 D2S2298 Allele 2 98 occurrences 57 6 MLE r1 r2 p11 p12 Gamma Eta LogLike LRT P Corrected H1 1 000000 22 562454 0 245505 0 554387 0 000000 0 000000 59 997096 35 35158 0 000000 0 000000 H0 1 000000 1 000000 0 228886 0 552130 0 000000 0 000000 77 672887 Locus 20 D2S2174 Allele 1 37 occurrences 22 3 MLE r1 r2 p11 p12 Gamma Eta LogLike LRT P Corrected H1 1 000000 1 479179 0 575724 0 327053 0 000000 0 000000 55 185774 0 158508 0 690551 0 904241 H0 1 000000 1 000000 0 574890 0 326453 0 000000 0 000000 55 265027 Allele 2 38 occurrences 22 9 MLE r1 r2 p11 p12 Gamma Eta LogLike LRT P Corrected H1 1 000000 10000 00 0 691292 0 270247 0 000000 0 000000 41 537543 21 31348 0 000004 0 000008 H0 1 000000 1 000000 0 682654 0 278131 0 000000 0 000000 52 194284 Allele 4 45 occurrences 27 1 MLE r1 r2 p11 p12 Gamma Eta LogLike LRT P Corrected H1 1 000000 4 452777 0 532816 0 397743 0 000000 0 000000 57 419163 4 988331 0 025541 0 050429 H0 1 000000 1 000000 0 527557 0 397515 0 000000 0 000000 59 913329 There are a few interesting things to note about this output First the TDTae statistic is performed for several alleles at each marker As can be noted by studying the maximum LRT value for each marker and the corresponding minimal corrected p value the results are highly significant We comment that genotype relative risk estimates for are large for each marker We comment that the location of the Sitosterolemia genes ABCG5 and ABCG8 are approximately 20

    Original URL path: http://linkage.rockefeller.edu/derek/UserGuideTDTae2.htm (2012-11-26)
    Open archived version from archive

  • 1. introduction
    and tested using vers 2 0 and 2 3 of the VAX C compiler on a Microvax II with 5 Mb memory under the MicroVMS operating system It has been successfully ported to a number of other computers including other VAXes SUN and Apollo workstations and the Mac II CRI MAP requires a lot of memory it is desirable to run it on a computer with at least 1 Mb RAM or virtual memory if you will be analyzing more than 10 loci simultaneously It may be possible to run it on an IBM AT under DOS if your data set is small and you reduce the default memory allocations although we have not tried this yet A small data set with several chromosome 7 markers is provided with the program for the purpose of testing it only I would appreciate being informed of any difficulties in implementing the program bugs errors or gaps in the documentation or suggestions for improvement Historical note Version 1 of CRI MAP was originally written by me in the summer of 1986 in the language APL the portion of that version which does maximum likelihood estimation for a fixed locus order was based on algorithms developed in collaboration with Eric Lander Lander and Green 1987 Collaborative s chromosome 7 map Barker et al 1987 was constructed using that version of CRI MAP running on an IBM XT In the summer of 1987 parts of the original version were translated into C with the help of Steve Crooks and used in constructing the genome map published in Cell Eric and his group at MIT independently constructed maps using the program MAPMAKER At this time I also discovered the layered EM maximum likelihood search method described in Green 1988 In January 1988 I worked out a

    Original URL path: http://linkage.rockefeller.edu/soft/crimap/intro1.html (2012-11-26)
    Open archived version from archive

  • section 2. using cri-map with general pedigrees
    genotypes of ancestors and descendants and compute a likelihood which is the weighted sum of the likelihood expressions for each particular choice of genotype CRI MAP thus ignores some of the available information In cases that we have examined however the information loss appears to be small If the missing locus genotype is in an original parent i e an individual with no ancestors in the pedigree then in a full likelihood analysis the population allele frequencies are used to assign probabilities to various possible genotypes These enter into the likelihood by influencing the probability that the allele in any child of the original parent is derived from that parent CRI MAP does not make use of allele frequencies for any allele in a child of an original parent CRI MAP determines the parental origin when this can be deduced from the other genotype information but otherwise assigns equal probability to the two possible parental origins for the allele In fact little information is lost by this procedure except when the allele is rare For a rare dominant disease if one parent is known to be affected the other should therefore be scored as unaffected rather than as missing for example It should be noted in any case that the frequencies of RFLP alleles have usually been estimated in a different population e g the CEPH family parents From the population from which the disease family is drawn It is our experience that allele frequencies may vary dramatically between populations Therefore it may be inappropriate to perform a full likelihood analysis of disease linkage if the results of that study depend in an essential way on parameters estimated from another population A somewhat more limited analysis which makes no assumptions concerning allele frequencies such as that given by CRI MAP

    Original URL path: http://linkage.rockefeller.edu/soft/crimap/general2.html (2012-11-26)
    Open archived version from archive

  • 3. getting started
    program is crimap chromosome number option Example crimap 7a twopoint The option name must be entered in lower case letters Chromosome number which may consist of any string of digits possibly followed by letters for example 7p or 17nf or 0a may be replaced by the name of the parameter file described below for example crimap chr7a par twopoint You must provide a gen file named in accordance with the conventions described below and residing in the same directory as CRI MAP which contains the raw genotype data and run the prepare option first in order to create the other files required by the program All program output apart from specific information written to one of the four files described in the next section is displayed using the printf function in C It will thus be displayed on the terminal unless redirected to a file by means of commands to the operating system if the program is run interactively or written to a log file if the program is run in batch mode The latter procedure is the most convenient way to make a copy of the output In UNIX and some other operating systems one can simply redirect the output to a file The prepare and merge options are the only ones requiring interactive input As a test run with the chromosome 7 data set chr7a gen provided with the program use the command crimap 7a prepare to create a dat file and a par file for subsequent use by the option all with the loci 2 8 9 10 Specify any two of these as the ordered loci and the other two as the inserted loci Use default values for the other parameters NOTE If the program stops prematurely displaying the message Your compiler uses a different size

    Original URL path: http://linkage.rockefeller.edu/soft/crimap/start3.html (2012-11-26)
    Open archived version from archive

  • 4. file structures
    by prepare but required only for the map building options build instant and quick You will need to learn about the structures of the gen and par files but can ignore the descriptions of the other file types if you wish Each file is in ASCII format and can be edited with a text editor For readability the user can insert additional blank or end of line characters into these

    Original URL path: http://linkage.rockefeller.edu/soft/crimap/file4.html (2012-11-26)
    Open archived version from archive