GENECONV Molecular Biology Computer Program


GENECONV:  Statistical Tests for Detecting Gene Conversion - Version 1.81a

Given an alignment of DNA or protein sequences, GENECONV finds the most likely candidates for aligned gene conversion events between pairs of sequences in the alignment. The program can also look for gene conversion events from outside of the alignment. Candidate events are ranked by multiple-comparison corrected P-values and listed to a spreadsheet-like output file.

See Geneconv Program Files below to download the program, documentation, and example files, and Geneconv Documentation for program documentation and theory.

NOTES FOR USING GENECONV WITH UNIX: Some standard C library functions (vprintf, vsprintf, etc) are compiled in different ways in different versions of C compilers, sometimes with the same version number for the same compiler. This can cause a segmentation fault in some cases. See NOTES FOR COMPILING IN UNIX below for a simple fix that does not involve recompiling the program.

An earlier version of some of these procedures was described in

Sawyer, S. A. (1989) Statistical tests for detecting gene conversion.
Molecular Biology and Evolution 6, 526-538.

Newer features in GENECONV:

  • a much better score for finding apparent gene conversion events than in Sawyer (1989),
  • can allow for mismatches within possible gene conversion events,
  • better global comparison procedures,
  • sorted spreadsheet-like output,
  • within-group comparisons for more powerful multiple comparison procedures,
  • new methods for detecting gene conversion events from outside of the alignment, and
  • can list ``junction sequences'' immediately adjacent to the endpoints of putative gene conversion events.

    GENECONV can be used in two ways:

  • GENECONV_HELPER, which provides a text windowed interface for entering the input sequence file name and other options, and
  • GENECONV itself, which runs in a command-line window with command-line arguments.

    Both GENECONV versions read an aligned sequence file and write one or more output files. The program GENECONV_HELPER calls GENECONV, so that both programs are needed if you use GENECONV_HELPER.

    Click on A quick start: program input and output if you want to use GENECONV immediately. However, you should also look at Assessing significance: pairwise and global P-values at least briefly to see the difference between global and pairwise fragments and the difference between permutation and KA P-values. (Global fragments have P-values that are multiple-comparison corrected for all possible sequence pairs, while pairwise fragments do not. Both kinds of P-values are naturally corrected for sequence length. Global fragments are more important than pairwise fragments. Low P-values for pairwise fragments might be due to a large number of pairwise comparisons.)

    It might also be helpful to browse through A first example in the GENECONV documentation.

    Input sequence files can be in NEXUS, CLUSTAL, Pearson/FASTA, NBRF/PIR, PHYLIP interleaved, or ASF formats.


    GENECONV PROGRAM FILES   (Version 1.81a) :

    DOCUMENTATION:

    gconvdoc.html (207Kb) -- Program documentation, including examples and explanation of the theory. This file can be read online or else downloaded and read offline as a local file in a Web browser. The HTML file is lightly formatted except for the table of contents and glossary, and could also be read in a text editor or word processor.

    gconvdoc.pdf (214Kb) -- The same program documentation in PDF format. The HTML file is easier to browse because of its internal links, but the PDF file will look nicer if printed.

    WINDOWS 95/98/NT/XP:

    geneconv.exe (194Kb) -- Executable command-line program for Microsoft Windows 95, 98, NT, and XP. See the documentation for program syntax and examples.

    geneconv_helper.exe (56Kb) -- Text-windowed interface for GENECONV. Download BOTH PROGRAMS to the same directory if you prefer windowed environments to command-line programs. GENECONV_HELPER can then be run through the ``My Computer'' or Windows Explorer interfaces.

    dos.examples.tar.gz (52Kb) -- GENECONV sample input and output files. Expands to 42 DOS text files (314Kb) including log files. The file dos.examples.tar.gz can be expanded by any DOS expander program such as WINZIP that understands gzip and tar format. If you have DOS versions of gzip and tar, enter gzip -dv dos.examples.tar.gz and then tar xvf dos.examples.tar. The log files (.sum) have system-dependent information, but the other output files should be identical when generated on all systems. A Windows batch file dosamps.bat is included that generates all of the output files and shows the GENECONV program parameters that were used. (The program parameters are also written to the output file.) The batch file took 54 seconds to run on a 733Mhz Intel machine running Windows NT 4.0.

    dos.source.tar.gz (136Kb) -- C source for GENECONV and GENECONV_HELPER. Expands to 21 DOS text files and one binary icon file (total: 492Kb). This includes a batch file makgconv.bat that calls the Borland C compiler bcc32 to compile GENECONV and GENECONV_HELPER. Also includes C source for a program make2rand that writes random nucleotide alignments. See dos.examples.tar.gz below for information about how to unpack dos.source.tar.gz . Note the text file zaddicon.note  that discusses how icons are added to GENECONV and GENECONV_HELPER. (This is done automatically by makgconv.bat.)

    WARNING: Some Windows browsers can rename ``dos.examples.tar.gz'' as ``dos.examples.tar.tar'' or ``dos.source.tar.gz'' as ``dos.source.tar.tar''. The file has not been corrupted: it has just been renamed. Neither WinZip nor gzip.exe will work correctly if the file extension is .tar. If this happens, rename the file back to ``dos.examples.tar.gz'' or ``dos.source.tar.gz'' and it should work correctly.

    UNIX:

    unix.source.tar.gz (134Kb) -- C source for GENECONV and GENECONV_HELPER for compiling in UNIX. Expands to 18 UNIX text files (477Kb). The source files are identical to the files of the same name in dos.source except that these are UNIX text files. The UNIX batch file makgconv.csh calls the Gnu C compiler gcc to compile GENECONV and GENECONV_HELPER. To run the UNIX batch file, enter csh dosamps.csh or source dosamps.csh . See unix.examples.tar.gz below for information about how to unpack unix.source.tar.gz . See also ``Notes for Compiling in UNIX'' below.

    unix.examples.tar.gz (51Kb) -- The same files as dos.examples.tar.gz above but for UNIX. Expands to 42 text files (306Kb) including log files. To expand this file on a UNIX system, enter gzip -dv unix.examples.tar.gz and then tar xvf unix.examples.tar. The log files (.sum) have system-dependent information, but the other output files should be identical when generated on all systems.
    A UNIX batch file dosamps.csh is included that generates all of the output files and shows the GENECONV program parameters. (The program parameters are also written to the output file.) To run the UNIX batch file, enter csh dosamps.csh or source dosamps.csh .

    NOTES: See also ``AN IMPORTANT NOTE FOR USING UNIX'' and ``NOTES FOR COMPILING IN UNIX'' below.

    
    

    New features in Version 1.81a (5-16-2007)

    Bug corrected in GENECONV_HELPER that caused problems with output file paths with embedded spaces.

    New features in Version 1.80 (8-18-2000) and 1.81 (8-29-2000):

    Option for sorting global fragment lists alphabetically by sequence name as opposed to by P-value. Default of P=0.05 or better for global fragment lists instead of P=0.15. Pairwise fragment lists no longer included as the default. Option for adding a constant to displayed site offsets in output. Distinguished NEXUS comments (``[!...]'') are written to output files in NEXUS format. Change in ASF format so that matchchar must be explicitly specified. Many other small changes and minor bug fixes.

    New features in Version 1.70 (11-21-99):

    Text window interface for easier use, More input sequence file formats. Autosensing of nucleotide vs. protein sequences. Display of fragments with significant permutation Bonferroni-corrected pairwise P-value but not global BLAST-like permutation P-value. Options for batch mode handling. Longest fragments selected in tie groups of overlapping fragments as opposed to leftmost fragment. Various other small changes.

    New features in Version 1.02 (5-31-99):

    Better handling of sequence and file names with embedded spaces and punctuation characters. Expanded documentation file. Dynamically expanding line buffers (previous line sizes were restricted to 2000 bytes). Tolerance for some errors in input files that are mildly damaged by mail programs or word processors. More detailed explanation in output files. Better handling of NEXUS files. Parsed (``distinguished'') comments in ASF format input files. Various other small changes.

    New features in Version 1.01 (1-19-99):

    Improvements in reading NEXUS files. Allowing upper case, lower case, or mixed case NEXUS keywords. Provision for writing ``distinguished'' comments to the output file. Various other small changes.

    Version 1.00 posted 9-14-98.

    
    

    AN IMPORTANT NOTE FOR USING UNIX:

    Some standard C library functions (vprintf, vsprintf, etc) are compiled in different ways in different versions of UNIX C compilers. This can happen with the same version number for the same compiler. (This presumably can happen with C++/Java compilers as well.) This can cause a segmentation fault in some cases. The problem arises only when GENECONV writes to its log file (e.g. myfile.nex.sum), as opposed to its main output file myfile.nex.frags.

    With the command-line program GENECONV, the easiest workaround is to run GENECONV with the syntax (for example)

    geneconv myfile.nex -nolog

    with the extra argument -nolog . This suppresses writing the *.sum file, and more importantly the advantage of avoiding the segmentation fault without changing the source and recompiling. With GENECONV_HELPER, try entering -nolog as part of the command line that you enter in GENECONV_HELPER. I will post a version of GENECONV that avoids this problem as soon as I get a chance. (Since GENECONV_HELPER is basically a wrapper for GENECONV, that would solve the problem with GENECONV_HELPER as well.)

    
    

    NOTES FOR COMPILING IN UNIX:

    1. The file unix.source.tar.gz unpacks to a directory unix.source with C source for GENECONV and GENECONV_HELPER. This directory contains a UNIX batch file makgconv.csh that can be run by entering either source makgconv.csh or csh makgconv.csh. The batch file has two commands that call gcc to compile GENECONV and GENECONV_HELPER.

    2. GENECONV_HELPER uses the function ``system()'' to call GENECONV.  This is not an ANSI-standard function but seems to be supported in most UNIX operating systems. If ``system()'' is not supported, GENECONV_HELPER will not work properly.

    3. Most UNIX compilers require the command-line argument -lm to load the math library when the source files need basic math functions, as is the case here. This argument is included in makgconv.csh.  However, some UNIX compilers load the math library automatically and crash if -lm is entered explicitly. In that case, edit makgconv.csh to remove the argument -lm.

    4. UNIX binaries were compiled and tested on the sample input files using gcc on a variety of UNIX machines. In each case, the output files were indentical with the corresponding Windows outfile files. (See however ``An Important Note for Using UNIX'' above. The log files are not the same since GENECONV log files contain system-dependent information such as timing information in DOS log files.)
    
    

    ADDITIONAL NOTE:

    David Robertson has links to programs for detecting gene conversion on a variety of different platforms by a wide variety of different methods. The Web page is at

    http://bioinf.man.ac.uk/~robertson/recombination/
    
    
    Acknowledgment:
    This work was supported by the National Science Foundation under grants DMS-9707045 and DMS-0107420.
    ``Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.''
    
    

    Molecular Biology Programs Main Page
    Stanley Sawyer's home page

    Send email comments to sawyer@math.wustl.edu

    Stanley Sawyer
    Department of Mathematics
    Washington University in St. Louis
    St. Louis, Missouri 63130, USA

    Web address:   http://www.math.wustl.edu/~sawyer
    Email address:   sawyer@math.wustl.edu

    The program GENECONV is free for academic use, but commercial rights are reserved.
    The program may be freely distributed for academic use, as long as it is not altered or renamed.

    This page has been visited times.

    Last modifications May 16, 2007;  August 11, 2013