Workshop: Variant Discovery in Next-Generation Sequencing (NGS) Data

The emergence of next-generation sequencing (NGS) technology in biological sciences has revolutionized and fundamentally altered the way we conduct eco-evolutionary research. This inevitably has led to an increasing demand of highly-skilled researchers who are able to effectively analyze and manage such large-scale data sets. Within this frame-work a 2-3 day workshop will be hosted at the Royal Belgian Institute of Natural Sciences in Brussels, Belgium where invited instructors from the Broad Institute (Cambridge, MA) will provide an outline of how to process NGS data using the freely available Broad’s Genome Analysis Toolkit (GATK). GATK is a versatile structured programming framework offering a variety of tools for processing NGS data with a key focus on data quality control and correctly calling SNPs and indels. GATK can handle basic actions such as data access and conversion but also includes a set of specialized tools, called "walkers" that you can use out of the box, individually or chained into scripted workflows, to perform anything from simple data diagnostics to complex "reads-to-results" analyses. During this workshop instructors will focus on the core steps involved in callings variants with GATK “Best Practices” workflow. You will learn why each step is essential to the calling process, what are the key operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.

Visit the GATK forum.

The workshop consisted of lecture-style sessions (first two days) during which the GATK development team explained the rationale, theory and real-life applications of the ‘Best Practices’. During the optional hands-on sessions (third day) the GATK team helped each participant work through interactive exercises and tutorials in which they applied the ‘Best Practices’ to real datasets.

Date: 24-26 June 2014

Venue:  Royal Belgian Institute of Natural Sciences, Vautierstraat 29, 1000 Brussels, Belgium.

Registration: free but mandatory. Closed. The hands-on session (third day) was limited to 25 participants.

This workshop was organized within the framework of the IUAP-project SPEEDY (BELSPO), BRAIN-project ‘GENESORT’ (BELSPO) and the Belgian Network for DNA Barcoding, BeBoL (FWO, JEMU), from which it received financial support.

        

Program:

Tuesday, 24 June 2014 Best Practices for Variant Discovery

  • 10:00 Opening address & introduction
  • 10:15 Intro to NGS for GATK
  • 10:45 Overview of GATK & DNA Best Practices
  • 11:10 Mapping & marking duplicates
  • 11:30 Coffee break / question time
  • 12:00 Indel realignment
  • 12:30 Base quality score recalibration
  • 13:00 Lunch break
  • 14:00 Variant calling
  • 14:55 Variant quality score recalibration
  • 15:30 Coffee break / question time
  • 15:50 Genotype refinement & functional annotation
  • 16:10 Variant manipulation & analysis
  • 16:30 End

Wednesday, 25 June 2014 Beyond the core Best Practices

  • 10:00 Opening address & introduction
  • 10:15 Applying GATK to non-human organisms
  • 10:45 Applying GATK to RNAseq analysis
  • 11:30 Coffee break / question time
  • 12:00 Differences in experimental design: whole genomes, exomes or small targets
  • 12:30 The benefits of analyzing cohorts of samples rather than single samples
  • 13:00 Lunch break
  • 14:00 Quality control of inputs and outputs
  • 14:20 Benchmarking results with standard resources
  • 14:40 Resources, documentation & support
  • 15:00 Coffee break / question time
  • 15:30 Parallelism options in the GATK
  • 15:50 Building pipelines with Queue
  • 16:30 End

Thursday, 26 June 2014 Hands-on exercises (max. 25 participants)
We go through the Best Practices step-by-step using real data sets. Mostly aimed at beginners but basic familiarity with command line tools is expected. People will have to bring their own laptop, Linux or MacOsX required (you can install a virtual machine like VMware Player on a Windows machine).

  • 10:00 Basic setup, usage & resources
  • 10:30 Data processing with GATK and related software
  • 12:30 Lunch break
  • 13:30 Variant calling
  • 14:30 Evaluating your callset
  • 16:00 End