\name{mismatchPlot}
\alias{mismatchPlot}

\title{mismatchPlot}
\description{
Plotting function that returns a \code{ggplot2} object representing the
mismatches and coverages of the specified samples in the specified region.
}
\usage{
mismatchPlot( data, sampledata, samples=sampledata$Sample, windowsize = NULL, position = NULL, range = NULL, plotReference = TRUE, refHeight=8, printReference = TRUE, printRefSize = 2, tickSpacing = c(10,10) )
}
\arguments{
  \item{data}{ The data to be plotted. Returned by \code{h5dapply} or \code{h5readBlock}.}
  \item{sampledata}{ The sampledata for the cohort represented by
    \code{data}. Returned by \code{getSampleData} }
  \item{samples}{A character vector listing the names of samples to be
    plotted, defaults to all samples as described in \code{sampledata}}
  \item{windowsize}{Size of the window in which to plot on each
    side. The total interval that is plotted will be
    [position-windowsize,position+windowsize]}
  \item{position}{The position at which the plot shall be centered}
  \item{range}{Integer vector of two elements specifying a range of coordinates to be plotted, use either position + windowsize or range; if both are provided range overwrites position and windowsize.}
  \item{plotReference}{This boolean flag specifies if a reference track should be plotted, only takes effect if there is a slot named \code{Reference} in the \code{data} object passed to the function}
  \item{refHeight}{Height of the reference track in coverage units (default of 8 = reference track is as high as 8 reads coverage would be in the plot of a sample.)}
  \item{printReference}{Boolean parameter to indicate whether a text representation of the reference should be overlayed to the reference track, can only be true if \code{plotReference} is true.}
  \item{printRefSize}{Size parameter of the \code{geom_text} layer used to print the reference. This value is unitless and needs to be manually optimised for a given plot.}
  \item{tickSpacing}{Integer vector of two elements, specifying the spacing of ticks along the x and y axes respectively.}
}
\details{
  If \code{position} and \code{windowsize} are specified this function creates
  a plot centered on \code{position} using the coverage and
  mismatch counts stored in \code{data}, annotating it with sample
  information provided in the data.frame \code{sampledata} and showing
  all samples listed in \code{sample}. If \code{range} is specified, the plot
  will cover the positions from \code{range[1]} to \code{range[2]}.
  The difference between specifying \code{range} or \code{position} plus
  \code{windowsize} lies only in the labelling of the x-axis and the coordinate
  system used on the x-axis. In the former case the coordinate system is that of
  genomic coordinates as specified in \code{range}, when using the latter the x-axis
  coordinates go from \code{-windowsize} through \code{+windowsize} and position
  \code{0} is marked with the calue provided in the \code{position} parameter.
  Furthermore when a position and windowsize are provided two black lines marking
  the center position are drawn (this is usefull for visualising SNVs)
  
  If neither \code{range}, nor \code{position} and \code{windowsize} are specified the function will try to extract the information from the \code{data} object. If \code{data} is the return value of a call to \code{h5dapply} or \code{h5readBlock} this will work automagically.
  
  The plot has the genomic position on the x-axis. The y-axis encodes values where positive values are on the forward strand and negative values on the reverse. The coverage is shown in grey, deletions in purple and the mismatches in the colors specified in the legend. Note that for each possible mismatch there is an additional color for low-quality counts (coming from the first and last sequencing cycles), so e.g. \code{C} is filled dark red and \code{C_lq} light red.
  
  If data is the result of a call to \code{h5dapply} representing multiple blocks of data as defined in the \code{range} parameter to \code{h5dapply} then the plot will contain the mismatchPlots of each of the ranges plotted next to each other.
}
\value{
  A \code{ggplot} object containing the mismatch plot, this can be used
  like any other ggplot object, i.e. additional layers and styles my be
  applied by simply adding them to the plot.
  }
\author{
Paul Pyl
}

\examples{
  # loading library and example data
  library(h5vc)
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  position <- 29979628
  windowsize <- 30
  samples <- sampleData$Sample[sampleData$Patient == "Patient8"]
  data <- h5readBlock(
    filename = tallyFile,
    group = "/ExampleStudy/16",
    names = c("Coverages", "Counts", "Deletions", "Reference"),
    range = c(position - windowsize, position + windowsize)
  )
  #Plotting with position and windowsize
  p <- mismatchPlot(
    data = data,
    sampledata = sampleData,
    samples = samples,
    windowsize = windowsize,
    position = position
  )
  print(p)
  #plotting with range and modified tickSpacing and refHeight
  p <- mismatchPlot(
    data = data,
    sampledata = sampleData,
    samples = samples,
    range = c(position - windowsize, position + windowsize),
    tickSpacing = c(20, 5),
    refHeight = 5
  )
  print(p)
  #plotting without specfiying range or position
  p <- mismatchPlot(
    data = data,
    sampledata = sampleData,
    samples = samples
  )
  print(p)
  #Plotting multiple regions (with small overlaps)
  library(IRanges)
  dataList <- h5dapply(
    filename = tallyFile,
    group = "/ExampleStudy/16",
    names = c("Coverages", "Counts", "Deletions", "Reference"),
    range = IRanges(start = seq( position - windowsize, position + windowsize, 20), width = 30 )
  )
  p <- mismatchPlot(
    data = dataList,
    sampledata = sampleData,
    samples = samples
  )
  print(p)
}