Date created:    Apr 15, 2021
Last modified:   Oct 15, 2021

Abstract

This example shows how to generate a Manhattan plot for a given human genome.


Introduction

Manhattan plot is used to visualize large datasets, with large number of data points, and a significant distribution (variance) for some of these points (features). This is a primary visualization tool in genom-wide association studies (GWAS).

For a basic example of a Manhattan plot used in genomics reasearch see: https://en.wikipedia.org/wiki/Manhattan_plot .

The below example is heavely based on the Vignette (documentation, example) from the R Bioconducotr Sushi package by Douglas H. Phanstiel, Alan P. Boyle, Carlos L. Araya, and Mike Snyder (https://www.bioconductor.org/packages/release/bioc/vignettes/Sushi/inst/doc/Sushi.pdf).


Install Sushi package

In oder to produce a Manhattan plot with R Bioconductor, one can use the Sushi package (https://www.bioconductor.org/packages/release/bioc/vignettes/Sushi/inst/doc/Sushi.pdf):

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Sushi")
## Bioconductor version 3.12 (BiocManager 1.30.16), R 4.0.3 (2020-10-10)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
##   re-install: 'Sushi'
library('Sushi')
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: biomaRt

Data

hg18 chromosomes

Sushu_hg18_genome stores the information on the length of human chromosomes as described by NCBI36/hg18 genome build (see: https://www.bioconductor.org/packages/release/bioc/manuals/Sushi/man/Sushi.pdf).

We can display the number of base pairs in each chromosome (in millions, i.e., 1*106):

data(Sushi_hg18_genome)
data.frame(Sushi_hg18_genome$V1, Sushi_hg18_genome$V2/1000000)
##    Sushi_hg18_genome.V1 Sushi_hg18_genome.V2.1e.06
## 1                  chr1                  247.24972
## 2                 chr10                  135.37474
## 3                 chr11                  134.45238
## 4                 chr12                  132.34953
## 5                 chr13                  114.14298
## 6                 chr14                  106.36858
## 7                 chr15                  100.33892
## 8                 chr16                   88.82725
## 9                 chr17                   78.77474
## 10                chr18                   76.11715
## 11                chr19                   63.81165
## 12                 chr2                  242.95115
## 13                chr20                   62.43596
## 14                chr21                   46.94432
## 15                chr22                   49.69143
## 16                 chr3                  199.50183
## 17                 chr4                  191.27306
## 18                 chr5                  180.85787
## 19                 chr6                  170.89999
## 20                 chr7                  158.82142
## 21                 chr8                  146.27483
## 22                 chr9                  140.27325

For the reference number of base pairs per chromosome see: https://www.ncbi.nlm.nih.gov/books/NBK22266/.

GWAS bed data

Bed data representing human genes with coordinates from NCBI36/hg18 genome build (see: https://www.bioconductor.org/packages/release/bioc/manuals/Sushi/man/Sushi.pdf).

The beginning of the GWAS bed data frame:

data(Sushi_GWAS.bed)
head(Sushi_GWAS.bed)
##   chr.hg18 pos.hg18 pos.hg18.1       rsid pval.GC.DBP V6
## 1     chr1  1695996    1695996  rs6603811    0.003110  .
## 2     chr1  1696020    1696020  rs7531583    0.000824  .
## 3     chr1  1698661    1698661 rs12044597    0.001280  .
## 4     chr1  1711339    1711339  rs2272908    0.001510  .
## 5     chr1  1712792    1712792  rs3737628    0.001490  .
## 6     chr1  1736016    1736016 rs12408690    0.004000  .

Number of coordinates (number of rows):

nrow(Sushi_GWAS.bed)
## [1] 32760

Note the names of the columns:

names(Sushi_GWAS.bed)
## [1] "chr.hg18"    "pos.hg18"    "pos.hg18.1"  "rsid"        "pval.GC.DBP"
## [6] "V6"

Manhattan Plot

A “genome” parameter is required to set how many base pairs there are in each chromosome. The number of base pairs is reflected in the distance for each chromosome on the X axis.

Please note that p-values (pval.GC.DBP) are stored in the 5th column of the data frame (i.e., Sushi_GWAS.bed[,5]).

plotManhattan(bedfile=Sushi_GWAS.bed,
              pvalues=Sushi_GWAS.bed[,5],
              col=SushiColors(6),
              genome=Sushi_hg18_genome,
              cex=0.75)

labelgenome(genome=Sushi_hg18_genome,n=4,scale="Mb",
            edgeblankfraction=0.20,cex.axis=.5)

axis(side=2,las=2,tcl=.2)
mtext("log10(P)",side=2,line=1.75,cex=1,font=2)
mtext("chromosome",side=1,line=1.75,cex=1,font=2)