suppressMessages(library(tidyverse))suppressMessages(library(glue))##PRE = "/Users/haekyungim/Library/CloudStorage/Box-Box/LargeFiles/imlab-data/data-Github/web-data"PRE="~/Downloads/"DATA =glue("{PRE}/2024-04-10-gwas-catalog-analysis-2022")if(!file.exists(DATA)) system(glue("mkdir -p {DATA}"))WORK=DATADOWNLOAD_DATE="2024-04-10"## download data from GWAS catalog## https://www.ebi.ac.uk/gwas/docs/file-downloads# - [ ] make sure the DATAFILE name is the same as the one you just downloaded# DATAFILE = "gwas_catalog_v1.0.2-associations_e105_r2022-04-07.tsv"DATAFILE="full"filename =glue("{DATA}/{DATAFILE}")if(!file.exists(filename)) system(glue("wget -P {DATA} https://www.ebi.ac.uk/gwas/api/search/downloads/full"))
Code
gwascatalog =read_tsv(filename)
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 589865 Columns: 34
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (23): FIRST AUTHOR, JOURNAL, LINK, STUDY, DISEASE/TRAIT, INITIAL SAMPLE...
dbl (9): PUBMEDID, UPSTREAM_GENE_DISTANCE, DOWNSTREAM_GENE_DISTANCE, MERGE...
date (2): DATE ADDED TO CATALOG, DATE
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## show number of entries in the GWAS catalog with cancergwascatalog %>%select(`DISEASE/TRAIT`, MAPPED_GENE) %>%filter(grepl("cancer",`DISEASE/TRAIT`)) %>%dim()
[1] 10169 2
Number of distinct loci reported in GWAS of various cancers
Number of entries in the GWAS catalog with cancer, unique loci (it’s a reasonable assumption that mapped genes will be a good way to count loci)
---title: lecture 8date: 2024-04-10author: Haky Imcategories: - bios25328 - lectureeditor_options: chunk_output_type: console---Lecture 8 notes [link](https://www.icloud.com/keynote/025sZaAoLKDAekdsgjQ7-KAhg#L8-QTL-analysis-GTEx-2024)## Download the GWAS catalog```{r prelim}suppressMessages(library(tidyverse))suppressMessages(library(glue))##PRE = "/Users/haekyungim/Library/CloudStorage/Box-Box/LargeFiles/imlab-data/data-Github/web-data"PRE="~/Downloads/"DATA =glue("{PRE}/2024-04-10-gwas-catalog-analysis-2022")if(!file.exists(DATA)) system(glue("mkdir -p {DATA}"))WORK=DATADOWNLOAD_DATE="2024-04-10"## download data from GWAS catalog## https://www.ebi.ac.uk/gwas/docs/file-downloads# - [ ] make sure the DATAFILE name is the same as the one you just downloaded# DATAFILE = "gwas_catalog_v1.0.2-associations_e105_r2022-04-07.tsv"DATAFILE="full"filename =glue("{DATA}/{DATAFILE}")if(!file.exists(filename)) system(glue("wget -P {DATA} https://www.ebi.ac.uk/gwas/api/search/downloads/full"))``````{r}gwascatalog =read_tsv(filename)dim(gwascatalog)names(gwascatalog)``````{r}## show number of entries in the GWAS catalog with cancergwascatalog %>%select(`DISEASE/TRAIT`, MAPPED_GENE) %>%filter(grepl("cancer",`DISEASE/TRAIT`)) %>%dim()```> Number of distinct loci reported in GWAS of various cancersNumber of entries in the GWAS catalog with cancer, unique loci (it's a reasonable assumption that mapped genes will be a good way to count loci)```{r}gwascatalog %>%select(`DISEASE/TRAIT`, MAPPED_GENE) %>%filter(grepl("cancer",`DISEASE/TRAIT`)) %>%unique() %>%dim()``````{r}## plot histogram of GWAS loci by yeargwascat_sig = gwascatalog %>%mutate(year=as.factor(lubridate::year(lubridate::as_date(`DATE ADDED TO CATALOG`)))) %>%filter(`P-VALUE`<5e-8)gwascat_sig %>%filter(year!="2024") %>%ggplot(aes(year)) +geom_bar() +theme_bw(base_size =15) +scale_x_discrete(breaks=c("2008","2012","2016","2020","2022")) +xlab("year") +ylab("GWAS loci reported p<5e-8") +ggtitle(paste0("GWAS Catalog Downloaded ",DOWNLOAD_DATE) )##ggsave(glue::glue("{DATA}/gwas-catalog/gwas-catalog-by-year.pdf"))```