Author

Liz Gibbons

Published

April 12, 2024

Goals and Introduction

In this lab we are going to use the PRSice software to take height GWAS results and predict height of the Personal Genome Project individuals using their genotype data.

Goals

  • Download data from box
  • Download PRSice
  • Run exploratory analysis of data
  • Run PRSice
  • Plot observed versus predicted for height

Setup

load R packages and define paths

You will have to install all these packages first:

(r code block)

Show the code
install.packages("tidyverse")
install.packages("RSQLite")
install.packages("glue")

(r code block)

Show the code
suppressMessages(library(tidyverse))
suppressMessages(library(RSQLite))
suppressMessages(library(glue))

Download data

If you are on a Mac install the mac appropriate binary:

(bash code block)

Show the code
# wget https://github.com/choishingwan/PRSice/releases/download/2.3.5/PRSice_mac.zip

If you are on Posit cloud do the linux binary:

(bash code block)

Show the code
wget https://github.com/choishingwan/PRSice/releases/download/2.3.5/PRSice_linux.zip

Unzip the binary: (bash code block)

Show the code
unzip ./PRSice_linux.zip

move the executable to the directory where you keep your software (optional) ** the first time you run PRSice, it will install additional components make sure PRSice is executable ** chmod u+x PRSice

(bash code block)

Show the code
chmod +x ./PRSice_linux

Terminal Setup

(bash code block)

Show the code
export PRE="/cloud/project"
export DATA="$PRE/Lab-personal-genomes-project-data"

export PRSice="/cloud/project/" # file/path/to/PRSice
export PRSice_os="PRSice_linux" # optional: which operating system is your prsice named for
#$PRSice/$PRSice_os --help
cd $DATA
export WORK=$DATA

Exploration of the Data

open the database with phenotype information and list tables

(r code block)

Show the code
dbname <- paste0(DATA,"/repgp-data.sqlite3") ##This is just to create the file path to the sqlite3 file 
## connect to db
db = RSQLite::dbConnect(SQLite(), dbname)
## list tables
dbListTables(db)

query database users table

(r code block)

Show the code
dbListFields(db,"users")

(r code block)

Show the code
query <- function(...) dbGetQuery(db, ...)
users = query("select * from users")
names(users)

look at distribution by count for each column

(r code block)

Show the code
users %>% count(race)

(r code block)

Show the code
users  %>% count(gender)

(r code block)

Show the code
users %>% count(blood)

plot height distribution by gender

(r code block)

Show the code
users %>% ggplot(aes(height,fill=gender)) + geom_density(alpha=0.6) + ggtitle("Height by gender - Missing gender, *, has bi-modal distr.") + theme_minimal(base_size = 15)

run PRSice

create phenotype data file

create phenotype data file as intersection of the plink formatted fam file and the users table from the database repgp-data.sqlite3

(r code block)

Show the code
fam = read_tsv(glue::glue("{DATA}/repgp.fam"),col_names = FALSE)

(r code block)

Show the code
names(fam)[1:2] = c("FID","IID")
fam <- fam %>% select(FID, IID) %>% inner_join(users %>% select(sample,height,weight,gender), by=c("IID"="sample")) 
write_tsv(fam,file=glue::glue("{DATA}/phenodata.txt"))

execute bash command to run PRSice

(bash code block)

Show the code
mkdir /cloud/project/Lab-personal-genomes-project-data/output

Rscript /cloud/project/PRSice.R --dir /cloud/project \
    --prsice /cloud/project/PRSice_linux \
    --base /cloud/project/Lab-personal-genomes-project-data/ukb_height.gz \
    --target cloud/project/Lab-personal-genomes-project-data/repgp.chr# \
    --snp variant_id \
    --A1 effect_allele \
    --A2 non_effect_allele \
    --stat effect_size \
    --beta \
    --pvalue pvalue \
    --pheno-file /cloud/project/Lab-personal-genomes-project-data/phenodata.txt \
    --pheno-col height \
    --bar-levels 5e-08,5e-07,5e-06,5e-05,5e-04,5e-03,5e-02,5e-01,1 \
    --fastscore \
    --binary-target F \
    --thread 2 \
    --out /cloud/project/Lab-personal-genomes-project-data/output/height_score_all

If you get duplicated SNP ID, follow the instructions

(bash code block)

Show the code
PRSice="/cloud/project/"
WORK="/cloud/project/Lab-personal-genomes-project-data/"

Rscript $PRSice/PRSice.R --dir $PRSice \
    --prsice $PRSice/PRSice_mac \
    --base $WORK/ukb_height.gz \
    --extract $WORK/output/height_score_all.valid \
    --target $WORK/repgp.chr# \
    --snp variant_id \
    --A1 effect_allele \
    --A2 non_effect_allele \
    --stat effect_size \
    --beta \
    --pvalue pvalue \
    --pheno-file $WORK/phenodata.txt \
    --pheno-col height \
    --bar-levels 5e-08,5e-07,5e-06,5e-05,5e-04,5e-03,5e-02,5e-01,1 \
    --fastscore \
    --binary-target F \
    --thread 2 \
    --out $WORK/output/height_score_all

This may take a couple minutes

plot observed vs predicted height

(r code block)

Show the code
## we already have the phenotype data in the fam variable
predicted_height = read_delim(glue::glue("{DATA}/output/height_score_all.best"),delim=" ")

(r code block)

Show the code
combined <- predicted_height %>% inner_join(fam,by=c("IID"="IID")) 
combined %>% ggplot(aes(PRS,height)) + geom_point() + theme_minimal(base_size = 15)

regress observed height with predicted height (PRS)

(r code block)

Show the code
summary(lm(height ~ PRS,data=combined)) %>% coef() 

© HakyImLab and Listed Authors - CC BY 4.0 License