Usage
To run quasar, use the command ./quasar on the command line after installation. Flags and options specify how quasar will run.
To list all the possible options and see quasar's help you can run:
./quasar --help
To see the version of quasar you are using run:
./quasar --version
Quickstart
We recommend running quasar to perform cis-eQTL mapping using the negative binomial model and adjusted profile likelihood estimation of the negative binomial dispersion parameter. To run quasar in this mode use the command:
./quasar --plink plink_prefix \
--bed phenotype_data.bed \
--cov covariate_data.tsv \
--mode cis \
--model nb_glm \
--use-apl \
--out nb_fit
QTL mapping modes
The quasar software can be run in three modes: cis, trans, gwas. These modes specify which varaints are tested for asscociation with a particular feature.
In mode cis variants within +- the window size of the gene (see phenotype data format for details). By default the window size is set to 1Mb but can be specified using the --window_size flag. We refer to this set of variants as the cis-window for that feature.
In mode trans all variants except those in the cis window are tested for assoication. In mode gwas all variants are are tested for association.
Note that during the development of quasar the cis mode was tested more extensively than the trans and gwas modes and that there are methodological issues with trans-eQTL mapping due to reads mapping to multiple locations causing false-positive associations.
Statistical models
The quasar software package supports a wide range of statistical models used to resiudalise the expression values. The supported models are:
lm: linear modelnb_glm: negative binomial GLMlmm: linear mixed modelp_glmm: Poisson generalised linear mixed model (GLMM)p_glm: Poisson GLM (not recommended due to producing a very high rate of false positives)nb_glmm: negative binomial GLMM (not generally recommended due to producing highly similar results to the Poisson GLMM while being slower, can be used if there is known to be high relatedness between samples)
When the model is a mixed model i.e. is specified to be any of lmm, p_glmm, nb_glmm the --grm flag (see below) must be used to specify a genetic relatedness matrix used in the covarariance matrix of the random effects.
When the nb_glm or nb_glmm flags are specified the --use-apl flag can be specified to use the Cox-Reid adjusted profile likelihood (APL) when estimating the negative binomial dispersion paraemter. Use of the APL is recommended as it reduces the number of false-positives but is slightly slower than standard maximum likelihood estimation.
Data formats
Genotype data
--plink/-p
The genotype data should be in plink2 binary format with .bed/.bim/.fam files named plink_prefix.bed/.bim/.fam
The .bed/.bim/.fam files can be generated from vcf using the following command
plink2 \
--output-chr chrX \
--vcf ${plink_prefix}.vcf.gz \
--out ${plink_prefix}
If using --make-bed with PLINK 1.9 or earlier, add the --keep-allele-order flag.
Phenotype data
--bed/-b
The phenotype data is a tab-seperated file with where rows are features and the first four columns give feature information and the rest are sample ids are the sample ids. For example,
#chr start end phenotype_id sample_1 sample_2 sample_3 ...
1 113871759 113813811 ENSG00000134242 39 43 45 ...
...
The start and end values are used to specify the centre of the cis-window. To specify the gene TSS as the centre of the window, set TSS = start, end = start + 1, so that the cis-window is [TSS - window, TSS + window + 1] or alternatively set the start and end values to the start and end of the gene so that the cis-window is [start - window, end + window].
For the count based models (i.e. nb_glm, p_glm, p_glmm and nb_glmm) count data should be passed to quasar. This can be either RNA-seq counts or pseudobulk scRNA-seq counts (the sum of the counts over the inidivdual). For the linear models (i.e. lm and lmm) we recommend that when analysing scRNA-seq counts the mean over individuals is passed to quasar.
Covariate data
--cov/-c
The covariate data is a tab-separated file with rows as samples and first column sample_id and other columns containing the covariates. For example,
sample_id covariate_1 covariate_2 ...
sample_1 1 5.4 ...
sample_2 1 3.1 ...
...
Genetic relatedness matrix
--grm/-g
A tab separated text file contaning the genetic relatedness-matrix in matrix fomat. For example,
sample_id sample_1 sample_2 sample_3 sample_4 ...
sample_1 1 0.18 0.03 -0.3
sample_2 0.1 1 0.4 0.1
sample_3 0.04 0.45 1 0.1
sample_4 -0.1 0.4 0 1 ...
...
To construct the GRM we recommend using the plink2 --make-king command after pruning variants.
Output
quasar produces two files:
- {out-prefix}-quasar-variant.txt which contains variant information
- {out-prefix}-quasar-cis-gene.txt which contains gene information
This files are written into the directory which quasar is run in.
Option list
| Option | Argument | Type | Description |
|---|---|---|---|
--plink |
FILE | Required | Plink files prefix, assumes that {prefix}.bed, {prefix}.bim, {prefix}.fam exist |
--cov |
FILE | Required | Covariate data file |
--bed |
FILE | Required | Phenotype bed file |
--grm |
FILE | Optional | A (dense) genetic relatedness matrix |
--out |
STRING | Optional | The output file prefix |
--mode |
STRING | Required | The mode used to run quasar in. One of: cis, trans, gwas. |
--model |
STRING | Required | The model used to residualise phenotype data. One of: lm, lmm, p_glm, nb_glm, p_glmm or nb_glmm. |
--window_size |
NUMBER | Optional | The size of the cis window in base pairs. Default: 1000000 |
--use-apl |
FLAG | Optional | Use Cox-Reid adjusted profile likelihood when estimating negative binomial dispersion |
--verbose |
FLAG | Optional | Write additional information to the console |