The genio
(GENetics I/O) package provides easy-to-use
and efficient readers and writers for formats in genetics research.
Currently targets Plink, Eigenstrat, and GCTA formats (more to come).
Plink BED/BIM/FAM and GCTA GRM formats are fully supported. Lightning
fast read_bed
and write_bed
(written in Rcpp)
reads and writes genotypes between native R matrices and Plink BED
format. make_*
functions create default FAM and BIM files
to go with simulated genotype data. Otherwise, the package consists of
wrappers for readr
functions that add missing extensions
and column names (often absent in these files).
You can install the released version of genio from CRAN with:
install.packages("genio")
Install the latest development version from GitHub:
install.packages("devtools") # if needed
library(devtools)
install_github("OchoaLab/genio", build_vignettes = TRUE)
You can see the package vignette, which has more detailed documentation, by typing this into your R session:
vignette('genio')
Load library:
library(genio)
Note that write_plink
writes all three BED/BIM/FAM files
together, while each write_{bed,bim,fam}
function creates a
single file.
# write your genotype matrix stored in an R native matrix
# (here we create a small example with random data)
# create 10 random genotypes
<- rbinom(10, 2, 0.5)
X # replace 3 random genotypes with missing values
sample(10, 3)] <- NA
X[# turn into 5x2 matrix
<- matrix(X, nrow = 5, ncol = 2)
X
# also create a simulated phenotype vector
<- rnorm(2) # two individuals as above
pheno
# write simulated data to all BED/BIM/FAM files in one handy command
# missing BIM and FAM columns are automatically generated
# data dimensions are validated for provided data
write_plink('random', X, pheno = pheno)
### same thing in separate steps:
# create default tables to go with simulated genotype data
<- make_fam(n = 2)
fam <- make_bim(n = 5)
bim # overwrite with simulated phenotype
$pheno <- pheno
fam
# write simulated data to BED/BIM/FAM separately (one command each)
# extension can be omitted and it still works!
write_bed('random', X)
write_fam('random', fam)
write_bim('random', bim)
# read individual and locus data into "tibbles"
# read plink data all at once
<- read_plink('sample')
data # extract genotypes and annotation tables
<- data$X
X <- data$bim
bim <- data$fam
fam
# Plink files read individually
<- read_bim('sample.bim')
bim <- read_fam('sample.fam')
fam <- read_bed('sample.bed', nrow(bim), nrow(fam))
X
# Eigenstrat formats
<- read_snp('sample.snp')
snp <- read_ind('sample.ind')
ind
# in all cases extension can be omitted and it still works!
<- read_bim('sample')
bim <- read_fam('sample')
fam <- read_snp('sample')
snp <- read_ind('sample')
ind
# write these data to other files
# here extensions are also added automatically
# write all plink files together, ensuring consistency
write_plink('new', X, bim, fam)
# write plink files individually
write_fam('new', fam)
write_bim('new', bim)
write_bed('new', X)
# Eigenstrat files
write_ind('new', ind)
write_snp('new', snp)
# read data from GRM files:
# - sample.grm.bin (kinship matrix),
# - sample.grm.N.bin (sample sizes matrix), and
# - sample.grm.id (family and ID table for individuals in this data)
<- read_grm( 'sample' )
obj # the kinship matrix
<- obj$kinship
kinship # the pair sample sizes matrix
<- obj$M
M # the fam and ID tibble
<- obj$fam
fam
# write data into new GRM files
# writes: new.grm.bin, new.grm.N.bin, new.grm.id
write_grm( 'new', kinship, M = M, fam = fam )