| Title: | A Pipeline for Meta-Genome Wide Association | 
| Version: | 2.0.4 | 
| Date: | 2018-06-15 | 
| Description: | Correlates variation within the meta-genome to target species phenotype variations in meta-genome with association studies. Follows the pipeline described in Chaston, J.M. et al. (2014) <doi:10.1128/mBio.01631-14>. | 
| License: | MIT + file LICENSE | 
| LazyData: | true | 
| Imports: | ape, coxme, doParallel, dplyr, foreach, iterators, lme4, multcomp, parallel, plyr, qqman, survival, seqinr | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| Depends: | R (≥ 3.0) | 
| RoxygenNote: | 6.0.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2018-06-16 03:47:53 UTC; coripenrod | 
| Author: | Corinne Sexton [aut], John Chaston [aut, cre], Hayden Smith [ctb] | 
| Maintainer: | John Chaston <john_chaston@byu.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2018-07-12 07:20:17 UTC | 
Main OrthoMCL Analysis
Description
Main function for analyzing the statistical association of OG (orthologous group) presence with phenotype data
Usage
AnalyzeOrthoMCL(mcl_data, pheno_data, model, species_name, resp = NULL,
  fix2 = NULL, rndm1 = NULL, rndm2 = NULL, multi = 1, time = NULL,
  event = NULL, time2 = NULL, startnum = 1, stopnum = "end",
  output_dir = NULL, sig_digits = NULL, princ_coord = 0)
Arguments
mcl_data | 
 output of FormatAfterOrtho; a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG  | 
pheno_data | 
 a data frame of phenotypic data with specific column names used to specify response variable as well as other fixed and random effects  | 
model | 
 linear model with gene presence as fixed effect (lm), linear mixed mffect models with gene presence as fixed effect and additional variables specified as: one random effect (lmeR1); two independent random effects (lmeR2ind); two random effects with rndm2 nested in rndm1 (lmeR2nest); or two independent random effects with one additional fixed effect (lmeF2), Wilcox Test with gene presence as fixed effect (wx), Survival Tests with support for multi core design: with two random effects (survmulti), and with two times as well as an additional fixed variable (survmulticensor)  | 
species_name | 
 Column name in pheno_data containing 4-letter species designations  | 
resp | 
 Column name in pheno_data containing response variable  | 
fix2 | 
 Column name in pheno_data containing second fixed effect  | 
rndm1 | 
 Column name in pheno_data containing first random variable  | 
rndm2 | 
 Column name in pheno_data containing second random variable  | 
multi | 
 (can only be used with survival tests) Number of cores  | 
time | 
 (can only be used with survival tests) Column name in pheno_data containing first time  | 
event | 
 (can only be used with survival tests) Column name in pheno_data containing event  | 
time2 | 
 (can only be used with survival tests) Column name in pheno_data containing second time  | 
startnum | 
 number of test to start on  | 
stopnum | 
 number of test to stop on  | 
output_dir | 
 (if using survival tests) directory where small output files will be placed before using SurvAppendMatrix. Must specify a directory if choosing to output small files, else only written as a matrix  | 
sig_digits | 
 amount of digits to display for p-values and means of data; default to NULL (no rounding)  | 
princ_coord | 
 the number of principle coordinates to be included in model as fixed effects (1, 2, or 3), if a decimal is specified, as many principal coordinates as are needed to account for that percentage of the variance will be included in the analysis  | 
Value
A matrix with the following columns: OG, p-values, Bonferroni corrected p-values, mean phenotype of OG-containing taxa, mean pheotype of OG-lacking taxa, taxa included in OG, taxa not included in OG
Examples
#Linear Model
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lm',
 'Treatment', resp='RespVar')
## End(Not run)
# the rest of the examples are not run for time's sake
#Linear Mixed Effect with one random effect
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR1',
'Treatment', resp='RespVar', rndm1='Experiment')
## End(Not run)
#Linear Mixed Effect with two independent random effects
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2ind',
 'Treatment', resp='RespVar', rndm1='Experiment', rndm2='Vial')
## End(Not run)
#Linear Mixed Effect with rndm2 nested in rndm1
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeR2nest',
 'Treatment',  resp='RespVar', rndm1='Experiment', rndm2='Vial')
## End(Not run) 
#Linear Mixed Effect with two independent random effects and one additional fixed effect
## Not run: 
mcl_mtrx3 <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'lmeF2',
 'Treatment', resp='RespVar', fix2='Treatment', rndm1='Experiment', rndm2='Vial', princ_coord = 4)
## End(Not run)
#Wilcoxon Test
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, pheno_data, 'wx',
 'Treatment', resp='RespVar')
## End(Not run)
# ~ 5 minutes
#Survival with two independent random effects, run on multiple cores
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulti',
 time='t2', event='event', rndm1='EXP', rndm2='VIAL', multi=1)
## End(Not run)
# ~ 5 minutes
#Survival with two independent random effects and one additional fixed effect,
#including drops on multi cores
## Not run: 
mcl_mtrx <- AnalyzeOrthoMCL(after_ortho_format, starv_pheno_data, 'TRT', model='survmulticensor',
 time='t1', time2='t2', event='event', rndm1='EXP', rndm2='VIAL', fix2='BACLO', multi=1)
 
## End(Not run)
#to be appended with SurvAppendMatrix
Show Principal Components Breakdown
Description
Function to show Principal Components statistics based on the OrthoMCL presence absence groupings.
Usage
CalculatePrincipalCoordinates(mcl_data)
Arguments
mcl_data | 
 output of FormatAfterOrtho –list of 2 things– 1: binary matrix indicating the presence / absence of genes in each OG and 2: vector of names of OGs  | 
Value
returns a named list of principal components and accompanying proportion of variance for each
Examples
CalculatePrincipalCoordinates(after_ortho_format)
Format file from output of OrthoMCL algorithm before use in AnalyzeOrthoMCL
Description
After running OrthoMCL and/or submitting to www.orthomcl.org, formats the output file to be used in AnalyzeOrthoMCL
Usage
FormatAfterOrtho(file, format = "ortho")
Arguments
file | 
 Path to the OrthoMCL output file  | 
format | 
 Specification of the method by which file was obtained: defaults to 'ortho' for output from orthomcl.org. Other option is 'groups' for output from local run of OrthoMCL software.  | 
Value
a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG
Examples
file <- system.file('extdata', 'orthologGroups.txt', package='MAGNAMWAR')
after_ortho_format <- FormatAfterOrtho(file)
file_grps <- system.file('extdata', 'groups_example_r.txt', package='MAGNAMWAR')
after_ortho_format_grps <- FormatAfterOrtho(file_grps, format = 'groups')
Format all raw GenBank fastas to single OrthoMCL compatible fasta file
Description
Creates the composite fasta file for use in running OrthoMCL and/or submitting to www.orthomcl.org
Usage
FormatMCLFastas(fa_dir, genbnk_id = 4)
Arguments
fa_dir | 
 Path to the directory where all raw GenBank files are stored. Note, all file names must be changed to a 4-letter code representing each species and have '.fasta' file descriptor  | 
genbnk_id | 
 (Only necessary for the deprecated version of fasta headers) The index of the sequence ID in the GenBank pipe-separated annotation line (default: 4)  | 
Value
Returns nothing, but prints the path to the final OrthoMCL compatible fasta file
Examples
## Not run: 
dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR')
dir <- paste(dir,'/',sep='')
formatted_file <- FormatMCLFastas(dir)
## End(Not run)
Join Representative Sequences
Description
Joins the OrthoMCL output matrix to representative sequences
Usage
JoinRepSeq(mcl_data, fa_dir, mcl_mtrx, fastaformat = "new")
Arguments
mcl_data | 
 output of FormatAfterOrtho; a list of matrices; (1) a presence/absence matrix of taxa per OG, (2) a list of the specific protein ids within each OG  | 
fa_dir | 
 Path to the directory where all raw GenBank files are stored. Note, all file names must be changed to a 4-letter code representing each species and have '.fasta' file descriptor  | 
mcl_mtrx | 
 OrthoMCL output matrix from AnalyzeOrthoMCL()  | 
fastaformat | 
 options: new & old; new = no GI numbers included; defaults to new  | 
Value
Returns the original OrthoMCL output matrix with additional columns: representative sequence taxon, representative sequence id, representative sequence annotation, representative sequence
Examples
## Not run: 
dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR')
dir <- paste(dir,'/',sep='')
joined_mtrx_grps <- JoinRepSeq(after_ortho_format_grps, dir, mcl_mtrx_grps, fastaformat = 'old')
## End(Not run)
Manhattan Plot of All Taxa
Description
Manhattan plot that graphs all p-values for taxa.
Usage
ManhatGrp(mcl_data, mcl_mtrx, tree = NULL)
Arguments
mcl_data | 
 FormatAfterOrtho output  | 
mcl_mtrx | 
 output of AnalyzeOrthoMCL()  | 
tree | 
 tree file optional, used for ordering taxa along x axis  | 
Value
a manhattan plot
References
Some sort of reference
Examples
ManhatGrp(after_ortho_format, mcl_mtrx)
#@param equation of line of significance, defaults to -log10((.05)/dim(pdgs)[1])
Plot of a PDG and Data with Standard Error Bars
Description
Bar plot of PDG vs phenotype data with presence of taxa in PDG indicated by color
Usage
PDGPlot(data, mcl_matrix, OG = "NONE", species_colname, data_colname,
  xlab = "Taxa", ylab = "Data", ylimit = NULL, tree = NULL,
  order = NULL, main_title = NULL)
Arguments
data | 
 R object of phenotype data  | 
mcl_matrix | 
 AnalyzeOrthoMCL output  | 
OG | 
 optional parameter, a string with the name of chosen group (OG) to be colored  | 
species_colname | 
 name of column in phenotypic data file with taxa designations  | 
data_colname | 
 name of column in phenotypic data file with data observations  | 
xlab | 
 string to label barplot's x axis  | 
ylab | 
 string to label barplot's y axis  | 
ylimit | 
 optional parameter to limit y axis  | 
tree | 
 optional parameter (defaults to NULL) Path to tree file, orders the taxa by phylogenetic distribution, else it defaults to alphabetical  | 
order | 
 vector with order of taxa names for across the x axis (defaults to alpha ordering)  | 
main_title | 
 string for title of the plot (defaults to OG)  | 
Value
a barplot with taxa vs phenotypic data complete with standard error bars
Examples
PDGPlot(pheno_data, mcl_mtrx, 'OG5_126778', 'Treatment', 'RespVar', ylimit=12)
Number of PDGs vs OGs/PDG
Description
Barplot that indicates the number of PDGs vs OGs(clustered orthologous groups) in a PDG
Usage
PDGvOG(mcl_data, num = 40, ...)
Arguments
mcl_data | 
 FormatAfterOrtho output  | 
num | 
 an integer indicating where the x axis should end and be compiled  | 
... | 
 args to be passed to barplot  | 
Value
a barplot with a height determined by the second column and the first column abbreviated to accomodate visual spacing
Examples
PDGvOG(after_ortho_format_grps,2)
Phylogenetic Tree with Attached Bar Plot and Standard Error Bars
Description
Presents data for each taxa including standard error bars next to a phylogenetic tree.
Usage
PhyDataError(phy, data, mcl_matrix, species_colname, data_colname,
  color = NULL, OG = NULL, xlabel = "xlabel", ...)
Arguments
phy | 
 Path to tree file  | 
data | 
 R object of phenotype data  | 
mcl_matrix | 
 AnalyzeOrthoMCL output  | 
species_colname | 
 name of column in data file with taxa designations  | 
data_colname | 
 name of column in data file with data observations  | 
color | 
 optional parameter, (defaults to NULL) assign colors to individual taxa by providing file (format: Taxa | Color)  | 
OG | 
 optional parameter, (defaults to NULL) a string with the names of chosen group to be colored  | 
xlabel | 
 string to label barplot's x axis  | 
... | 
 argument to be passed from other methods such as parameters from barplot() function  | 
Value
A phylogenetic tree with a barplot of the data (with standard error bars) provided matched by taxa.
References
Some sort of reference
Examples
file <- system.file('extdata', 'muscle_tree2.dnd', package='MAGNAMWAR')
PhyDataError(file, pheno_data, mcl_mtrx, species_colname = 'Treatment', data_colname = 'RespVar',
 OG='OG5_126778', xlabel='TAG Content')
Print OG Sequences
Description
Print all protein sequences and annotations in a given OG
Usage
PrintOGSeqs(after_ortho, OG, fasta_dir, out_dir = NULL, outfile = "none")
Arguments
after_ortho | 
 output from FormatAfterOrtho  | 
OG | 
 name of OG  | 
fasta_dir | 
 directory to fastas  | 
out_dir | 
 complete path to output directory  | 
outfile | 
 name of file that will be written to  | 
Value
A fasta file with all protein sequences and ids for a given OG
Examples
## Not run: 
OG <- 'OG5_126968'
dir <- system.file('extdata', 'fasta_dir', package='MAGNAMWAR')
dir <- paste(dir,'/',sep='')
PrintOGSeqs(after_ortho_format, OG, dir)
## End(Not run)
QQPlot
Description
Makes a qqplot of the p-values obtained through AnalyzeOrthoMCL
Usage
QQPlotter(mcl_mtrx)
Arguments
mcl_mtrx | 
 matrix generated by AnalyzeOrthoMCL  | 
Value
a qqplot of the p-values obtained through AnalyzeOrthoMCL
References
Some sore of reference
Examples
QQPlotter(mcl_mtrx)
Write RAST files to Genbank formats OrthoMCL Analysis
Description
Useful for reformating RAST files to GBK format
Usage
RASTtoGBK(input_fasta, input_reference, out_name_path)
Arguments
input_fasta | 
 path to input fasta file  | 
input_reference | 
 path to a .csv file; it should be downloaded from RAST as excel format, saved as a .csv (saved as the tab-delimited version has compatibility problems)  | 
out_name_path | 
 name and path of the file to write to  | 
Examples
## Not run: 
lfrc_fasta <- system.file('extdata', 'RASTtoGBK//lfrc.fasta', package='MAGNAMWAR')
lfrc_reference <- system.file('extdata', 'RASTtoGBK//lfrc_lookup.csv', package='MAGNAMWAR')
lfrc_path <- system.file('extdata', 'RASTtoGBK//lfrc_out.fasta', package='MAGNAMWAR')
RASTtoGBK(lfrc_fasta,lfrc_reference,lfrc_path)
## End(Not run)
Append Survival Test Outputs
Description
Function used to append all .csv files that are outputted from AnalyzeOrthoMCL into one matrix.
Usage
SurvAppendMatrix(work_dir, out_name = "surv_matrix.csv", out_dir = NULL)
Arguments
work_dir | 
 the directory where the output files of AnalyzeOrthoMCL are located  | 
out_name | 
 file name of outputted matrix  | 
out_dir | 
 the directory where the outputted matrix is placed  | 
Value
A csv file containing a matrix with the following columns: OG, p-values, Bonferroni corrected p-values, mean phenotype of OG-containing taxa, mean pheotype of OG-lacking taxa, taxa included in OG, taxa not included in OG
Examples
## Not run: 
file <- system.file('extdata', 'outputs', package='MAGNAMWAR')
directory <- paste(file, '/', sep = '')
SurvAppendMatrix(directory)
## End(Not run)
Print analyzed matrix
Description
Writes a tab separated version of the analyzed OrthoMCL data with or without the joined representative sequences
Usage
WriteMCL(mtrx, filename)
Arguments
mtrx | 
 Matrix derived from AnalyzeOrthoMCL  | 
filename | 
 File name to save final output  | 
Value
The path to the written file
Examples
## Not run: 
WriteMCL(mcl_mtrx, 'matrix.tsv')
#mcl_mtrx previously derived from AnalyzeOrthoMCL() or join_repset()
## End(Not run)
Formatted output of OrthoMCL.
Description
A list created by inputting the output of OrthoMCL clusters into the FormatAfterOrtho function.
Usage
after_ortho_format
Format
List of 2: (1) presence absence matrix, (2) protein ids:
- pa_matrix
 matrix showing taxa presence/absence in OG
- proteins
 matrix listing protein_id contained in each OG
Formatted output of OrthoMCL.
Description
A list created by inputting the output of OrthoMCL clusters into the FormatAfterOrtho function.
Usage
after_ortho_format_grps
Format
List of 2: (1) presence absence matrix, (2) protein ids:
- pa_matrix
 matrix showing taxa presence/absence in OG
- proteins
 matrix listing protein_id contained in each OG
Final output of join_repset.
Description
A data frame containing the final results of statistical analysis with protein ids, annotations, and sequences added.
Usage
joined_mtrx
Format
A data frame with 17 rows and 11 variables:
- OG
 taxa cluster id, as defined by OrthoMCL
- pval1
 p-value, based on presence absence
- corrected_pval1
 Bonferroni p-value, corrected by number of tests
- mean_OGContain
 mean of all taxa phenotypes in that OG
- mean_OGLack
 mean of all taxa phenotypes not in that OG
- taxa_contain
 taxa in that cluster
- taxa_miss
 taxa not in that cluster
- rep_taxon
 randomly selected representative taxa from the cluster
- rep_id
 protein id, from randomly selected representative taxa
- rep_annot
 fasta annotation, from randomly selected representative taxa
- rep_seq
 AA sequence, from randomly selected representative taxa
Final output of join_repset.
Description
A data frame containing the final results of statistical analysis with protein ids, annotations, and sequences added.
Usage
joined_mtrx_grps
Format
A data frame with 10 rows and 11 variables:
- OG
 taxa cluster id, as defined by OrthoMCL
- pval1
 p-value, based on presence absence
- corrected_pval1
 Bonferroni p-value, corrected by number of tests
- mean_OGContain
 mean of all taxa phenotypes in that OG
- mean_OGLack
 mean of all taxa phenotypes not in that OG
- taxa_contain
 taxa in that cluster
- taxa_miss
 taxa not in that cluster
- rep_taxon
 randomly selected representative taxa from the cluster
- rep_id
 protein id, from randomly selected representative taxa
- rep_annot
 fasta annotation, from randomly selected representative taxa
- rep_seq
 AA sequence, from randomly selected representative taxa
Final output of AnalyzeOrthoMCL
Description
A matrix containing the final results of statistical analysis.
Usage
mcl_mtrx
Format
A matrix with 17 rows and 7 variables:
- OG
 taxa cluster id, as defined by OrthoMCL
- pval1
 p-value, based on presence absence
- corrected_pval1
 Bonferroni p-value, corrected by number of tests
- mean_OGContain
 mean of all taxa phenotypes in that OG
- mean_OGLack
 mean of all taxa phenotypes not in that OG
- taxa_contain
 taxa in that cluster
- taxa_miss
 taxa not in that cluster
Final output of AnalyzeOrthoMCL
Description
A matrix containing the final results of statistical analysis.
Usage
mcl_mtrx_grps
Format
A matrix with 10 rows and 7 variables:
- OG
 taxa cluster id, as defined by OrthoMCL
- pval1
 p-value, based on presence absence
- corrected_pval1
 Bonferroni p-value, corrected by number of tests
- mean_OGContain
 mean of all taxa phenotypes in that OG
- mean_OGLack
 mean of all taxa phenotypes not in that OG
- taxa_contain
 taxa in that cluster
- taxa_miss
 taxa not in that cluster
Triglyceride (TAG) content of fruit flies dataset.
Description
A subset of the TAG content of fruit flies, collected in the Chaston Lab, to be used as a brief example for tests in AnalyzeOrthoMCL.
Usage
pheno_data
Format
A data frame with 586 rows and 4 variables:
- Treatment
 4-letter taxa designation of associated bacteria
- RespVar
 response variable, TAG content
- Vial
 random effect variable, vial number of flies
- Experiment
 random effect variable, experiment number of flies
Starvation rate of fruit flies dataset.
Description
A subset of the Starvation rate of fruit flies, collected in the Chaston Lab, to be used as a brief example for survival tests in AnalyzeOrthoMCL.
Usage
starv_pheno_data
Format
A matrix with 543 rows and 7 variables:
- EXP
 random effect variable, experiment number of flies
- VIAL
 random effect variable, vial number of flies
- BACLO
 fixed effect variable, loss of bacteria in flies
- TRT
 4-letter taxa designation of associated bacteria
- t1
 time 1
- t2
 time 2
- event
 event