Summary
Guide on how to create igraph objects from pathways imported from Pathway Commons.
Package
graphsim 1.0.2
1 Importing Pathways
1.1 Motivations
Here we demonstrate how to create igraph objects (Csardi and Nepusz 2006) for pathways compatible with graphsim. We provide example objects with the package and these examples contain additional details showing how these are imported into R. This uses the paxtoolsr package class=“citation”>(Luna et al 2015) from Bioconductor.
Graph object have edge properties (Barabási and Oltvai 2004). Here we show how to define the “state” parameter which can be used to differentiate inhibitions. We use different arrowheadsto show these as per convention in molecular biology.
1.2 Getting started
The bioconductor package to import data can be installed as follows.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("paxtoolsr")To generate perform these step the following packages must be imported.
library("igraph")
library("graphsim")
library("paxtoolsr")1.3 Importing data
We will demonstrate downloading <a href=“https://reactome.org/>Reactome pathway (Croft et al. 2014) from the <a href=”http://www.pathwaycommons.org/>Pathway Commons. Reactome pathways are also available in the href="https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html>reactome.db package (Fabregat et al. 2017; Ligtenberg 2019) but this only contains the gene set information. We use Pathway Commons as it contains direction graph structure information for the edges.
We downlaad the Pathway Commons release into Extended Simple Interaction Format (SIF) Network format. We use a legacy version (y) to reproduce the results.
# Importing data
results <- downloadPc2(version = 7, selectedFileName = "Pathway%20Commons.7.Reactome.BIOPAX.owl.gz")However we recommend using the latest version which is:
print(latest_version)
#> [1] 12Run the following code to download the new version.
The results will be cached here:
Sys.getenv("PAXTOOLSR_CACHE")1.3.1 Searching results
We then query the results to find the pathways of interest. See the paxtoolsr vignette for details.
## Search Pathway Commons for 'PI3K'-related pathways
searchResults <- searchPc(q = "PI3K", type = "pathway", verbose = TRUE)
#> URL: http://www.pathwaycommons.org/pc2/search.xml?q=PI3K&page=0&type=pathway
pathways <- xpathSApply(searchResults, "/searchResponse/searchHit/name", xmlValue)
length(pathways)
#> [1] 100
head(pathways)
#> [1] "PI3K events in ERBB2 signaling"
#> [2] "Class IB PI3K non-lipid kinase events"
#> [3] "Erythropoietin activates Phosphoinositide-3-kinase (PI3K)"
#> [4] "Activated NTRK3 signals through PI3K"
#> [5] "Activated NTRK2 signals through PI3K"
#> [6] "Trk receptor signaling mediated by PI3K and PLC-gamma"1.3.2 Downloading a Pathway
We can create a local OWL file in this manner.
## Search Pathway Commons for 'PI3K'-related pathways
searchResults <- searchPc(q = "PI3K Cascade", type = "pathway", verbose = TRUE)
#> URL: http://www.pathwaycommons.org/pc2/search.xml?q=PI3K%20Cascade&page=0&type=pathway
pathways <- xpathSApply(searchResults, "/searchResponse/searchHit/name", xmlValue)
length(pathways)
#> [1] 100
head(pathways)
#> [1] "PI3K Cascade"
#> [2] "IGF1R signaling cascade"
#> [3] "IRS-mediated signalling"
#> [4] "LPA receptor mediated events"
#> [5] "Insulin receptor signalling cascade"
#> [6] "EPHA2 forward signaling"We then select the first pathway.
pathway <- pathways[1]
pathway
#> [1] "PI3K Cascade"We save it to a temporary OWL file. First we extract the required columns.
library("plyr")
#>
#> Attaching package: 'plyr'
#> The following object is masked from 'package:paxtoolsr':
#>
#> summarize
#convert to data frame
searchResultsDf <- ldply(xmlToList(searchResults), data.frame)
dim(searchResultsDf)
#> [1] 105 22
# Simplified results
simplifiedSearchResultsDf <- searchResultsDf[, c("name", "uri", "biopaxClass")]
head(simplifiedSearchResultsDf)
#> name
#> 1 PI3K Cascade
#> 2 IGF1R signaling cascade
#> 3 IRS-mediated signalling
#> 4 LPA receptor mediated events
#> 5 Insulin receptor signalling cascade
#> 6 EPHA2 forward signaling
#> uri
#> 1 https://identifiers.org/reactome/R-HSA-109704
#> 2 https://identifiers.org/reactome/R-HSA-2428924
#> 3 https://identifiers.org/reactome/R-HSA-112399
#> 4 http://pathwaycommons.org/pc12/Pathway_ebbd43e6d7ede5ba46b0b03c4566c06f
#> 5 https://identifiers.org/reactome/R-HSA-74751
#> 6 http://pathwaycommons.org/pc12/Pathway_41112300a6e2adfd271ada175fc3f63d
#> biopaxClass
#> 1 Pathway
#> 2 Pathway
#> 3 Pathway
#> 4 Pathway
#> 5 Pathway
#> 6 PathwayThen we write to a temp file.
## Use an XPath expression to extract the results of interest. In this case, the
## URIs (IDs) for the pathways from the results
tmpSearchResults <- xpathSApply(searchResults, "/searchResponse/searchHit/uri", xmlValue)
## Generate temporary file to save content into
biopaxFile <- "bioxpax-reactome-pi3k-cascade.owl"
## Extract a URI for a pathway in the search results and save into a file
idx <- which(grepl("reactome", simplifiedSearchResultsDf$uri) & grepl("PI3K Cascade",
simplifiedSearchResultsDf$name, ignore.case = TRUE))
uri <- simplifiedSearchResultsDf$uri[idx]
saveXML(getPc(uri, format = "BIOPAX"), file = biopaxFile)
#> [1] "bioxpax-reactome-pi3k-cascade.owl"1.3.3 Create SIF object
We convert to th eExtended Simple Interaction Format (SIF) Network format. This gives a matrix of nodes for genes and edges for relationships bewteen.
resultsSIF <- toSifnx(inputFile = biopaxFile)
print(paste(c("nodes:", nrow(resultsSIF$nodes))))
#> [1] "nodes:" "44"
print(paste(c("edges:", nrow(resultsSIF$edges))))
#> [1] "edges:" "1219"With node properties:
results.nodesDF <- as.data.frame(resultsSIF$nodes)
head(results.nodesDF)
#> PARTICIPANT PARTICIPANT_TYPE PARTICIPANT_NAME
#> 1 Q8NEB9 ProteinReference PK3C3_HUMAN
#> 2 Q06124 ProteinReference PTN11_HUMAN
#> 3 Q9UEF7 ProteinReference KLOT_HUMAN
#> 4 Q99570 ProteinReference PI3R4_HUMAN
#> 5 O95750 ProteinReference FGF19_HUMAN
#> 6 P12034 ProteinReference FGF5_HUMAN
#> UNIFICATION_XREF RELATIONSHIP_XREF
#> 1 uniprot knowledgebase:Q8NEB9 hgnc symbol:PIK3C3
#> 2 uniprot knowledgebase:Q06124 hgnc symbol:PTPN11
#> 3 uniprot knowledgebase:Q9UEF7 hgnc symbol:KL
#> 4 uniprot knowledgebase:Q99570 hgnc symbol:PIK3R4
#> 5 uniprot knowledgebase:O95750 hgnc symbol:FGF19
#> 6 uniprot knowledgebase:P12034 hgnc symbol:FGF5With edge properties:
results.edgesDF <- as.data.frame(resultsSIF$edges)
head(results.edgesDF)
#> PARTICIPANT_A
#> 1 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 2 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 3 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 4 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 5 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> 6 1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate
#> INTERACTION_TYPE
#> 1 used-to-produce
#> 2 used-to-produce
#> 3 reacts-with
#> 4 consumption-controlled-by
#> 5 consumption-controlled-by
#> 6 consumption-controlled-by
#> PARTICIPANT_B
#> 1 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> 2 ADP
#> 3 ATP
#> 4 O00459
#> 5 O15520
#> 6 O43320
#> INTERACTION_DATA_SOURCE
#> 1 Reactome
#> 2 Reactome
#> 3 Reactome
#> 4 Reactome
#> 5 Reactome
#> 6 Reactome
#> INTERACTION_PUBMED_ID PATHWAY_NAMES
#> 1 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 2 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 3 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 4 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 5 12660731;1381348;16847462;19805105;21827948;7543144 PI3K Cascade
#> 6 12660731;1381348;16847462;19805105;21827948;7543144 PI3K CascadeWe see that all genes and edges belong to the same Reactome pathway.
table(results.edgesDF$PATHWAY_NAMES)
#>
#> PI3K Cascade
#> 902
table(results.edgesDF$INTERACTION_DATA_SOURCE)
#>
#> Reactome
#> 1218Edges are defined in several ways (some directional).
table(results.edgesDF$INTERACTION_TYPE)
#>
#> chemical-affects consumption-controlled-by
#> 30 78
#> controls-production-of in-complex-with
#> 78 286
#> neighbor-of reacts-with
#> 741 1
#> used-to-produce
#> 41.3.4 Filtering genes and metabolites
We can then optionally filter out edges that are not related to genes or proteins.
Either by filtering edges involving metabolites.
results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "chemical-affects",]
results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "reacts-with",]
results.edgesDF <- results.edgesDF[results.edgesDF$INTERACTION_TYPE != "used-to-produce",]
table(results.edgesDF$INTERACTION_TYPE)
#>
#> consumption-controlled-by controls-production-of
#> 78 78
#> in-complex-with neighbor-of
#> 286 741Ions can be removed as follows while retaining other metabolites.
results.edgesDF <- results.edgesDF[results.edgesDF[,1] != "2+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,3] != "2+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,1] != "3+",1:3]
results.edgesDF <- results.edgesDF[results.edgesDF[,3] != "3+",1:3]Alternatively by screening the nodes for proteins.
table(results.nodesDF$PARTICIPANT_TYPE)
#>
#> ProteinReference SmallMoleculeReference
#> 39 5#extract protein nodes
results.nodesDF <- results.nodesDF[results.nodesDF$PARTICIPANT_TYPE == "ProteinReference",]
#match to edges
results.edgesDF <- results.edgesDF[results.edgesDF$PARTICIPANT_A %in% results.nodesDF$PARTICIPANT,]
results.edgesDF <- results.edgesDF[results.edgesDF$PARTICIPANT_B %in% results.nodesDF$PARTICIPANT,]
print(paste(c("nodes:", nrow(results.nodesDF))))
#> [1] "nodes:" "39"
print(paste(c("edges:", nrow(results.edgesDF))))
#> [1] "edges:" "1027"1.3.5 Creating an igraph object
Then we create and edge list from the SIF object. First we match names between edges and participants to gene symbols.
#extract names
gene_names <- resultsSIF$nodes$PARTICIPANT_NAME
#replace with gene symbol (if defined)
gene_names[grep("hgnc symbol:", resultsSIF$nodes$RELATIONSHIP_XREF)] <- sapply(strsplit(grep("hgnc symbol:", resultsSIF$nodes$RELATIONSHIP_XREF, value = TRUE), ":"), function(x) x[2])
gene_names
#> [1] "PIK3C3"
#> [2] "PTPN11"
#> [3] "KL"
#> [4] "PIK3R4"
#> [5] "FGF19"
#> [6] "FGF5"
#> [7] "ADP"
#> [8] "FGF3"
#> [9] "FLT3"
#> [10] "FGFR2"
#> [11] "GRB2"
#> [12] "FRS2"
#> [13] "FGF8"
#> [14] "FGF7"
#> [15] "FGF1"
#> [16] "GAB1"
#> [17] "PIK3R1"
#> [18] "FGF2"
#> [19] "FGF4"
#> [20] "FGFR4"
#> [21] "FGF16"
#> [22] "PIK3R2"
#> [23] "FGF22"
#> [24] "FGF23"
#> [25] "FGF6"
#> [26] "FGF20"
#> [27] "FLT3LG"
#> [28] "IRS2"
#> [29] "TLR9"
#> [30] "KLB"
#> [31] "IRS1"
#> [32] "GAB2"
#> [33] "1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate"
#> [34] "FGFR1"
#> [35] "FGF17"
#> [36] "FGF18"
#> [37] "heparan sulfate"
#> [38] "1-phosphatidyl-1D-myo-inositol 4,5-bisphosphate"
#> [39] "FGF9"
#> [40] "FGFR3"
#> [41] "FGF10"
#> [42] "PIK3CB"
#> [43] "ATP"
#> [44] "PIK3CA"Match gene symbols to edge participants
results.edgesDF$PARTICIPANT_A <- gene_names[match(results.edgesDF$PARTICIPANT_A, results.nodesDF$PARTICIPANT)]
results.edgesDF$PARTICIPANT_B <- gene_names[match(results.edgesDF$PARTICIPANT_B, results.nodesDF$PARTICIPANT)]head(results.edgesDF[,c(1, 3)])
#> PARTICIPANT_A PARTICIPANT_B
#> 86 FGF16 heparan sulfate
#> 87 FGF16 heparan sulfate
#> 88 FGF16 FGFR4
#> 89 FGF16 FGFR4
#> 90 FGF16 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> 91 FGF16 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphateCreate edge list with SIF edges.
library("igraph")
g <- graph_from_edgelist(as.matrix(results.edgesDF[,c(1, 3)]))
g
#> IGRAPH d711b49 DN-- 39 1027 --
#> + attr: name (v/c)
#> + edges from d711b49 (vertex names):
#> [1] FGF16->heparan sulfate
#> [2] FGF16->heparan sulfate
#> [3] FGF16->FGFR4
#> [4] FGF16->FGFR4
#> [5] FGF16->1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> [6] FGF16->1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
#> [7] FGF16->FGFR1
#> [8] FGF16->FGFR1
#> + ... omitted several edgeslibrary("graphsim")
plot_directed(g, arrow_clip = 0.25, col.arrow = "grey75", cex.arrow = 0.5, fill.node = "lightblue", cex.node = 1.25)2 Session info
Here is the output of sessionInfo() on the system on which this document was compiled running pandoc 2.1:
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods
#> [7] base
#>
#> other attached packages:
#> [1] plyr_1.8.6 paxtoolsr_1.22.0 XML_3.99-0.5
#> [4] rJava_0.9-13 graphsim_1.0.2 igraph_1.2.6.9001
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.5 pillar_1.4.7 compiler_4.0.2
#> [4] BiocManager_1.30.10 R.methodsS3_1.8.1 bitops_1.0-6
#> [7] prettydoc_0.4.0 R.utils_2.10.1 tools_4.0.2
#> [10] digest_0.6.27 tibble_3.0.4 lifecycle_0.2.0
#> [13] jsonlite_1.7.1 evaluate_0.14 lattice_0.20-41
#> [16] pkgconfig_2.0.3 rlang_0.4.8 Matrix_1.2-18
#> [19] curl_4.3 yaml_2.2.1 mvtnorm_1.1-1
#> [22] xfun_0.19 stringr_1.4.0 httr_1.4.2
#> [25] knitr_1.30 vctrs_0.3.5 hms_0.5.3
#> [28] gtools_3.8.2 caTools_1.18.0 grid_4.0.2
#> [31] R6_2.5.0 rmarkdown_2.5 readr_1.4.0
#> [34] magrittr_2.0.1 ellipsis_0.3.1 gplots_3.1.0
#> [37] htmltools_0.5.0 matrixcalc_1.0-3 KernSmooth_2.23-18
#> [40] stringi_1.5.3 rjson_0.2.20 crayon_1.3.4
#> [43] R.oo_1.24.0
3 References
Barabási, A. L., and Oltvai, Z. N. 2004. “Network Biology: Understanding the Cell’s Functional Organization.” Nat Rev Genet 5 (2): 101–13.
Croft, D., Mundo, A. F., Haw, R., Milacic, M., Weiser, J., Wu, G., Caudy, M., et al. 2014. “The Reactome pathway knowledgebase.” Journal Article. Nucleic Acids Res 42 (database issue): D472–D477. https://doi.org/10.1093/nar/gkt1102.
Csardi, G., and Nepusz, T. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. https://igraph.org/.
Fabregat, A., Sidiropoulos, K., Viteri, G. et al. 2017. “Reactome pathway analysis: a high-performance in-memory approach.” BMC Bioinformatics 18: 1695. https://doi.org/10.1186/s12859-017-1559-2.
Ligtenberg W. 2019. “reactome.db: A set of annotation maps for reactome.” R package version 1.68.0. <a href="https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html>https://bioconductor.org/packages/release/data/annotation/html/reactome.db.html.
Luna, A., Babur, Ö., Aksoy, A. B, Demir, E., Sander, C. 2016. “PaxtoolsR: Pathway Analysis in R Using Pathway Commons.” Bioinformaticsl 32 (8): 1262-4. <a href="https://doi.org/10.1093/bioinformatics/btv733>https://doi.org/10.1093/bioinformatics/btv733.