Network construction of genome-wide epistatic interaction for plant height traits in Arabidopsis thaliana
-
摘要:
目的 以拟南芥株高性状及上位性网络模型为研究基础,通过构建不同层次的互作调控网络,探究、揭示植物生长发育过程中多基因在复杂网络中相互作用的过程或规律。 方法 以拟南芥的84个重组自交系为实验材料,共获得417 495个单核苷酸位点(SNPs)及8个时间点的株高生长数据,基于功能作图方法对测序得到的不同基因型与株高性状进行关联分析后,通过结合系统生物学中模块化的概念及统计学中降维的思想,在常微分方程组的基础上构建稀疏、有向、可量化的模块以及基因之间的上位性互作网络,同时使用拟南芥在线数据库对不同功能模块中的候选基因进行富集分析与功能注释。 结果 研究结果表明,在宏观遗传调控网络中,大部分功能模块在拟南芥发育过程中起正向调控的作用,并且随时间的变化会改变互作的策略。在微观调控网络中,与拟南芥的结构发育密切相关的基因AT4G29140在网络中对其他位点都是上调作用,同时只受到与衰老有关基因的下调作用。而与维持细胞的稳态有关的基因AT4G36910不主动发挥调控作用,其功能表达非常依赖于其他基因的控制。基因AT4G22680可能通过调节RP1的表达发挥其调控的功能。 结论 本研究从关联分析与复杂网络的角度上,探究了影响拟南芥生长的上位性机制,为植物遗传结构的解析提供了新的方法和思路。 Abstract:Objective Based on the study of Arabidopsis thaliana plant height traits and the epistatic network models, this research aimed to explore and reveal the processes or patterns of multiple genes interacting with each other in a complex network during plant growth by constructing interactive regulatory networks at different levels. Method 84 recombinant inbred lines (RILs) of Arabidopsis thaliana were selected for the subsequent experiment, from which a total of 417 495 single nucleotide polymorphisms (SNPs) and plant height growth data across 8 time points were acquired. Through the functional mapping method, correlation analyses were performed on different genotypes and plant height traits previously obtained through sequencing. Afterwards, taking into account the concept of modularization from systems biology and ideas on dimensionality reduction from statistics, a system of ordinary differential equations was further adopted to construct not only a sparse, directed and quantifiable module, but also an epistatic interaction network among the genes. Eventually, database from the Arabidopsis Information Resource (TAIR) was utilized to perform enrichment analyses and functional annotations on candidate genes in various functional modules. Result The findings obtained herein showed that most functional modules seen from the macroscopic scale in the gene regulatory network not only played a positive regulatory role throughout the growth of Arabidopsis thaliana but also changed corresponding interaction strategy with time. On the other hand, from the microscopic view of the gene regulatory network, AT4G29140, the gene closely associated with structural development of Arabidopsis thalian, was found to only play an up-regulating role onto other loci and be subjected to only down-regulating effects from ageing-related genes. Moreover, AT4G36910, the gene responsible for maintaining cellular homeostasis, was found to display passive regulatory attitudes and have functional expressions that greatly depend on the regulation from other genes. Last but not least, AT4G22680 is speculated to execute its regulatory functions by regulating RP1 expressions. Conclusion This study has taken the context of complex network, conducted correlation analysis, and successfully probed into the epistatic mechanism affecting the growth of Arabidopsis thaliana, thereby providing a novel set of method and thought process for analyzing the genetic structures of plants. -
Key words:
- complex trait /
- genetic variance /
- statistical model /
- epistatic network /
- Arabidopsis thaliana
-
图 1 利用功能作图方法识别的显著位点及基因注释
A. 利用功能作图计算不同染色体上的SNP的p值及显著的QTL位点,红线为Bonferroni方法确定的阈值线。B. 显著位点的基因注释。A, The p values and significant QTL loci of SNP on different chromosomes are calculated by FunMap. The red line is the threshold line determined by Bonferroni method. B, gene annotation of significant sites.
Figure 1. Significant SNPs identified by functional mapping and gene annotations
图 2 动态遗传方差的功能聚类
A. 利用功能聚类识别调控拟南芥株高生长的25个功能模块,蓝色线为遗传方差均值(类中心),灰色线为实际的遗传方差。B. 不同模块中涉及的生物过程数量统计。A, using functional clustering to identify 25 modules that regulate plant height growth in arabidopsis. The blue line is the mean genetic variance (cluster center); the gray line is the actual genetic variance. B, count the number of biological processes involved in different modules.
Figure 2. Functional clustering of dynamic genetic variance
图 3 拟南芥株高生长的宏观遗传调控网络
A. 25个模块之间的宏观遗传调控网络,其中红色和蓝色箭头分别代表上调和下调,线条的粗细代表互作的强弱。B. 网络模块传入与传出链接数量统计。A, macro genetic regulatory network between 25 modules, where the red and blue arrows represent activation and inhibition, and the thickness of the lines represents the strength of the interaction. B, statistics on the number of incoming and outgoing links of the network module.
Figure 3. Macro genetic network of plant height growth in Arabidopsis thaliana
图 4 由勒让德正交多项式族拟合的模块动态遗传方差曲线
每个模块的平均遗传方差(绿色线)被分解为独立的效应曲线(红色线)和由其他标记模块(蓝色线)调节的相关效应曲线。The mean genetic variance for each module (green line) is decomposed into independent effect curves (red line) and correlated effect curves adjusted by other marker modules (blue line).
Figure 4. Module dynamic genetic variance curves fitted by Legendre family of orthogonal polynomials
图 5 拟南芥株高生长的微观上位性调控网络
A. 模块8中子模块Sub-M3的基因调控网络及显著基因AT4G22680(橙色)在网络中的调控关系。B. 模块8中子模块Sub-M5的基因调控网络及显著基因AT4G29140、AT4G36910(橙色)在网络中的调控关系。图中不同颜色的点表示模块中的位点,其中红色表示该位点在网络中对其他位点起下调(抑制)作用,蓝色表示上调(促进)作用,绿色为受到显著基因调控的位点。A, the gene regulatory network of Sub-M3 in module 8 and the regulatory relationship of significant gene AT4G22680 (orange) in the network. B, the gene regulatory network of Sub-M5 in module 8 and the regulatory relationship of significant genes AT4G29140 and AT4G36910 (orange), dots of different colors in the figure represent sites in the module, where red indicates that the site plays a down-regulating (inhibiting) role on other sites in the network, while blue indicates up-regulating (promoting) role, and green indicates sites of significant gene regulation.
Figure 5. Micro regulatory networks of plant height growth in Arabidopsis thaliana
表 1 主要生物学过程在模块中的分布
Table 1. Distribution of major biological processes in modules
生物学过程 Biological process GO编号 GO ID 模块 Module 细胞过程 Cellular process GO: 0009987 1,2,5,6,8,9,10,11,12,13,14,16,17,18,20,21,22,23,24,25 细胞通讯 Cell communication GO: 0007154 2,9,10,11,12,14,16,18,20,22,24 代谢过程负调控 Negative regulation of metabolic process GO: 0009892 18 细胞对刺激的反应 Cellular response to stimulus GO: 0051716 1,2,6,8,9,11,12,13,14,16,18,20,22,24 叶片发育 Leaf development GO: 0048366 1,7,18,23 植物器官发育 Plant organ development GO: 0099402 2,6,7,9,11,12,13,14,15,16,17,18,20,22,23,24,25 细胞内信号转导 Intracellular signal transduction GO: 0035556 11,16 小分子代谢过程 Small molecule metabolic process GO: 0044281 1,6,10,11,14,16,18,20,21,22,24,25 生物调节 Biological regulation GO: 0065007 1,2,5,6,8,9,10,11,12,13,14,16,17,18,20,21,22,23,24,25 自平衡过程 Homeostatic process GO: 0042592 3,4,6,19 基因表达调控 Regulation of gene expression GO: 0010468 1,2,8,9,10,11,12,13,14,16,18,20,22,24 对激素的响应 Response to hormone GO: 0009725 1,2,8,9,10,11,12,13,14,16,17,18,20,22,24 转录调控,DNA模板
Regulation of transcription, DNA-templateGO: 0006355 2,8,9,10,11,12,13,14,16,18,20,22,24 根的发育 Root development GO: 0048364 2,3,11,14,15,18,19,20,22 -
[1] Boyle E A, Li Y I, Pritchard J K. An expanded view of complex traits: from polygenic to omnigenic[J]. Cell, 2017, 169(7): 1177−1186. doi: 10.1016/j.cell.2017.05.038 [2] Sandhu K S, Lozada D N, Zhang Z, et al. Deep learning for predicting complex traits in spring wheat breeding program[J]. Frontiers in Plant Science, 2020, 11: 613325. [3] Munkvold J D, Tanaka J, Benscher D, et al. Mapping quantitative trait loci for preharvest sprouting resistance in white wheat[J]. Theoretical and Applied Genetics, 2009, 119(7): 1223−1235. doi: 10.1007/s00122-009-1123-1 [4] Wang Q, Gan J, Wei K, et al. A unified mapping framework of multifaceted pharmacodynamic responses to hypertension interventions[J]. Drug Discovery Today, 2019, 24(3): 883−889. doi: 10.1016/j.drudis.2019.01.009 [5] Zeng J, Xue A, Jiang L, et al. Widespread signatures of natural selection across human complex traits and functional genomic categories[J]. Nature Communications, 2021, 12(1): 1164. doi: 10.1038/s41467-021-21446-3 [6] Gibson G. Rare and common variants: twenty arguments.[J]. Nature Reviews Genetics, 2011, 13(2): 135−145. [7] Wray N R, Yang J, Hayes B J, et al. Pitfalls of predicting complex traits from SNPs[J]. Nature Reviews Genetics, 2013, 14(7): 507−515. doi: 10.1038/nrg3457 [8] Yang J, Zeng J, Goddard M E, et al. Concepts, estimation and interpretation of SNP-based heritability[J]. Nature Genetics, 2017, 49(9): 1304−1311. doi: 10.1038/ng.3941 [9] Evans L M, Tahmasbi R, Vrieze S I, et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits[J]. Nature Genetics, 2018, 50(5): 737−745. doi: 10.1038/s41588-018-0108-x [10] Banerjee P, Carmelo V A O, Kadarmideen H N. Genome-wide epistatic interaction networks affecting feed efficiency in Duroc and Landrace pigs[J]. Frontiers in Genetics, 2020, 111: 121. [11] Jian Y, Benyamin B, Mcevoy B P, et al. Common SNPs explain a large proportion of the heritability for human height[J]. Nature Genetics, 2010, 42(7): 565−569. doi: 10.1038/ng.608 [12] Genc Y, Oldach K, Verbyla A P, et al. Sodium exclusion QTL associated with improved seedling growth in bread wheat under salinity stress[J]. Theoretical and Applied Genetics, 2010, 121(5): 877−894. doi: 10.1007/s00122-010-1357-y [13] Bai C, Liang Y, Hawkesford M J. Identification of QTLs associated with seedling root traits and their correlation with plant height in wheat[J]. Journal of Experimental Botany, 2013, 64(6): 1745−1753. doi: 10.1093/jxb/ert041 [14] Cowen L, Ideker T, Raphael B J, et al. Network propagation: a universal amplifier of genetic associations[J]. Nature Reviews Genetics, 2017, 18(9): 551−562. doi: 10.1038/nrg.2017.38 [15] Jiang L, Shi H, Sang M, et al. A computational model for inferring QTL control networks underlying developmental covariation[J]. Frontiers in Plant Science, 2019, 10: 1557. doi: 10.3389/fpls.2019.01557 [16] Chatrabgoun H, Soltanian A R, Mahjub H, et al. Learning gene regulatory networks using gaussian process emulator and graphical LASSO[J]. Journal of Bioinformatics and Computational Biology, 2021, 19(3): 2150007. doi: 10.1142/S0219720021500074 [17] Castro D M, Veaux N R, Miraldi E R, et al. Multi-study inference of regulatory networks for more accurate models of gene regulation[J]. PLoS Computational Biology, 2019, 15(1): e1006591. doi: 10.1371/journal.pcbi.1006591 [18] Kim S. ppcor: an R package for a fast calculation to semi-partial correlation coefficients[J]. Communications for Statistical Applications and Methods, 2015, 22(6): 665−674. doi: 10.5351/CSAM.2015.22.6.665 [19] Wang Y, Xu M, Wang Z, et al. How to cluster gene expression dynamics in response to environmental signals[J]. Briefings in Bioinformatics, 2012, 13(2): 162−174. doi: 10.1093/bib/bbr032 [20] Wu R, Jiang L. Recovering dynamic networks in big static datasets[J]. Physics Reports, 2021, 912: 1−57. doi: 10.1016/j.physrep.2021.01.003 [21] Pandey A K, Jiang L, Moshelion M, et al. Functional physiological phenotyping with functional mapping: a general framework to bridge the phenotype-genotype gap in plant physiology[J]. iScience, 2021, 24(8): 102846. doi: 10.1016/j.isci.2021.102846 [22] Jiang L, Griffin C H, Wu R. SEGN: inferring real-time gene networks mediating phenotypic plasticity[J]. Computational and Structural Biotechnology Journal, 2020, 18: 2510−2521. doi: 10.1016/j.csbj.2020.08.029 [23] Wu R, Ma C X, Hou W, et al. Functional mapping of quantitative trait loci that interact with the hg mutation to regulate growth trajectories in mice[J]. Genetics, 2005, 171(1): 239−249. doi: 10.1534/genetics.104.040162 [24] Thornley J H M. A new formulation of the logistic growth equation and its application to leaf area growth[J]. Annals of Botany, 1990, 3: 309−311. [25] Zhao W, Hou W, Littell R C, et al. Structured antedependence models for functional mapping of multiple longitudinal traits[J]. Statistical Applications in Genetics and Molecular Biology, 2005, 4(1): Article33. [26] Li P, Lu J, Feng H. The global convergence of a modified BFGS method under inexact line search for nonconvex functions[J]. Mathematical Problems in Engineering, 2021, 2021: 1−9. [27] Li J, Das K, Fu G, et al. The Bayesian lasso for genome-wide association studies[J]. Bioinformatics, 2011, 27(4): 516−523. doi: 10.1093/bioinformatics/btq688 [28] Kim B, McMurry T, Zhao W, et al. Wavelet-based functional clustering for patterns of high-dimensional dynamic gene expression[J]. Journal of Computational Biology, 2010, 17(8): 1067−1080. doi: 10.1089/cmb.2009.0270 [29] Kim B R, Zhang L, Berg A, et al. A computational approach to the functional clustering of periodic gene-expression profiles[J]. Genetics, 2008, 180(2): 821−834. doi: 10.1534/genetics.108.093690 [30] Li Z, Sillanpaa M J. Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection[J]. Theoretical and Applied Genetics, 2012, 125(3): 419−435. doi: 10.1007/s00122-012-1892-9 [31] Li Y, Liu D, Zhu Y, et al. Differential analysis of gene regulatory networks modeled with structural equation models[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 12(10): 9181−9192. [32] Wang H, Ye M, Fu Y, et al. Modeling genome-wide by environment interactions through omnigenic interactome networks[J]. Cell Reports, 2021, 35(6): 109114. doi: 10.1016/j.celrep.2021.109114 [33] Rdct R. A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2021, URL https://www.R-project.org/. [34] Zhong R, Lee C, Zhou J, et al. A battery of transcription factors involved in the regulation of secondary cell wall biosynthesis in Arabidopsis[J]. The Plant Cell, 2008, 20(10): 2763−2782. doi: 10.1105/tpc.108.061325 [35] Hu C, Zhu Y, Cui Y, et al. A group of receptor kinases are essential for CLAVATA signalling to maintain stem cell homeostasis[J]. Nature Plants, 2018, 4(3): 205−211. [36] Yoo K S, Ok S H, Jeong B C, et al. Single cystathionine synthase domain-containing proteins modulate development by regulating the thioredoxin system in Arabidopsis[J]. The Plant Cell, 2011, 23: 3577−3594. doi: 10.1105/tpc.111.089847 [37] Zhang L, Pu H, Duan Z, et al. Nucleus-encoded protein BFA1 promotes efficient assembly of the chloroplast ATP synthase coupling factor 1[J]. The Plant Cell, 2018, 30(8): 1770−1788. [38] Astley H M, Parsley K, Aubry S, et al. The pyruvate, orthophosphate dikinase regulatory proteins of Arabidopsis are both bifunctional and interact with the catalytic and nucleotide-binding domains of pyruvate, orthophosphate dikinase[J]. The Plant Journal, 2011, 68(6): 1070−1080. doi: 10.1111/j.1365-313X.2011.04759.x [39] 苏晓华, 刘琦, 宁坤, 等. 植物功能基因网络及其应用[J]. 林业科学研究, 2018, 31(1): 94−104. doi: 10.13275/j.cnki.lykxyj.2018.01.012Su X H, Liu Q, Ning K, et al. Functional gene network and its application in forestry[J]. Forest Research, 2018, 31(1): 94−104. doi: 10.13275/j.cnki.lykxyj.2018.01.012 [40] Geng P, Zhang S, Liu J, et al. MYB20, MYB42, MYB43 and MYB85 regulate phenylalanine and lignin biosynthesis during secondary cell wall formation[J]. Plant Physiology, 2019, 182(3): 01070.02019. [41] Li R, Li J, Li S, et al. ADP1 affects plant architecture by regulating local auxin biosynthesis[J]. PLoS Genetics, 2014, 10(1): e1003954. doi: 10.1371/journal.pgen.1003954 [42] Fischer-Kilbienski I, Miao Y, Roitsch T, et al. Nuclear targeted AtS40 modulates senescence associated gene expression in Arabidopsis thaliana during natural development and in darkness[J]. Plant Molecular Biology, 2010, 73: 379−390. [43] Alvarez J M, Brooks M D, Swift J, et al. Time-based systems biology approaches to capture and model dynamic gene regulatory networks[J]. Annual Review of Plant Biology, 2021, 72(1): 105−131. [44] Mackay T F. Epistasis and quantitative traits: using model organisms to study gene-gene interactions[J]. Nature Reviews Genetics, 2014, 15(1): 22−33. doi: 10.1038/nrg3627 [45] 王真梅, 李海霞, 何莹, 等. 植物丙酮酸磷酸双激酶(PPDK)研究进展[J]. 植物生理学报, 2012, 48(10): 949−957. doi: 10.13592/j.cnki.ppj.2012.10.010Wang Z M, Li H X, He Y, et al. Advances in plant pyruvate, orthophosphate dikinase[J]. Plant Physiology Journal, 2012, 48(10): 949−957. doi: 10.13592/j.cnki.ppj.2012.10.010 [46] Ok S H, Yoo K S, Shin J S. CBSXs are sensor relay proteins sensing adenosine-containing ligands in Arabidopsis[J]. Plant Signaling & Behavior, 2012, 7(6): 664−667. -