# Parental Nutrition and Genetic Risk of Childhood Obesity: Exploring Hypothetical Interventions with Causal Inference Methods

### Methods:

#### Sample

The participants in this study are a small sample of adolescents from the population-based ALSPAC cohort that recruited pregnant women in the South West of England [25, 26]† All pregnant women expected to have a child in the period from April 1, 1991 to December 31, 1992, were approached to participate in the original cohort. At the beginning, 14,451 pregnant women participated and by the end of the first year 13,988 children were still alive. To ensure the independence of individuals, one sibling per set of multiple births (*N* = 203 sets) was randomly included in our sample. For these analyses, the final subsample included participants who had data on exposure, mediators, and outcome (defined below; *N* = 4248). Please note that the study website contains details of all data available through a fully searchable data dictionary and variable search facility and refers to the following webpage: http://www.bristol.ac.uk/alspac/researchers/our-data/†

#### Measures

### Exposure

Genotype data were available for 9,915 children out of a total of 15,247 ALSPAC participants. Participants were genotyped on the genome-wide Illumina HumanHap550 quad chip. Individuals with disproportionate levels of individual absence (i.e., >3%), insufficient sample replication (identity by ancestry < 0.8), biological sex mismatch, and non-European ancestry (as defined by multidimensional scaling using the HapMap Phase II, release 22, reference populations) were excluded. SNPs with a minor allele frequency (MAF) of <1%, excessive absence (i.e., call rate <95%), or an abnormality of the Hardy-Weinberg equilibrium (*p*value <5 × 10^{−7}) have been removed. The imputation was performed with Impute3 with the HRC 1.0 as reference panel [27] and phasing was performed using ShapeIT (v2.r644). Finally, post-imputation quality checks were performed; all SNPs with MAF less than 1%, Impute3 information quality metric of < 0.8 and not confirming the Hardy-Weinberg equilibrium (*p*< 5 × 10^{−7}) have been removed. After data cleaning, a total of 8,654 subjects and 4,054,653 SNPs remained eligible for analyses.

Polygenic scores (PGS) are derived from summary statistics from the Genetic Investigation of Anthropometric Traits consortium, called the Discovery Cohort. [11]† PGS were calculated using a high-dimensional Bayesian regression framework, which includes a continuous shrinkage prior to the effect sizes of the included single nucleotide polymorphisms (SNPs) [23]† This method has the advantage that researchers can add all potential SNPs to the PGS, without clumping or making a choice *p*value threshold to specify inclusion. This method has been shown to be superior to other polygenic scoring methods as it can account for the greatest amount of variance. [28]† The final PGS score included 754,458 SNPs.

Considering different levels of exposure and to facilitate interpretation, we categorized the distribution of PGS-BMI scores into quintiles: lowest, low, medium, high, and highest risk. The mean and standard deviation of the PGS-BMI in each group are reported in the Supplementary Table 1†

### mediators

When the children were about 10.7 years old, parents were asked to report on their feeding behavior through a questionnaire with a total of 13 items. Parents rated how often they were involved in different parental feeding behaviors. Exploratory factor analyzes suggested three factors, with eigenvalues >1. After oblique rotation, two items did not contribute enough to any of the three factors (factor loadings <0.4), and then decreased. This final solution comprised three subscales (latent factors): Emotional nutrition (4 items, e.g. “I cheer her up with something to eat when she is sad or upset”), Restriction (4 items, example: “I consciously keep some food beyond her reach") and Pressure to Eat (3 items, e.g., "I insist she eat all the food on the plate"). These three factors of parental feeding behavior correspond to the most studied constructs in the literature [29]† Factor scores on these three parental feeding behaviors were considered joint mediators between genetic liability and outcome, BMI at 12 years, as previous interventions took a holistic approach that focused on modifying a range of feeding behaviors rather than focusing on one specific one. behaviour [30]† A full list of items, response options frequencies and subscales can be found in the supplementary table 2†

### Result

Height and weight were measured during clinic visits when the children were approximately 12 years old (mean = 12.5 years, SD = 0.6). Weight was measured with a Tanita Body Fat Analyzer (Tanita TBF UK Ltd) to the nearest 50 g. Height was measured to the nearest millimeter using a Harpenden Stadiometer (Holtain Ltd). BMI was calculated by dividing weight (in kg) by height (in m) squared.

### covariates

High maternal education at child birth was defined by mothers who had completed education to A level, the requirement to apply to university in the UK. Additional covariates were infant sex and maternal self-reported BMI prior to pregnancy. 1†

### analysis

We adapted the approach to interventional inequality measures of Micali et al. [22]† This method aims to estimate how much of the difference in outcome (Y, BMI at 12 years) due to the difference in exposure (X, PGS) remains after mediating factors (M, parental nutrition) are modified by a potential intervention. In the context of genetic liability, this framework allows researchers to assess the magnitude of the disparity that would persist if downstream factors were changed. [21, 24]† The scheme in fig. 1 illustrates this conceptual model.

The effect of interest (that is, our estimation target or estimate) is defined as the direct effect of interventional disparity (IDM-DE). This captures the difference in outcome due to exposure versus non-exposure to X that would be observed if we could intervene and distribute the mediator M as if X were set to take the no-exposure value [22]† In our case, X, the PGS-BMI, has 5 levels (1 = lowest risk, 2 = lower risk, 3 = medium risk, 4 = high risk, 5 = highest risk), which we index with j. Therefore, the IDM becomes -DE specified separately for: *J*= 2, 3, 4, 5, with *J*= 1 treated as reference value. Let’s in particular \(M_C^1\) be a random draw from the distribution of M conditionally on the confounder C when X is set to take the reference value 1 and Y(m) be the potential outcome when the mediator M is set to take the value m, in this case to take the randomly drawn value \(M_C^1\)† Note that M here is three-dimensional and therefore \(M_C^1\) represents random draws from the joint distribution of the three parental behaviors.

The inequality measures of interest are then defined as: *J*= 2, 3, 4, 5,

$${\mathrm{IDM}} {\mbox{-}} {\mathrm{DE}}_j = \mathop {\sum}\limits_c \left[ E\left\{ {Y\left( {M_C^1} \right)|X = j,\;C = c} \right\}-E\left\{ {Y\left( {M_C^1} \right)|X = 1,C = c} \right\} \right]Pr(C = c),$$

(1a)

These four measures of inequality capture the contrast between two levels of X while determining the mediators to be distributed under a hypothetical scenario when X is set to the reference value 1. They represent the magnitude of the inequality in BMI in children due to genetic liability (as captured by PGS) that would persist if all parental feeding behaviors were set at the lowest risk level (hypothetical intervention 1).

Since this may be an unrealistic situation, we also defined these quantities for the hypothetical scenario in which the parental behavior reference distributions, from which the random draws were taken, correspond to the scenario in which genetic liability is established at one risk category lower than in which they are observed. . For example, for a child with the highest risk (*J*= 5) category, this hypothetical intervention would shift the distribution of parental nutrition, as if they were in the risk category below (*J*= 4). The same would be true for the other categories, which shift from high risk (*J*= 4) to medium risk (*J*= 3) and so on. For this setting, Eq. 1a has been modified to allow for this shift in the reference category, for *J*= 2, 3, 4, 5:

$${\mathrm{IDM}} {\mbox{-}} {\mathrm{DE}}_j \,=\, \mathop {\sum}\limits_c \left[E\{ Y(M_C^j)|X \,=\, j \,+\, 1,\;C \,=\, c\} \,- E\{ Y(M_C^j)|X \,=\, j, C \,=\, c\}\right]Pr(C \,=\, c),$$

(1b)

As before work [22], we consider interventions that change all mediators collectively, as a hypothetical intervention is unlikely to address only one parental feeding behavior, and also recognize that several aspects of parental feeding are likely to be correlated. Under the assumption of no unmeasured disruption of the MY relationships, and of consistency for the mediators (i.e. that \(E\left\{ {{{{\mathrm{Y}}}}\left( m \right){{{\mathrm{|}}}}X \,=\, j,\;C \, =\, c} \right\} \,=\, E\left\{ {Y{{{\mathrm{|}}}}X \,=\, j,\;C \,=\, c, \;M \,=\, m} \right\}\)), and without interfering with the mediators, these amounts can be estimated from the data.

In addition, we also report estimates of the adjusted total association (Adj-TA) of PGS-BMI on BMI at 12 years, at each exposure level (categories of genetic liability), compared to the reference [22]† In front of *J*= 1 treated as the reference group, i.e. hypothetical intervention 1, this is defined as, for *J*= 2, 3, 4, 5:

$${{{\mathrm{Adj}} {\mbox{-}} {\mathrm{TA}}}}_j \,=\, \mathop {\sum}\limits_c {\left[ {E\{ Y|X \,=\, j,C \,=\, c\} \,-\, E\{ Y|X \,=\, 1,\;C \,=\, c\} } \right]Pr\left( {{{{\mathrm{C}}}} \,=\, {{{\mathrm{c}}}}} \right),}$$

(2a)

For the hypothetical intervention 2, for *J*= 2, 3, 4, 5 this becomes:

$${\mathrm{Adj}} {\mbox{-}} {\mathrm{TA}}_j \,=\, \mathop {\sum}\limits_c {\left[ {E\{ Y|X \,=\, j \,+\, 1,C \,=\, c\} \,-\, E\{ Y|X \,=\, j,C \,=\, c\} } \right]Pr\left( {{{{\mathrm{C}}}} \,=\, {{{\mathrm{c}}}}} \right),}$$

(2b)

Analyzes, consisting of a series of regressions for the mediators and the outcome, were performed in Stata version 16, where the estimation was performed by parametric plug-in estimation and Monte Carlo simulation on a 1000-fold expanded dataset, with 1000 bootstrap samples. Regression models include interactions between confounders and mediators.