Search code examples
rp-valuesignificancer-lavaansemplot

How do I include p-value and R-square for the estimates in semPaths?


I am using semPaths (semPlot package) to draw my structural equation models. After some trial and error, I have a pretty good script to show what I want. Except, I haven’t been able to figure out how to include the p-value/significance levels of the estimates/regression coefficients in the figure.

Can/how can I include significance levels either as e.g. p-value in the edge labels below the estimate or as a broken line for insignificance or …? I am also interested in including the R-square, but not as critically as the significance level.

This is the script I am using so far:

semPaths(fitmod.bac.class2,
     what = "std",
     whatLabels = "std",
     style="ram",
     edge.label.cex = 1.3,
     layout = 'tree',
     intercepts=FALSE,
     residuals=FALSE,
     nodeLabels = c("Negati-\nvicutes","cand_class\n_MB_A2_108", "CO2", "Bacilli","Ignavi-\nbacteria","C/N", "pH","Water\ncontent"),
     sizeMan=7 )

Example of one of the SemPath outputs Example of one of the SemPath outputs

In this example the following are not significant:

  • Ignavibacteria -> First_C_CO2_ugC_gC_day, p = 0.096
  • pH -> Ignavibacteria, p = 0.151
  • cand_class_MB_A2_108 <-> Bacilli correlation, p = 0.054

I am a R-user and not really a coder, so I might just be missing a crucial point in the arguments.

I am testing a lot of different models at the moment, and would really like not to have to draw them all up by hand.

update: Using semPlotModel: Am I right in understanding that semPlotModel doesn’t include the significance levels from the sem function (see my script and output below)? I am specifically looking to include the P(>|z|) for regressions and covariance. Is it just me that is missing that, or is it not included? If it is not included, my solution is simply just to custom the edge labels.

    {model.NA.UP.bac.class2 <- '

  #LATANT VARIABLES

  #REGRESSIONS 

  #soil organic carbon quality

  c_Negativicutes ~ CN  

  #microorganisms
  First_C_CO2_ugC_gC_day    ~ c_Bacilli
  First_C_CO2_ugC_gC_day    ~ c_Ignavibacteria
  First_C_CO2_ugC_gC_day    ~ c_cand_class_MB_A2_108
  First_C_CO2_ugC_gC_day    ~ c_Negativicutes

  #pH
  c_Bacilli            ~pH
  c_Ignavibacteria    ~pH
  c_cand_class_MB_A2_108~pH
  c_Negativicutes     ~pH

  #COVARIANCE
  initial_water       ~~ CN
  c_cand_class_MB_A2_108 ~~ c_Bacilli 


  '

  fitmod.bac.class2 <- sem(model.NA.UP.bac.class2, data=datapNA.UP.log, missing="ml", meanstructure=TRUE, fixed.x=FALSE, std.lv=FALSE, std.ov=FALSE)
  summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE)

  out <- capture.output(summary(fitmod.bac.class2, standardized=TRUE, fit.measures=TRUE, rsq=TRUE))
}

Output:

  lavaan 0.6-5 ended normally after 188 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of free parameters                         28

  Number of observations                            30
  Number of missing patterns                         1

Model Test User Model:

  Test statistic                                17.816
  Degrees of freedom                                16
  P-value (Chi-square)                           0.335

Model Test Baseline Model:

  Test statistic                               101.570
  Degrees of freedom                                28
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.975
  Tucker-Lewis Index (TLI)                       0.957

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)                472.465
  Loglikelihood unrestricted model (H1)        481.373

  Akaike (AIC)                                -888.930
  Bayesian (BIC)                              -849.697
  Sample-size adjusted Bayesian (BIC)         -936.875

Root Mean Square Error of Approximation:

  RMSEA                                          0.062
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.185
  P-value RMSEA <= 0.05                          0.414

Standardized Root Mean Square Residual:

  SRMR                                           0.107

Parameter Estimates:

  Information                                 Observed
  Observed information based on                Hessian
  Standard errors                             Standard

Regressions:
                           Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  c_Negativicutes ~                                                             
    CN                        0.419    0.143    2.939    0.003    0.419    0.416
  c_cand_class_MB_A2_108 ~                                                      
    CN                       -0.433    0.160   -2.707    0.007   -0.433   -0.394
  First_C_CO2_ugC_gC_day ~                                                      
    c_Bacilli                 0.525    0.128    4.092    0.000    0.525    0.496
    c_Ignavibacter            0.207    0.124    1.667    0.096    0.207    0.195
    c_c__MB_A2_108            0.310    0.125    2.475    0.013    0.310    0.301
    c_Negativicuts            0.304    0.137    2.220    0.026    0.304    0.271
  c_Bacilli ~                                                                   
    pH                        0.624    0.135    4.604    0.000    0.624    0.643
  c_Ignavibacteria ~                                                            
    pH                        0.245    0.171    1.436    0.151    0.245    0.254
  c_cand_class_MB_A2_108 ~                                                      
    pH                        0.393    0.151    2.597    0.009    0.393    0.394
  c_Negativicutes ~                                                             
    pH                        0.435    0.129    3.361    0.001    0.435    0.476

Covariances:
                            Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  CN ~~                                                                          
    initial_water              0.001    0.000    2.679    0.007    0.001    0.561
 .c_cand_class_MB_A2_108 ~~                                                      
   .c_Bacilli                 -0.000    0.000   -1.923    0.054   -0.000   -0.388

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .c_Negativicuts    0.145    0.198    0.734    0.463    0.145    3.826
   .c_c__MB_A2_108    1.038    0.226    4.594    0.000    1.038   25.076
   .Frs_C_CO2_C_C_   -0.346    0.233   -1.485    0.137   -0.346   -8.115
   .c_Bacilli         0.376    0.135    2.778    0.005    0.376    9.340
   .c_Ignavibacter    0.754    0.170    4.424    0.000    0.754   18.796
    CN                0.998    0.007  145.158    0.000    0.998   26.502
    pH                0.998    0.008  131.642    0.000    0.998   24.034
    initial_water     0.998    0.008  125.994    0.000    0.998   23.003

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .c_Negativicuts    0.001    0.000    3.873    0.000    0.001    0.600
   .c_c__MB_A2_108    0.001    0.000    3.833    0.000    0.001    0.689
   .Frs_C_CO2_C_C_    0.001    0.000    3.873    0.000    0.001    0.408
   .c_Bacilli         0.001    0.000    3.873    0.000    0.001    0.586
   .c_Ignavibacter    0.002    0.000    3.873    0.000    0.002    0.936
    CN                0.001    0.000    3.873    0.000    0.001    1.000
    initial_water     0.002    0.000    3.873    0.000    0.002    1.000
    pH                0.002    0.000    3.873    0.000    0.002    1.000

R-Square:
                   Estimate
    c_Negativicuts    0.400
    c_c__MB_A2_108    0.311
    Frs_C_CO2_C_C_    0.592
    c_Bacilli         0.414
    c_Ignavibacter    0.064

Warning message:
In lav_model_hessian(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: Hessian is not fully symmetric. Max diff = 5.15131396241486e-05

Solution

  • This example is taken from ?semPaths since we don't have your object.

    library('semPlot')
    
    modFile <- tempfile(fileext = '.OUT')
    download.file('http://sachaepskamp.com/files/mi1.OUT', modFile)
    

    Use semPlotModel to get the object without plotting. There you can inspect what is to be plotted. I just dug around without reading the docs until I found what it seems to be using.

    After you run semPlotModel, the object has an element x@Pars which contains the edges, nodes, and the std which is being used for the edge labels in your case. semPaths also has an argument that allows you to make custom edge labels, so you can take the data you need from x@Pars and add your p-values:

    x <- semPlotModel(modFile)
    x@Pars
    #                     label    lhs edge    rhs    est       std   group fixed par
    # 1        lambda[11]^{(y)} perfIQ   ->     pc  1.000 0.6219648 Group 1  TRUE   0
    # 2        lambda[21]^{(y)} perfIQ   ->     pa  0.923 0.5664888 Group 1 FALSE   1
    # 3        lambda[31]^{(y)} perfIQ   ->     oa  1.098 0.6550159 Group 1 FALSE   2
    # 4        lambda[41]^{(y)} perfIQ   ->     ma  0.784 0.4609990 Group 1 FALSE   3
    # 5   theta[11]^{(epsilon)}     pc  <->     pc  5.088 0.6131598 Group 1 FALSE   5
    # 10  theta[22]^{(epsilon)}     pa  <->     pa  5.787 0.6790905 Group 1 FALSE   6
    # 15  theta[33]^{(epsilon)}     oa  <->     oa  5.150 0.5709541 Group 1 FALSE   7
    # 20  theta[44]^{(epsilon)}     ma  <->     ma  7.311 0.7874800 Group 1 FALSE   8
    # 21                psi[11] perfIQ  <-> perfIQ  3.210 1.0000000 Group 1 FALSE   4
    # 22           tau[1]^{(y)}         int     pc 10.500        NA Group 1 FALSE   9
    # 23           tau[2]^{(y)}         int     pa 10.374        NA Group 1 FALSE  10
    # 24           tau[3]^{(y)}         int     oa 10.663        NA Group 1 FALSE  11
    # 25           tau[4]^{(y)}         int     ma 10.371        NA Group 1 FALSE  12
    # 11       lambda[11]^{(y)} perfIQ   ->     pc  1.000 0.6515609 Group 2  TRUE   0
    # 27       lambda[21]^{(y)} perfIQ   ->     pa  0.923 0.5876948 Group 2 FALSE   1
    # 31       lambda[31]^{(y)} perfIQ   ->     oa  1.098 0.6981974 Group 2 FALSE   2
    # 41       lambda[41]^{(y)} perfIQ   ->     ma  0.784 0.4621919 Group 2 FALSE   3
    # 51  theta[11]^{(epsilon)}     pc  <->     pc  5.006 0.5754684 Group 2 FALSE  14
    # 101 theta[22]^{(epsilon)}     pa  <->     pa  5.963 0.6546148 Group 2 FALSE  15
    # 151 theta[33]^{(epsilon)}     oa  <->     oa  4.681 0.5125204 Group 2 FALSE  16
    # 201 theta[44]^{(epsilon)}     ma  <->     ma  8.356 0.7863786 Group 2 FALSE  17
    # 211               psi[11] perfIQ  <-> perfIQ  3.693 1.0000000 Group 2 FALSE  13
    # 221          tau[1]^{(y)}         int     pc 10.500        NA Group 2 FALSE   9
    # 231          tau[2]^{(y)}         int     pa 10.374        NA Group 2 FALSE  10
    # 241          tau[3]^{(y)}         int     oa 10.663        NA Group 2 FALSE  11
    # 251          tau[4]^{(y)}         int     ma 10.371        NA Group 2 FALSE  12
    # 26               alpha[1]         int perfIQ -2.469        NA Group 2 FALSE  18
    

    As you can see there are more edge labels than ones that are plotted, and I have no idea how it chooses which to use, so I am just taking the first four from each group (since there are four edges shown and the stds match those. Maybe there is an option to plot all of them or select which ones you need--I haven't read the docs.

    ## take first four stds from each group, generate some p-values
    l <- sapply(split(x@Pars$std, x@Pars$group), function(x) head(x, 4))
    set.seed(1)
    l <- sprintf('%.3f, p=%s', l, format.pval(runif(length(l)), digits = 2))
    l
    # [1] "0.622, p=0.27" "0.566, p=0.37" "0.655, p=0.57" "0.461, p=0.91" "0.652, p=0.20" "0.588, p=0.90" "0.698, p=0.94" "0.462, p=0.66"
    

    Then you can plot the object with your new labels, edgeLabels = l

    layout(1:2)
    semPaths(
      x,
      edgeLabels = l,
      ask = FALSE, title = FALSE,
      what = 'std',
      whatLabels = 'std',
      style = 'ram',
      edge.label.cex = 1.3,
      layout = 'tree',
      intercepts = FALSE,
      residuals = FALSE,
      sizeMan = 7 
    )
    

    enter image description here