Log linear modeling in SPSS

updated April 2024

in-class activity using the collaborative dataset describing reporting in publications

We start with a data file collab_data_2024_cleaned.csv that was created from within the R markdown file chisq-inclass2024.Rmd.

General process to analyze categorical outcomes with categorical predictors

Categorical outcome decision process"

Are reporting of race/ethn, income, and location related?

Step 1. Generate a frequency table using Crosstabs

Step 2. Analyze -> Log Linear -> Model Selection

Step 2a - log linear menu

Enter the 3 variables into the "Factors" box, and use the "Define Range" button to define the range for each (minimum=0, maximum=1)

Step 2b - log linear model specification

Step 3. Backward Elimination

Steps 4 and 5. Look at K-way and Higher-order effects table and the Partial Associations table

Step 4 - K-way table

Step 6. Look at Goodness of Fit for the final model

Step 6 - goodness of fit

Step 7. Look at 2-way contingency tables to interpret significant effects.

  1. Race reporting*Income reporting: we can see in the table that race is reported more than expected (under the null hypothesis) when income is reported (observed N=37, expected N = 21) and less when income is not reported (observed N=47, expected N = 63).
    Step 7.1 - Race*Income

  2. Income reporting*Location reporting: we can see in the table that income/ses is reported more than expected (under the null hypothesis) when location is reported (observed N=47, expected N = 37) and less when income is not reported (observed N=101, expected N = 111).
    Step 7.2 - Location*Income

Reporting - how to report a finding like this:

  1. report the likelihood ratio statistic for the final model.
  2. For any terms that are significant you should report the chi-square change.
  3. If you break down any higher-order interactions in subsequent analyses then report the relevant chi-square statistics (and odds ratios).

For this example we could report:
The three-way loglinear analysis produced a final model that retained interactions of (a) race reporting by income/ses reporting and (b) location reporting by income/ses reporting. The likelihood ratio of this final model was χ2(2) = 3.222, p = .200, indicating the model did not significantly differ from the observed frequencies. The highest-order interaction (race reporting by income/ses reporting by location reporting) was not significant, (comparison of model without the three-way interaction to the full model: χ2(1) = 0.986, p = .320), and the race/ethnicity-reported by location-reported interaction was not significant (model without this term compared to model with all 2nd-order interactions: χ2(1) = 2.236, p = .135). To understand the two-ways associations, separate chi-square tests were performed to examine (a) race/ethnicity reporting by income/ses reporting (combining location reporting categories) and (b) location reporting by income/ses reporting (combining race/ethnicity reporting categories). There was a significant association between race/ethnicity-reported and whether location was reported, χ2(1) = 24.961, p < .0001, odds ratio = 5.325. Furthermore, there was a significant association between income/ses-reported and whether location was reported, χ2(1) = 11.447, p = .001, odds ratio = 4.055. [plots/contingency tables can be used to characterize the two-way associations completely]

That's all for today!!!

References