PRESERVE.
SET DECIMAL DOT.

GET DATA  /TYPE=TXT
  /FILE="C:\Users\jb1094\Downloads\ice_bucket.csv"
  /ENCODING='UTF8'
  /DELIMITERS="\t"
  /QUALIFIER='"'
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /DATATYPEMIN PERCENTAGE=95.0
  /VARIABLES=
  upload_day AUTO
  /MAP.
RESTORE.
CACHE.
EXECUTE.

Data written to the working file.
1 variables and 2323000 cases written.
Variable: upload_day         Type: Number  Format : F2

Substitute the following to build syntax for these data.
  /VARIABLES=
   upload_day F2

DATASET NAME DataSet1 WINDOW=FRONT.
* Chart Builder.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=upload_day MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: upload_day=col(source(s), name("upload_day"))
  GUIDE: axis(dim(1), label("upload_day"))
  GUIDE: axis(dim(2), label("Frequency"))
  GUIDE: text.title(label("Simple Histogram of upload_day"))
  ELEMENT: interval(position(summary.count(bin.rect(upload_day))), shape.interior(shape.square))
END GPL.




GGraph

Graph
Simple Histogram of upload_day

Graph






upload_day is unimodal, positively skewed







DATASET ACTIVATE DataSet1.
DATASET CLOSE DataSet2.
DATASET ACTIVATE DataSet1.
DATASET CLOSE DataSet3.
PRESERVE.
SET DECIMAL DOT.

GET DATA  /TYPE=TXT
  /FILE="C:\Users\jb1094\Downloads\nhanes_selectvars_n500.csv"
  /ENCODING='UTF8'
  /DELIMITERS=","
  /QUALIFIER='"'
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /DATATYPEMIN PERCENTAGE=95.0
  /VARIABLES=
  ID AUTO
  Gender AUTO
  Age AUTO
  Weight AUTO
  Height AUTO
  AlcoholDay AUTO
  Depressed AUTO
  BMI AUTO
  SleepHrsNight AUTO
  SleepTrouble AUTO
  /MAP.
RESTORE.
CACHE.
EXECUTE.

Data written to the working file.
10 variables and 500 cases written.
Variable: ID                 Type: Number  Format : F5
Variable: Gender             Type: String  Format : A6
Variable: Age                Type: Number  Format : F2
Variable: Weight             Type: Number  Format : F5.1       One or more values were set to system-missing.
Variable: Height             Type: Number  Format : F5.1       One or more values were set to system-missing.
Variable: AlcoholDay         Type: String  Format : A2
Variable: Depressed          Type: String  Format : A7
Variable: BMI                Type: Number  Format : F5.2       One or more values were set to system-missing.
Variable: SleepHrsNight      Type: String  Format : A2
Variable: SleepTrouble       Type: String  Format : A3

Substitute the following to build syntax for these data.
  /VARIABLES=
   ID F5
   Gender A6
   Age F2
   Weight F5.1
   Height F5.1
   AlcoholDay A2
   Depressed A7
   BMI F5.2
   SleepHrsNight A2
   SleepTrouble A3

DATASET NAME DataSet4 WINDOW=FRONT.
EXAMINE VARIABLES=Height
  /PLOT BOXPLOT HISTOGRAM NPPLOT
  /COMPARE GROUPS
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.








### Height is negatively skewed, non-normal, lots of low value outliers


Explore


[DataSet4] 

Case Processing Summary
 Cases
ValidMissingTotal
NPercentNPercentNPercent
Height47795.4%234.6%500100.0%
Descriptives
 StatisticStd. Error
HeightMean159.8591.0004
95% Confidence Interval for MeanLower Bound157.894 
Upper Bound161.825 
5% Trimmed Mean161.858 
Median165.400 
Variance477.393 
Std. Deviation21.8493 
Minimum84.9 
Maximum194.5 
Range109.6 
Interquartile Range19.5 
Skewness-1.546.112
Kurtosis2.104.223
Tests of Normality
 Kolmogorov-SmirnovaShapiro-Wilk
StatisticdfSig.StatisticdfSig.
Height.158477.000.844477.000
a. Lilliefors Significance Correction


Height

Histogram

Histogram
Normal Q-Q Plot
Normal Q-Q Plot of Height

Normal Q-Q Plot
Detrended Normal Q-Q Plot
Detrended Normal Q-Q Plot of Height

Detrended Normal Q-Q Plot
Boxplot

Boxplot







USE ALL.
COMPUTE filter_$=(Age >= 18).
VARIABLE LABELS filter_$ 'Age >= 18 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
EXAMINE VARIABLES=Height
  /PLOT BOXPLOT HISTOGRAM NPPLOT
  /COMPARE GROUPS
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.








For individuals age 18 or older, Height is normally distributed, no outliers, likely due to the age range (young individuals that have not reached adult Height)


Explore

Case Processing Summary
 Cases
ValidMissingTotal
NPercentNPercentNPercent
Height33699.1%30.9%339100.0%
Descriptives
 StatisticStd. Error
HeightMean168.629.5575
95% Confidence Interval for MeanLower Bound167.533 
Upper Bound169.726 
5% Trimmed Mean168.680 
Median168.750 
Variance104.427 
Std. Deviation10.2189 
Minimum143.1 
Maximum194.5 
Range51.4 
Interquartile Range13.9 
Skewness-.070.133
Kurtosis-.399.265
Tests of Normality
 Kolmogorov-SmirnovaShapiro-Wilk
StatisticdfSig.StatisticdfSig.
Height.035336.200*.995336.432
a. Lilliefors Significance Correction
*. This is a lower bound of the true significance.


Height

Histogram

Histogram
Normal Q-Q Plot
Normal Q-Q Plot of Height

Normal Q-Q Plot
Detrended Normal Q-Q Plot
Detrended Normal Q-Q Plot of Height

Detrended Normal Q-Q Plot
Boxplot

Boxplot







PRESERVE.
SET DECIMAL DOT.

GET DATA  /TYPE=TXT
  /FILE="C:\Users\jb1094\Downloads\cort-hypothetical.txt"
  /ENCODING='UTF8'
  /DELCASE=LINE
  /DELIMITERS="\t"
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /DATATYPEMIN PERCENTAGE=95.0
  /VARIABLES=
  ID AUTO
  cortisol_baseline AUTO
  /MAP.
RESTORE.

CACHE.
EXECUTE.

Data written to the working file.
2 variables and 400 cases written.
Variable: ID                 Type: Number  Format : F5
Variable: cortisol_baseline   Type: Number  Format : F6.3       One or more values were set to system-missing.

Substitute the following to build syntax for these data.
  /VARIABLES=
   ID F5
   cortisol_baseline F6.3

DATASET NAME DataSet5 WINDOW=FRONT.
EXAMINE VARIABLES=cortisol_baseline
  /PLOT BOXPLOT HISTOGRAM NPPLOT
  /COMPARE GROUPS
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.








cortisol_baseline is positively skewed, non-normal


Explore


[DataSet5] 

Case Processing Summary
 Cases
ValidMissingTotal
NPercentNPercentNPercent
cortisol_baseline39097.5%102.5%400100.0%
Descriptives
 StatisticStd. Error
cortisol_baselineMean4.79593.170363
95% Confidence Interval for MeanLower Bound4.46098 
Upper Bound5.13088 
5% Trimmed Mean4.43909 
Median3.93150 
Variance11.319 
Std. Deviation3.364405 
Minimum.389 
Maximum28.294 
Range27.905 
Interquartile Range3.281 
Skewness2.453.124
Kurtosis10.088.247
Tests of Normality
 Kolmogorov-SmirnovaShapiro-Wilk
StatisticdfSig.StatisticdfSig.
cortisol_baseline.140390.000.805390.000
a. Lilliefors Significance Correction


cortisol_baseline

Histogram

Histogram
Normal Q-Q Plot
Normal Q-Q Plot of cortisol_baseline

Normal Q-Q Plot
Detrended Normal Q-Q Plot
Detrended Normal Q-Q Plot of cortisol_baseline

Detrended Normal Q-Q Plot
Boxplot

Boxplot







COMPUTE log_cortisol_baseline=LN(cortisol_baseline).
EXECUTE.
EXAMINE VARIABLES=log_cortisol_baseline
  /PLOT BOXPLOT HISTOGRAM NPPLOT
  /COMPARE GROUPS
  /STATISTICS DESCRIPTIVES
  /CINTERVAL 95
  /MISSING LISTWISE
  /NOTOTAL.








log transformed cortisol_baseline is (close to) normally distributed


Explore

Case Processing Summary
 Cases
ValidMissingTotal
NPercentNPercentNPercent
log_cortisol_baseline39097.5%102.5%400100.0%
Descriptives
 StatisticStd. Error
log_cortisol_baselineMean1.3687.03226
95% Confidence Interval for MeanLower Bound1.3053 
Upper Bound1.4322 
5% Trimmed Mean1.3758 
Median1.3690 
Variance.406 
Std. Deviation.63702 
Minimum-.94 
Maximum3.34 
Range4.29 
Interquartile Range.80 
Skewness-.149.124
Kurtosis.528.247
Tests of Normality
 Kolmogorov-SmirnovaShapiro-Wilk
StatisticdfSig.StatisticdfSig.
log_cortisol_baseline.047390.041.994390.139
a. Lilliefors Significance Correction


log_cortisol_baseline

Histogram

Histogram
Normal Q-Q Plot
Normal Q-Q Plot of log_cortisol_baseline

Normal Q-Q Plot
Detrended Normal Q-Q Plot
Detrended Normal Q-Q Plot of log_cortisol_baseline

Detrended Normal Q-Q Plot
Boxplot

Boxplot