June 1, 2018; version 1.0README for SAS Code to Calculate GPQI-2016 Total and Component Scores
© 2018 Philip J. Brewster, John F. Hurdle and Patricia M. GuentherINTRODUCTIONThis SAS program can be used to calculate Grocery Purchase Quality Index-2016 (GPQI-2016) total and
component scores from food purchase data (dollars and cents) that have been summarized into the 29
categories of the USDA Food Plans. The code can be adapted to calculate GPQI-2016 scores for data that
use a smaller number of categories (see Table 1 below).
This program calculates GPQI-2016 component and total scores for households, using a unique identifier
(HHNUM) representing one row of the data. When multiple time periods of shopping data are available,
it is possible to add secondary identifiers (e.g., week, month) to calculate the GPQI-2016 scores for each
of those periods. In this case, there will be more than one row per household ID.
This code creates unweighted statistics for the GPQI-2016 scores. Consult the SAS documentation for
the PROC SURVEY* options to account for sample weights and complex survey designs in the analysis.
This program was tested using SAS, version 9.4.
Updates to this program will be posted at the Hive: University of Utah Research Data Respository.
For details on the development of the GPQI-2016, read Brewster et al., 2017, Journal of Food
Composition and Analysis 64(1):119-126.
LICENSE
This code is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International License. Please view the license at http://creativecommons.org/licenses/by-nc-sa/4.0/.
This code may be used for personal, educational, or research purposes, but may not be used for
commercial purposes without the expressed consent of the copyright holders. Contact John F. Hurdle
(john.hurdle@utah.edu) regarding commercial use.
REQUIRED DATASETS
A CSV (comma-separated-values) file with the unit of analysis (i.e., the household) labeled as HHNUM in
the first column, followed by a series of columns (an array), labeled c1-c31, that summarize the food
purchases (cost) in dollars and cents (0.00) for the 29 Food Plan categories (c1-c29), for bottled water
(c30), and for purchases that do not belong in any of these categories (e.g., alcohol) (c31).
If a category contains no purchases or is not used in the 15-category version of the category map (see
Table 1 below), that position in the array should contain zeroes (0.00). However, the column array (c1-
c31) should be complete, including the maximum expected dimensions (31), or the SAS code will need
to be revised to accommodate different input formats.
The CSV file’s path or directory name should be specified in the first SAS code FILENAME statement so
that it can be referred to in the code that follows (as input_HH).
OUTPUT DATASET
The output dataset is described in Step 5 below.
PROCESSING
This program carries out 5 steps:
Step 1. Locate the required datasets and variables, and make necessary edits to the datasets.a. Input standardized expenditure shares (29 rows) and output as a 1 row array.
This array will be merged (appended) to each row of the input file in the next step.
b. Read in the CSV file described above and referenced by SAS FILENAME input_HH using SAS
PROC IMPORT.
Sum the purchases per food category (c1-c30) to obtain the total spent on foods in the Food
Plans plus bottled water, creating variable fpspend.
The total fpspend is the denominator for calculating the observed expenditure shares for each
household (HHNUM).
The code also sums the purchases that fall outside the Food Plan category schema (fpspend +
c31 = totspend).
Though it is not used for scoring the GPQI-2016, it may be informative to see the percent (share)
of expenditures in c31 that were not accounted for in the Food Plan model, or perhaps to see if
any missing food types falling under c31 might require reclassification (remapping).
SAS PROC UNIVARIATE will display descriptive statistics for each of the food category
expenditures (c1-c31) and for these totals.
Step 2. Calculate the ratios which are the basis for scoring the GPQI-2016.
For all 30 food category expenditure array positions used in scoring, calculate the observed
expenditure share.
x1-x30 = c1-c30 / fpspend
SAS PROC UNIVARIATE will display descriptive statistics for these observed household expenditure
shares per category.
Obtain the ratio of the observed expenditure share to the standardized expenditure share for
scoring.
r1-r29 = (x1-x29 / y1-y29 ) * 100
Note: there is no standardized expenditure share or expected value for purchases of bottled water
( x30 ).
Step 3. Recast the Food Plan food categories to GPQI-2016 food components.
The program uses the mappings described in the table below to group the 29 Food Plan categories
into their corresponding GPQI-2016 components. Also shown is a mapping from 15 food categories,
collapsed from the 29, to the 11 GPQI-2016 components.
Table 1. Map of food categories to GPQI-2016 component scoring numbers
Food Plan Food Plan Alternate
GPQI-2016 GPQI-2016
category number
category mapping number description scoring number description
using 15 (score1-score11 collapsed food
in the SAS code) categories
instead of 29
________________________________________________________________________________________________________
1
Whole grain 1
Whole grains
5
bread, rice,
pasta,
pastries (including whole
grain flours)
2
Whole grain
1 Whole grains
5
cereals (including
hot cereal mixes)
3
Popcorn and 1
Whole grains
5 other whole-
grain
snacks
4
Non-whole grain
4
Refined grains
9
bread, cereal,
rice, pasta, pies,
pastries, snacks, flours
5
All potato
9
Other vegetables
1 products
6
Dark green 6
Greens and
beans
2 vegetables
7
Orange
9
Other vegetables
1
vegetables
8
Canned and dry
6
Greens and
beans
2
beans, lentils, legumes
9
Other vegetables
9
Other vegetables
1
10
Whole fruits
10
Whole fruit
4
11
Fruit juices
11
100% fruit juices
3
12
Whole milk, 12
Regular-fat dairy
6
yogurt, and cream
13
Lower-fat and 13
Lower-fat dairy
6 skim milk and
yogurt
14
All cheese
12
Regular-fat dairy
6 (including cheese soup and sauce)
15
Milk drinks and 25 Sweets and sodas
10 milk desserts
16
Beef, pork, veal,
16
Meat, poultry,
eggs
7 and lamb
17
Chicken and 16
Meat, poultry,
eggs
7 turkey
18
Fish and fish 18
Seafood and nuts
8
products
19 Bacon, sausages, 19
Processed meats
11 lunch meats
(including
spreads)
20
Nuts, nut butters, 18
Seafood and nuts
8
and seeds
21
Eggs and egg 16
Meat, poultry,
eggs
7
mixtures
22
Table fats, oils, 29
Unused
99 and salad dressing
23
Gravies, sauces, 29
Unused
99 condiments, and spices
24
Coffee and tea
29
Unused
99
25
Soft drinks,
sodas,
25
Sweets and sodas
10
fruit
drinks,
and ades
26
Sugars, sweets, 25
Sweets and sodas
10
and candies
27
Soups, ready to
29
Unused
99 serve and condensed
28
Soups, dry
29
Unused
99
29
Frozen and 29
Unused
99
refrigerated entrees and
other
mixed
foods
30
Bottled water
30
Unused
99
31
Other foods not
31
Unused
99 in Food Plans
(e.g., alcohol,
baby foods)
Step 4. Run the GPQI-2016 scoring steps which calculate the GPQI-2016 total and component scores.
The scoring standards for the index were created by summing the standardized expenditure shares
for the Food Plan categories that contribute to each of the 11 components. Similarly, the observed
household expenditure shares per Food Plan category are summed. To obtain a score for each of the
11 GPQI-2016 components, the aggregate observed food category expenditure share is first divided
by the standardized expenditure share.
For the adequacy components (listed in Table 2 below), multiplying the ratio by the maximum points
for the component results in the component score. For ratios greater than 1.0, the score is
constrained so as to equal the maximum number of points. For the Dairy component, the two Food
Plan categories with regular fat content (or the one in the 15-category schema) are constrained so
that expenditures from these categories cannot contribute more than their per-category share.
For the moderation components (listed in Table 2 below), a ratio of 1.0 is assigned the maximum of
point. To establish the cut-point (i.e., the upper bound ratio value >1.0) for assigning the minimum
score of zero, the authors used estimates from the USDA/ERS National Food Purchase and
Acquisition Survey, 2012 (FoodAPS). That survey collected information on all foods purchased or
otherwise acquired during a 7-day period by 4,826 households in the 48 contiguous states. The data
sets include the prices paid for foods at the item level. Then the relevant categories were combined
into the moderation categories for scoring (i.e., Refined Grains, Processed Meats, and Sweets and
Sodas). The authors computed the base ratio values for food items purchased at stores in each of
the moderation categories and then estimated the population’s 85th percentile. These values were
4.9 for Refined Grains, 18.9 for Processed Meats, and 14.9 for Sweets and Sodas. Ratios at the 85th
percentile or higher are assigned a score of zero. Ratios from 1.0 to the value at the 85th percentile
are assigned scores linearly descending from the maximum points (MaxPoints) to zero. Expressed
mathematically,
Score = MaxPoints*(max(0.0,min(1.0,1.0-((ratio-1.0)/(85pctl-1.0))))).
Finally, the total GPQI-2016 score is the sum of the 11 component scores. A maximum of 75 points is
possible. The GPQI-2016 components and their maximum scores are listed in Table 2. Refer to the
SAS code for the standardized expenditure shares that correspond to the point system.
Table 2. GPQI–2016 components and maximum scores
Component
Max.
score
_________________________
Adequacy:
Total Fruits
5
Whole Fruits
5
Total Vegetables
5
Greens and Beans
5
Whole Grains
10
Dairy
10
Total Protein Foods
5
Seafood and Nuts
5
Moderation:
Processed Meats
5
Refined Grains
10
Sweets and Sodas
10
Total score
75
Step
5. Display and save the results. a. The SAS program saves one GPQI-2016 total score and one set of component scores for each
household row (HHNUM) and exports them to a CSV file with SAS FILENAME: scores. (A local file
name and path must be specified by the user at the top of the program.)b. Finally, the SAS code calculates an unweighted set of the mean and median GPQI-2016 scores for
all households using PROC SURVEYMEANS. Saved results are exported to an Excel file called SAS
FILENAME: stats. (Again, a local file name and path must be specified by the user at the top of
the program.)
Please send any comments or questions regarding this code to phil.brewster@utah.edu.
This code may be used for personal, educational, or research purposes, but may not be used for
commercial purposes without the expressed consent of the copyright holders. Contact John F. Hurdle
(john.hurdle@utah.edu) regarding commercial use.