June 1, 2018; version 1.0README for SAS Code to Calculate GPQI-2016 Total and Component Scores © 2018 Philip J. Brewster, John F. Hurdle and Patricia M. GuentherINTRODUCTIONThis SAS program can be used to calculate Grocery Purchase Quality Index-2016 (GPQI-2016) total and component scores from food purchase data (dollars and cents) that have been summarized into the 29 categories of the USDA Food Plans. The code can be adapted to calculate GPQI-2016 scores for data that use a smaller number of categories (see Table 1 below). This program calculates GPQI-2016 component and total scores for households, using a unique identifier (HHNUM) representing one row of the data. When multiple time periods of shopping data are available, it is possible to add secondary identifiers (e.g., week, month) to calculate the GPQI-2016 scores for each of those periods. In this case, there will be more than one row per household ID. This code creates unweighted statistics for the GPQI-2016 scores. Consult the SAS documentation for the PROC SURVEY* options to account for sample weights and complex survey designs in the analysis. This program was tested using SAS, version 9.4. Updates to this program will be posted at the Hive: University of Utah Research Data Respository. For details on the development of the GPQI-2016, read Brewster et al., 2017, Journal of Food Composition and Analysis 64(1):119-126. LICENSE This code is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please view the license at http://creativecommons.org/licenses/by-nc-sa/4.0/. This code may be used for personal, educational, or research purposes, but may not be used for commercial purposes without the expressed consent of the copyright holders. Contact John F. Hurdle (john.hurdle@utah.edu) regarding commercial use. REQUIRED DATASETS A CSV (comma-separated-values) file with the unit of analysis (i.e., the household) labeled as HHNUM in the first column, followed by a series of columns (an array), labeled c1-c31, that summarize the food purchases (cost) in dollars and cents (0.00) for the 29 Food Plan categories (c1-c29), for bottled water (c30), and for purchases that do not belong in any of these categories (e.g., alcohol) (c31). If a category contains no purchases or is not used in the 15-category version of the category map (see Table 1 below), that position in the array should contain zeroes (0.00). However, the column array (c1- c31) should be complete, including the maximum expected dimensions (31), or the SAS code will need to be revised to accommodate different input formats. The CSV file’s path or directory name should be specified in the first SAS code FILENAME statement so that it can be referred to in the code that follows (as input_HH). OUTPUT DATASET The output dataset is described in Step 5 below. PROCESSING This program carries out 5 steps: Step 1. Locate the required datasets and variables, and make necessary edits to the datasets.a. Input standardized expenditure shares (29 rows) and output as a 1 row array. This array will be merged (appended) to each row of the input file in the next step. b. Read in the CSV file described above and referenced by SAS FILENAME input_HH using SAS PROC IMPORT. Sum the purchases per food category (c1-c30) to obtain the total spent on foods in the Food Plans plus bottled water, creating variable fpspend. The total fpspend is the denominator for calculating the observed expenditure shares for each household (HHNUM). The code also sums the purchases that fall outside the Food Plan category schema (fpspend + c31 = totspend). Though it is not used for scoring the GPQI-2016, it may be informative to see the percent (share) of expenditures in c31 that were not accounted for in the Food Plan model, or perhaps to see if any missing food types falling under c31 might require reclassification (remapping). SAS PROC UNIVARIATE will display descriptive statistics for each of the food category expenditures (c1-c31) and for these totals. Step 2. Calculate the ratios which are the basis for scoring the GPQI-2016. For all 30 food category expenditure array positions used in scoring, calculate the observed expenditure share. x1-x30 = c1-c30 / fpspend SAS PROC UNIVARIATE will display descriptive statistics for these observed household expenditure shares per category. Obtain the ratio of the observed expenditure share to the standardized expenditure share for scoring. r1-r29 = (x1-x29 / y1-y29 ) * 100 Note: there is no standardized expenditure share or expected value for purchases of bottled water ( x30 ). Step 3. Recast the Food Plan food categories to GPQI-2016 food components. The program uses the mappings described in the table below to group the 29 Food Plan categories into their corresponding GPQI-2016 components. Also shown is a mapping from 15 food categories, collapsed from the 29, to the 11 GPQI-2016 components. Table 1. Map of food categories to GPQI-2016 component scoring numbers Food Plan Food Plan Alternate GPQI-2016 GPQI-2016 category number category mapping number description scoring number description using 15 (score1-score11 collapsed food in the SAS code) categories instead of 29 ________________________________________________________________________________________________________ 1 Whole grain 1 Whole grains 5 bread, rice, pasta, pastries (including whole grain flours) 2 Whole grain 1 Whole grains 5 cereals (including hot cereal mixes) 3 Popcorn and 1 Whole grains 5 other whole- grain snacks 4 Non-whole grain 4 Refined grains 9 bread, cereal, rice, pasta, pies, pastries, snacks, flours 5 All potato 9 Other vegetables 1 products 6 Dark green 6 Greens and beans 2 vegetables 7 Orange 9 Other vegetables 1 vegetables 8 Canned and dry 6 Greens and beans 2 beans, lentils, legumes 9 Other vegetables 9 Other vegetables 1 10 Whole fruits 10 Whole fruit 4 11 Fruit juices 11 100% fruit juices 3 12 Whole milk, 12 Regular-fat dairy 6 yogurt, and cream 13 Lower-fat and 13 Lower-fat dairy 6 skim milk and yogurt 14 All cheese 12 Regular-fat dairy 6 (including cheese soup and sauce) 15 Milk drinks and 25 Sweets and sodas 10 milk desserts 16 Beef, pork, veal, 16 Meat, poultry, eggs 7 and lamb 17 Chicken and 16 Meat, poultry, eggs 7 turkey 18 Fish and fish 18 Seafood and nuts 8 products 19 Bacon, sausages, 19 Processed meats 11 lunch meats (including spreads) 20 Nuts, nut butters, 18 Seafood and nuts 8 and seeds 21 Eggs and egg 16 Meat, poultry, eggs 7 mixtures 22 Table fats, oils, 29 Unused 99 and salad dressing 23 Gravies, sauces, 29 Unused 99 condiments, and spices 24 Coffee and tea 29 Unused 99 25 Soft drinks, sodas, 25 Sweets and sodas 10 fruit drinks, and ades 26 Sugars, sweets, 25 Sweets and sodas 10 and candies 27 Soups, ready to 29 Unused 99 serve and condensed 28 Soups, dry 29 Unused 99 29 Frozen and 29 Unused 99 refrigerated entrees and other mixed foods 30 Bottled water 30 Unused 99 31 Other foods not 31 Unused 99 in Food Plans (e.g., alcohol, baby foods) Step 4. Run the GPQI-2016 scoring steps which calculate the GPQI-2016 total and component scores. The scoring standards for the index were created by summing the standardized expenditure shares for the Food Plan categories that contribute to each of the 11 components. Similarly, the observed household expenditure shares per Food Plan category are summed. To obtain a score for each of the 11 GPQI-2016 components, the aggregate observed food category expenditure share is first divided by the standardized expenditure share. For the adequacy components (listed in Table 2 below), multiplying the ratio by the maximum points for the component results in the component score. For ratios greater than 1.0, the score is constrained so as to equal the maximum number of points. For the Dairy component, the two Food Plan categories with regular fat content (or the one in the 15-category schema) are constrained so that expenditures from these categories cannot contribute more than their per-category share. For the moderation components (listed in Table 2 below), a ratio of 1.0 is assigned the maximum of point. To establish the cut-point (i.e., the upper bound ratio value >1.0) for assigning the minimum score of zero, the authors used estimates from the USDA/ERS National Food Purchase and Acquisition Survey, 2012 (FoodAPS). That survey collected information on all foods purchased or otherwise acquired during a 7-day period by 4,826 households in the 48 contiguous states. The data sets include the prices paid for foods at the item level. Then the relevant categories were combined into the moderation categories for scoring (i.e., Refined Grains, Processed Meats, and Sweets and Sodas). The authors computed the base ratio values for food items purchased at stores in each of the moderation categories and then estimated the population’s 85th percentile. These values were 4.9 for Refined Grains, 18.9 for Processed Meats, and 14.9 for Sweets and Sodas. Ratios at the 85th percentile or higher are assigned a score of zero. Ratios from 1.0 to the value at the 85th percentile are assigned scores linearly descending from the maximum points (MaxPoints) to zero. Expressed mathematically, Score = MaxPoints*(max(0.0,min(1.0,1.0-((ratio-1.0)/(85pctl-1.0))))). Finally, the total GPQI-2016 score is the sum of the 11 component scores. A maximum of 75 points is possible. The GPQI-2016 components and their maximum scores are listed in Table 2. Refer to the SAS code for the standardized expenditure shares that correspond to the point system. Table 2. GPQI–2016 components and maximum scores Component Max. score _________________________ Adequacy: Total Fruits 5 Whole Fruits 5 Total Vegetables 5 Greens and Beans 5 Whole Grains 10 Dairy 10 Total Protein Foods 5 Seafood and Nuts 5 Moderation: Processed Meats 5 Refined Grains 10 Sweets and Sodas 10 Total score 75 Step 5. Display and save the results. a. The SAS program saves one GPQI-2016 total score and one set of component scores for each household row (HHNUM) and exports them to a CSV file with SAS FILENAME: scores. (A local file name and path must be specified by the user at the top of the program.)b. Finally, the SAS code calculates an unweighted set of the mean and median GPQI-2016 scores for all households using PROC SURVEYMEANS. Saved results are exported to an Excel file called SAS FILENAME: stats. (Again, a local file name and path must be specified by the user at the top of the program.) Please send any comments or questions regarding this code to phil.brewster@utah.edu. This code may be used for personal, educational, or research purposes, but may not be used for commercial purposes without the expressed consent of the copyright holders. Contact John F. Hurdle (john.hurdle@utah.edu) regarding commercial use.