Which Of The Following Is Not An Example Of Primary Data Obtained Through Observation?

Combining SAS Information Sets: Methods

Definition

Concatenating data sets is the combining of two or more data sets, one later the other, into a single data set. The number of observations in the new data set up is the sum of the number of observations in the original data sets. The order of observations is sequential. All observations from the get-go data set are followed by all observations from the second data set, and so on.

In the simplest case, all input data sets contain the same variables. If the input data sets contain different variables, observations from i data set take missing values for variables defined but in other data sets. In either case, the variables in the new data set are the aforementioned as the variables in the old data sets.

Syntax

Utilize this form of the SET statement to concatenate information sets:

where

data-set: specifies any valid SAS information prepare proper noun.

For a complete clarification of the SET argument, see SAS Language Reference: Dictionary.

Data Stride Processing During Concatenation

Compilation stage: SAS reads the descriptor data of each data prepare that is named in the Set statement and so creates a programme data vector that contains all the variables from all data sets as well as variables created by the DATA step.
Execution -- Step 1: SAS reads the first observation from the kickoff data set into the plan data vector. It processes the first observation and executes other statements in the Information step. Information technology so writes the contents of the program data vector to the new data gear up. The Set up statement does not reset the values in the program data vector to missing, except for variables whose value is calculated or assigned during the Data step.
Execution -- Step ii: SAS continues to read one ascertainment at a time from the starting time data fix until it finds an stop-of-file indicator. The values of the variables in the program data vector are then ready to missing, and SAS begins reading observations from the 2d data ready and so forth until it reads all observations from all information sets.

Instance 1: Concatenation Using the DATA Footstep

In this example, each data fix contains the variables Mutual and Number, and the observations are arranged in the order of the values of Common. Generally, y'all concatenate SAS data sets that have the same variables. In this case, each information gear up also contains a unique variable to show the effects of combining data sets more conspicuously. The following shows the Fauna and the PLANT input data sets in the library that is referenced by the libref Case:

        Fauna                            PLANT  OBS  Common  Creature  Number       OBS  Common  Constitute     Number   1     a     Ant       5           1     g     Grape       69  2     b     Bird                  2     h     Hazelnut    55  3     c     Cat      17           3     i     Indigo     4     d     Dog       nine           4     j     Jicama      xiv   5     e     Eagle                 v     yard     Kale         v  six     f     Frog     76           6     l     Lentil      77

The following program uses a SET statement to concatenate the data sets and so prints the results:

libname example 'SAS-data-library';  data example.chain;    fix example.animate being example.establish; run;  proc print information=instance.concatenation;    var Common Creature Plant Number;    title 'Information Fix Chain'; run;

Concatenated Data Sets (Information Step)

            Data Ready Chain                            1                   Obs    Common    Animal    Plant       Number                     1      a       Ant                      5                      2      b       Bird                     .                      3      c       Cat                     17                      4      d       Dog                      nine                      five      e       Eagle                    .                      6      f       Frog                    76                      seven      g                 Grape         69                      8      h                 Hazelnut      55                      ix      i                 Indigo         .                     ten      j                 Jicama        14                     11      k                 Kale           5                     12      l                 Lentil        77

The resulting data set CONCATENATION has 12 observations, which is the sum of the observations from the combined data sets. The plan data vector contains all variables from all data sets. The values of variables plant in 1 data set but not in another are set to missing.

Instance ii: Concatenation Using SQL

Y'all can also utilize the SQL linguistic communication to concatenate tables. In this example, SQL reads each row in both tables and creates a new table named COMBINED. The post-obit shows the YEAR1 and YEAR2 input tables:

YEAR1                 YEAR2  Date1                 Date2   1996                      1997                  1997 1998                  1998  1999                  1999                       2000                       2001

The post-obit SQL code creates and prints the table COMBINED.

proc sql;    title 'SQL Table COMBINED';    create table combined as       select * from year1       outer union corr       select * from year2;       select * from combined; quit;

Concatenated Tables (SQL)

            SQL Table COMBINED                              1                                          Year                                     --------                                         1996                                         1997                                         1998                                         1999                                         1997                                         1998                                         1999                                         2000                                         2001

Appending Files

Instead of concatenating information sets or tables, you can append them and produce the aforementioned results as chain. SAS concatenates data sets (DATA step) and tables (SQL) by reading each row of data to create a new file. To avert reading all the records, you can suspend the second file to the commencement file past using the Suspend procedure:

proc suspend base of operations=year1 data=year2; run;

The YEAR1 file volition comprise all rows from both tables.

Note: Yous cannot use PROC Suspend to add observations to a SAS information set up in a sequential library. [cautionend]

Efficiency

If no additional processing is necessary, using PROC APPEND or the APPEND statement in PROC DATASETS is more efficient than using a DATA step to concatenate data sets.

Definition

Interleaving uses a SET argument and a BY statement to combine multiple information sets into one new data ready. The number of observations in the new data set is the sum of the number of observations from the original data sets. Notwithstanding, the observations in the new data set are arranged by the values of the BY variable or variables and, within each By group, by the order of the data sets in which they occur. You can interleave data sets either by using a Past variable or past using an index.

Syntax

Use this course of the SET argument to interleave data sets when you use a BY variable:

where

data-set: specifies a one-level proper name, a two-level name, or one of the special SAS data set up names.
variable: specifies each variable by which the data set is sorted. These variables are referred to as BY variables for the electric current DATA or PROC stride.

Apply this course of the SET argument to interleave data sets when you use an index:

Prepare data-set-1 . . . data-gear up-northward Primal= alphabetize;

where

data-set up: specifies a ane-level name, a two-level name, or one of the special SAS information gear up names.
index: provides nonsequential access to observations in a SAS information set, which are based on the value of an index variable or cardinal.

For a consummate description of the SET statement, including SET with the Primal= option, run across SAS Language Reference: Dictionary.

Sort Requirements

Before y'all tin can interleave data sets, the observations must be sorted or grouped by the aforementioned variable or variables that yous employ in the BY statement, or y'all must have an appropriate alphabetize for the information sets.

Information Step Processing During Interleaving

Compilation phase

SAS reads the descriptor information of each data set that is named in the Gear up statement and so creates a programme data vector that contains all the variables from all data sets also as variables created by the DATA pace.
SAS creates the Outset.variable and LAST.variable for each variable listed in the By statement.

Execution -- Footstep 1

SAS compares the first ascertainment from each information gear up that is named in the Set up statement to determine which By group should appear first in the new data ready. It reads all observations from the first BY group from the selected data set. If this Past group appears in more than one data prepare, it reads from the data sets in the gild in which they appear in the Prepare statement. The values of the variables in the program data vector are set to missing each time SAS starts to read a new information prepare and when the BY grouping changes.

Execution -- Step 2

SAS compares the next observations from each data gear up to determine the adjacent By grouping and then starts reading observations from the selected data set in the Set statement that contains observations for this Past grouping. SAS continues until it has read all observations from all data sets.

Example 1: Interleaving in the Simplest Case

In this example, each data set contains the BY variable Common, and the observations are arranged in social club of the values of the BY variable. The following shows the Beast and the PLANT input data sets in the library that is referenced by the libref Case:

        Fauna                         PLANT     OBS  Common  Animal         OBS  Mutual  Establish        one     a     Ant             ane     a     Apple     2     b     Bird            two     b     Banana     three     c     Cat             3     c     Coconut     four     d     Canis familiaris             4     d     Dewberry     5     east     Eagle           5     e     Eggplant     half-dozen     f     Frog            6     f     Fig

The following program uses Set up and BY statements to interleave the data sets, and prints the results:

data instance.interleaving;    prepare instance.animal case.institute;    by Common; run;  proc print data=example.interleaving;    title 'Data Set INTERLEAVING'; run;

Interleaved Data Sets

            Data Set INTERLEAVING                             ane                        Obs    common    animal    plant                          i      a       Pismire                                        2      a                 Apple                            3      b       Bird                                       4      b                 Assistant                           5      c       Cat                                        half dozen      c                 Coconut                          7      d       Dog                                        eight      d                 Dewberry                         9      e       Hawkeye                                     x      e                 Eggplant                        xi      f       Frog                                      12      f                 Fig

The resulting data set INTERLEAVING has 12 observations, which is the sum of the observations from the combined data sets. The new information set contains all variables from both data sets. The value of variables found in 1 data set simply not in the other are set to missing, and the observations are arranged by the values of the Past variable.

Example ii: Interleaving with Duplicate Values of the Past variable

If the data sets contain duplicate values of the Past variables, the observations are written to the new data set in the club in which they occur in the original data sets. This instance contains duplicate values of the BY variable Common. The following shows the ANIMAL1 and PLANT1 input data sets:

        ANIMAL1                      PLANT1      OBS  Common  Animal1         OBS  Mutual  Plant1           1     a      Emmet             i     a     Apple  2     a      Ape             2     b     Banana  3     b      Bird            iii     c     Kokosnoot  four     c      Cat             4     c     Celery  five     d      Dog             five     d     Dewberry  vi     eastward      Eagle           half dozen     eastward     Eggplant

The following program uses SET and BY statements to interleave the data sets, and prints the results:

information example.interleaving2;    prepare instance.animal1 example.plant1;    by Common; run;  proc print data=example.interleaving2;    title 'Data Set INTERLEAVING2: Duplicate By Values'; run;

Interleaved Data Sets with Duplicate Values of the BY Variable

            Data Set INTERLEAVING2: Indistinguishable BY Values                1                        Obs    Common    Animal1    Plant1                          one      a        Emmet                                        two      a        Ape                                        3      a                  Apple tree                            4      b        Bird                                       5      b                  Banana                           6      c        Cat                                        7      c                  Coconut                          8      c                  Celery                           9      d        Dog                                       ten      d                  Dewberry                        11      e        Eagle                                     12      e                  Eggplant

The number of observations in the new data set is the sum of the observations in all the data sets. The observations are written to the new data set in the order in which they occur in the original data sets.

Example 3: Interleaving with Different Past Values in Each Data Fix

The data sets ANIMAL2 and PLANT2 both contain By values that are present in one data set but not in the other. The following shows the ANIMAL2 and the PLANT2 input data sets:

        ANIMAL2                        PLANT2  OBS  Common  Animal2          OBS  Common  Plant2   1     a      Ant              1     a     Apple tree  two     c      Cat              2     b     Banana  3     d      Dog              3     c     Coconut  4     e      Eagle            4     due east     Eggplant                                5     f     Fig

This program uses Fix and BY statements to interleave these data sets, and prints the results:

data example.interleaving3;    set example.animal2 example.plant2;    by Common; run;  proc print data=example.interleaving3;    championship 'Data Ready INTERLEAVING3: Different Past Values'; run;

Interleaving Data Sets with Unlike Past Values

            Data Set INTERLEAVING3: Different Past Values                  one                        Obs    Common    Animal2    Plant2                         1       a        Ant                                       two       a                  Apple                           iii       b                  Banana                          4       c        Cat                                       5       c                  Kokosnoot                         6       d        Dog                                       seven       e        Hawkeye                                     eight       e                  Eggplant                        9       f                  Fig

The resulting data set has nine observations arranged past the values of the By variable.

Comments and Comparisons

In other languages, the term merge is often used to mean interleave. SAS reserves the term merge for the operation in which observations from two or more data sets are combined into one observation. The observations in interleaved data sets are not combined; they are copied from the original data sets in the order of the values of the By variable.
If ane table has multiple rows with the aforementioned BY value, the Information pace preserves the club of those rows in the result.
To utilize the DATA stride, the input tables must be appropriately sorted or indexed. SQL does not require the input tables to be in order.

Definition

I-to-one reading combines observations from two or more than data sets into one observation by using two or more Ready statements to read observations independently from each data set up. This process is also called i-to-ane matching. The new data fix contains all the variables from all the input data sets. The number of observations in the new information set is the number of observations in the smallest original information set. If the data sets contain common variables, the values that are read in from the concluding data set supersede the values that were read in from earlier data sets.

Syntax

Apply this form of the Set statement for one-to-1 reading:

where

data-fix-1: specifies a one-level name, a two-level proper noun, or one of the special SAS data set names. information-set up-one is the first file that the Data step reads.
information-set-2: specifies a i-level name, a two-level name, or one of the special SAS information set up names. data-set-2 is the second file that the DATA step reads.

Caution:: Use care when you lot combine data sets with multiple Fix statements. Using multiple Set statements to combine observations tin produce undesirable results. Test your plan on representative samples of the information sets before using this method to combine them.

For a complete description of the Set up statement, see SAS Language Reference: Dictionary.

DATA Footstep Processing During a 1-to-One Reading

Compilation phase: SAS reads the descriptor data of each data set named in the SET statement and and so creates a plan data vector that contains all the variables from all data sets as well as variables created past the DATA footstep.
Execution -- Stride i: When SAS executes the beginning Set statement, SAS reads the first ascertainment from the outset data set into the programme data vector. The second Fix statement reads the offset ascertainment from the 2nd data gear up into the program data vector. If both data sets contain the same variables, the values from the second information set replace the values from the outset data set, fifty-fifty if the value is missing. After reading the start observation from the last data set and executing whatever other statements in the Information footstep, SAS writes the contents of the programme data vector to the new data set. The SET statement does not reset the values in the program data vector to missing, except for those variables that were created or assigned values during the DATA pace.
Execution -- Step 2: SAS continues reading from one information set and then the other until it detects an end-of-file indicator in one of the data sets. SAS stops processing with the last observation of the shortest information fix and does non read the remaining observations from the longer data ready.

Example i: One-to-I Reading: Processing an Equal Number of Observations

The SAS data sets ANIMAL and Plant both comprise the variable Common, and are bundled by the values of that variable. The post-obit shows the ANIMAL and the PLANT input data sets:

        ANIMAL                    PLANT  OBS  Common  Animal       OBS  Common  Plant   1     a     Ant           1     a     Apple  2     b     Bird          2     b     Assistant  iii     c     True cat           3     c     Coconut  four     d     Dog           iv     d     Dewberry  v     east     Eagle         5     eastward     Eggplant  6     f     Frog          half-dozen     1000     Fig

The following program uses two SET statements to combine observations from Fauna and PLANT, and prints the results:

data twosets;    set brute;    set plant; run;  proc impress data=twosets;    title 'Data Gear up TWOSETS - Equal Number of Observations'; run;

Data Set up Created from 2 Data Sets That Take Equal Observations

            Data Set TWOSETS - Equal Number of Observations                1                        Obs    Mutual    Animal    Institute                         1       a       Ant       Apple                           ii       b       Bird      Assistant                          3       c       Cat       Coconut                         4       d       Dog       Dewberry                        5       e       Hawkeye     Eggplant                        6       yard       Frog      Fig

Each observation in the new data prepare contains all the variables from all the data sets. Note, however, that the Common variable value in observation half dozen contains a "thou." The value of Mutual in observation vi of the Creature information set was overwritten past the value in Constitute, which was the data set that SAS read final.

Comments and Comparisons

The results that are obtained past reading observations using two or more Fix statements are similar to those that are obtained by using the MERGE statement with no BY statement. However, with 1-to-one reading, SAS stops processing before all observations are read from all data sets if the number of observations in the data sets is not equal.
Using multiple Gear up statements with other Data pace statements makes the following applications possible:
- merging one observation with many
- conditionally merging observations
- reading from the same data set twice.

Definition

Ane-to-one merging combines observations from ii or more SAS data sets into a single ascertainment in a new information ready. To perform a one-to-ane merge, utilise the MERGE statement without a By statement. SAS combines the kickoff observation from all data sets in the MERGE statement into the starting time observation in the new data gear up, the second ascertainment from all data sets into the second ascertainment in the new information ready, so on. In a one-to-one merge, the number of observations in the new information set equals the number of observations in the largest data set up that was named in the MERGE statement.

If yous employ the MERGENOBY= SAS system option, you tin control whether SAS bug a message when MERGE processing occurs without an associated BY statement.

Syntax

Use this course of the MERGE statement to merge SAS data sets:

where

information-ready: names at least ii existing SAS data sets.

Circumspection:: Avoid using indistinguishable values or dissimilar values of common variables. One-to-one merging with data sets that comprise duplicate values of common variables can produce undesirable results. If a variable exists in more than than one data set, the value from the last data set that is read is the ane that is written to the new data set. The variables are combined exactly equally they are read from each information set. Using a one-to-i merge to combine information sets with different values of common variables can also produce undesirable results. If a variable exists in more than than one data set, the value from the last data set read is the one that is written to the new data ready fifty-fifty if the value is missing. Once SAS has processed all observations in a data set, all subsequent observations in the new data prepare have missing values for the variables that are unique to that information set.

For a complete clarification of the MERGE statement, see SAS Language Reference: Dictionary.

DATA Step Processing During One-to-One Merging

Compilation stage: SAS reads the descriptor information of each data set up that is named in the MERGE argument then creates a program information vector that contains all the variables from all data sets also as variables created by the DATA step.
Execution -- Stride i: SAS reads the commencement observation from each data ready into the program data vector, reading the data sets in the order in which they announced in the MERGE statement. If two data sets contain the same variables, the values from the second information ready supercede the values from the first data set. After reading the first observation from the terminal data fix and executing whatever other statements in the Information step, SAS writes the contents of the programme information vector to the new data gear up. Only those variables that are created or assigned values during the Data step are set to missing.
Execution -- Step two: SAS continues until it has read all observations from all information sets.

Example 1: 1-to-One Merging with an Equal Number of Observations

The SAS data sets Brute and Constitute both contain the variable Common, and the observations are arranged by the values of Common. The following shows the Creature and the PLANT input data sets:

        Brute                    Institute  OBS  Mutual  Creature       OBS   Common  Plant   1     a     Ant           1      a     Apple tree  two     b     Bird          2      b     Banana  3     c     True cat           3      c     Coconut  four     d     Domestic dog           4      d     Dewberry  5     eastward     Eagle         5      e     Eggplant   six     f     Frog          six      g     Fig

The post-obit program merges these data sets and prints the results:

data combined;     merge animal plant; run;  proc print data=combined;     championship 'Information Set COMBINED'; run;

Merged Data Sets That Have an Equal Number of Observations

            Data Gear up COMBINED                               i                        Obs    Common    Animal    Plant                         1       a       Pismire       Apple                           2       b       Bird      Banana                          3       c       True cat       Coconut                         four       d       Domestic dog       Dewberry                        5       e       Hawkeye     Eggplant                        6       g       Frog      Fig

Each observation in the new data set contains all variables from all data sets. If two data sets contain the same variables, the values from the second data set supersede the values from the first data set, as shown in observation half-dozen.

Example two: One-to-One Merging with an Unequal Number of Observations

The SAS information sets ANIMAL1 and PLANT1 both comprise the variable Mutual, and the observations are arranged past the values of Common. The PLANT1 information set has fewer observations than the ANIMAL1 data set. The post-obit shows the ANIMAL1 and the PLANT1 input data sets:

        ANIMAL1                    PLANT1  OBS  Mutual  Creature       OBS   Common  Plant   one     a     Emmet           one      a     Apple  2     b     Bird          two      b     Banana  3     c     Cat           3      c     Coconut  4     d     Domestic dog            5     e     Eagle           half-dozen     f     Frog

The following program merges these unequal data sets and prints the results:

information combined1;     merge animal1 plant1; run;  proc print data=combined1;     title 'Data Set COMBINED1'; run;

Merged Information Sets That Take an Diff Number of Observations

            Information Set up COMBINED1                              1                         Obs    Common    Fauna     Plant                          one       a       Ant       Apple                           2       b       Bird      Assistant                          3       c       Cat       Coconut                         four       d       Dog                                       5       e       Eagle                                     6       f       Frog

Note that observations iv through vi contain missing values for the variable Found.

Example 3: One-to-1 Merging with Duplicate Values of Common Variables

The following case shows the undesirable results that you can obtain by using ane-to-one merging with data sets that incorporate duplicate values of common variables. The value from the last data set that is read is the one that is written to the new information set. The variables are combined exactly as they are read from each data fix. In the following example, the data sets ANIMAL1 and PLANT1 contain the variable Mutual, and each data set contains observations with duplicate values of Mutual. The post-obit shows the ANIMAL1 and the PLANT1 input information sets:

        ANIMAL1                    PLANT1  OBS  Common  Creature       OBS   Common  Plant   1     a     Ant           1      a     Apple  2     a     Ape           ii      b     Assistant  three     b     Bird          3      c     Coconut  4     c     Cat           4      c     Celery   five     d     Canis familiaris           five      d     Dewberry   vi     eastward     Eagle         6      east     Eggplant

The post-obit program produces the information fix MERGE1 data set and prints the results:

        /* This program illustrates undesirable results. */ data merge1;    merge animal1 plant1; run;  proc print data=merge1;    championship 'Information Ready MERGE1'; run;

Undesirable Results with Duplicate Values of Common Variables

            Information Set up MERGE1                                one                        Obs    Common    Animal1    Plant1                         ane       a       Pismire       Apple                           ii       b       Ape       Assistant                          iii       c       Bird      Coconut                         4       c       Cat       Celery                          5       d       Domestic dog       Dewberry                        6       e       Hawkeye     Eggplant

The number of observations in the new information ready is 6. Notation that observations two and 3 contain undesirable values. SAS reads the 2nd ascertainment from information set ANIMAL1. Information technology then reads the second observation from information set PLANT1 and replaces the values for the variables Common and Plant1. The tertiary observation is created in the aforementioned way.

Example four: One-to-Ane Merging with Different Values of Common Variables

The following instance shows the undesirable results obtained from using the one-to-one merge to combine information sets with dissimilar values of mutual variables. If a variable exists in more than one data ready, the value from the last data set that is read is the 1 that is written to the new information set fifty-fifty if the value is missing. Once SAS processes all observations in a data set, all subsequent observations in the new information set take missing values for the variables that are unique to that data gear up. In this example, the data sets ANIMAL2 and PLANT2 take unlike values of the Mutual variable. The following shows the ANIMAL2 and the PLANT2 input data sets:

        ANIMAL2                    PLANT2  OBS  Common  Animal       OBS   Common  Plant   one     a     Ant           one      a     Apple  ii     c     True cat           2      b     Banana  three     d     Dog           3      c     Kokosnoot  four     e     Eagle         iv      e     Eggplant                             v      f     Fig

The following programme produces the data set MERGE2 and prints the results:

        /* This programme illustrates undesirable results. */ data merge2;    merge animal2 plant2; run;  proc print data=merge2;    title 'Data Gear up MERGE2'; run;

Undesirable Results with Unlike Values of Mutual Variables

            Data Set up MERGE2                                1                        Obs    Common    Animal2    Plant2                         1       a        Pismire       Apple                           ii       b        Cat       Banana                          3       c        Domestic dog       Kokosnoot                         4       e        Hawkeye     Eggplant                        5       f                  Fig

Comments and Comparisons

The results from a one-to-one merge are similar to the results obtained from using ii or more SET statements to combine observations. However, with the ane-to-one merge, SAS continues processing all observations in all information sets that were named in the MERGE statement.

Definition

Match-merging combines observations from ii or more SAS data sets into a single ascertainment in a new data prepare according to the values of a common variable. The number of observations in the new data set is the sum of the largest number of observations in each By grouping in all information sets. To perform a friction match-merge, employ the MERGE statement with a BY statement. Before you tin can perform a match-merge, all data sets must be sorted by the variables that y'all specify in the BY argument or they must have an index.

Syntax

Utilize this form of the MERGE statement to lucifer-merge data sets:

where

data-set up: names at to the lowest degree 2 existing SAS data sets from which observations are read.
variable: names each variable by which the data ready is sorted or indexed. These variables are referred to equally BY variables.

For a consummate description of the MERGE and the BY statements, see SAS Language Reference: Lexicon.

Data Footstep Processing During Match-Merging

Compilation stage: SAS reads the descriptor information of each data ready that is named in the MERGE statement and so creates a program information vector that contains all the variables from all information sets too as variables created by the DATA pace. SAS creates the Outset.variable and LAST.variable for each variable that is listed in the BY argument.
Execution - Step 1: SAS looks at the starting time BY group in each data set that is named in the MERGE argument to determine which BY group should appear starting time in the new data set. The Data footstep reads into the program data vector the first observation in that BY group from each data gear up, reading the data sets in the lodge in which they appear in the MERGE argument. If a information gear up does non have observations in that By group, the programme data vector contains missing values for the variables unique to that data set up.
Execution - Step 2: Later on processing the first observation from the concluding data set and executing other statements, SAS writes the contents of the program data vector to the new data set. SAS retains the values of all variables in the program data vector except those variables that were created past the Information step; SAS sets those values to missing. SAS continues to merge observations until it writes all observations from the first By group to the new data prepare. When SAS has read all observations in a Past group from all information sets, it sets all variables in the program data vector to missing. SAS looks at the next BY group in each information ready to determine which Past group should appear next in the new data set.
Execution - Footstep three: SAS repeats these steps until information technology reads all observations from all BY groups in all information sets.

Case 1: Combining Observations Based on a Criterion

The SAS information sets ANIMAL and Found each comprise the Past variable Mutual, and the observations are bundled in order of the values of the Past variable. The following shows the Animate being and the Institute input data sets:

        ANIMAL                    PLANT  OBS  Common  Animal        OBS  Common  Plant   i     a     Ant            ane     a     Apple  2     b     Bird           ii     b     Banana  3     c     Cat            3     c     Coconut  4     d     Dog            4     d     Dewberry  5     e     Hawkeye          5     e     Eggplant  6     f     Frog           6     f     Fig

The post-obit plan merges the information sets co-ordinate to the values of the Past variable Common, and prints the results:

data combined;     merge animal plant;     past Common; run;  proc impress information=combined;    championship 'Data Set up COMBINED'; run;

Data Sets Combined by Friction match-Merging

            Data Set COMBINED                               1                        Obs    Common    Fauna    Institute                         1       a       Emmet       Apple tree                           2       b       Bird      Banana                          3       c       Cat       Coconut                         4       d       Domestic dog       Dewberry                        5       e       Eagle     Eggplant                        6       f       Frog      Fig

Each observation in the new data gear up contains all the variables from all the data sets.

Example ii: Match-Merge with Duplicate Values of the Past Variable

When SAS reads the terminal observation from a BY group in one information gear up, SAS retains its values in the plan data vector for all variables that are unique to that data prepare until all observations for that By group have been read from all data sets. In the following example, the data sets ANIMAL1 and PLANT1 contain indistinguishable values of the By variable Mutual. The following shows the ANIMAL1 and the PLANT1 input information sets:

        ANIMAL1                      PLANT1  OBS  Common  Animal1         OBS  Mutual  Plant1   1     a      Ant             1     a     Apple tree  2     a      Ape             ii     b     Assistant  3     b      Bird            iii     c     Coconut  4     c      Cat             4     c     Celery  5     d      Dog             5     d     Dewberry  6     e      Hawkeye           6     e     Eggplant

The following program produces the merged data prepare MATCH1, and prints the results:

data match1;    merge animal1 plant1;    by Common; run;  proc impress data=match1;    title 'Data Set MATCH1'; run;

Match-Merged Data Ready with Duplicate Past Values

            Information Fix MATCH1                                1                        Obs    Common    Animal1    Plant1                         i       a        Ant       Apple                           ii       a        Ape       Apple                           3       b        Bird      Assistant                          4       c        Cat       Coconut                         5       c        Cat       Celery                          six       d        Dog       Dewberry                        7       due east        Hawkeye     Eggplant

In observation 2 of the output, the value of the variable Plant1 is retained until all observations in the BY group are written to the new data ready. Friction match-merging also produced duplicate values in ANIMAL1 for observations four and v.

Example 3: Match-Merge with Nonmatched Observations

When SAS performs a match-merge with nonmatched observations in the input data sets, SAS retains the values of all variables in the plan information vector even if the value is missing. The data sets ANIMAL2 and PLANT2 do not comprise all values of the By variable Common. The following shows the ANIMAL2 and the PLANT2 input data sets:

        ANIMAL2                    PLANT2  OBS  Common  Animal2       OBS  Common  Plant2   one     a      Pismire           1     a     Apple tree  two     c      Cat           2     b     Banana  3     d      Canis familiaris           3     c     Coconut  4     e      Eagle         iv     eastward     Eggplant                             5     f     Fig

The following program produces the merged information set MATCH2, and prints the results:

data match2;    merge animal2 plant2;    by Common; run;  proc print data=match2;    championship 'Information Set MATCH2'; run;

Friction match-Merged Information Set with Nonmatched Observations

            Data Set MATCH2                                one                        Obs    Common    Animal2    Plant2                         ane       a        Ant       Apple                           2       b                  Banana                          3       c        Cat       Coconut                         four       d        Dog                                       5       e        Eagle     Eggplant                        half dozen       f                  Fig

As the output shows, all values of the variable Mutual are represented in the new data set, including missing values for the variables that are in one information gear up simply not in the other.

Updating with the UPDATE and the MODIFY Statements

Definitions

Updating a data set up refers to the process of applying changes to a master data gear up. To update data sets, you work with two input information sets. The information set containing the original information is the master data set, and the information set containing the new information is the transaction data fix.

You can update information sets by using the UPDATE statement or the Modify statement:

UPDATE	uses observations from the transaction data set to change the values of respective observations from the chief data set. You lot must utilize a BY argument with the UPDATE argument considering all observations in the transaction data prepare are keyed to observations in the master data prepare according to the values of the BY variable.
Change	can replace, delete, and append observations in an existing data set. Using the Change argument can save disk space because it modifies data in place, without creating a copy of the information set.

The number of observations in the new information set is the sum of the number of observations in the master data set and the number of unmatched observations in the transaction data set.

For complete information about the UPDATE and the MODIFY statements, see "Statements" in SAS Language Reference: Dictionary.

Syntax of the UPDATE Statement

Employ this course of the UPDATE statement to update a master data gear up:

UPDATE master-data-set transaction-data-set;

where

principal-data-set up: names the SAS information set that is used as the primary file.
transaction-data-set: names the SAS data set up that contains the changes to be applied to the master data set.
variable-listing: specifies the variables past which observations are matched.

If the transaction data set contains duplicate values of the Past variable, SAS applies both transactions to the observation. The terminal values that are copied into the program information vector are written to the new data set. If your data is in this form, use the MODIFY argument instead of the UPDATE statement to process your data.

CAUTION:: Values of the Past variable must be unique for each observation in the master information set. If the master data prepare contains two observations with the aforementioned value of the By variable, the first observation is updated and the second observation is ignored. SAS writes a alert message to the log when the Data step executes.

For complete data about the UPDATE argument, run into SAS Language Reference: Dictionary.

Syntax of the Modify Statement

This form of the MODIFY statement is used in the examples that follow:

where

chief-data-set: specifies the SAS data ready that yous want to modify.
variable-list: names each variable by which the data prepare is ordered.

Notation: The Modify statement does not support changing the descriptor portion of a SAS data prepare, such equally adding a variable. [cautionend]

For complete data about the Alter statement, see SAS Linguistic communication Reference: Lexicon.

DATA Footstep Processing with the UPDATE Argument

Compilation stage

SAS reads the descriptor information of each data set that is named in the UPDATE statement and creates a programme information vector that contains all the variables from all data sets likewise equally variables created past the Data stride.
SAS creates the FIRST.variable and Last.variable for each variable that is listed in the BY statement.

Execution - Step one

SAS looks at the kickoff observation in each information gear up that is named in the UPDATE argument to determine which Past grouping should announced first. If the transaction BY value precedes the master BY value, SAS reads from the transaction information set only and sets the variables from the principal data gear up to missing. If the master BY value precedes the transaction By value, SAS reads from the master data set but and sets the unique variables from the transaction information set to missing. If the BY values in the master and transaction information sets are equal, it applies the starting time transaction by copying the nonmissing values into the programme information vector.

Execution - Stride 2

Afterwards completing the beginning transaction, SAS looks at the adjacent observation in the transaction data ready. If SAS finds one with the same Past value, it applies that transaction besides. The outset observation and then contains the new values from both transactions. If no other transactions exist for that observation, SAS writes the ascertainment to the new data set and sets the values in the program information vector to missing. SAS repeats these steps until it has read all observations from all BY groups in both data sets.

Updating with Nonmatched Observations, Missing Values, and New Variables

In the UPDATE statement, if an ascertainment in the chief data fix does not take a respective observation in the transaction data set, SAS writes the ascertainment to the new information gear up without modifying it. Whatsoever ascertainment from the transaction data set that does not represent to an ascertainment in the chief information prepare is written to the program information vector and becomes the basis for an observation in the new data set. The data in the program data vector tin be modified by other transactions before it is written to the new data set. If a main data set observation does non demand updating, the corresponding observation can be omitted from the transaction information set.

SAS does not supplant existing values in the master data gear up with missing values if those values are coded every bit periods (for numeric variables) or blanks (for character variables) in the transaction data gear up. To replace existing values with missing values, y'all must either create a transaction data set in which missing values are coded with the special missing value characters, or use the UPDATEMODE=NOMISSINGCHECK argument option.

With UPDATE, the transaction data set can comprise new variables to be added to all observations in the chief information set.

To view a sample program, encounter Example 3: Using UPDATE for Processing Nonmatched Observations, Missing Values, and New Variables.

Sort Requirements for the UPDATE Statement

If you lot do not use an index, both the chief information set and the transaction data set must exist sorted by the same variable or variables that you specify in the By statement that accompanies the UPDATE statement. The values of the Past variable should be unique for each observation in the master information gear up. If you apply more than one BY variable, the combination of values of all Past variables should exist unique for each ascertainment in the master data gear up. The Past variable or variables should be ones that yous never need to update.

Note: The Change argument does non crave sorted files. Nonetheless, sorting the data improves efficiency. [cautionend]

Using an Index with the MODIFY Statement

The MODIFY statement maintains the alphabetize. You do not have to rebuild the index like you practice for the UPDATE argument.

Choosing between UPDATE or Modify with BY

Using the UPDATE argument is comparable to using Change with By to apply transactions to a information set. While Modify is a more powerful tool with several other applications, UPDATE is all the same the tool of selection in some cases. The following table helps you choose whether to use UPDATE or Alter with BY.

***Change with BY versus UPDATE***
Issue	MODIFY with BY	UPDATE
Disk space	saves disk space because information technology updates data in place	requires more than disk space considering it produces an updated copy of the information fix
Sort and alphabetize	sorted input information sets are not required, although for proficient performance, it is strongly recommended that both information sets be sorted and that the master information ready be indexed	requires simply that both information sets exist sorted
When to use	use only when you look to process a SMALL portion of the data set	employ if you expect to demand to procedure most of the data set
Where to specify the modified data set up	specify the updated data set in both the DATA and the Modify statements	specify the updated information prepare in the DATA and the UPDATE statements
Duplicate By-values	allows duplicate BY-values in both the master and the transaction data sets	allows indistinguishable Past-values in the transaction data fix only (If duplicates be in the principal data set, SAS issues a alarm.)
Telescopic of changes	cannot change the data prepare descriptor information, and so changes such every bit adding or deleting variables, variable labels, and then on, are not valid	tin make changes that require a modify in the descriptor portion of a information gear up, such as adding new variables, and so on
Error checking	has fault-checking capabilities using the _IORC_ automatic variable and the SYSRC autocall macro	needs no mistake checking because transactions without a corresponding chief record are not applied but are added to the data set
Information set integrity	data may merely be partially updated due to an abnormal task termination	no information loss occurs considering UPDATE works on a re-create of the information

For more information about tools for combining SAS information sets, see Statements or Procedures for Combining SAS Data Sets.

Primary Uses of the Alter Statement

The Alter statement has three principal uses:

modifying observations in a unmarried SAS data fix.
modifying observations in a single SAS data prepare straight, either by observation number or by values in an alphabetize.
modifying observations in a principal data fix, based on values in a transaction information set. Modify with BY is similar to using the UPDATE statement.

Several of the examples that follow demonstrate these uses.

Case 1: Using UPDATE for Basic Updating

In this example, the information set MASTER contains original values of the variables Animate being and Plant. The data gear up NEWPLANT is a transaction information set with new values of the variable Plant. The following shows the MASTER and the NEWPLANT input data sets:

        Chief                           NEWPLANT  OBS Mutual Animal Found           OBS Common Plant   1    a    Ant    Apple            1    a    Apricot  2    b    Bird   Banana           2    b    Barley  3    c    True cat    Kokosnoot          three    c    Cactus  4    d    Canis familiaris    Dewberry         4    d    Date  5    e    Eagle  Eggplant         5    eastward    Escarole  6    f    Frog   Fig              6    f    Fennel

The following program updates MASTER with the transactions in the data set NEWPLANT, writes the results to UPDATE_FILE, and prints the results:

data update_file;    update chief newplant;    by common; run;  proc print data=update_file;    title 'Data Set Update_File'; run;

Primary Data Gear up Updated by Transaction Data Set

            Data Set up Update_File                             1                        Obs    Mutual    Animal    Plant                         1       a       Ant       Apricot                         2       b       Bird      Barley                          three       c       Cat       Cactus                          4       d       Domestic dog       Date                            v       eastward       Eagle     Escarole                        6       f       Frog      Fennel

Each observation in the new data set up contains a new value for the variable Plant.

Example 2: Using UPDATE with Duplicate Values of the By Variable

If the main information fix contains ii observations with the same value of the BY variable, the offset ascertainment is updated and the second ascertainment is ignored. SAS writes a warning message to the log. If the transaction data set contains indistinguishable values of the Past variable, SAS applies both transactions to the observation. The last values copied into the programme data vector are written to the new data ready. The following shows the MASTER1 and the DUPPLANT input data sets.

        MASTER1                           DUPPLANT  OBS Common Animal1 Plant1           OBS Common Plant1   1    a    Ant     Apple            1    a    Apricot  2    b    Bird    Banana           2    b    Barley  3    b    Bird    Banana           3    c    Cactus  4    c    True cat     Coconut          4    d    Date  five    d    Dog     Dewberry         5    d    Dill  6    east    Eagle   Eggplant         vi    e    Escarole  vii    f    Frog    Fig              7    f    Fennel

The following program applies the transactions in DUPPLANT to MASTER1 and prints the results:

information update1;    update master1 dupplant;    by Common; run;  proc impress data=update1;    title 'Data Set Update1'; run;

Updating Data Sets with Indistinguishable BY Values

            Data Set Update1                               1                        Obs    Common    Animal1    Plant1                         i       a        Emmet       Apricot                         2       b        Bird      Barley                          3       b        Bird      Banana                          4       c        Cat       Cactus                          v       d        Dog       Dill                            six       e        Eagle     Escarole                        vii       f        Frog      Fennel

When this Data step executes, SAS generates a warning message stating that in that location is more than one observation for a Past group. However, the Information step continues to process, and the data fix UPDATE1 is created.

The resulting data set has 7 observations. Observations ii and 3 have duplicate values of the By variable Common. However, the value of the variable PLANT1 was not updated in the second occurrence of the duplicate BY value.

Instance 3: Using UPDATE for Processing Nonmatched Observations, Missing Values, and New Variables

In this example, the information ready MASTER2 is a chief data set. Information technology contains a missing value for the variable Plant2 in the beginning observation, and not all of the values of the BY variable Common are included. The transaction data set NONPLANT contains a new variable Mineral, a new value of the By variable Common, and missing values for several observations. The following shows the MASTER2 and the NONPLANT input data sets:

        MASTER2                             NONPLANT   OBS  Common  Animal2  Plant2        OBS  Mutual  Plant2   Mineral      i     a     Ant                     1     a     Apricot  Amethyst  2     c     Cat      Kokosnoot        2     b     Barley   Beryl  iii     d     Domestic dog      Dewberry       3     c     Cactus   4     e     Hawkeye    Eggplant       four     e  5     f     Frog     Fig            five     f     Fennel                                      half dozen     g     Grape    Garnet

The post-obit program updates the information set MASTER2 and prints the results:

data update2_file;    update master2 nonplant;    by Common; run;  proc print information=update2_file;    championship 'Information Set Update2_File'; run;

Results of Updating with New Variables, Nonmatched Observations, and Missing Values

            Data Set Update2_File                             1                  Obs    Mutual    Animal2    Plant2      Mineral                   1       a        Ant       Apricot     Amethyst                  ii       b                  Barley      Beryl                     3       c        Cat       Cactus                                4       d        Dog       Dewberry                              5       e        Eagle     Eggplant                              6       f        Frog      Fennel                                7       m                  Grape       Garnet

As shown, all observations at present include values for the variable Mineral. The value of Mineral is gear up to missing for some observations. Observations ii and 6 in the transaction data ready did non accept corresponding observations in MASTER2, and they have become new observations. Ascertainment 3 from the master information set was written to the new data set without modify, and the value for Plant2 in ascertainment 4 was not inverse to missing. 3 observations in the new data set take updated values for the variable Plant2.

The following program uses the UPDATEMODE statement option on the UPDATE statement, and prints the results:

information update2_file;        update master2 nonplant updatemode=nomissingcheck;           by Common;  run;    proc print data=update2_file;           title 'Data Ready Update2_File - UPDATEMODE Pick';  run;

Results of Updating with the UPDATEMODE Option

            Data Gear up Update2_File - UPDATEMODE Option                   1                  Obs    Common    Animal2    Plant2      Mineral                   ane       a        Ant       Apricot     Amethyst                  2       b                  Barley      Beryl                     3       c        True cat       Cactus                                four       d        Domestic dog       Dewberry                              5       due east        Hawkeye                                           six       f        Frog      Fennel                                vii       g                  Grape       Garnet

The value of Plant2 in observation 5 is set to missing considering the UPDATEMODE=NOMISSINGCHECK option is in effect.

For detailed examples for updating information sets, see Combining and Modifying SAS Data Sets: Examples.

Case iv: Updating a Principal Information Fix by Adding an Observation

If the transaction data set contains an ascertainment that does not match an observation in the primary data gear up, yous must alter the program. The Year value in ascertainment v of TRANSACTION has no friction match in Principal. The post-obit shows the Chief and the TRANSACTION input data sets:

        MASTER                       TRANSACTION    OBS  Year   VarX   VarY       OBS  Twelvemonth   VarX   VarY      one   1985    x1     y1         1   1991    x2    2   1986    x1     y1         2   1992    x2      y2    three   1987    x1     y1         3   1993    x2    4   1988    x1     y1         4   1993            y2    five   1989    x1     y1         five   1995    x2      y2  6   1990    x1     y1    7   1991    x1     y1    eight   1992    x1     y1    9   1993    x1     y1  ten   1994    x1     y1

You must apply an explicit OUTPUT statement to write a new observation to a master data set. (The default action for a Information step using a MODIFY argument is Supercede, not OUTPUT.) Once you lot specify an explicit OUTPUT argument, you must too specify a Supercede statement. The following DATA step updates information set up Primary, based on values in TRANSACTION, and adds a new observation. This program also uses the _IORC_ automatic variable for mistake checking. (For more information about error checking, see Error Checking When Using Indexes to Randomly Access or Update Information.

data master;    alter principal transaction;    by Year;    if _iorc_=%sysrc(_sok) then replace;    else if _iorc_=%sysrc(_dsenmr) so       do;          output;          _error_=0;       end;    else       practice;          put "Unexpected fault at Observation: " _n_;          _error_=0;          stop;       finish; run;  proc impress data=master;    title 'Updated Master Data Ready -- MODIFY';    title2 'One Ascertainment Added'; run;

Modified Main Information Set

            Updated Master Data Set -- MODIFY                       1                              Ane Observation Added                            Obs    Year    VarX    VarY                              1    1985     x1      y1                              2    1986     x1      y1                              iii    1987     x1      y1                              4    1988     x1      y1                              five    1989     x1      y1                              6    1990     x1      y1                              7    1991     x2      y1                              8    1992     x2      y2                              9    1993     x2      y2                             ten    1994     x1      y1                             xi    1995     x2      y2

SAS added a new observation, ascertainment xi, to the Main data set and updated observations 7, eight, and 9.