More efficient sorting.
Aside from cutting down the
amount of variables to process (using DROP / KEEP), and the amount of
data (using WHERE), the NOEQUALS option can improve sort speeds. This
option tells SAS that it is not necessary to maintain the order of data
as it currently appears in a data step.
E.g. for the data ;
DistAcc
|
Prodcode
|
CL001234
|
A12345
|
CL001235
|
A12346
|
CL001235
|
A12345
|
CL001234
|
A12346
|
When
sorting by DistACC SAS would attempt to maintain the order that the
product codes currently reside in, but if this is not important using
the NOEQUALS option will speed up processing.
For very large files, the SAS option SORTANOM=512 may speed up processing.
Also, if
sorting large files is becoming problematic, consider splitting the
data into chunks then join the sorted files using an append procedure.
e.g.
Proc sort data=bigfile out=lib2.sorted noequals;
by distacc;
fails due to a lack of sort space, yet the maximum is allocated. Splitting the file will take longer, but ultimately work ...
Proc sort data=bigfile(where=(distacc<:'N')) Out=lib2.sorted NoEquals;
By Distacc;
Proc sort data=bigfile(where=(distacc>=:'N')) Out=Part2 NoEquals;
By Distacc;
Proc append base=lib2.sorted new=part2;
run;
The first sort handles all
accounts starting in the first half of the alphabet, the second handles
the next half. The append procedure adds the two together and the
result is a fully sorted file. Using append is more efficient than
using a set statement, because the first dataset does not need to be
read observation by observation before adding the second.