Add-Innovation Home

About Add-Innovation

Contact Us

  SAS Institute logo.    More efficient sorting.


Aside from cutting down the amount of variables to process (using DROP / KEEP), and the amount of data (using WHERE), the NOEQUALS option can improve sort speeds. This option tells SAS that it is not necessary to maintain the order of data as it currently appears in a data step.

E.g. for the data ;

DistAcc

Prodcode

CL001234

A12345

CL001235

A12346

CL001235

A12345

CL001234

A12346

 When sorting by DistACC SAS would attempt to maintain the order that the product codes currently reside in, but if this is not important using the NOEQUALS option will speed up processing.

For very large files, the SAS option SORTANOM=512 may speed up processing.

Also, if sorting large files is becoming problematic, consider splitting the data into chunks then join the sorted files using an append procedure.

e.g.

Proc sort data=bigfile out=lib2.sorted noequals;
    by distacc;

fails due to a lack of sort space, yet the maximum is allocated. Splitting the file will take longer, but ultimately work ...

Proc sort data=bigfile(where=(distacc<:'N')) Out=lib2.sorted NoEquals;
    By Distacc;

Proc sort data=bigfile(where=(distacc>=:'N')) Out=Part2 NoEquals;
    By Distacc;

Proc append base=lib2.sorted new=part2;
run;

The first sort handles all accounts starting in the first half of the alphabet, the second handles the next half. The append procedure adds the two together and the result is a fully sorted file. Using append is more efficient than using a set statement, because the first dataset does not need to be read observation by observation before adding the second.