What is SAS ?
Founded
in 1976, SAS ("Statistical Analysis Software") has recorded double
digit growth since it's inception, and is the largest privately owned
software company in the world. It is installed in 96 of the FTSE 100
companies, and recognised as the market leader for business
intelligence and analysis software.
SAS was originally created (by
Dr Jim Goodnight) to provide a programming environment for
statisticians. It's development followed two key paths ; the first was
to create the market leading resource for statistical and data
analysis, the second to provide a flexible, and powerful multi-platform
programming language and development environment. The result is one of
the most highly structured, flexible programming languages available,
facilitating very rapid application develoment, coupled with a point
and click development and analysis environment which is comfortable for
non programmers.
History and Product development
Consistently
re-investing 25 % of it's revenue in R & D, SAS has followed many
key research paths bolstering it's key product ("The SAS System") each
time :
- Starting with
statistical analysis, SAS became the leader in statistical analysis
software and was used extensively by pharmaceutical products
for clinical trials analysis, and banking institutions for data
analysis.
- SAS then improves it's
graphics capabilities to provide a simple programming interface to
graphics and presentation tools, plus extended time series analysis.
- The product's user
interface was improved with the launch of the Full-Screen Product range
which enabled interactive browsing and editting of data with one
programming statement, while most other programming environments were
relying on DOS, file manager, and the mainframe TSO language
statements. Two years later (1983), spreadsheet capabilities were
introduced allowing direct spreadsheet style processing to SAS based
data (Microsft Excel was first launched two years later)
- A Rapid application
development environment was introduced (SAS/AF), more platforms (mini
computers and micro's) were supported, and PC-mainframe linking
introduced.
- SAS becomes almost exclusively, the programming language used behind mainframe performance analysis and capacity planning.
- An industry specific
product is launched in SAS/PH Clinical, securing SAS' position as the
tool used for clinical research, whilst SAS/LAB and SAS/CALC are
introduced providing guided data analysis for statisticians. This marks
a point where SAS is becoming more and more open to non-programmers,
with a point and click user interface with most commands supported by
menu options.
- The next year sees the
launch of SAS version 6, with massive programming improvements, the
creation of SAS/Frame (equivalent to the Windows Structured Developers
Kit), ODBC (Open database connectivity) support, SAS/ACCESS
(enabling direct programming access of almost all databases (IMS, DB2,
etc..)) and SAS/Connect (connecting SAS programs throughout all majorly
used platforms).
Point and click support also moves into applications development in the form of SAS/EIS ("Executive Information Systems").
- SAS moves into data
warehousing, linking it's database capabilities with it's statistical
facilities so that concrete relationships can be found between data,
"transforming the information into knowledge" as SAS puts it. This
solution also has a point and click development environment, and
soon wins the "Datamation" product of the year award for three years in
a row. At the same time, SAS launches strategic partners with Price
Waterhouse, and KPMG, securing it's software as a preferred development
environment.
- Enterprise Resource
Planning ("ERP") software is introduced. More R & D is pumped into
data analysis focussing on providing business intelligence through a
point and click environment.
So SAS has almost itself in terms of the industries and technologies it
supports to maintain a healthy growth, but it has always maintained
support for the products it developed for those industries (and they
are the most wealthy and profitable industries) securing it's high
revenue base.
What is the SAS Language ?
The SAS programming language
is split into two parts ; the Data step, and the Procedure step. The
Data step is concerned with reading data, and manipulating it into the
format you want. The Procedure step facilitates reporting (text, charts
and HTML), data summarisation, and analysis.
The language is also supported by SAS macros, which allow program
statements to be re-used with aspects of the statements changed to suit
different data. For example, program statements can be written to
manipulate data in a certain way, and output the data to a particular
file. Enclosing the statements within a macro allows you to very simply
specify a different input and output file without repeating the same
program statements.
Once data is read in, it is held in SAS datasets which also hold the
name, description, type, length, output format, and input format of
every variable in the dataset. When the dataset is reused, all of those
attributes are already known to the program statements (for example if
the data is output using a procedure step, it's output format (e.g. a
date or currency) will automtaically be adopted according to it's
definition), and if the dataset is merged with another dataset, all of
those attributes are inherited by the new one.
The instructions within a data step are assumed to be performed against
every record in the data set, removing the need to add loop controls to
read new records.
Strengths
- Rapid Application Design.
The programming language behind SAS is both simple and powerful. With
the data step, records and variables can be manipulated very flexibly,
with complete control enabled to the programmer. Procedure steps on the
other hand, can be used very simply (with almost no other control
information required), or very openly (with complete control for
formatting and structure).
For example, a data step can be used to read in data from another
source, subset it to the requried level, create new variables, and
merge it with data from another source. A procedure step can then sort
the data into the required order in just two statements (add one word
to remove duplicates) , another procedure can then create a HTML table
of the data in one statement (and one more to create an index). The
programmer is relinquished of the need to create their own routines for
doing this.
The Function Point Score for SAS is very high when compared to other programming languages such as C and COBOL, and even Visual Basic.
For example in SAS I can write x=a+b; and in COBOL I could write "ADD B
TO A GIVING C", but then first I'd have to set up an Environment
division to define what computer I'm working on, then and
Identification division to define the length, and type of X, A, and B.
I'd also have to link and compile the source code program which would
give me a seperate load module, etc...
Another example is that in SAS I can say :
Proc Means Data=xyz ;
Run;
To tell me the Variable name, number of records, Mean, Standard Deviation, Minimum and Maximum. If I add ;
Class x;
In the middle, it will generate the same statistics for every group of records with the same value of class.
If I add ;
Output Out=Red;
It will output the data to a new SAS dataset called Red.
If I did the same thing in say, Visual Basic for Applications (VBA), in
Excel, carrying out the analysis against a range of cells, I could
write something like ;
Public Sub ABC()
Dim Num, Mean, StdDev, Min, Max As Double
Dim i As Integer
Dim xyz, def As Range
Set xyz = Range("A1:E20")
For i = 0 To 4
Set def = xyz.Offset(0, i).Resize(20, 1)
Mean = WorksheetFunction.Average(def)
StdDev = WorksheetFunction.StDev(def)
Min = WorksheetFunction.Min(def)
Max = WorksheetFunction.Max(def)
MsgBox "For Column " & i & vbCr & "Mean = " & Mean
& vbCr & "StdDev = " & StdDev & vbCr _
& "Min = " & Min & vbCr & "Max = " & Max
Next i
End Sub
But if I wanted a subtotals for each class of x, I'd need a lot more code.
- Pre built products.
In addition to the hundreds of pre-built procedure steps,
several pre-built products are provided such as the full-screen Browse
and Edit procedures which enable data entry and browsing of data sets
with one statement - the programmer doesn't need to worry about control
of which record is being looked at (or even if it is being
editted by another user for that matter), position and format of
records on the screen (although these can be editted if necessary),
etc...
A big positive of these pre-built products is that they can be referred
to and used in applications without worrying about any new fields being
added, or existing ones being deleted. As the data in the application
expands, the procedure automatically caters for this. Additionally,
controls are embedded for searching for records, limiting the data
being veiewed to particular categories, and moving to the required
record.
Cost Benefit : Development, Maintenance and Testing are all free !
- Multi-platform, database and application support.
Code developed on one platform can easily be transported to another.
Plus a program working on one platform, can communicate and control
another. SAS can read and write to almost ANY source of data. SAS
supports object oriented programming, and can reference any
standard windows object (e.g. Excel) and control it (e.g. it can create
and populate an Excel spreadsheet).
- Liberal syntax and easy to read.
Statements are written in English with few codes required, and the layout is simple and easy to read. See above for examples.
- Powerful development environment.
The development environment in which programs are created, tested, and
(optionally) ran has the look and feel of a windows environment, with
windows for the program edittor, run log, output window, dataset and
external data windows showing libraries assigned, and their formats. It
also provides powerful tools for browsing results files, and their
attributes, listing text files, etc ...
- Support for 4GL programming.
Within SAS it becomes very easy to write program statements which will
write program statements and then execute them. For example, I can hold
a list of filenames in a dataset, then run through that list of
filenames, conditionally generating program statements to perform
operations on each one. This generates the possibility of truly
ubiquitous source code.
Weaknesses
- Cost
SAS is an iterpretive language as opposed to a compiled language, which
means that rather than create a set of machine instructions which can
be ran on it's own, the program statements are compiled into a set of
machine instructions each time the program is ran. This is because some
of the pre-built tools and products aren't compiled but are treated
more like independent applications.
This means that a copy of SAS must be available to run the programs,
and this software is licensed so a fee is payable annually to use it.
This price must be offset against the reduced cost of development and
maintenance, which is significant. Also, the cost of licenses reduces
the more copies of the license a company has.
- Locked out access to SAS datasets. While data is in a SAS dataset, it cannot be read by any other language or system.