From: basugwm@basug.org on behalf of basugwm [basugwm@basug.org] Sent: Thursday, February 26, 2004 1:22 PM Subject: BASUG March 2004 Quarterly Meeting The Boston Area SAS(r) Users Group Quarterly Meeting March 23, 2004 TOPIC: Programming techniques related to data mining. Many programming techniques that programmers use week in and week out come in handy for data mining. These presentations delve into some of those familiar programming techniques, and how they might be used for mining data. WHEN: Tuesday, March 23, 2004, 8:30AM to 12:00PM WHERE: Holiday Inn - Newton (Directions are included below) 399 Grove Street Newton, MA 02462 617-969-5300 INDIVIDUAL, ON-LINE REGISTRATION REQUIRED. NO EMAIL! To register, visit: http://www.basug.org/register.php3 CONTACT: If you have questions about the meeting contact: Brian Saper: briansaper@earthlink.net Please Note the following: This meeting is appropriate for all levels of SAS. Experience with statistics or data mining is not required. AGENDA: 8:30 - 8:45 - Sign in and Coffee Break 8:45 - 9:00 - Meeting Announcements and Introductions 9:00 - 9:50 - "SAS/Access to External Databases: Wisdom for the Warehouse User", by Judy Loren 9:50 - 10:05 - Break and Refreshments 10:05 - 10:50 - "Match-Making Tools for Data Mining: What works Faster, Simpler and When?", by Om Kundu 10:50 - 11:05 - Break and Refreshments 11:05 - 11:25 - "Using the Power of SAS Macro Language to Manage Variables", by Judy Loren 11:25 - 11:50 - "After a Hard Day at the Data Mines", by Bob Virgile ABSTRACTS AND BIOGRAPHIES: "SAS Access to External Databases: Wisdom for the Warehouse User", by Judy Loren, Health Dialog Data Service With SAS/Access, SAS users can read from and write to almost any data- base product: DB2, Oracle, Informix, Sybase, MS SQL Server, or Teradata just to name a few. ODBC opens up even more warehouse doors. SAS/Access offers several ways to connect: procs,such as Import and Export; the libname statement option that treats database tables like SAS datasets; and pass-through, which passes user-written SQL directly to the external product to execute and return the results to SAS. The fun starts when the warehouse tables are too large or too complex to allow the easy approach. This tutorial reviews all the techniques briefly, then focuses on the situations that call for advanced expertise. Examples demonstrate using SAS with DB2, Oracle, and Micro- soft Access. Important details like how to refer to missing values in various databases, and how to use macros in pass-through SQL, make the tutorial concrete and useful. Ever had to select a small number of records from a huge external table based on a set of key values in a SAS dataset? In case you missed it, here's some shortcut code. This tutorial is for SAS users who need access to a non-SAS data warehouse, particularly if that warehouse is really large or complex. Knowledge of SQL is not necessary, but it will help you follow the examples. Biography: Judy is a Senior Analyst at Health Dialog Data Service, the analytical arm of Health Dialog, which partners with insurance companies to coach members with chronic conditions or impending health care decisions. She uses SAS to warehouse claims information and evaluate the effectiveness of interventions. An avowed SAS bigot, Judy is proud to follow in her grandmother's footsteps--the one who converted the State of Maine pay- roll to a computer-based system back when programming meant hard-wiring boards. " Match-Making Tools for Data Mining: What works Faster, Simpler, Better and When?", by Om Kundu Merging data from disparate data-sources coming from multiple data-providers becomes an arduous task because of inherent differences in notation of identical entities. The situation is further compounded by inevitable data-entry errors when populating a database. As a result, implementing SQL Joins or DATA Merges based on exact key matches fail to pick-up a significant number of observations which should have been merged. It becomes imperative to implement Joins that are not based on exact Key matches but rather alphanumeric character pattern recognition in Keys of a given table that match with the Key patterns in a second dataset. However, such Joins often cannot be implemented because PROC SQL is not optimized for Index or Substring-based matches. This talk will explore SAS techniques and algorithms which will allow Users to exploit Data-Mining tools to circumvent software limitations for non-matching keys. We will also cover cases where generating Macro variables with cascaded PROC SQL queries may be efficient alternatives to invoking CALL SYMPUT in DATA steps. The presentation will also enable users to become more cognizant in making judgment calls of when and where it makes more computational sense to implement PROC SQL vis-à-vis DATA steps. Biography: Om Kundu is a Senior Consultant at Deloitte & Touche in the Economic Consulting-FAS practice. His programming, statistical and strategy consulting experience spans several verticals encompassing networking/telecom, financial securities, retail and biotech. His prior experience has included analyzing the competitive landscape for Internetworking and Wireless technologies in Intellectual Property and Litigation consulting engagements. Om has developed Data-Mining algorithms related to online content-personalization with collaborative filtering and rules-based inference engines utilizing clustering algorithms and expert systems. His previous R&D experience includes efficient algorithmic design in quantum computing and genetic algorithms. He has also served as a technology strategy analyst in designing enterprise Data-Warehousing and CRM architectures deploying secure Internetworking front-end seamlessly integrated with back-end RDBMS, object-oriented technologies, and R/MOLAP. Mr. Kundu holds B.S. degrees in Mathematical and Computational Sciences, with a Physics minor, from Stanford University. "Using the Power of SAS Macro Language to Manage Variables", by Judy Loren When you set out to mine data for gems of information, you never know what data you may encounter. How well you prepare data for your mining techniques can set you up for success. This short presentation will walk through some code that solves a problem that can occur when converting data from an external database to a SAS dataset: the lengths and/or formats and informats of text variables come across as 255. This code can be generalized to handle other kinds of dataset management. "After a Hard Day at the Data Mines", by Bob Virgile After the data mining is completed, the final stage is often an optimization problem. How can you apply the knowledge derived from mining to a real world situation? (In fact, some definitions of data mining include this optimization stage as part of the process.) This presentation examines one approach to such an application. Note that it focuses on the issues and approaches that can be taken, not the SAS code itself. Biography: Bob Virgile is an independent trainer and consultant with over 20 years of experience designing and teaching SAS classes. He formerly composed the problem-solving contests for SUGI and NESUG, and has written two books for SAS Institute. Currently, he is barred from participating in the NESUG SAS Bowl due to superior SAS knowledge. DIRECTIONS DRIVING: FROM BOSTON: Take I-90 West to Exit 15, then take Route 128 South (I-95)1/4 mile to Exit 22. When you exit, stay right and bear right at the fork onto Grove Street. Hotel is on your left. FROM SOUTH OF BOSTON: Take Route 128 North (I-95) to Exit 22. When you exit, stay right and bear right at the fork onto Grove Street. Hotel is on your left. FROM WESTERN MASS: Take I-90 East to Exit 14, then take Route 128 South (I-95) 1/4 mile to Exit 22. When you exit, stay right and bear right at the fork onto Grove Street. Hotel is on your left. FROM NEW HAMPSHIRE: Take I-93 South to Route 128 South (I-95), follow for approximately 15 miles to Exit 22. When you exit, stay right and bear right at the fork onto Grove Street. Hotel is on your left. FROM RHODE ISLAND: Take I-95 North to Route 128 North (I-95). Follow to Exit 22. When you exit, stay right and bear right at the fork onto Grove Street. Hotel is on your left. PUBLIC TRANSPORTATION: The hotel is adjacent to the Riverside T Station.From Kenmore Square take the Green Line - D (Riverside) to the Riverside stop. Hotel is adjacent to the T Station. The hotel is also accessible from downtown Boston via Express Bus #500 (EXPRESS BUS Riverside - Downtown Via Mass. Turnpike.). See http://www.mbta.com/schedmaps/bus/index.cfm for detailed bus route and schedule. Bus drops off at Riverside T Station. Hotel is adjacent to the T Station. INDIVIDUAL, ON-LINE REGISTRATION REQUIRED. NO EMAIL! To Register visit: http://www.basug.org/register.php3 CONTACT: If you have questions about the meeting contact: Brian Saper: briansaper@earthlink.net BASUG CONTACTS: BASUG's Mail Address: BASUG PO Box 253 Boston, MA 02117 To email our Webmaster: basugwm@basug.org SUBSCRIBE TO OUR EMAIL LIST: Subscribers receive automatic e-mail notification of upcoming meetings, courses, and conferences of interest to local SAS users. To subscribe to the BASUG message list, send a message to majordomo@basug.org, and in the BODY of the message specify: subscribe basugnews To unsubscribe to the BASUG message list, send a message to majordomo@basug.org, and in the BODY of the message specify: unsubscribe basugnews