Sunday, February 9, 2014

Session 4 Updates

Hi all,

Session 4 got done y'day.

We covered the basics - the why, how and what after of text analysis as a prelude to doing the 'web extraction of text data' piece which, technically, was the centerpiece from the DC course POV.

1. In hindsight, I should've anticipated the issues that arose in trying to Live-run the R code on an untested machine, especially in Section A. My Research Assistant Ankit Anand usually does this stuff - package installation, dry testing of the code etc - before the session begins (in the MBA courses where I have covered this) and I got used to that. Ankit's not in town this week and the usual checklist simply escaped me. So, sorry about the hiccups in running the R code in class, basically.

I'm still working on a version of the code that you can run without such trouble. Pls ensure you have the latest version of Java loaded on your machines before you start.

---------------------------------------

2. I've received student queries about additional sources of material for study. Well, there are two ways about it. If you are presently working on a problem on R and encounter roadblocks, then the best thing is to simply google your query. Chances are sites like Stackoverflow will have answers for it. It usually works very well for me.

On the other hand, if you are looking for a structured way to start, then there are any number of books you could consider getting and starting. Below I list some which can help the rank beginner get started:

A beginner's guide to R from Computerworld, a video introduction to R here from Google and here is a full fledged book from the Springer publishers' stable on how to get started in R.

Better still is this list of links for books on R: Link for list of books and downloads for R. More advanced users, especially after you are introduced to supervised machine learning as part of the CBA program, may want to consider the following books (some of which are free downloads):

Machine Learning with R, by Brett Lantz. The link takes you to the table of contents which you can browse and also through a sample chapter.

This short document from MIT's open courseware on Machine learning is a useful reporsitary of the very basic datasets, algorithms and packages a beginner can use to get started on the machine learning part of R analytics.

---------------------------------------

3. Regarding text analytics in particular, here's a quick set of code that can get you started with the basic things we did with text analytics (in addition to the code I will send you).

In any case, you are advised to subscribe to the r-bloggers.com daily newsletter for quick daily overviews of what's new and hot on R. Here is a link and expert commentary on text mining in R from R-bloggers.com, for instance.

This is an example of Q&A at stackoverflow, which is among the pre-eminent sites for code level discussions on R and (other packages).

---------------------------------------

4. Whew! That's it from me for now. There'll be homework for this session - will involve you extracting, storing and processing web based text data, will also involve you processing text data from your class, processing it into semantic network analyses etc. But that's all for later.

See you in class soon.

Sudhir

No comments:

Post a Comment