Monday, October 29, 2012

Here we go again

Gretchen Rubin ponders the arrival of Hurricane Sandy. Meanwhile this constant flow of tweets and updates is just making me more anxious than it probably would if I just ignored it. After all, we can only sit at home and wait for the inevitable power outage that will come.

From the radar, Sandy looks impressive. (Picture from CWG.)

Saturday, October 27, 2012

The state of apple

Finally broke down and got an Apple computer. In order to receive updates I had to create an Apple ID. Okay. But … in order to do this I had to give them a credit card number???

Okay … I guess this is what Apple has come to - besides being a pussy and and a wussy it is trying to extract as much money as it can by coming out with incremental improvements in products that are about as imaginative as innovations in shaving blades.

I remember when it used to be single blades, and then double, and then wow! triple! and then … and so forth.

The iPad and iPhone was an innovative interface just as the shaving blade was an innovation at the time. The incremental improvements that have come from Apple is just as predictable - bigger brighter screen, smaller version - what next? A set of wet wipes attached?

Friday, October 19, 2012

When we censor ourselves what does it say about us?

I was looking forward to getting Enid Blyton’s Famous Five series for the kids. I had enjoyed them when I was in the pre-teen/early teen years. Here are some comments I found when I looked them up on Amazon:

By Titan
This book is not Enid Blyton's original text but an edited version. It is an outrage that this is not made more clear. If I want to read Enid Blyton, I want to read the original words, not a doctored 'modern' version. Apparently the original (first folio) version is still available so you might want to look for that instead of buying this one. I would give the original version 5 stars, incidentally. I loved the Famous Five books as a child.

By Tatyana A Privalova
Chapter one, this edition: She wants a good talking to.
Chapter one, older editions: She wants a good spanking.

Enough said.

By emma_e_brown
Like the last reviewer I am thoroughly disgusted that the publishers have "updated" the text. I was looking forward to a big nostalgia trip, but no lashings of ginger beer here by jove! This is hardly replacing a racially offensive toy with naughty teddies and I see no reason whatsoever for butchering these classic stories. What next - The Lion, The With, and the Ipod?!! Surely contemporary children are just as capable of appreciating a period story as we were?

These reviewers gave the book one star because of the edits. I’m not sure I’d go that far. I had noticed this ‘politically correct edits’ when the kids were younger. In Curious George Goes Fishing, the current edition has George seeing a large man go by. In an older edition of Curious George Flies a Kite (which we got as a hand me down) from which Goes Fishing was excerpted, it was a fat man.

What is the message that we are trying to send? That we don’t want to mislabel someone? Or is it that we don’t want the kids to think badly of us -- that we used to call people fat, that we used to spank? When kids grow up as they all inevitably do, they’ll know what fat and spank are - so what did we achieve by withholding these ‘truths’ from them?

Wednesday, October 17, 2012

Data scientist(?)

One of the apparently ‘hot’ jobs these days is the data scientist. So hot in fact that the Harvard Business Review has named it the sexiest job of the 21st century. I came across Rachel Schutt’s Data Science class at Columbia (HT: Andrew Gelman) and she has a description of what a data scientists does (or should do):

What is a Data Scientist?
Let me start with academia because that’s quicker. Then industry.
In Academia: No one calls themselves a Data Scientist yet in universities. There are 60 students in my class from across disciplines. I thought when I proposed the course it would be statisticians, applied mathematicians and computer scientists who showed up. Actually it’s them plus sociologists, journalists, political scientists, biomedical informatics students, students from NYC government agencies and non-profits related to social welfare, someone from the architecture school, environmental engineering, pure mathematicians, business marketing students, and students who already work as data scientists. Am I missing someone? They’re all interested in figuring out ways to solve important problems, often of social value, with data.

For the term Data Science to catch on in academia at the level of the faculty, the research area needs to be more formally defined. I see a rich set of problems that could be many PhD theses. My current working definition is a Data Scientist in this setting is a Scientist (from social scientists to biologists) who work with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness and nature of the data, while simultaneously solving a real world problem. Across academic disciplines, the computational and deep data problems are the same. So if researchers across departments join forces, they can solve multiple real-world problems from different domains.

In Industry:
It depends on the level of seniority and whether you’re talking about the internet industry in particular. The role of data scientist need not be exclusive to the tech world, but that’s where the term originated so for the purposes of the conversation, let me say what it means there:

A Chief Data Scientist should be setting the data strategy of the company which involves a variety of things: setting everything up from the engineering and infrastructure for collecting data and logging, to privacy concerns; deciding what data will be user-facing, how data is going to be used to make decisions, and how it’s going to be built back into the product. She should manage a team of engineers, scientists and analysts and she should communicate with leadership across the company including the CEO, CTO and product leadership. She’ll also be concerned with patenting innovative solutions, and setting research goals.

More generally, a data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. She spends a lot of time in the process of collecting, cleaning and munging data, because data is never clean. This process requires persistence, statistics and software engineering skills– skills that are also  necessary for understanding biases in the data, and for debugging logging. Once she gets the data into shape, a crucial part is exploratory data analysis which combines visualization and data sense. She’ll find patterns, build models and algorithms, some with the intention of understanding product usage and the overall health of the product, and others serve as prototypes that ultimately get baked back into the product. She may design experiments, and is a critical part of data-driven decision making. She’ll communicate with team members, engineers, and leadership in clear language and using data visualizations so that even if her colleagues are not immersed in the data themselves, they will understand the implications.

Looking at the syllabus it sure sounds a lot like data mining. I guess being a scientist beats being a miner.

Friday, October 5, 2012

I’d like to share something with you

But you have to pay for it.
Am I the only one who thinks that this is ironic?

P.S. By the time I read MR's post on the book, the price had risen from $3.99 to $4.99.

Thursday, October 4, 2012

Grades, quizzes, homework and Coursera

As I sit through some of the quizzes, homeworks and tests at Coursera I am reminded about why these things suck so bad. It sucks even though there is very little at stake (for me - except probably pride!) and undermines all incentive to learn. I'll bet that this wasn't what on-line educators had in mind when they launched.

There are several ways that assignments and quizzes are graded and these are applied in different combinations listed below:

  1. Multiple attempts are allowed, with hard deadlines and soft deadlines with penalty after the hard deadline has passed
  2. Only one attempt is allowed
  3. Feedback on a question-by-question basis explaining what was right and what was wrong
  4. No feedback at all - just a total score - not even an accounting of which question is right or wrong.
  5. Time limits are imposed (or not)

So (1) could be combined with (4), (2) with (3) and so forth. These are just from the courses that I’ve been in so peer grading is excluded and since there are no writing assignments I can’t comment on those. At this point, I'll just say that #4 really sucks! #3 is great although when it is combined with #1, the student already has the answer since the questions do not change. So in a way this sucks too.

The question that really needs to be asked is whether grades really reveal anything and reasonably minded people will have different opinions on this. What does an A reveal? What if the whole class got As? Does it make a difference? As the discussion in the embedded link points out, grades reveal as much as they can if there is a point of reference. By itself, my sense is that it is meaningless.

Again, Khan Academy is moving ahead of the curve on this front. The two relevant concepts are achievement of proficiency (either full, average or below average would be a starting delineation though Khan isn’t doing this just yet) and adaptive testing. In other words, achievement of proficiency via adaptive testing. Students learn through quizzes, tests, and exams. If they get it wrong an algorithm defaults to an easier question and steps the student back up to the harder question - either presented again or in a different form or even both.

If there is anything to be taken away from this approach is the following: grades do not penalize learning. As it stands now, grades are more of a penalty than a reward or an incentive. If there are two relevant measures that can be used to gauge a student it is these: persistence and length of time to achieve some proficiency level. These two measure reveal more than the grade itself. A student may have worked hard for an A while another one breezes through the class. A potential employer looking at an A cannot tell if the first person is a hard worker or someone just doing well because they are smart but with no work ethic.

Coursera can implement this type of approach to learning. It does encroach into the testing field where companies like ETS and Pearson may start feeling the heat and it isn’t clear if this is something that Coursera will want to get into (depending on the VCs). Unfortunately, Coursera is far away from implementing this approach although I believe the technology and the architecture are already in place. 

The  same problem that plagues teaching at the college level in the bricks and mortar setting carry over into the online setting. Adaptive testing needs a really large test bank and professors really loathe to do this since research is what they are more interested in. Perhaps it can be assigned to real teachers as opposed to professors and by this I mean the teaching assistants and those not under the publish or perish system. It could even be open or crowd sourced. And perhaps more importantly it forces teachers to think about proficiency in certain concepts - not just the course itself. In order for adaptive testing to work well the concept has the well defined so that the software can step back and present another relevant question. It is easier at Khan Academy since they focus on basic skills but much much harder at the college level.

It would make learning a little bit more fun and take away the feeling that getting a question wrong on a quiz is a penalty.

Tuesday, October 2, 2012

Are software companies pussies and wussies

Or more precisely, are Apple, Samsung, etc behaving like a bunch of above? This follows from the earlier post but the recent salvo from Samsung seems to indicate that things seem to be getting out of hand:

Samsung has added Apple's latest handset to a US patent lawsuit claiming the iPhone 5 infringes eight of its technologies.

HTC, Motorola, Microsoft, RIM and other tech firms are also involved in ongoing US lawsuits.
Legal experts have expressed concern at some of the tactics being used, including Judge Richard Posner who threw out a case involving Motorola and Apple in June, rebuking both firms.

He has now followed this up with a blog post in which he calls for an overhaul of the law regarding software patents.

"Nowadays most software innovation is incremental, created by teams of software engineers at modest cost, and also ephemeral - most software innovations are quickly superseded," he wrote.

"Software innovation tends to be piecemeal - not entire devices, but components, so that a software device (a cellphone, a tablet, a laptop, etc) may have tens of thousands of separate components (bits of software code or bits of hardware), each one arguably patentable.

He has now followed this up with a blog post in which he calls for an overhaul of the law regarding software patents.

"Nowadays most software innovation is incremental, created by teams of software engineers at modest cost, and also ephemeral - most software innovations are quickly superseded," he wrote.

"Software innovation tends to be piecemeal - not entire devices, but components, so that a software device (a cellphone, a tablet, a laptop, etc) may have tens of thousands of separate components (bits of software code or bits of hardware), each one arguably patentable.

Does it matter whom you plagiarise from?

According to Bob Dylan, it does:

Journalist Mikal Gilmore asked Dylan what he thinks of the "controversy" over quotations in his songs, stemming from the works of other writers, including Japanese author Junichi Saga and poet Henry Timrod.
"Oh, yeah, in folk and jazz, quotation is a rich and enriching tradition," he responded. "That certainly is true. It's true for everybody, but me. There are different rules for me. And as far as Henry Timrod is concerned, have you even heard of him? Who's been reading him lately? And who's pushed him to the forefront?... And if you think it's so easy to quote him and it can help your work, do it yourself and see how far you can get. Wussies and pussies complain about that stuff. It's an old thing - it's part of the tradition. It goes way back."


Meanwhile, Dylan, himself has been caught up in lawsuit involving use of his name. In 1994, he filed a trademark infringement lawsuit against Apple, asking for a court order to keep the computer giant from using his name. There are also reports that Dylan reached an out-of-court settlement in 1995 with Hootie & the Blowfish over the band's hit song "Only Wanna Be With You." Dylan reportedly claimed frontman Darius Rucker borrowed some of his lyrics in the track.

It sounds as though it’s okay for Bob Dylan to plagiarise from a nobody but it’s not okay to copy from Bob Dylan.

Here’s some old news:

In interviews promoting Amused to Death, Roger Waters, formerly of Pink Floyd, claimed that Lloyd Webber had plagiarised short chromatic riffs from the 1971 song "Echoes" for sections of The Phantom of the Opera, released in 1986; nevertheless, he decided not to file a lawsuit regarding the matter.[25] The songwriter Ray Repp made a similar claim about the same song, but insisted that Lloyd Webber stole the idea from him. Unlike Roger Waters, Ray Repp did decide to file a lawsuit, but the court eventually ruled in Lloyd Webber's favour.[26]

Coursera thoughts

Enrolled in a few courses and the method of presentations differ:
  1. Professor stands in front of a screen and projects slides with some notes made using a tablet PC which then gets thrown onto the screen
  2. Khan academy type approach with “real time” writing/slides with highlights as the professor goes through the points
  3. Slides with a background voice and occasional face time.

I’m a little undecided whether this is the way of the future. Method 1 is the most boring. It reminds me of being in class and doesn’t really leverage technology to the fullest. Perhaps its because this is the method most professors are used to. Similarly for method 3.

The Khan academy approach is the most engaging though probably the hardest to produce especially if the slides cover a difficult topic. I tend to drift off less in this approach.

The major problem I have with Coursera is when watching the videos I can’t ‘flip’ through the slides without rewinding the video. I almost need two computer screens or print out the slides. The split screen doesn’t work too well on my monitor.

The discussion forums are - well - discussion forums. There is a lot of noise to filter through though I’ve found it interesting and useful to look through them. I would like to participate more but time constraints preclude that.

The reason I am signed up for so many courses is that I am afraid the course won’t be offered again. Why not automatically re-offer the course? I am also worried that all the course materials will go off the server once the course is completed. Coursera isn’t very clear on the policies regarding course material - except to say it varies by professor. The main problem seems to be people downloading the videos and reposting them on YouTube and I can see why they wouldn’t want that. In which case they should just make the material available.

For instance I can’t really see what a course is like unless I enroll and if the material doesn’t look too interesting the I un-enroll. But what if a course is over and I would like to look at the materials? I presume I would have to do the same but would the material be available? I probably wouldn’t get a certificate but that isn’t my main concern.

Which brings me to another point: I was surprised to see the intensity in getting that certificate. Is there really a value to the certificate in a labor market? I’m in an academic bubble so I don’t really know. But because of the desire to get a certificate it is not unexpected to find that there are concerns about cheating and plagiarism.

In short, it is unclear what Coursera is really about - is it a substitute for college classes? I don’t think so. If anything it might be used to leverage some classes especially intro level classes. For instance, if I were a professor I might want to license with Coursera to have some courses made available - not because I’m too lazy to teach - but I can use it to double down on the intensity of the class.

One of my professors used to have us do the reading before class. The incentive was that the beginning of every class was a quiz that counted toward the grade on that reading. The quiz served as feedback as to what material was confusing or unclear - something that he wanted to know early on. I would substitute the background reading with Coursera videos.

One course that I am in is Scott Page’s Model Thinking. This is turning out to be a general overview/survey type course. Instead of doing Netlogo demos in class I would use Scott’s videos as pre-class assignment and spend class time actually doing Netlogo programming.

Is Coursera going anywhere? Yes and no. I am intensely grateful for these opportunities to learn something new and Coursera is the way to go. The quizzes and exams force me to go down a road and the certificate is an incentive, I’ll admit that. After all MIT’s OCW has been around for years and every year I say I’ll try it out but never did. Do I think that it will replace college level courses? Not entirely. It may cut down time to finish if some the courses can be substitute for intro level classes.