© 2017 by Doran Bae 

Subscribe!
You are reading Coding Otter, stuffed with articles about big data, data science development, ML in production, managing data science projects, and many more. ​
About the host
I'm Doran Bae, Data Scientist @TVB turning data into products and stories. More about me.

(Repost) 5 Bite-Sized Data Science Summaries

I found the following Youtube channel and post very helpful to stay afloat in the ocean of information. All credit to the author Cassie Kozyrkov!




5 Bite-Sized Data Science Summaries

In the spirit of teamwork, the Next Rewind video series asked a bunch of people to pick up to five favorite talks from Google Cloud Next SF 2018 and discuss them on camera in no more than five minutes.

5 favorite talks. 5 video summaries. 5 minutes or less.

Did I go for it? You bet! Partly because I love sharing stuff that makes me happy with you all (surprise!) and partly because a year ago I was re-e-eally bad at speaking in front of a camera — I won’t ever conquer its heartless unblinking eye unless I force myself to practice. With a bit of luck I’ll learn to stop doing things like pointing to the left when I’m supposed to be pointing right…

5 Favorite Talks

Here are my 5 favorite talks from Next 2018 and the reasons I picked them. I got first dibs to choose out of over 300 talks, so these topics aren’t offcuts! They’re really the ones I think data science enthusiasts will appreciate most.

So, without further ado, here’s my list, arranged from least nitty gritty to most nitty gritty.

#1 Real businesses are already using AI for fun and profit!

If you’ve heard a sentiment like, “Sure, that’s some shiny math, but you can’t use it for anything practical.” Think again! That is so 2008. A lot has changed in the past decade. Allow me a brief history lesson digression that isn’t in the video.

AI spent over half a century being more hype than happening. You’d think that’s because the algorithms weren’t there, but deep learning (that thing you’re calling “AI”) is a child of the 60s.

Many of the algorithms were around since the 60s but in those days the tools weren’t great yet and processing power wasn’t abundant.

The real reason is that the tools weren’t great yet (the software equivalent of the prototype radio that can only be used by the grad students who built it… out of sticky tape and dreams; breathe near it and it falls apart) and processing power wasn’t abundant.

Before cloud technologies, you couldn’t build a prototype unless you built a data center first.

Cloud technologies change all that. Cloud providers share their hardware with whoever wants to give it a whirl, which means AI is a try-before-you-buy proposition in a way it couldn’t have been a decade ago.

Cloud providers also build tools with general-purpose consumption in mind and they’re much better than they used to be. That’s something I love about humankind: whenever someone invents a useful tool, other someones tend to step up and make it easy to use. Compared with the radios of the 1890s, today’s radios are much easier to set up and much more likely to survive a high-velocity flight towards the nearest wall.

Many people don’t realize that the story of today’s applied AI is actually a story about The Cloud.

Of course, all of this is useless without data and that’s another reason why AI is trending up, up, UP! The world’s collecting more data than ever before, so businesses now have the fuel to make AI tick. So what I’m saying is: AI’s real now, and that’s exciting!

That’s why I picked a use cases talk: people need to know this stuff isn’t sci-fi anymore. AI is here and it’s awesome!

But then I ran into the problem that Rajen’s original talk was such a dense barrage of use cases that summarizing them was going to be impossible (what a great problem to have!) and so I’m using my 3 minutes to pick out some great bits of advice from the original talk and cheerlead you all to go check out the full thing to soak in the sheer scale of the examples feast.

#2 What is machine learning and how do I eat it? (Without a PhD)

Hey, you know I’ll take any opportunity to highlight that research AI and applied AI are different disciplines… and if I can give a shout-out to the applied side, you can hardly expect me to pass up the chance, right?

We need more straight-talk about basics and application in language engineers and tech lovers can appreciate.

Lak’s original talk was all-in on skipping the standard linear algebra of the postdoctoral contingent in favor of straight-talk about basics and application in the language engineers and tech lovers can appreciate, so it already scored massive points with me. It also colors in the ideas with examples of real use cases, then it seals the deal by sprinkling in four excellent bits of advice and it’s those gems that I’ve chosen to use my 3 minutes expanding on:

Machine learning can be used to solve many problems for which you are writing rules today.Machine learning is how you personalize applications.Design systems with the expectation that you will have more data next year.Use a platform that lets you forget about infrastructure and offers great pre-built models.

#3 You can do machine learning in SQL now(!!)

If your massive beast of a database won’t come to your machine learning, bring machine learning to your database! BigQuery has just given you linear and logistic regression right in SQL. Now you don’t have to know the pain of exporting your database to shove it kicking and screaming into your TensorFlow setup!

If you’re an expert analyst, your currency is speed, but machine learning on massive datasets takes forever.

Why is this mindblowing? If you’re an expert analyst, your currency is speed. The faster you can see if there’s potential in a dataset, the more headpats you get. Alas, if you operate at giant scale, you’ve probably made your peace with spending approximately forever exporting data to try out even a basic machine learning model. No longer!

This. Is. Instant. Gratification.

Not only does is BigQuery ML accelerate analytics for those who’re operating at eye-popping scale, but in the spirit of overachieving it does so with bonus lovely stuff like ROC curves and feature distribution analysis. I use my 2 minutes to gush and give you a sneak peak into what it looks like when you rev it up. If you’re inspired, Naveed and Abhishek’s original talk has the full demo.

#4 Data scientists, you no longer need a black belt in infrastructure

This is a story about better tools that empower data scientists to do more of what they love and less of what feels like a chore. It’s also about broader empowerment: better tools democratize access to technologies that allow people to be architects of their own bright futures. I already waxed lyrical about this in another blog post (What do you call AI without the boring bits?), but getting chores out of the way so people can focus on being creative and doing the parts they love whips me into a frenzy of passion, so that’s why the Kubeflow talk makes my list of favorites.

Kubeflow is essentially a ski-lift for your mountain of machine learning chores!

Data scientists, you’d love to be able to bring scalable machine learning to a hybrid cloud environment, but look me in the eye and tell me you really want to spend precious modeling and analysis time on be learning Kubernetes and figuring out things like autoscaling based on job submission, optimized VMs, and data exfiltration prevention. No? Well, luckily you don’t have to.

Congratulations on waiting it out long enough to have it taken care of by Kubeflow, kind of like you don’t need to build your own computer anymore.

I use my 3 minutes to take you through the basics of machine learning composability, scalability, and portability, then show you glimpses of the gorgeous demos of what Kubeflow and Elastifile (data portability) can do from David’s original talk.

#5 TensorFlow is on a trajectory of increasing cuddliness

Okay, let’s be real: the Top 5 was going to have a TensorFlow spot, regardless. It’s just such a staple of the data science diet. The reason I’m delighted — as opposed to dutifully docile — in picking Laurence’s original talk is that it highlights awesome new features that make TensorFlow not only better than ever, but also more friendly. I was so excited about this one that I blogged about before I even made the video. If you don’t like to watch things, you can get the text in 9 things you should know about TensorFlow. Or get the summary of the summery here in the next paragraph. (Efficiency!)

TensorFlow is the industrial lathe of data science, designed for state-of-the-art AI on giant datasets.

If you work with giant datasets or if you’re after the state-of-the-art in AI, TensorFlow is probably on your radar. It’s the industrial lathe of data science and in its early days it seemed to take its user-friendliness advice from industrial lathes as well. If you ran away screaming, come baaaack! It’s much cuddlier now and has some fabulous new features.

I use my 2.5 minutes to zoom you through my favorite highlights, which include opportunities for self-expression that you’ll find more palatable if Python is your mother tongue, a host of other languages you can use it in if you’re not a Pythonista — including JavaScript (hello doing everything in the browser!). There are also improvements in data processing, model sharing, and support for machine learning on mobile and toaster alike.

The other 300+ talks were also great, but these five warmed a special place in my data science heart. Hope you enjoy them! (Check out the Next Rewind video series to see flash summaries of a great variety of tech topics.)