So You Want to Get Into Data Science
Congratulations! Data Science is a career that’s hottest, hardest, most challenging, most rewarding, and full of top-notch minds. Your journey is bound to be full of fun, challenges, enlightenments, and achievements (big or small). New papers are published daily or even hourly. New techniques and experiments are developed regularly. New ways of thinking become the new norm. And what seems magical before, are proven feasible.
But You Don’t Know Where to Start
But getting into Data Science is not easy. Far from it. The learning curve is brutal. There is so much to learn: Linear Algebra, Calculus, Statistics, Python, SQL, Machine Learning, Algorithm, Optimization, Data Wrangling, Data Visualization, Software Engineering, DevOps, … The list goes on and on.
Some people may have some background in math or statistics, which will definitely help. Yet you still need a solid foundation for software engineering to be efficient and be successful in your career. But this is not a problem, you say. After all, we live in an era of booming online education. There are plenty of courses paid and free we can choose.** True, but this is precisely where the problem is**. The biggest challenge for self-education these days is not lack of education resources, but hard to find the best or most relevant ones.
Enter CS50. If You are Only Allowed to Take One CS Course, Take CS50.
What is CS50? It is the introductory course on computer science taught at Harvard University by Professor David J. Malan. It is the largest class at Harvard with 800 students, 102 staff, and a professional production team. It offers both an on-campus and an online course. I’ve taken the online one, but it’s already **THE **best computer science course I came across, period. Let me tell you why:
- The learning curve is so well designed, and it’s like watching a great suspense movie
The CS50 staff has the capability of knowing precisely what you do and do not know before each lecture (in that they have zero expert blindness). So the speech will not mention anything you are not familiar with. It smoothly guides you through key concepts of computer science and makes it seem obvious. It raises questions from time to time and later addresses them with a more in-depth explanation of the concepts. You’ll have plenty of ‘a-ha’ moments, and it almost felt like watching a suspense movie.
- Covers core and essential fundamentals of computer science, and leave plenty of room for you to dig deeper
The course covers most of the critical computer science elements: C, Python, Data Structures, Algorithms, Software Engineering, Resource Management, Web Development, etc. It delves down deep enough so you can understand all the essential concepts while also know where to look if you want to dig deeper.
- Orchestrate variety of ways to teach you challenging/boring concept, never felt boring
CS50 has many ways to teach and keep you engaged. You’ll play a game to understand different sorting algorithms, receive a rubber duck to experience the famous Rubber Duck Debugging, watch experiments of ‘array of lights 🚥’ to learn data structure, even eat a delicious breakfast 🍞 while exploring the idea of pseudo-code. (One of my favorites is where David J. Malan uses a Yellowpage phonebook to explain binary search and tears down half of the book and throw it away. A definitive moment in CS50 indeed. )
- Interactive, fun and engaging, the time just flew by, and you’ll be amazed what you can do once the course is over
The learning experience is so fun you’ll feel the time fly by without noticing it. Some of the problem set it gives are quite challenging, yet not impossible. And you’ll feel so proud of yourself once you cracked it. You’ll probably fall in love with the joy of problem-solving. If you are stuck, there is an online community on almost every social network platform (Twitter, Reddit, Stack Exchange, Facebook, etc.) where you can get help.
- Out-of-class activities get you familiar with the ‘developer culture,’ which is essential for your future career.
Puzzle days, office hours, CS 50 Fairs, the final project ‘All-nighter’ hackathon (free breakfast at IHOP if you stay up all night), lots of activities designed to get you familiar with the ‘developer culture’ and better prepare you for the software engineering world.
- State-of-the-art course software to get you started
How great is a computer science course if they don’t use the software tools they developed themselves? Over the years, CS50’s staff has developed a series of tools/software to help the students write code, submit homework, check their code quality/syntax, tidy up code styles, and even generate color-coded code documentation in PDF form! These are all neat and useful ‘training-wheels’ as David J. Malan puts it and will help you get up to speed.
But, please don’t just take my words for it, see what YouTube CEO Susan D. Wojcicki said about her experience:
And It Is Great for Data Science Too
Being a great course, CS50 is also very **relevant **to Data Science. It helps you lay a solid foundation of software engineering for your future career:
It teaches you C. More importantly, through C, you understand the fundamentals of computer like how memory works, what is a pointer, data structures, etc.
If you can write C, then you can quickly learn to write in C++. C++ is the de facto low-level, high-performance language used for data science libraries like Numpy, Pandas, Sk-Learn, etc.
It teaches Python, which is the primary high-level language for Machine Learning and Data Science.
It teaches SQL, which is the most widely used language in Data Science.
It also teaches web programming, useful when you try to deploy your model to production.
So essentially nothing taught in the course is not somewhat useful to you, and the foundation it helps you build will go a long way.
CS50 and Beyond!
Once you finished the course, you’ll be more knowledgable and confident to continue your Data Science journey, and I’ll point you to a couple of possible directions from here:
Jeremy Howard’s Fast.ai course to Start a ‘Top-down’ Approach for ML
Fast.ai is fantastic and unique. It enables you to build state-of-the-art deep learning models within the first lesson with less than ten lines of code. Then it delves down deeper and deeper on the how and why. The only prerequisite is one-year of coding experience, which CS50 would have already prepared you with.
Andrew Ng’s Machine Learning Course at Coursera
Another great Machine Learning course, but a ‘Bottom-up’ style. It smoothly explains the math fundamentals first and gradually builds up the knowledge to piece together complicated machine learning models from scratch. I have an article that explains the difference between Andrew Ng and Jeremy Howard’s different approaches to machine learning education and recommend a potentially efficient way to learn.
Corey Schafer’s YouTube channel, Python and OOP Tutorials
As good as it is, CS50 only covers the generic and basic concepts of Python. You’ll need more in-depth knowledge to code efficiently for your data science projects. For this, I recommend Corey Schafer’s YouTube channel. He is one of the best Python educators I came across to explain complicated ideas in a crystal clear way. Not one second of his videos is wasted. The content is concise, to the point, and highly condensed. He has playlists for basic Python, SQL, Matplotlib, Git, and Object-Oriented Programming.
Learning Data Science is never a breeze, and I hope this article will help a little in alleviating the pain and make your journey a bit more efficient and fun. If you know other courses and resources that are also great, please feel free to leave a response so others can also see. Thanks!
Any feedback or constructive criticism is welcomed. You can either find me on Twitter @lymenlee or my blog site wayofnumbers.com.