AWS Machine Learning Specialty exam

Vijay Kamath
5 min readMar 29, 2021

Hi, I recently passed the AWS Machine Learning — Specialty exam. I thought I would share some of my experiences to help others prepare for this exam, which I would categorize as “challenging but totally do-able if you are well prepared.”

This blog has three parts. First, I will cover why someone might want to take the AWS Machine Learning — Specialty exam. Then I will share some of the online materials that I found useful to prepare for the exam. And finally, my cheat sheet/observations I used to push me through.

Why did I take this exam?

The primary reason was to stay up to speed with new developments in tech and ML/AI. These disciplines move extremely fast and are in the process of revolutionizing our world with some really interesting capabilities enabled in our day-to-day life. For example, I used to wonder (not anymore) how Netflix/Spotify/Amazon recommends movies/shows/music.books, how Tesla’s Autopilot (self-driving) car capabilities work. Like it or not — Bottom line — the ML/AI revolution is happening already, and it is only going to accelerate and become further entrenched in our day-to-day lives. This means staying up to speed with this ever-moving and accelerating high-speed train

The secondary aspect was to better understand what machine learning techniques to use and when to use them. The ML/AI toolbox is ever-growing. Whilst many SaaS offerings push their “auto ML” capabilities to be used out of the box, I am always fascinated to look “under-the-hood” and understand what is happening in the lifecycle of a Machine Learning project, especially when evaluating what data features to use/extract/suppress/encode, what algorithms to fit, what hyperparameters to tune when to re-train or refit a model already in production, and so-on.

To set a bit of context about my background experience, I would categorize myself more as a web architect with lots of experience in transactional SQL and cloud engineering. I started dabbling in the machine learning space as a hobby. you certainly do not need masters or PhD degree to pass this exam, nor will you need to go really deep into math.

Online courses

Below are some useful online resources. These courses worked well for me with my background and experience. So feel free to use these as a starting point while exploring other online content to find that works for you.

I brushed my basic ML and data science skillset with ML & Data science Bootcamp course from Udemy before starting to prepare for the exam. This course helped me to

· Dust off my python coding knowledge

· Understand the fundamentals of ML with greater clarity

· Start with ML coding and testing outside AWS (using google collab) to get a broader perspective

My main two courses were the Cloud guru ML Course and the Udemy course from Stephan Marek/Frank Kane. These courses are mainly tuned for the speciality exam and stick faithfully to the AWS script while imparting

  • The domains of knowledge for the AWS Certified Machine Learning Speciality exam
  • Best practices for using the tools and platforms of AWS for data engineering, data analysis, machine learning modelling, model evaluation and deployment
  • Hands-on labs designed to challenge your intuition, creativity and knowledge of the AWS platform

After learning — I wrote a lot of practice exams. Cloud guru do give good practice exams and did a few in Udemy (Abhishek Singh, Analytics Ustad)

The machine learning examinations available in Jon bonso portal were extremely helpful

I also did Whizlabs practice tests but found some of the questions too advanced

Lastly, word of caution — this exam is not just an advanced level theoretical exam, it’s also quite applied, covering the latest modelling techniques and cloud services. It's always good to get some hands-on experience under your belt with various labs. While some of the topics are AWS specific, the learnings are universally applicable in many industries that support the entire lifecycle of a machine learning project.

Few pointers I picked up along the way

1. Know your distributions

This is really the basic knowledge of statistics. You will get tested on them, and you should be able to answer these quickly. The question will explain a scenario and ask you which distribution would best describe it.

With a very thin background on statistics, these are questions that I found quite tricky. I had to start from basics but for data Scientists or people with statistical background — this should be a breeze

2. The Amazon Kinesis Family

I was comfortable with this one as I had studied for pro certification and have used it on multiple projects. The kinesis family of services, together with AWS Glue, will make up the majority of the data engineering domain (20%) of the ML test. Simple cheatsheet for all:

· Only Kinesis Data Firehose can load streaming data into S3; it can also provide data compression (for S3), as well as data conversions to Parquet/ORC.

· While Kinesis Data Firehose cannot provide data transformation (such as CSV->JSON), it can do so via AWS Lambda.

· Kinesis data Analytics is mainly used for real-time analytics via SQL; there are two custom AWS ML SQL functions: RANDOM_CUT_FOREST (for detecting anomaly) and HOTSPOTS (for identifying dense regions). If the question says “real-time analytics” — it's usually the option with kinesis data analytics

· Kinesis Analytics will use IAM permissions to access streaming sources and destinations.

· There is also S3 analytics, which is not to be confused with Kinesis Analytics. S3 Analytics is used for storage class analysis

3. AWS Glue

AWS Glue is an ETL tool. It is so cool that it automatically crawls on S3 files and generates a schema. When we have a large amount of data hosted on S3, it was very convenient to query Glue generated Athena tables for data exploration. It also comes with an AWS custom algorithm FindMatches ML, which identifies potential duplicated records, even though the two records may not match exactly.

The glue will set up elastic network interfaces to enable the job to connect to other resources securely. Glue can run Spark jobs, and it supports both Python 2.7 and Python 3.6.

4. Security

This can get all-consuming in AWS — there is a speciality certification just around this but for ML — at a minimum, you will need to know security works with AWS S3; also, security around using SageMaker, how to protect your data to and from SageMaker.

5. Amazon SageMaker

Amazon SageMaker, being the flagship AWS fully managed ML service, will be tested heavily in the ML exam. Other than the security that I mentioned above, some of the things you will need to know include:

· Understand all built-in algorithms from SageMaker; you will be tested on these! Also, understand which algorithm can be speeded up by multi-core, multi-instance, or GPU.

· SageMaker can only get its data from S3, and there’s pipe mode or file mode; usually, pipe mode speeds up things when the data is very large.

· Hyperparameter tuning in SageMaker, understand what the options are, and understand how automatic model tuning works with SageMaker

I hope this article serves you well in your preparation for the Machine Learning Specialty certification. I hope you continue on with your Machine Learning journey to better yourself personally and professionally.

Best of luck to you all!

--

--