From time to time I am asked: how does one become a data scientist? What courses are necessary? How long will it take? How did you become a DS? I have answered this question several times, so it seems to me that writing a post could be a good idea to help the aspiring data scientists.
I got a masters degree at MSU Faculty of Economics (Russia, Moscow) and worked for ~4 years as an analyst/consultant in ERP-system implementation sphere. …
What is mentoring? How can I find a mentor? Questions like these are often asked on forums, on social media, and in communities. Mentorship is supposed to be an exciting situation when an experienced person shares insights and guides a mentee on a path to success pro bono. Some people suggest finding a mentor at work; others say that you can reach out to a potential mentor anywhere.
But how do things proceed from the mentor’s point of view? Maybe one day, a person just decides that it is time to mentor and starts helping people? …
Data Science is often advertised as one of the best jobs in the world. Many companies are talking about AI. There are hundreds of different MOOCs for thousands of people who want to work as data scientists. Recruiters poach people, enticing them with better salaries and benefits. New SOTA approaches regularly appear and break records. You can hear about cool applications like Alpha Zero or GPT-3 from many sources. People constantly use virtual assistants and other technical wonders.
But… is everything really so great for those who really work in the field? Well, not exactly.
In fact, many data scientists…
Be like gradient descent — learn from the errors!
Several weeks ago one more Kaggle Competition has ended — Bengali.AI Handwritten Grapheme Classification.
Bengali is the 5th most spoken language in the world. This challenge hoped to improve on approaches to Bengali recognition. Its alphabet has 49 letters and 18 diacritics, which means there are a lot of possible graphemes (the smallest units in a written language).
In this competition we are supposed to predict classes for three separate parts of these graphemes — grapheme root, vowel diacritics, and consonant diacritics.
More than 2 thousands teams took part in the…
Taking part in kaggle competitions is a serious challenge. You need to spend a lot of time and efforts, study new things and try many tricks to get a high score. And often this isn’t enough because there are a lot of great people, who have more experience, more free time, more hardware or some other advantages (maybe they even have all the advantages).
Previously I was able to get only silver medals in competitions. Sometimes it was thanks to luck (after shake up), sometimes it was due to a lot of work. …
A new journey is beginning…
The end of this journey
As I wrote in the first post of this series, I was able to study DL Nanodegree thanks to successfully completing PyTorch Scholarship Challenge. I have already gone through several DL courses, so I could compare them to this Nanodegree. If we talk about theory, then I knew most of the things which were in the lessons, excluding GANs and model deployment. Still, I learned some new things and got a lot of practice. This helped me to…
A sixth part of the Nanodegree: Deploying on SageMaker
Deploying a model
In this lesson, we learn about model deployment. Most of this part of Nanodegree is about deploying models on Amazon’s SageMaker. I think that this is too specific and there are enough tutorials on doing it, so I focus on information about deployment itself.
Cloud computing can simply be thought of as transforming an Information Technology (IT) product into a service. With our vacation photos…
Nowadays there are a lot of pre-trained nets for NLP which are SOTA and beat all benchmarks: BERT, XLNet, RoBERTa, ERNIE… They are successfully applied to various datasets even when there is little data available.
At the end of July (23.07.2019–28.07.2019) there was a small online hackathon on Analytics Vidhya where they offered the participants to make a sentimental analysis on drugs’ reviews. It was complicated due to several reasons:
A fifth part of the Nanodegree: GAN
Generative Adversarial Networks
In this lesson we learn about various types of GANs and how to implement them. Also, we’ll work on a fourth project — generating faces.
The first lesson on GANs is lead by Ian Goodfellow, who invented GANs! It was very exciting to see him!
A fourth part of the Nanodegree: CNN
Recurrent Neural Networks
In this lesson we learn about recurrent neural nets, try word2vec, write attention and do many other things. Also, we’ll work on a third project — generating TV scripts.
In this lesson, we go through the basics of RNN — Recurrent Neural Nets. There are many applications of this type of neural nets and one of them is generating sequences. It could be a sequence of text or time series. …