Interview on Data Science: what is expected of you

Data Science - area very promising . Over the past year, we in the ERA have received 210 resumes from people who want to engage in Data Science. Of these, 43 people were invited to a technical interview, but they offered seven jobs. If the demand is high, why so?
We talked to technical interviewers and found out: the problem of many candidates is that they do not represent well, what analysts do. Therefore, their knowledge and skills are not always relevant for work. Someone thinks that the experience with Big Data is enough to work in Data Science, someone is sure that there will be enough viewing of several courses in machine learning, some think that it is not necessary to understand algorithms well.
Dmitry Nikitko and Mikhail Kamalov - Data analysts and technical interviewers from ERAM - told us what they are waiting for at interviews from candidates, what questions are asked, what is valued in the resume and how to prepare for the interview.
Interview on Data Science: what is expected of you  
test , which recruiters send to candidates before the interview. The part where you need to choose the right option is checked automatically. Part of the detailed answers to the questions are read by technical interviewers.

What you need to know

Briefly, the data analyst is a person who can program (in most cases in Python), understands statistics, math, algorithms and speaks English.
English is needed not only to read specialized literature and deal with documentation. Many analysts directly communicate with foreign customers. By the way, the ability to translate from the date-sentient language to the one that is understandable to business is also useful here.

Is a profile education mandatory?

It is important to know mathematics well, and higher technical education is a big plus. Most date-sentientists in ERAM are mathematicians, programmers, or physicists. But this is not a strict requirement - we have a linguistic employee, and recently we also took a sociologist who, after graduation, processed the results of sociological research, created models, and forecasted and analyzed social graphs. This experience is relevant for working in Data Science, so the candidate was interesting to us.
In general, it can not be said that a person with a technical background is suitable for us, but with a humanitarian one, we do not. It all depends on your skills and experience. For example, a computer linguist who learned to write code is a more interesting candidate than the Big Data engineer who worked with MapReduce and Hadoop, but who is not versed in algorithms, or the holder of a scientific degree in statistics without experience.

What is appreciated in the resume is

The most valuable work experience. If you've already worked in Data Science, write in detail what you did, what algorithms you used, and what skills you have.
If you do not have any work experience, the big plus in the resume will be:
A short story about pet projects . It is important that the candidate not only knew the theory, but also had time to practice.
Participation in the Hakatonas . This says at least that you worked in a team and (most likely) created a working solution in a limited time. Participation in the Hakaton is also good because employers can notice them on you. Then you might not need to send a resume.
Participation in competitions in machine learning (Kaggle, DrivenData). If you participated in or even won the Instacart competition at Kaggle, where you had to create a recommendation system, you can solve a business problem with similar goals faster. But, in our experience, winning in such competitions does not always mean that the candidate knows, for example, how the algorithms that he used are working.

What is asked at the interview

The purpose of the interview on Data Science, as elsewhere, is to understand how well a person understands his subject domain. First, the interviewer asks questions about the basics of machine learning and statistics. From the answers you can understand the depth and breadth of the candidate's knowledge of basic issues. After that, ask specific questions, for example, on the processing of natural language, work with time series or advisory systems. If the candidate says that he knows how to work with graphs, images or other data, he will be asked about it.
Universal soldiers are extremely rare, and questions in the interview depend on the experience of the candidates. Usually they ask about past projects, about what technologies they used and why. After that, they can ask to speculate. And of course they will ask some theoretical questions.
Here are some questions that may be asked during the interview:
Neural networks
- What methods of preventing re-training (regularization) for neural networks do you know? How do they work? Where to insert batch normalization?
- What is the difference between a neural network with one output and a sigmoid activation function and the same neural network, but with two outputs and softmax?
- Let's imagine that we have a multilayered fully connected network with a nonlinear activation function. What will happen to the neural network if we remove the nonlinearity?
- Why use global pooling?

Recognition of images
- How is quality evaluated in object detection problems?
- What architectures of neural networks for semantic segmentation do you know?
- How and why to use transfer learning?

Time series
- How correctly to test the quality of models in work with time series?
- What should we do with seasonality in the data?
- How to search for anomalies in time series?

Natural language processing
- What is the basis for modeling topics? How does this algorithm work? How will you choose the number of topics that will be trained by this algorithm?
- You have text reviews and rating, users use a 5-point scale. How would you build a system that will be able to predict the evaluation of the review text? How to evaluate the quality of this system?

In the course of reasoning and problem solving interviewers ask many clarifying questions and try to place the candidate in "combat conditions". For example, the candidate proposes a solution, and the interviewer adds new conditions to the task.
"What will you do if the data set is unbalanced?"
"How will you solve the problem if there are omissions in the data?"
"What do you do if there are emissions in the data?"

In addition, they may ask how the candidate organizes his working hours, how experiments logically track, monitor whether they reproduce, how he processes large amounts of data, and builds data processing pipelines.

Typical mistakes in interviews

The candidate does not understand how the algorithms he used work.
Interviewers always ask about algorithms used by candidates: what parameters they have, how to set them. If there is no answer, or the candidate responds that he tuned the algorithm "by inspiration" - this is bad. If you take an algorithm, it's worth taking the time to understand how to configure it.
The candidate does not understand how to apply his knowledge in "combat conditions"
It happens like this: a candidate knows the theory well, but does not know how to cope with problems on projects. It is important not only to be able to find insights in the data, do feature engineering, build models, but also understand how to put it all into production or make a solution that will work faster.
The candidate can not reason independently
If a person answers the question too often: "I'll google" - this is not a good sign. Of course, the date-sentientes google, but to be able to reason independently is also important: sometimes there are problems for which there is no ready solution, and you need to invent something of your own.
The candidate thinks up how the system works.
Sometimes people can not answer the question of how this or that system works, and they start to invent, hoping to get a finger in the sky. So do not recommend: the interviewer will notice. Better to say honestly: "I do not know", then there will be more time for other questions. The likelihood that you will be asked about what you are dealing with will grow.


Anyone who wants to engage in Data Science, we recommend to see /read:
• Course Programming in Python on Stepik
• Course "Introduction to machine learning" on the Coursera
• Course "Machine learning and data analysis" on Coursera
• Course "Machine learning" Constantine Vorontsov
Courses in deep learning on Coursera
• Course "Neural networks" on Stepik
• Book Deep Learning Book
• Book "Deep training: immersion in the world of neural networks" - the first book on in-depth training in Russian
• The book on NLP Speech and Language Processing
• The book on information retrieval and NLP "Introduction to Information Retrieval"
• Articles on opendatascience
• Course "Algorithms and Data Structures" Maxim Babenko

Add comment