Table of Contents
- 1 How would you decide whether you can use machine learning technique on a dataset or not?
- 2 What is the difference between supervised machine learning and unsupervised machine learning?
- 3 How do you select a suitable machine learning algorithm for a certain application?
- 4 Can you combine supervised unsupervised learning?
- 5 How do you make sure your model is not Overfitting?
- 6 What examples can you find to justify the usage of unsupervised learning?
- 7 Can machine learning be used to detect spam emails?
- 8 What is data ownership in machine learning?
How would you decide whether you can use machine learning technique on a dataset or not?
Here are some important considerations while choosing an algorithm.
- Size of the training data. It is usually recommended to gather a good amount of data to get reliable predictions.
- Accuracy and/or Interpretability of the output.
- Speed or Training time.
- Linearity.
- Number of features.
How do you choose between supervised and unsupervised learning?
The main difference between supervised and unsupervised learning: Labeled data. The main distinction between the two approaches is the use of labeled datasets. To put it simply, supervised learning uses labeled input and output data, while an unsupervised learning algorithm does not.
What is the difference between supervised machine learning and unsupervised machine learning?
In a supervised learning model, the algorithm learns on a labeled dataset, providing an answer key that the algorithm can use to evaluate its accuracy on training data. An unsupervised model, in contrast, provides unlabeled data that the algorithm tries to make sense of by extracting features and patterns on its own.
What is an example of supervised learning?
One practical example of supervised learning problems is predicting house prices. By leveraging data coming from thousands of houses, their features and prices, we can now train a supervised machine learning model to predict a new house’s price based on the examples observed by the model.
How do you select a suitable machine learning algorithm for a certain application?
Do you know how to choose the right machine learning algorithm among 7 different types?
- 1-Categorize the problem.
- 2-Understand Your Data.
- Analyze the Data.
- Process the data.
- Transform the data.
- 3-Find the available algorithms.
- 4-Implement machine learning algorithms.
- 5-Optimize hyperparameters.
How will you select suitable machine learning algorithm for a problem statement?
If it is a regression problem, then use Linear regression, Decision Trees, Random Forest, KNN, etc. If it is a classification problem, then use Logistic regression, Random forest, XGboost, AdaBoost, SVM, etc. If it is unsupervised learning, then use clustering algorithms like K-means algorithm.
Can you combine supervised unsupervised learning?
From a definitional sense, there is no such thing as “mixing unsupervised learning and supervised learning” since any problem for which you have target variables is by definition supervised learning. When you don’t have target variables it’s called unsupervised learning.
What are some examples of unsupervised learning?
Below is the list of some popular unsupervised learning algorithms:
- K-means clustering.
- KNN (k-nearest neighbors)
- Hierarchal clustering.
- Anomaly detection.
- Neural Networks.
- Principle Component Analysis.
- Independent Component Analysis.
- Apriori algorithm.
How do you make sure your model is not Overfitting?
How do we ensure that we’re not overfitting with a machine learning model?
- 1- Keep the model simpler: remove some of the noise in the training data.
- 2- Use cross-validation techniques such as k-folds cross-validation.
- 3- Use regularization techniques such as LASSO.
Is K-means supervised or unsupervised?
K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster.
What examples can you find to justify the usage of unsupervised learning?
Some use cases for unsupervised learning — more specifically, clustering — include:
- Customer segmentation, or understanding different customer groups around which to build marketing or other business strategies.
- Genetics, for example clustering DNA patterns to analyze evolutionary biology.
What is Supervised Learning technique?
Supervised learning, also known as supervised machine learning, is a subcategory of machine learning and artificial intelligence. It is defined by its use of labeled datasets to train algorithms that to classify data or predict outcomes accurately.
Can machine learning be used to detect spam emails?
Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering.
How machine learning can be used for fraud detection?
Here comes Machine Learning which can be used for creating a fraud detection algorithm that helps in solving these real-world problems. To learn more about ML, checkout Intellipaat’s Machine Learning Certification Course. Email Phishing: This is a fraud or cybercrime wherein attackers send fake sites and messages to users via email.
What is data ownership in machine learning?
The concept of ownership breaks down with ML datasets that are an aggregate of data from many users. Essentially, data engineers need to be granted view-access to an entire set of data in order to effectively use the dataset.
Is it possible to classify emails based on their message body?
In this case I wanted to classify emails based on their message body, definitely an unsupervised machine learning task. Instead of loading in all +500k emails, I chunked the dataset into a couple of files with each 10k emails. Trust me, you don’t want to load the full Enron dataset in memory and make complex computations with it.