The concepts found in this task are meant to provide a high-level overview of Machine Learning. Machine learning is a subset of Artificial intelligence where the focus is to create machines that can simulate human intelligence by learning from data.
Advances in technology have made it easier for data to be collected and made available. The available type of data will determine the kind of training that the machine learning model can undergo. There are two types of machine learning training, supervised and unsupervised learning. Supervised learning is when the dataset contains the output that you are trying to predict. For those cases where the predicting variable is not present, it’s called unsupervised learning.
Note that the Evispot ML platform creates supervised models. In machine learning, the goal is to take the variables and use them to come up with predictions on never-before-seen data. An example of a label could be whether or not a person paid back a loan and variables could be income, age and occupation.
A machine learning model is as good as the data that is used to train it. If you use garbage data to train your model, you will get a garbage model. With that said, before uploading a dataset into tools that will assist you with building your machine learning model, such as Evispot ML platform, ensure that the dataset has been cleaned and prepared for training.
Data preparation can include extractions, parsing, joining, standardizing, augmenting and cleansing.
Variable engineering and/or data transformation is the process of creating new variables from the existing ones. Some data transformations include looking at all the variables and identifying which variables can be combined to make new ones that will be more useful to the model’s performance. Variable engineering is very time-consuming due to its repetitive nature.
After successfully or having a notion of well-done data transformation, the next step in creating a model is selecting an algorithm.
In supervised machine learning, there are many algorithms to select from for training. The type of algorithm(s) will depend on the size of your data set, structure, and the type of problem you are trying to solve. Through trial and error, the best performing algorithms can be found for your dataset.
Some of those algorithms include gradient boosting trees, regression trees, random forests, to name a few.
When training a machine learning model, one good practice is to split up your dataset into subsets: training, validation, and testing sets. A good ratio for the entire dataset is 60-20-20, 60% of the whole dataset for training, 20% for validation, and the remaining 20% for testing. The training set is the data used to train the model, and it needs to be big enough to get significant results from it. The validation set is the data held back from the training and will be used to evaluate and adjust the trained model’s hyperparameters and, hence, adjust the performance.
Finally, the test set is data that has also been held back and will be used to confirm the final model’s results. In the Evispot ML platform this is automatically done, but it is also possible to change the percentage values or upload a separate test set for evaluation and model comparison.
One of the significant challenges in developing a single production-ready model is that it can take weeks or months to build it. Developing a model involves variable engineering, model building, and model deployment.
All tasks are very repetitive, time-consuming, require advanced knowledge of variable generation, algorithms, parameters, and model deployment. Finally, there needs to be in-depth knowledge and confidence in how the model was generated to justify how it made its decisions.
AutoML or Automated Machine Learning is the process of automating algorithm selection, variable generation, hyperparameter tuning, iterative modeling, and model assessment. AutoML tools such as Evispot ML platform makes it easy to train and evaluate machine learning models. Automating the repetitive tasks around Machine Learning development allows individuals to focus on the data and the business problems they are trying to solve.
With this task in mind, let’s explore and load the data that we will be using when predicting whether a person will pay back their loan or not.
© Evispot 2022 All rights reserved.
Leveraging machine learning for smarter lending and obtain insights into the technology behind 100% transparent machine learning models.
A link to download the file will be sent to your inbox.