The AI Dictionary
AI terms and jargon simply explained.
I often find explanations online to be more complicated than they need to be. Here, I hope to fix that. New terms will continue to be added over time.
Click terms to view expanded definitions.
Do let me know of any corrections and improvements, and of any terms you would like added!
Activation Function
A function that follows the linear function in a neuron, to introduce nonlinearity.
Bagging
An ensembling technique. When bagging, each model is trained on random subset of the rows, and a random subset of the columns, with replacement.
Cross Entropy Loss
A technique for calculating the loss for categorical models with multiple categories.
Dataloader
An object that takes data from the dataset, and assembles them into batches. Note that this object does not decide what indices to load from, and hence is not a sampler.
Decoder (Transformers)
A component of a transformer that is used for generating text. An example is the autocomplete feature on a smartphone’s keyboard.
Dot Product
The operation given to the process of taking the product of each corresponding element in a vector, and summing all products. Also known as linear combination.
Embedding
A table, or matrix, where each row represents an item and each column describes the items in some way. The real magic of embeddings happen when you combine two embeddings together in some way to obtain further information.
Encoder (Transformers)
A component of a transformer that is used for “understanding” text. Encoders are typically used for classifying sentences by sentiment and figuring out what parts of a sentence refers, for example, to a person or location.
Gradient Accumulation
A technique for running or fitting large models on a not-so-powerful GPU.
Gradient Boosting Machine (GBM)
An ensembling technique where instead of averaging the predictions of all models, each successive model predicts the error of the previous model. The errors are then summed to obtain the final prediction.
Inference
Using a trained model for predictions.
K-Fold Cross Validation
An ensembling technique where models are trained on a different set percent of the dataset. For example each model is trained on a different 80% of the dataset.
Learning Rate
A numerical value which controls how much the gradients update the parameters of a model.
Linear Combination
The operation given to the process of taking the product of each corresponding element in a vector, and summing all products. Also known as dot product.
Named Entity Recognition (NER)
A NLP classification task where a sentence is broken into its components, and the model attempts to assign each component to a specific entity (e.g., person, place, organization).
Neuron
A basic processor of information. It consists of the linear combination and an activation function.
OneR Classifier
The simplest type of decision tree. The tree only contains a single split.
Rectified Linear Unit (ReLU)
An activation function that clips any value less than zero, to zero.
Sampler
An algorithm that decides what indices of a dataset to load. Note that this algorithm does not load data, and hence is not a dataloader.
Tabular Model
A model trained on tabular data. It is used to predict a specified column in the data.
Tokenization
Splitting a document into its component words.
Transformer
The name given to a Natural Language Processing (NLP) architecture that, in a nutshell, either fills-in-the-blanks or autocompletes text. Transformers consist of either an encoder, decoder, or both.
Zero-shot
A prefix given to a pretrained model that can be used without finetuning.