Augmentation is a data preprocessing step that allows you to significantly increase the quantity of training data you have available by slightly modifying examples within your dataset.
data:image/s3,"s3://crabby-images/696cb/696cb3263faf174de053ca543766c408352ea4cb" alt=""
This is typically used for supervised learning, where all data must be labeled. Augmentation allows us to transfer the labels from a base example to newly generated examples, saving significant time and money.
data:image/s3,"s3://crabby-images/dcc7a/dcc7a867d493e5ae07210138e88efd9519547758" alt=""
Augmentation is extremely useful in the field of computer vision, as augmentations can simulate a variety of settings using a small amount of data.
data:image/s3,"s3://crabby-images/daed5/daed511942346d8bb3fdb971d0474ebdab889af2" alt=""
Common image augmentations include:
(Screenshots courtesy of Roboflow, which makes augmentation effortless!)
Flip:
data:image/s3,"s3://crabby-images/ae67c/ae67c4fc52614e0801e7d82bd434664717d85d86" alt=""
Rotate
data:image/s3,"s3://crabby-images/e9d15/e9d15bbf2063edf3a46da7689e898b19b278854c" alt=""
Crop/Zoom
data:image/s3,"s3://crabby-images/9fc06/9fc062cd89ac0be4696ebbdf586068c8483916e8" alt=""
Shear
data:image/s3,"s3://crabby-images/6d0ca/6d0ca6053c047f573f2bee6cf084fe94366b71cc" alt=""
Hue
data:image/s3,"s3://crabby-images/ba9a5/ba9a573515167507facb0e3f68d4bd1ec86dbb18" alt=""
Saturation
data:image/s3,"s3://crabby-images/f3010/f3010bcfbe889727e8410a7e4bace6537eb88d1f" alt=""
Brightness
data:image/s3,"s3://crabby-images/dc8b8/dc8b8d778ff0dedbc8b13e90874d3cfb1931cdb4" alt=""
Greyscale
data:image/s3,"s3://crabby-images/b7ba3/b7ba3d473d4914085c1dfe388f9ff4bbfb2cd26c" alt=""
Cutout/Occlusion
data:image/s3,"s3://crabby-images/40e4c/40e4cd536c030cc786a3d9193522f49ad72b35f1" alt=""
Blur
data:image/s3,"s3://crabby-images/f9500/f9500053b7f2249911f8b626560c7847d9c66e31" alt=""
Noise
data:image/s3,"s3://crabby-images/1a76d/1a76d727eec30eecd3c85f305fd98a156af21abb" alt=""
Experimentation is required to determine the “best” augmentations for each specific problem, especially as over-augmentation can decrease model performance.
data:image/s3,"s3://crabby-images/dc380/dc380b32f0c5742eadc63ef9c9d5d1d3c23a24de" alt=""
Ideally, this is done through an “ablation study”, where augmentations are tested one at a time to isolate the performance impacts and determine the optimal combination.
Recently, augmentation has been taken to the next level, through a process called synthetic data generation, where 3d models are created and then simulated in a number of environments.
data:image/s3,"s3://crabby-images/c1b45/c1b45cba1c88651fcd84349fe0e28ec12d01c62b" alt=""
Similar techniques can be applied to other domains such as Natural Language Processing to create robust training sets from limited data.
data:image/s3,"s3://crabby-images/53abd/53abd7fd6f2e58e17b9492dcbc75fcb5fa7e5b14" alt=""