#77 - Vitaliy Chiley (Cerebras)

16 June 2022

68 min

16 June 2022

68 min

Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.

[00:00:00] Housekeeping

[00:01:08] Preamble

[00:01:50] Vitaliy Chiley Introduction

[00:03:11] Cerebrus architecture

[00:08:12] Memory management and FLOP utilisation

[00:18:01] Centralised vs decentralised compute architecture

[00:21:12] Sparsity

[00:23:47] Does Sparse NN imply Heterogeneous compute?

[00:29:21] Cost of distributed memory stores?

[00:31:01] Activation vs weight sparsity

[00:37:52] What constitutes a dead weight to be pruned?

[00:39:02] Is it still a saving if we have to choose between weight and activation sparsity?

[00:41:02] Cerebras is a cool place to work

[00:44:05] What is sparsity? Why do we need to start dense?

[00:46:36] Evolutionary algorithms on Cerebras?

[00:47:57] How can we start sparse? Google RIGL

[00:51:44] Inductive priors, why do we need them if we can start sparse?

[00:56:02] Why anthropomorphise inductive priors?

[01:02:13] Could Cerebras run a cyclic computational graph?

[01:03:16] Are NNs locality sensitive hashing tables?

References;

Rigging the Lottery: Making All Tickets Winners [RIGL]

https://arxiv.org/pdf/1911.11134.pdf

[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet

https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/

A Spline Theory of Deep Learning [Balestriero]

https://proceedings.mlr.press/v80/balestriero18b.html

Machine Learning Street Talk (MLST)

Technology

The podcast Machine Learning Street Talk (MLST) is embedded on this page from an open RSS feed. All files, descriptions, artwork and other metadata from the RSS-feed is the property of the podcast owner and not affiliated with or validated by Podplay.