Conv layers without Batch Normalization

Suggest to design the experiment as follows:
Step 1. Define a shallow VGG-like network (refer
to https://github.com/NervanaSystems/distiller/blob/master/distiller/models/cifar10/vgg_c
ifar.py), say a VGG with 3 or 4 Conv layers without Batch Normalization. This is easy for
debugging.
Step 2. Train this shallow VGG baseline network on CIFAR-10, generate Top-1 accuracy, refer
to https://github.com/NervanaSystems/distiller/tree/master/examples/baseline_networks
Step 3. Run Automated Gradual Pruner (AGP) Pruning
(https://github.com/NervanaSystems/distiller/tree/master/examples/agp-pruning) on this
shallow VGG, to prune network weights during retraining and generate Top-1 accuracy.
Step 1 – 3 is the baseline of network pruning in the spatial domain.
Next, extend to pruning in the frequency domain:
Step 4. refer to the frequency domain pruning paper (figure 1), transform the shallow VGG
network pre-trained at Step 2 to the frequency domain, and test the Top-1 accuracy. It
should report the same accuracy as in Step 2
Step 5. refer to Section 2.2 in the paper, prune the frequency-domain coefficients generated
at Step 4, with different manually chosen thresholds, and test the Top-1 accuracy again with
the pruned network, observe how does the performance change. For debug purposes, you
can prune Conv layer by layer.
You can compare the compression ratio and accuracy, between Step 5 and Step 3. The
comparisons can verify whether Step 4 and 5 are correct or not, to some extent.
The final step, integrate the frequency-domain pruning with AGP training.
Step 6. implement the update rule as referred to Section 2.4 in the paper with the AGP
training code (Step 3), and perform frequency-domain pruning during training.

The post Conv layers without Batch Normalization appeared first on My Assignment Online.

Share this:

Like this:

Related Posts