Global testing

Jeffrey Wigger requested to merge wigger/decentralizepy:globalTesting into main
  • Adds the option to do the evaluation on the test set on the 'allreduced' weights. The old behaviour is still available by setting the --centralized_test_eval to 0. It is also possible to do the evaluation on the trainset with the 'allreduced' weights by setting --centralized_train_eval to 1. However, this is not recommended since it takes significantly longer.
  • The final weights of each node are now stored in a folder called 'weights' inside the log directory.
  • The plotting functions are updated to handle the new log files.
  • After 50 global epochs, testing will only happen during every second global epoch.
  • Includes fixes for Shakespeare. The code for the testset was wrongly used for the trainset. The size of the trainset has been further reduced to 97545 samples.

