decentralizepy merge requestshttps://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests2022-10-18T09:39:51Zhttps://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/15Refactor and add federated + parameter server + central peer sampling2022-10-18T09:39:51ZRishi SharmaRefactor and add federated + parameter server + central peer samplingThis PR merges the refactoring, removes some unnecessary code, and adds Elisabeth's contributions.This PR merges the refactoring, removes some unnecessary code, and adds Elisabeth's contributions.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/13Only start star topology when needed2022-05-16T22:10:00ZJeffrey WiggerOnly start star topology when neededConnections for centralized training are only created when the cte flag is set.Connections for centralized training are only created when the cte flag is set.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/12Fixes to the previous PR2022-05-10T12:02:06ZJeffrey WiggerFixes to the previous PRFixing subsampling and sharing with compression
Open Issue:
~~Only the meta data for the indices is counted towards the meta_data size.
Other meta data like seed for subsampling is ignored~~
Now all metadata is counted. Even the overhea...Fixing subsampling and sharing with compression
Open Issue:
~~Only the meta data for the indices is counted towards the meta_data size.
Other meta data like seed for subsampling is ignored~~
Now all metadata is counted. Even the overhead of the dictionary, as well as the `degree` and `iteration` added in `step`.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/11Compression2022-05-09T12:42:27ZJeffrey WiggerCompressionhttps://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/10global epoch plotting and an option for non centralized plotting2022-05-05T15:35:02ZJeffrey Wiggerglobal epoch plotting and an option for non centralized plotting- Adds the global epoch plotting script (It also creates a file that lists the max accuracies and min losses for each tested method)
- Adds an option to plot.py to switch between centralized and non-centralized testing- Adds the global epoch plotting script (It also creates a file that lists the max accuracies and min losses for each tested method)
- Adds an option to plot.py to switch between centralized and non-centralized testinghttps://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/9Global testing2022-05-03T11:57:12ZJeffrey WiggerGlobal testing- Adds the option to do the evaluation on the test set on the 'allreduced' weights. The old behaviour is still available by setting the --centralized_test_eval to 0. It is also possible to do the evaluation on the trainset with the 'allr...- Adds the option to do the evaluation on the test set on the 'allreduced' weights. The old behaviour is still available by setting the --centralized_test_eval to 0. It is also possible to do the evaluation on the trainset with the 'allreduced' weights by setting --centralized_train_eval to 1. However, this is not recommended since it takes significantly longer.
- The final weights of each node are now stored in a folder called 'weights' inside the log directory.
- The plotting functions are updated to handle the new log files.
- After 50 global epochs, testing will only happen during every second global epoch.
- Includes fixes for Shakespeare. The code for the testset was wrongly used for the trainset. The size of the trainset has been further reduced to 97545 samples.Jeffrey WiggerJeffrey Wiggerhttps://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/8updated configs and run files2022-04-28T20:24:51ZJeffrey Wiggerupdated configs and run filesChanging config files to use SGD instead of Adams. Adding script that runs tests 5 times for different seeds. Added 200 seconds sleep in between rounds.Changing config files to use SGD instead of Adams. Adding script that runs tests 5 times for different seeds. Added 200 seconds sleep in between rounds.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/7Shared parameter counter2022-03-30T18:40:56ZJeffrey WiggerShared parameter counterAdds a vector that counts for every parameter how many times it has been shared. It also includes a new plotting module that visualizes this data.Adds a vector that counts for every parameter how many times it has been shared. It also includes a new plotting module that visualizes this data.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/6gridsearch fix2022-03-28T09:59:12ZJeffrey Wiggergridsearch fixFixed the calculation of ```new_iterations```.Fixed the calculation of ```new_iterations```.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/5Reddit2022-03-23T14:25:05ZJeffrey WiggerRedditAdds the reddit dataset and a RNN network. An example config file is provided as well.Adds the reddit dataset and a RNN network. An example config file is provided as well.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/4gridsearch run file + 192 nodes regular graph2022-03-22T12:16:23ZJeffrey Wiggergridsearch run file + 192 nodes regular graphAdds the grid search run file and the 192 nodes regular random graph of degree eight.Adds the grid search run file and the 192 nodes regular random graph of degree eight.https://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/3FFT Wavelets and more2022-03-21T17:47:06ZJeffrey WiggerFFT Wavelets and moreAdds several sharing methods and their accompanying training implementations:
- FFT with frequency change based parameter selection (FrequencyAccumulator)
- Wavelet with frequency change based parameter selection (FrequencyWaveletAccumul...Adds several sharing methods and their accompanying training implementations:
- FFT with frequency change based parameter selection (FrequencyAccumulator)
- Wavelet with frequency change based parameter selection (FrequencyWaveletAccumulator)
- topK with model change based parameter selection (ModelChangeAccumulator)
- TopKParams: selects the topK highest values for sharing
It also adds an example config file for each mentioned sharing method.
Additionally it adds:
- 96 nodes regular random graph with degree four
- plot.py now also json dumps the average train loss, test loss, and test loss
- changes run.sh template to store the logging data on the nfs
- adds `PyWavelets` to setup.cfg
- In testing.py it will now crash if the logging directory already exists to prevent accidentally overwriting old experiments.
- converting indices to int32 before encoding
- removing not needed importshttps://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/2moving to pickle; two threads per proc2022-03-08T09:38:30ZJeffrey Wiggermoving to pickle; two threads per procmoving from json encoding to pickle and limiting each proc to two threads.moving from json encoding to pickle and limiting each proc to two threads.Jeffrey WiggerJeffrey Wiggerhttps://gitlab.epfl.ch/sacs/decentralizepy/-/merge_requests/1using uid instead of rank for the femnist data partition2022-02-17T16:29:20ZJeffrey Wiggerusing uid instead of rank for the femnist data partitionFor the femnist dataset the data gets partitioned on every machine based on the rank of the process [0, ```procs_per_machine```]. If the data is not already pre-partitioned then all machines will use the entire dataset, i.e., a sample is...For the femnist dataset the data gets partitioned on every machine based on the rank of the process [0, ```procs_per_machine```]. If the data is not already pre-partitioned then all machines will use the entire dataset, i.e., a sample is used ```machines``` times.
This PR changes the Femnist dataset to use the ```uid``` for partitioning as it is done in for Celeba.
TODO:
[x] Either change all femnist config files such that n_procs now equals ```procs_per_machine * machines```, or set ```n_procs``` in Dataset.py to ```mapping.n_machines * mapping.procs_per_machine```
Update:
Removed the n_procs config option. Both for the Femnist and Celeba datasets the data is now split between the number of global processes.