Reduced the cost of the pull communication
With more than 32 ranks, a sub-communicator will be used
for the pull communication. This reduces the pull communication
significantly with small pull groups. With large pull groups the total
simulation performance might not improve much, because ranks
that are not in the sub-communicator will later wait for the pull
ranks during the communication for the constraints.
Added a pull_comm_t struct to separate the data used for communication.
Change-Id: I92b64d098b508b11718ef3ae175b771032ad7be2