有一次,我在自己写的关于“神经组合优化”的项目中遭遇报错:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [CUDABoolType [1024, 21]] is at version 139; expected version 138 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
此报错令人一头雾水,因为抛出异常的位置位于loss.backward(),显然跟真正出问题的地方相去甚远,而其他有价值的信息就只有关于问题张量的数据类型与形状,所以想要进行问题定位还是比较困难的。