Commit f6cdc6c1 authored by Gary Ruben's avatar Gary Ruben
Browse files

Improved recovery logic that caused nodes to be marked processed upon a...

Improved recovery logic that caused nodes to be marked processed upon a network failure so we just roll back to the last successful transfer of a nonzero node
parent 722bde73
......@@ -21,8 +21,9 @@ Note that current version creates two files in the same directory as this script
Known issues
------------
Note: The current version of fabric generates warnings. This issue is discussed
here: https://github.com/paramiko/paramiko/issues/1369
Note: The current version of fabric generates harmless warnings. This issue is
discussed
here: https://github.com/paramiko/paramiko/issues/1369
Notes
-----
......@@ -30,7 +31,7 @@ This is a possible option for checksumming:
https://stackoverflow.com/q/45819356/
KERNEL_CHECKSUM=$(cpio --to-stdout -i kernel.fat16 < archive.cpio | sha256sum | awk '{print $1}')
We used the following command to check whether a transfer was succusseful
We used the following command to check whether a transfer was successful
immediately prior to a failure of the ASCI filesystem.
The command to count the number of files in a tarball
$ tar -tf Lamb_Lung_Microfil_CT_18011B_right_CT.tar | wc -l
......@@ -61,7 +62,7 @@ LOG_FILENAME = os.path.join(
f"{EXPERIMENT_NAME}-{timestamp}.log"
)
REMOTE_LOGIN = "gary.ruben@monash.edu@sftp1.synchrotron.org.au"
SRC_PATH = "/data/13660b/asci/input"
SRC_PATH = f"/data/{EXPERIMENT_NAME}/asci/input"
DEST_PATH = "/home/grub0002/bapcxi/vault/IMBL_2018_Oct_McGillick"
......@@ -172,12 +173,14 @@ if __name__ == "__main__":
print('tree:')
pprint.pprint(tree)
# Reset nodes with count==0 to unprocessed. We observed a failure that
# mistakenly reported source tree nodes to have 0 files, so force a
# recheck of those. The side-effect is to recheck genuinely empty nodes.
for node in tree:
# Reset nodes at the end of the list with count==0 to unprocessed
# This is done because we observed a failure that mistakenly reported
# source tree nodes to have 0 files, so force a recheck of those.
for node in reversed(tree):
if node.count == 0:
node.processed = False
else:
break
else:
# Get the directory tree from remote server as a list
with Connection(REMOTE_LOGIN) as c:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment