Skip to content
Snippets Groups Projects
Commit f6cdc6c1 authored by Gary Ruben's avatar Gary Ruben
Browse files

Improved recovery logic that caused nodes to be marked processed upon a...

Improved recovery logic that caused nodes to be marked processed upon a network failure so we just roll back to the last successful transfer of a nonzero node
parent 722bde73
No related branches found
No related tags found
No related merge requests found
...@@ -21,8 +21,9 @@ Note that current version creates two files in the same directory as this script ...@@ -21,8 +21,9 @@ Note that current version creates two files in the same directory as this script
Known issues Known issues
------------ ------------
Note: The current version of fabric generates warnings. This issue is discussed Note: The current version of fabric generates harmless warnings. This issue is
here: https://github.com/paramiko/paramiko/issues/1369 discussed
here: https://github.com/paramiko/paramiko/issues/1369
Notes Notes
----- -----
...@@ -30,7 +31,7 @@ This is a possible option for checksumming: ...@@ -30,7 +31,7 @@ This is a possible option for checksumming:
https://stackoverflow.com/q/45819356/ https://stackoverflow.com/q/45819356/
KERNEL_CHECKSUM=$(cpio --to-stdout -i kernel.fat16 < archive.cpio | sha256sum | awk '{print $1}') KERNEL_CHECKSUM=$(cpio --to-stdout -i kernel.fat16 < archive.cpio | sha256sum | awk '{print $1}')
We used the following command to check whether a transfer was succusseful We used the following command to check whether a transfer was successful
immediately prior to a failure of the ASCI filesystem. immediately prior to a failure of the ASCI filesystem.
The command to count the number of files in a tarball The command to count the number of files in a tarball
$ tar -tf Lamb_Lung_Microfil_CT_18011B_right_CT.tar | wc -l $ tar -tf Lamb_Lung_Microfil_CT_18011B_right_CT.tar | wc -l
...@@ -61,7 +62,7 @@ LOG_FILENAME = os.path.join( ...@@ -61,7 +62,7 @@ LOG_FILENAME = os.path.join(
f"{EXPERIMENT_NAME}-{timestamp}.log" f"{EXPERIMENT_NAME}-{timestamp}.log"
) )
REMOTE_LOGIN = "gary.ruben@monash.edu@sftp1.synchrotron.org.au" REMOTE_LOGIN = "gary.ruben@monash.edu@sftp1.synchrotron.org.au"
SRC_PATH = "/data/13660b/asci/input" SRC_PATH = f"/data/{EXPERIMENT_NAME}/asci/input"
DEST_PATH = "/home/grub0002/bapcxi/vault/IMBL_2018_Oct_McGillick" DEST_PATH = "/home/grub0002/bapcxi/vault/IMBL_2018_Oct_McGillick"
...@@ -172,12 +173,14 @@ if __name__ == "__main__": ...@@ -172,12 +173,14 @@ if __name__ == "__main__":
print('tree:') print('tree:')
pprint.pprint(tree) pprint.pprint(tree)
# Reset nodes with count==0 to unprocessed. We observed a failure that # Reset nodes at the end of the list with count==0 to unprocessed
# mistakenly reported source tree nodes to have 0 files, so force a # This is done because we observed a failure that mistakenly reported
# recheck of those. The side-effect is to recheck genuinely empty nodes. # source tree nodes to have 0 files, so force a recheck of those.
for node in tree: for node in reversed(tree):
if node.count == 0: if node.count == 0:
node.processed = False node.processed = False
else:
break
else: else:
# Get the directory tree from remote server as a list # Get the directory tree from remote server as a list
with Connection(REMOTE_LOGIN) as c: with Connection(REMOTE_LOGIN) as c:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment