CS 6150 : Reliable Computing
CS 6150: Reliable Computing
Semester Hours: 3.0
Contact Hours: 3
Coordinator: Ray Kresman
Text: TBD
Author: TBD
Year: TBD
SPECIFIC COURSE INFORMATION
Catalog Description
Techniques for writing reliable software including n-version programming, fault-tolerant data structures and formal proofs of correctness. Rollback and recovery methods. Fault-tolerant hardware and methods of hardware error detection and correction. Prerequisites: Full Admission to MS in CS program, or consent of department.
Course type: ELECTIVE
SPECIFIC COURSE GOALS
- I can articulate why empirical software testing does not provide 100% guarantee on software correctness.
- I am able to write the specification/predicates, that should hold, at various points for simple programs.
- I understand how to use axiomatic techniques to prove correctness of simple programs, both partial and total.
- I am able to define/give examples of groups, rings and vector spaces.
- I can explain the relationship between minimum Hamming distance and error detection/correction capability.
- I can construct basis, or G matrix, to derive codewords for messages.
- I can construct H matrix and detect/correct received data.
- I can explain the application of memory error detection/correction techniques using Hamming code.
- I can construct fault tolerant data structures, for example, modify a linked list to permit error detection and correction.
- I understand how to derive test points that can detect a variety of linear domain errors.
- I can explain the tradeoff between memory and CPU in masking hardware faults.
LIST OF TOPICS COVERED
- Fault-Tolerant Hardware
- Tandem computer architecture(*)
- Stratus computer architecture
- The (4,2) computer architecture
- Hardware error detection and correction through coding(*)
- Redundant array of inexpensive disks (RAID)(*)
- Fault-Tolerant Software
- Formal proofs of correctness(*)
- Axiomatic semantics and proof rules
- Weakest precondition
- Strongest post condition
- Invariants and assertions
- Formal specification – an overview
- VDM or Z
- Algebraic specification and data types
- Roll back and recovery, check pointing(*)
- Software Safety
- N-version techniques(*)
- Fault tolerant data structures and scrubbing(*)
- User of error detection codes in software
- Data integrity in distributed transactions
- Validation protocols for transactions
- Distributed check pointing
- Formal proofs of correctness(*)
- Estimation of Mean Time Between Failures (MTBF)
- Numerical aspects of software testing
- Domain testing
- Effect of redundant components
- Effect of scrubbing
- Standards for software fault-tolerance
(*) These topics are core material to be covered every time the course is taught.
Updated: 12/17/2025 04:53PM