This is Penn State

Inferring Software Specifications from Open Source Repositories

 

Today individuals, society, and the nation critically depend on software to manage critical infrastructures for power, banking and finance, air traffic control, telecommunication, transportation, national defense, and healthcare. Specifications are critical for communicating the intended behavior of software systems to software developers and users and to make it possible for automated tools to verify whether a given piece of software indeed behaves as intended. Safety critical applications have traditionally enjoyed the benefits of such specifications, but at a great cost. Because producing useful, non-trivial specifications from scratch is too hard, time consuming, and requires expertise that is not broadly available, such specifications are largely unavailable. The lack of specifications for core libraries and widely used frameworks makes specifying applications that use them even more difficult. The absence of precise, comprehensible, and efficiently verifiable specifications is a major hurdle to developing software systems that are reliable, secure, and easy to maintain and reuse.

This project brings together an interdisciplinary team of researchers with complementary expertise in formal methods, software engineering, machine learning and big data analytics to develop automated or semi-automated methods for inferring the specifications from code. The resulting methods and tools combine analytics over large open source code repositories to augment and improve upon specifications by program analysis-based specification inference through synergistic advances across both these areas.

The broader impacts of the project include: transformative advances in specification inference and synthesis, with the potential to dramatically reduce, the cost of developing and maintaining high assurance software; enhanced interdisciplinary expertise at the intersection of formal methods software engineering, and big data analytics; Contributions to research-based training of a cadre of scientists and engineers with expertise in high assurance software.