'Big code' initiative aims to build better software

DARPA seeks to harness big data to improve software reliability.

The Defense Advanced Research Projects Agency is attempting to take big data analytics to the next level through a "big code" project designed to improve overall software reliability through a large-scale repository of software that drives big data.

The DARPA "big code" initiative, formally known as Mining and Understanding Software Enclaves, or MUSE, seeks to leverage software analysis and big data analytics to improve the way software is built, debugged and verified.

The goal of the "big code" effort is to apply the principles of big data analytics to "identify and understand deep commonalities among the constantly evolving [body] of software drawn from the hundreds of billions of lines of open source code available today," DARPA program manager Suresh Jagannathan, said in a statement.

The MUSE program treats the details of software programs as a data set with the goal of "discovering new relationships – enclaves —among this 'big code' to build better, more robust software," Jagannathan said

The research agency's Information Innovation Office added that it is seeking to transform the way software is written and maintained. MUSE would replace the traditional test/debug/validate cycle with "'always on' program analysis, mining, inspection and discovery," Jagannathan said.

The MUSE approach would also create a community infrastructure built around what DARPA calls a continuously operating "specification-mining engine." The engine would attempt to leverage "deep program analyses" and the key ideas underpinning big data analytics to build a database containing inferences software program properties, behaviors and vulnerabilities.

The research agency sponsors high-risk, high-payoff technology programs that may or may not lead to military and commercial applications. DARPA is best known for creating the ARPAnet, the wide-area network that served as the forerunner of today's Internet.

Program officials said they hoped "the collective knowledge gleaned from this effort would facilitate new mechanisms for dramatically improving software reliability, and help develop radically different approaches for automatically constructing and repairing complex software."

The need to improve software reliability grows as operation of bigger segments of the nation's critical infrastructure is automated. As networks are scaled, software errors triggered during program execution are frequently found to be the cause of network failures and security vulnerabilities.

DARPA issued a solicitation earlier this year seeking research proposals in areas such as program analysis, software verification and big data analytics that could be used to specify properties of complex software systems.

The agency said it expects to award contracts in five different "big code" research areas: software integrity evaluators, artifact generators, a mining engine, analytics and infrastructure. The mining engine category will require expertise in big data, machine learning and databases, the DARPA solicitation said.

The three-phase MUSE program will include a series of demonstration workshops at the end of each phase designed to reflect orders-of-magnitude increases in the amount of "big code" developed under the research program along with "corresponding advances in scalability of analyses and analytics," the DARPA announcement said.