Timing Technology - 31 minutes read
Timing Technology: Lessons From The Media Lab
While reading “Funding Breakthrough Research: Promises and Challenges of the ‘ARPA Model’”, Azoulay et al 2018, on DARPA, I noticed an interesting comment:
The two Goldstein & Kearney 2018 papers sounded interesting but alas, are listed as /; only one is available as a preprint. I was surprised that an agency as well known and intimately involved in computing history could be described as having one internal history, ever, and looked up a PDF copy of Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002.
The preface makes clear the odd footnote: while they may have had some access to internal archival data, they had a lot less access than they requested, DARPA was not enthusiastic about it, and eventually canceled their book contract (they published anyway). This leads to an… interesting preface. You don’t often hear historians of solicited official histories describe the access as a and say things like “they never lied to us, as best as we can tell”, they just “simply could not understand why we wanted to see the materials we requested”, or recount that their “requests for access to these [emails] were most often met with laughter”, noting that “We were never explicitly denied access to records controlled by DARPA; we just never gained complete access.” Frustrated, they
In one anecdote from the interviews, Lynn Conway shows up with a stack of internal DARPA documents, states that a NDA prevents her from talking about them (as if anyone cared about NDAs from decades before), and refuses to show any of the documents to the interviewer, leaving me rather bemused—why bother? (Although in this case, it may just be that Conway is a jerk—one might remember her from helping try to frame Michael Bailey for sexual abuse.) I was reminded a little of Carter Scholz’s also 2002 novel, Radiance, which touches on SDI and indirectly on SCI.
The book itself doesn’t seem to have suffered too badly for the birth pangs. It’s an overview of the birth and death of the SCI, organized in chunks by the manager. The division by manager is not an accident—R&S comment deprecatingly about DARPA personnel being focused on the technology and how they didn’t want them to and invoke the strawman of ; they seem to adopt the common historian pose that a sophisticated historian focuses on people and it is naive & unsophisticated to invoke objective constraints of science & technology & physics. This is wrong in the context of SCI, as their in-depth recounting will eventually make clear. The people did not have much to do with the failures: stuff like gallium arsenide or expert systems or autonomous robots didn’t work out because they don’t work or are hard or require computing power unavailable at the time, not because some bureaucrat made a bad naming choice or ran afoul of the wrong Senator. People don’t matter to something like Moore’s law. Man proposes but Nature disposes—you can fake medicine or psychology easily, but it’s harder to fake a robot not running into trees. Fortunately, for all the time R&S spend on project managers shuffling around acronyms, they still devote adequate space to the actual science & technology and do a good job of it.
So what was SCI? It was a 1980s–1990 add-on to ARPA’s existing funding programs, where the spectre of Japan’s Fifth Generation Project was used to lobby Congress for additional R&D funding which would be devoted to a cluster of interconnected technological opportunities ARPA spied on the US horizon, to push them forward simultaneously and break the logjams. (As always, “funding comes from the threat”, though many were highly skeptical that Fifth Generation would go anywhere or that its intended goals—much of which was to simply work around flaws in Japanese language handling—were much of a threat, and most Western evaluations of it generally describe it as a failure or at least not a notably productive R&D investment.) The systems included gallium arsenide chips to replace silicon’s poor thermal/radiation tolerance and operate at faster frequencies as well, VLSI chips which would combine previously disparate chips onto a single small chip as part of a silicon design ecosystem which would design & manufacture chips much faster than previously, parallel processing computers going far beyond just 1 or 2 processors, autonomous car robots, AI expert systems, and advanced user-friendly software tools in general. The name was chosen to try to benefit from Reagan’s SDI, but while the military connections remained throughout, the connection was ultimately quite tenuous and the gallium arsenide chips were deliberately split out to SDI to avoid contamination, although the US military would still be the best customer for many of the products & the connections continued to alienate people. Surprisingly—shockingly, even—computer networking was not a major SCI focus: the ARPA networking PM Barry Leiner kept clear of SCI (not needing the money & fearing a repeat of know-nothing Republican Congressmen searching for something to axe). The funding ultimately amounted to $1,000,0002,141,3661993, trivial compared to total military funding, but still real money.
The project implementation followed ARPA’s existing loose oversight paradigm, where traveling project managers were empowered to dispense grants to applicants on their own authority, depending primarily on their own good taste to match talented researchers with ripe opportunities, with bureaucracy limited to meeting with the grantees semi-annually or annually for progress reports & evaluation, often in groups so as to let researchers test each other’s mettle & form social ties. (“ARPA program managers like to repeat the quip that they are 75 entrepreneurs held together by a common travel agent.”) An ARPA PM would humbly ‘surf’ the cutting-edge, going with the waves rather than swimming upstream, so to speak, to follow growing trends while cutting their losses on dead ends, to bring things through the ‘valley of death’ between lab prototype and the real world:
Done wrong, of course, this results in a corrupt slush fund doling out R&D funds to an incestuous network of grantees for technologies always just on the horizon and whose failure is always excused by the claim that high-risk research often won’t work out, or results in elaborate systems trying to do too many things and collapsing under the weight of many advanced half-debugged systems chaotically interacting (eg ILLIAC IV). Having been conceived in scientific sin and born of blue-uniform bureaucracy while midwifed by conniving committees, SCI’s prospects might not look too great.
So, did SCI work out? The answer is a definite, unqualified—maybe:
The end of SCI coincided with (and partially caused) the , but SCI went beyond just the Lisp machine & expert system software companies we associate with the AI winter. Of the systems, some worked out, others were good ideas but the time wasn’t ripe in an unforeseeable way and have been maturing ever since, some have poked along in a kind of permanent stasis (not dead but not alive either), others were dead ends but dead ends in important ways, and some are plain dead. In order, one might list: parallel commodity processors and rapid development of large silicon chips via a subsidized foundry, the autonomous cars/vehicles and generalized machine intelligence systems and expert systems, Thinking Machines’s Connection Machine, and Josephson junctions.
Pining for the fjords: super-fast superconducting Josephson junctions were rapidly abandoned before becoming officially part of SCI research, while gallium arsenide suffered a similar fate—at the time, they were exciting and Cray Computers infamously bet big on the Cray 3 achieving its OOM improvement in part with gallium arsenide chips, but somehow it never quite worked out or replaced silicon and remains in a small niche. (I doubt it was SDI’s fault, since gallium arsenide has had 2 decades since, and there’s been a ton of commercial incentive to find a replacement for silicon as it gets ever harder to shrink silicon nodes.)
Important failures: autonomous vehicles and generalized AI systems represent an interesting intermediate case: the funded vehicles, like the work at CMU, were useless—expensive, slow, trivially confused by slight differences in roads or scenery, unable to cope in realtime with more than monochrome images with pitiful resolutions like 640x640px or smaller because the computer vision algorithms were too computationally demanding, and the development bogged down by endless tweaks and hacking with regular regressions in capability. But these research programs and demos were direct ancestors of the DARPA Grand Challenge, which itself kickstarted the current self-driving car boom a decade ago. ARPA and the military didn’t get the exciting vehicles promised by the early ’90s, but they do now have autonomous cars and especially drones, and it’s amazing to think that Google Waymo cars are wandering around Arizona now regularly picking up and dropping off riders without a single fatality or major injury after millions of miles. As far as I can tell, Waymo wouldn’t exist now without the DARPA Grand Challenge, and it seems possible that DARPA was encouraged by the mixed success of the SCI vehicles, so that’s an interesting case of potential success albeit delayed. (But then, we do expect that with technology—Amara’s law.)
Parallel computers: Thinking Machines benefited a lot from SCI as did other parallel computing projects, and while TM did fail and the computers we use now don’t resemble the Connection Machine at all, the field of parallel processing was proven out (ie. systems with thousands of weak CPUs could be successfully built, programmed, realize OOM performance gains, and commercially sold); I’d noticed once that a lot of parallel computing architectures we use now seemed to stem from an efflorescence in the 1980s, but it was only while reading R&S and noting all the familiar names that I realized that that was not a coincidence because many of them were ARPA-funded at this time. Even without R&S noting that the parallel computing was successfully rolled over into , SCI’s investment into parallel computing was a big success.
A successful adjunct to the parallel computing was an interesting program I’d never heard of before: MOSIS. MOSIS was essentially a government-subsidized chip foundry, competitive with commercial chip foundries, which would accept student & researcher submissions of VLSI chip designs like CPUs or ASICs and make physical chips in combined batches to save costs. Anyone with interesting new ideas could email in a design and get back within 2 months a real live chip for a few hundred dollars. The chips would be made cheaply, quickly, quality-checked, with assurance of privacy, and ran thousands of projects a year (peaking at 1880 in 1989). This is quite a cool program to run and must have been a godsend, especially for anyone trying to make custom chips for parallel projects. (“SC also supported BBN’s Butterfly parallel processor, Charles Seitz’s Hypercube and Cosmic Cube at CalTech, Columbia’s Non-Von, and the CalTech Tree Machine. It supported an entire newcomer as well, Danny Hillis’s Connection Machine, coming out of MIT. All of these projects used MOSIS services to move their design ideas into experimental chips.”) It was involved in early GPU work (Clark’s Geometry Engine) and RISC designs like MIPS and even oddities like systolic array chips/computers like the iWarp. Sadly, MOSIS was a bit of a victim of its own success and drew political fire.
Expert systems and planners are generally listed as a ‘failure’ and the cause of the AI Winter, and it’s true they didn’t give us HAL as some GOFAI people hoped, but they did find a useful niche and have been important—R&S give a throwaway paragraph noting that one system from SCI, DART, was used in planning logistics for the first Gulf War and saved the DoD more money than the whole SCI program combined cost. (The listed reference, “DART: Revolutionizing Logistics Planning”, Hedberg 2002, actually makes the bolder claim that DART “paid back all of DARPA’s 30 years of investment in AI in a matter of a few months, according to Victor Reis, Director of DARPA at the time.” Which could be equally well taken as a comment on how expensive a war is, how inefficient DoD logistics planning was, or how little has been invested in AI.) It’s also worth noting that speech recognition based on Hidden Markov models & n-grams, the first speech recognition systems which were any use (underlying successes like Dragon Naturally Speaking), was a success here, even if now obsolesced by deep learning.
Perhaps the most relevant area to contemporary AI discussions of deep learning is the expert systems. Why was there such optimism? Expert systems had accomplished a few successes: MYCIN/DENDRAL (although it was never used in production), some mining/oil case studies like PROSPECTOR, a customer configuration assistant XCON for DEC… And SCI was a synergistic program, remember, providing the chips and then powerful parallel computers whose expert systems would scale up to the tens of thousands of rules per second estimated necessary for things like the autonomous vehicles:
Small wonder, then, that Robert Kahn and the architects of SC believed in 1983 that AI was ripe for exploitation. It was finally moving out of the laboratory and into the real world, out of the realm of toy problems and into the realm of real problems, out of the sterile world of theory and into the practical world of applications. …That such a goal appeared within reach in the early 1980s is a measure of how far the field had already come. In the early 1970s, the MYCIN expert system had taken twenty person-years to produce just 475 rules. The full potential of expert systems lay in programs with thousands, even tens and hundreds of thousands, of rules. To achieve such levels, production of the systems had to be dramatically streamlined. The commercial firms springing up in the early 1980s were building custom systems one client at a time. DARPA would try to raise the field above that level, up to the generic or universal application. Thus was shaped the SC agenda for AI. While the basic program within IPTO continued funding for all areas of AI, SC would seek in four areas critical to the program’s applications: (1) speech recognition would support Pilot’s Associate and Battle Management; (2) natural language would be developed primarily for Battle Management; (3) vision would serve primarily the Autonomous Land Vehicle; and (4) expert systems would be developed for all of the applications. If AI was the penultimate tier of the SC pyramid, then expert systems were the pinnacle of that tier. Upon them all applications depended. Development of a generic expert system that might service all three applications could be the crowning achievement of the program. Optimism on this point was fueled by the whole philosophy behind SC. AI in general, and expert systems in particular, had been hampered previously by lack of computing power. Feigenbaum, for example, had begun DENDRAL on an IBM 7090 computer, with about 130K bytes of core memory and an operating speed between 50 and 100,000 floating point operations per second. Computer power was already well beyond that stage, but SC promised to take it to unprecedented levels—a gigaflop by 1992. Speed and power would no longer constrain expert systems. If AI could deliver the generic expert system, SC would deliver the hardware to run it. Compared to existing expert systems running 2,000 rules at 50–100 rules per second, SC promised running 30,000 rules firing at 12,000 rules per second and six times real time.
What happened was that the hardware came into existence, but the expert systems didn’t scale. They instantly hit a combinatorial wall, couldn’t solve the grounding problem, and knowledge engineering never became feasible at the level where you might encode a human’s knowledge. Expert systems also struggled to be extended beyond symbolic systems to real data like vision or sound. AI didn’t have remotely enough computing power to do anything useful, and it didn’t have methods which could use the computing power if it had it. We got the VLSI chips, we got the gigahertz processors even without gallium arsenide, we got the gigaflops and then the teraflops and now the petaflops—but what do you do with an expert system on those? Nothing. The grand goals of SCI relied on all the parts doing their part, and one part fell through:
Only four years into the SC program, when Schwartz was about to terminate the IntelliCorp and Teknowledge contracts, expectations for expert systems were already being scaled back. By the time that Hayes-Roth revised his article for the 1992 edition of the Encyclopedia, the picture was still more bleak. There he made no predictions at all about program speeds. Instead he noted that rule-based systems still lacked “a precise analytical foundation for the problems solvable by RBSs . . . and a theory of knowledge organization that would enable RBSs to be scaled up without loss of intelligibility of performance.” SC contractors in other fields, especially applications, had to rely on custom-developed software of considerably less power and versatility than those envisioned when contracts were made with IntelliCorp and Teknowledge. Instead of a generic expert system, SC applications relied increasingly on , a change in terminology that reflected the direction in which the entire field was moving. This is strikingly similar to the pessimistic evaluation Schwartz had made in 1987. It was not just that IntelliCorp and Teknowledge had failed; it was that the enterprise was impossible at current levels of experience and understanding…Does this mean that AI has finally migrated out of the laboratory and into the marketplace? That depends on one’s perspective. In 1994 the U.S. Department of Commerce estimated the global market for AI systems to be about $9001,8621994 million, with North America accounting for two-thirds of that total. Michael Schrage, of the Sloan School’s Center for Coordination Science at MIT, concluded in the same year that “AI is—dollar for dollar—probably the best software development investment that smart companies have made.” Frederick Hayes-Roth, in a wide-ranging and candid assessment, insisted that “KBS have attained a permanent and secure role in industry”, even while admitting the many shortcomings of this technology. Those shortcomings weighed heavily on AI authority Daniel Crevier, who concluded that “the expert systems flaunted in the early and mid-1980s could not operate as well as the experts who supplied them with knowledge. To true human experts, they amounted to little more than sophisticated reminding lists.” Even Edward Feigenbaum, the father of expert systems, has conceded that the products of the first generation have proven narrow, brittle, and isolated. As far as the SC agenda is concerned, Hayes-Roth’s 1993 opinion is devastating: “The current generation of expert and KBS technologies had no hope of producing a robust and general human-like intelligence.” …Each new [ALV] feature and capability brought with it a host of unanticipated problems. A new panning system, installed in early 1986 to permit the camera to turn as the road curved, unexpectedly caused the vehicle to veer back and forth until it ran off the road altogether. The software glitch was soon fixed, but the panning system had to be scrapped anyway; the heavy, 40-pound camera stripped the device’s gears whenever the vehicle made a sudden stop. Given such unanticipated difficulties and delays, Martin increasingly directed its efforts toward achieving just the specific capabilities required by the milestones, at the expense of developing more general capabilities. One of the lessons of the first demonstration, according to the ALV engineers, was the importance of defining , because “too much time was wasted doing things not appropriate to proof of concept.” Martin’s selection of technology was conservative. It had to be, as the ALV program could afford neither the lost time nor the bad publicity that a major failure would bring. One BDM observer expressed concern that the pressure of the demonstrations was encouraging Martin to cut corners, for instance by using the “flat earth” algorithm with its two-dimensional representation. ADS’s obstacle-avoidance algorithm was so narrowly focused that the company was unable to test it in a parking lot; it worked only on roads.…The vision system proved highly sensitive to environmental conditions—the quality of light, the location of the sun, shadows, and so on. The system worked differently from month to month, day to day, and even test to test. Sometimes it could accurately locate the edge of the road, sometimes not. The system reliably distinguished the pavement of the road from the dirt on the shoulders, but it was fooled by dirt that was tracked onto the roadway by heavy vehicles maneuvering around the ALV. In the fall, the sun, now lower in the sky, reflected brilliantly off the myriads of polished pebbles in the tarmac itself, producing glittering reflections that confused the vehicle. Shadows from trees presented problems, as did asphalt patches from the frequent road repairs made necessary by the harsh Colorado weather and the constant pounding of the eight-ton vehicle. …Knowledge-based systems in particular were difficult to apply outside the environment for which they had been developed. A vision system developed for autonomous navigation, for example, probably would not prove effective for an automated manufacturing assembly line. “There’s no single universal mechanism for problem solving”, Amarel would later say, “but depending on what you know about a problem, and how you represent what you know about the problem, you may use one of a number of appropriate mechanisms.”…In another major shift in emphasis, SC2 removed from its own plateau on the pyramid, subsuming it under the general heading . This seemingly minor shift in nomenclature signaled a profound reconceptualization of AI, both within DARPA and throughout much of the computer community. The effervescent optimism of the early 1980s gave way to more sober appraisal. AI did not scale. In spite of impressive achievements in some fields, designers could not make systems work at a level of complexity approaching human intelligence. Machines excelled at data storage and retrieval; they lagged in judgment, learning, and complex pattern recognition. …During SC, AI had proved unable to exploit the powerful machines developed in SC’s architectures program to achieve Kahn’s generic capability in machine intelligence. On the fine-grained level, AI, including many developments from the SC program, is ubiquitous in modern life. It inhabits everything from automobiles and consumer electronics to medical devices and instruments of the fine arts. Ironically, AI now performs miracles unimagined when SC began, though it can’t do what SC promised.
Given how people keep reaching back to the AI Winter in discussions of connectionism—I mean, deep learning—it’s interesting to contrast the two paradigms. Deep learning has long ago escaped into the commercial market, indeed, is primarily driven by industry researchers at this point. The case studies are innumerable (and many are secret due to their considerable commercial value). DL handles grounding problems & raw sensory data well and indeed struggles most on problems with richly formalized structures like hierarchies/categories/directed graphs (ML practitioners currently tend to use decision tree methods like XGBoost for those), or which require using rules & logical reasoning (somewhat like humans). Perhaps most importantly from the perspective of SCI and HPC, deep learning scales: it parallelizes in a number of ways, and it can soak up indefinite amounts of computing power & data. You can train a CNN on a few hundred or thousand images usefully, but Facebook & Google have run experiments going from millions to large datasets such as hundreds of millions and billions of images (eg Gao et al 2017, Gross et al 2017, Sun et al 2017, Mahajan et al 2018, Laanait et al 2019, Anonymous et al 2019), and the CNNs steadily improve their performance on both their assigned task and in accomplishing transfer learning. Similarly in reinforcement learning, the richer the resources available, the richer a NN can be trained (OA chart; consider how deep Zero’s NN is compared to the original AlphaGo, or Ape-X or Impala for learning many ALE games simultaneously, or just the other day, OpenAI’s 5x5 DoTA progress via essentially brute force). Even self-driving car programs which are a byword for incompetence deal just fine with all the issues that bedeviled ALV by using, well, ‘a single universal mechanism for problem solving’ (which we call CNNs, which can do anything from image segmentation to human language translation). These points are all the more striking as there is no sign that hardware improvements are over or that any inherent limits have been hit; even the large-scale experiments criticized as ‘boil the oceans’ projects nevertheless spend what are trivial amounts of money by both global economic and R&D criteria, like a few million dollars of GPU time. But none of this could have been done in the 1980s, or early 1990s. (As Hinton says, why didn’t connectionism work back then? Because the computers were thousands of times too slow, the datasets were thousands of times too small, and some of the neural network details like initializations & activations were broken.)
Considering all this, it’s not a surprise that the AI part of SC didn’t pan out and eventually got axed, as it should have. Sometimes the time is not ripe. Hero can invent the steam engine, but you don’t get steam engine trains until it’s steam engine train time, and the best intentions of all the bureaucrats in the world can’t affect that much. The turnover in managers and political interference may well have been enough to “disrupt the careful orchestration that its ambitious agenda required”, but this was more in the nature of shooting a dead horse. R&S seem, somewhat reluctantly, to ultimately assent to the view they critiqued at the beginning, held by the ARPA staff, that the failure of SC is primarily a demonstration of technological determinism than social & political contingency, and more about the technology than people:
…Thus, for all of their agency, their story appears to be one driven by the technology. If they were unable to socially construct this technology, to maintain agency over technological choice, does it then follow that some technological imperative shaped the SC trajectory, diverting it in the end from machine intelligence to high performance computing? Institutionally, SC is best understood as an analog of the development programs for the Polaris and Atlas ballistic missiles. An elaborate structure was created to sell the program, but in practice the plan bore little resemblance to day-to-day operations. Conceptually, SC is best understood by mixing Thomas Hughes’s framework of large-scale technological systems with Giovanni Dosi’s notions of research trajectories. Its experience does not quite map on Hughes’s model because the managers could not or would not bring their reverse salients on line. It does not quite map on Dosi because the managers regularly dealt with more trajectories and more variables than Dosi anticipates in his analyses. In essence, the managers of SC were trying to research and develop a complex technological system. They succeeded in developing some components; they failed to connect them in a system. The overall program history suggests that at this level of basic or fundamental research it is best to aim for a broad range of capabilities within the technology base and leave integration to others…While the Fifth Generation program contributed significantly to Japan’s national infrastructure in computer technology, it did not vault that country past the United States…SC played an important role, but even some SC supporters have noted that the Japanese were in any event headed on the wrong trajectory even before the United States mobilized itself to meet their challenge. …In some ways the varying records of the SC applications shed light on the program models advanced by Kahn and Cooper at the outset. Cooper believed that the applications would pull technology development; Kahn believed that the evolving technology base would reveal what applications were possible. Kahn’s appraisal looks more realistic in retrospect. It is clear that expert systems enjoyed significant success in planning applications. This made possible applications ranging from Naval Battle Management to DART. Vision did not make comparable progress, thus precluding achievement of the ambitious goals set for the ALV. Once again, the program went where the technology allowed. Some reverse salients resisted efforts to orchestrate advance of the entire field in concert. If one component in a system did not connect, the system did not connect. In the final analysis, SC failed for want of connection.
Reading about SC furnishes an unexpected lesson about the importance of believing in Moore’s Law and having techniques which can scale. What are we doing now which won’t scale, and what waves are we paddling up instead of surfing?
Source: Gwern.net
Powered by NewsAPI.org
Keywords:
DARPA • Ariella Azoulay • DARPA • Preprint • Portable Document Format • DARPA • DARPA • DARPA • Lynn Conway • DARPA • Non-disclosure agreement • Why Bother? (essay) • J. Michael Bailey • Sexual abuse • Carter Scholz • Strategic Defense Initiative • Childbirth • DARPA • Technology • History • History • Algorithm • Objectivity (philosophy) • Science • Technology • Physics • Contextualism • Gallium arsenide • Expert system • Moore's law • Counterfeit medications • Psychology • Robot • Science • Technology • DARPA • Spectre (2015 film) • Japan • Fifth generation computer • Research and development • Technology • Japanese language • Productivity • Research and development • System • Gallium arsenide • Integrated circuit • Silicon • Thermal radiation • Radiation hardening • Radio frequency • Very-large-scale integration • Integrated circuit • Integrated circuit • Silicon • Software design • Ecosystem • Integrated circuit • Parallel computing • Computer • Central processing unit • Autonomous car • Artificial intelligence • Expert system • Usability • Strategic Defense Initiative • Gallium arsenide • Integrated circuit • Strategic Defense Initiative • Computer network • DARPA • Know Nothing • Republican Party (United States) • DARPA • Regulation • Paradigm • Project management • Authority • Research • Bureaucracy • Social progress • Social group • Research • DARPA • Education • Management • Travel agency • DARPA • Swimming Upstream • Reality • Slush fund • Research and development • Technology • Failure • Risk • Research • System • Object (philosophy) • Weight • System • Chaos theory • ILLIAC IV • Bureaucracy • Lisp machine • Expert system • AI winter • Punctuated equilibrium • Life • Integrated circuit • Subsidy • Semiconductor fabrication plant • Autonomous car • Artificial intelligence • Expert system • Systems thinking • Thinking Machines Corporation • Connection Machine • Josephson effect • Superconductivity • Josephson effect • Gallium arsenide • Cray • Cray-3 • Gallium arsenide • Integrated circuit • Silicon • Serial digital interface • Gallium arsenide • Silicon • Silicon • Autonomous car • Artificial intelligence • System • Carnegie Mellon University • Real-time computing • Monochrome • Image compression • Computer vision • Algorithm • Software development • Security hacker • Regression analysis • Computer program • DARPA Grand Challenge • Electric current • Autonomous car • Crane (machine) • DARPA • Vehicle • Autonomous car • Unmanned aerial vehicle • Google • Waymo • Arizona • Waymo • DARPA Grand Challenge • Parallel computing • Thinking Machines Corporation • Parallel computing • Computer • Connection Machine • Parallel computing • Computer • Central processing unit • Computer programming • Computer performance • Parallel computing • Computer architecture • Efflorescence • DARPA • Parallel computing • Parallel computing • Parallel computing • MOSIS • MOSIS • Integrated circuit • Semiconductor fabrication plant • Integrated circuit • Semiconductor fabrication plant • Research • Very-large-scale integration • Integrated circuit • Central processing unit • Application-specific integrated circuit • Integrated circuit • Integrated circuit • Privacy • Godsend (Heroes) • Integrated circuit • BBN Technologies • Butterfly • Parallel computing • Hypercube • Cosmic Cube • California Institute of Technology • California Institute of Technology • Danny Hillis • Connection Machine • Massachusetts Institute of Technology • MOSIS • Integrated circuit • Graphics processing unit • Geometry pipelines • Reduced instruction set computing • MIPS instruction set • Systolic array • Integrated circuit • Computer • IWarp • MOSIS • Expert system • Causality • AI winter • Symbolic artificial intelligence • Logistics • Gulf War • United States Department of Defense • Logistics • Planning • DARPA • Artificial intelligence • DARPA • United States Department of Defense • Logistics • Artificial intelligence • Speech recognition • Hidden Markov model • N-gram • Speech recognition • Dragon NaturallySpeaking • Deep learning • Artificial intelligence • Deep learning • Expert system • Expert system • Mycin • Dendral • Prospecting • Xcon • Digital Equipment Corporation • Computer program • Integrated circuit • Parallel computing • Expert system • Scalability • Rule-based system • Autonomous car • Delaware • Bob Kahn • Architecture • Artificial intelligence • Reality • Theory • Pragmatism • Mycin • Expert system • Expert system • DARPA • Artificial intelligence • BASIC • Computer program • Information Processing Techniques Office • Artificial intelligence • Software • Speech recognition • Natural language • Computer vision • Autonomous car • Vehicle • Expert system • Computer program • Artificial intelligence • Expert system • Application software • Expert system • Application software • Computer program • Philosophy • Artificial intelligence • Expert system • Edward Feigenbaum • Dendral • IBM 7090 • Computer • Byte • Magnetic-core memory • FLOPS • FLOPS • Expert system • Artificial intelligence • Expert system • Computer hardware • Expert system • Rule of inference • Rule of inference • Rule of inference • Rule of inference • Real-time computing • Computer hardware • Expert system • Combinatorics • Symbol grounding problem • Knowledge engineering • Expert system • Reality • Data • Computer vision • Artificial intelligence • Computer performance • Software development process • Computer performance • Very-large-scale integration • Integrated circuit • Hertz • Central processing unit • Gallium arsenide • FLOPS • FLOPS • FLOPS • Expert system • IntelliCorp (software) • Expert system • Raqqa Is Being Slaughtered Silently • Epistemology • Knowledge organization • Raqqa Is Being Slaughtered Silently • Discipline (academia) • Software • Software • IntelliCorp (software) • Expert system • Discipline (academia) • Evaluation • IntelliCorp (software) • Business • Experience • Understanding • Artificial intelligence • United States Department of Commerce • Market (economics) • Artificial intelligence • Systems engineering • North America • Accounting • MIT Sloan School of Management • Science • Massachusetts Institute of Technology • Artificial intelligence • Software development • Smartphone • Data rate units • Technology • Artificial intelligence • Authority • Daniel Crevier • Expert system • Expert • Knowledge • Truth • Human • Expert • Edward Feigenbaum • Expert system • Data rate units • System • Camera • Software • Software bug • Computer • Machine • Proof of concept • Technology • Conservatism • League of German Girls • Observation • Flat Earth • Algorithm • Algorithm • Computer vision • Environmental science • Light • Sun • Road • Road surface • Road • Dirt road • Shoulder (road) • Dirt road • Vehicle • Tarmac • Reflection (physics) • Asphalt concrete • Colorado • Weather • Knowledge-based systems • Environmental science • Computer vision • Research and development • Autonomous robot • Automation • Manufacturing • Assembly line • Machine • Glimepiride • What You Know • What You Know • Pyramid • Artificial intelligence • DARPA • Artificial intelligence • System • Computational complexity theory • Intelligence • Machine • Computer data storage • Information retrieval • Judgement • Complexity • Pattern recognition • Artificial intelligence • Machine • Computer architecture • Computer program • Artificial intelligence • Artificial intelligence • Computer programming • Ubiquitous computing • Consumer electronics • AI winter • Connectionism • Deep learning • Paradigm • Deep learning • Perception • Data • Hierarchy • Categorization • Maximum likelihood estimation • Decision tree • Scientific method • Xgboost • Logical reasoning • Supercomputer • Deep learning • Scalability • Number • Computer performance • Data • CNN • Facebook • Google • Inductive transfer • Reinforcement learning • Operations research • AlphaGo • Ape • Impala • OpenAI • Defense of the Ancients • Autonomous car • Problem solving • Image segmentation • Language • Computer engineering • Economies of scale • Money • Research and development • Graphics processing unit • Geoffrey Hinton • Connectionism • Artificial neural network • Artificial intelligence • Steam locomotive • Steam locomotive • Train • Steam engine • The Best Intentions • DARPA • Technology demonstration • Technological determinism • Social science • Politics • Technology • Person • Structure and agency • Technology • Technology • Agency (philosophy) • Technology • Technology • Artificial intelligence • Supercomputer • Analog computer • UGM-27 Polaris • SM-65 Atlas • Ballistic missile • Planning • Concept • Thomas Hughes • Technology • System • Giovanni Dosi • Medical research • Map • Scientific modelling • Management • Function (mathematics) • Management • Trajectory • Variable (mathematics) • Analysis • Essence • Management • Medical research • Complexity • Technology • System • System • System • History • Basic research • Research • Technology • Japan • Computer • United States • United States • Pull technology • Expert system • Moore's law • Surfing •
While reading “Funding Breakthrough Research: Promises and Challenges of the ‘ARPA Model’”, Azoulay et al 2018, on DARPA, I noticed an interesting comment:
The two Goldstein & Kearney 2018 papers sounded interesting but alas, are listed as /; only one is available as a preprint. I was surprised that an agency as well known and intimately involved in computing history could be described as having one internal history, ever, and looked up a PDF copy of Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002.
The preface makes clear the odd footnote: while they may have had some access to internal archival data, they had a lot less access than they requested, DARPA was not enthusiastic about it, and eventually canceled their book contract (they published anyway). This leads to an… interesting preface. You don’t often hear historians of solicited official histories describe the access as a and say things like “they never lied to us, as best as we can tell”, they just “simply could not understand why we wanted to see the materials we requested”, or recount that their “requests for access to these [emails] were most often met with laughter”, noting that “We were never explicitly denied access to records controlled by DARPA; we just never gained complete access.” Frustrated, they
In one anecdote from the interviews, Lynn Conway shows up with a stack of internal DARPA documents, states that a NDA prevents her from talking about them (as if anyone cared about NDAs from decades before), and refuses to show any of the documents to the interviewer, leaving me rather bemused—why bother? (Although in this case, it may just be that Conway is a jerk—one might remember her from helping try to frame Michael Bailey for sexual abuse.) I was reminded a little of Carter Scholz’s also 2002 novel, Radiance, which touches on SDI and indirectly on SCI.
The book itself doesn’t seem to have suffered too badly for the birth pangs. It’s an overview of the birth and death of the SCI, organized in chunks by the manager. The division by manager is not an accident—R&S comment deprecatingly about DARPA personnel being focused on the technology and how they didn’t want them to and invoke the strawman of ; they seem to adopt the common historian pose that a sophisticated historian focuses on people and it is naive & unsophisticated to invoke objective constraints of science & technology & physics. This is wrong in the context of SCI, as their in-depth recounting will eventually make clear. The people did not have much to do with the failures: stuff like gallium arsenide or expert systems or autonomous robots didn’t work out because they don’t work or are hard or require computing power unavailable at the time, not because some bureaucrat made a bad naming choice or ran afoul of the wrong Senator. People don’t matter to something like Moore’s law. Man proposes but Nature disposes—you can fake medicine or psychology easily, but it’s harder to fake a robot not running into trees. Fortunately, for all the time R&S spend on project managers shuffling around acronyms, they still devote adequate space to the actual science & technology and do a good job of it.
So what was SCI? It was a 1980s–1990 add-on to ARPA’s existing funding programs, where the spectre of Japan’s Fifth Generation Project was used to lobby Congress for additional R&D funding which would be devoted to a cluster of interconnected technological opportunities ARPA spied on the US horizon, to push them forward simultaneously and break the logjams. (As always, “funding comes from the threat”, though many were highly skeptical that Fifth Generation would go anywhere or that its intended goals—much of which was to simply work around flaws in Japanese language handling—were much of a threat, and most Western evaluations of it generally describe it as a failure or at least not a notably productive R&D investment.) The systems included gallium arsenide chips to replace silicon’s poor thermal/radiation tolerance and operate at faster frequencies as well, VLSI chips which would combine previously disparate chips onto a single small chip as part of a silicon design ecosystem which would design & manufacture chips much faster than previously, parallel processing computers going far beyond just 1 or 2 processors, autonomous car robots, AI expert systems, and advanced user-friendly software tools in general. The name was chosen to try to benefit from Reagan’s SDI, but while the military connections remained throughout, the connection was ultimately quite tenuous and the gallium arsenide chips were deliberately split out to SDI to avoid contamination, although the US military would still be the best customer for many of the products & the connections continued to alienate people. Surprisingly—shockingly, even—computer networking was not a major SCI focus: the ARPA networking PM Barry Leiner kept clear of SCI (not needing the money & fearing a repeat of know-nothing Republican Congressmen searching for something to axe). The funding ultimately amounted to $1,000,0002,141,3661993, trivial compared to total military funding, but still real money.
The project implementation followed ARPA’s existing loose oversight paradigm, where traveling project managers were empowered to dispense grants to applicants on their own authority, depending primarily on their own good taste to match talented researchers with ripe opportunities, with bureaucracy limited to meeting with the grantees semi-annually or annually for progress reports & evaluation, often in groups so as to let researchers test each other’s mettle & form social ties. (“ARPA program managers like to repeat the quip that they are 75 entrepreneurs held together by a common travel agent.”) An ARPA PM would humbly ‘surf’ the cutting-edge, going with the waves rather than swimming upstream, so to speak, to follow growing trends while cutting their losses on dead ends, to bring things through the ‘valley of death’ between lab prototype and the real world:
Done wrong, of course, this results in a corrupt slush fund doling out R&D funds to an incestuous network of grantees for technologies always just on the horizon and whose failure is always excused by the claim that high-risk research often won’t work out, or results in elaborate systems trying to do too many things and collapsing under the weight of many advanced half-debugged systems chaotically interacting (eg ILLIAC IV). Having been conceived in scientific sin and born of blue-uniform bureaucracy while midwifed by conniving committees, SCI’s prospects might not look too great.
So, did SCI work out? The answer is a definite, unqualified—maybe:
The end of SCI coincided with (and partially caused) the , but SCI went beyond just the Lisp machine & expert system software companies we associate with the AI winter. Of the systems, some worked out, others were good ideas but the time wasn’t ripe in an unforeseeable way and have been maturing ever since, some have poked along in a kind of permanent stasis (not dead but not alive either), others were dead ends but dead ends in important ways, and some are plain dead. In order, one might list: parallel commodity processors and rapid development of large silicon chips via a subsidized foundry, the autonomous cars/vehicles and generalized machine intelligence systems and expert systems, Thinking Machines’s Connection Machine, and Josephson junctions.
Pining for the fjords: super-fast superconducting Josephson junctions were rapidly abandoned before becoming officially part of SCI research, while gallium arsenide suffered a similar fate—at the time, they were exciting and Cray Computers infamously bet big on the Cray 3 achieving its OOM improvement in part with gallium arsenide chips, but somehow it never quite worked out or replaced silicon and remains in a small niche. (I doubt it was SDI’s fault, since gallium arsenide has had 2 decades since, and there’s been a ton of commercial incentive to find a replacement for silicon as it gets ever harder to shrink silicon nodes.)
Important failures: autonomous vehicles and generalized AI systems represent an interesting intermediate case: the funded vehicles, like the work at CMU, were useless—expensive, slow, trivially confused by slight differences in roads or scenery, unable to cope in realtime with more than monochrome images with pitiful resolutions like 640x640px or smaller because the computer vision algorithms were too computationally demanding, and the development bogged down by endless tweaks and hacking with regular regressions in capability. But these research programs and demos were direct ancestors of the DARPA Grand Challenge, which itself kickstarted the current self-driving car boom a decade ago. ARPA and the military didn’t get the exciting vehicles promised by the early ’90s, but they do now have autonomous cars and especially drones, and it’s amazing to think that Google Waymo cars are wandering around Arizona now regularly picking up and dropping off riders without a single fatality or major injury after millions of miles. As far as I can tell, Waymo wouldn’t exist now without the DARPA Grand Challenge, and it seems possible that DARPA was encouraged by the mixed success of the SCI vehicles, so that’s an interesting case of potential success albeit delayed. (But then, we do expect that with technology—Amara’s law.)
Parallel computers: Thinking Machines benefited a lot from SCI as did other parallel computing projects, and while TM did fail and the computers we use now don’t resemble the Connection Machine at all, the field of parallel processing was proven out (ie. systems with thousands of weak CPUs could be successfully built, programmed, realize OOM performance gains, and commercially sold); I’d noticed once that a lot of parallel computing architectures we use now seemed to stem from an efflorescence in the 1980s, but it was only while reading R&S and noting all the familiar names that I realized that that was not a coincidence because many of them were ARPA-funded at this time. Even without R&S noting that the parallel computing was successfully rolled over into , SCI’s investment into parallel computing was a big success.
A successful adjunct to the parallel computing was an interesting program I’d never heard of before: MOSIS. MOSIS was essentially a government-subsidized chip foundry, competitive with commercial chip foundries, which would accept student & researcher submissions of VLSI chip designs like CPUs or ASICs and make physical chips in combined batches to save costs. Anyone with interesting new ideas could email in a design and get back within 2 months a real live chip for a few hundred dollars. The chips would be made cheaply, quickly, quality-checked, with assurance of privacy, and ran thousands of projects a year (peaking at 1880 in 1989). This is quite a cool program to run and must have been a godsend, especially for anyone trying to make custom chips for parallel projects. (“SC also supported BBN’s Butterfly parallel processor, Charles Seitz’s Hypercube and Cosmic Cube at CalTech, Columbia’s Non-Von, and the CalTech Tree Machine. It supported an entire newcomer as well, Danny Hillis’s Connection Machine, coming out of MIT. All of these projects used MOSIS services to move their design ideas into experimental chips.”) It was involved in early GPU work (Clark’s Geometry Engine) and RISC designs like MIPS and even oddities like systolic array chips/computers like the iWarp. Sadly, MOSIS was a bit of a victim of its own success and drew political fire.
Expert systems and planners are generally listed as a ‘failure’ and the cause of the AI Winter, and it’s true they didn’t give us HAL as some GOFAI people hoped, but they did find a useful niche and have been important—R&S give a throwaway paragraph noting that one system from SCI, DART, was used in planning logistics for the first Gulf War and saved the DoD more money than the whole SCI program combined cost. (The listed reference, “DART: Revolutionizing Logistics Planning”, Hedberg 2002, actually makes the bolder claim that DART “paid back all of DARPA’s 30 years of investment in AI in a matter of a few months, according to Victor Reis, Director of DARPA at the time.” Which could be equally well taken as a comment on how expensive a war is, how inefficient DoD logistics planning was, or how little has been invested in AI.) It’s also worth noting that speech recognition based on Hidden Markov models & n-grams, the first speech recognition systems which were any use (underlying successes like Dragon Naturally Speaking), was a success here, even if now obsolesced by deep learning.
Perhaps the most relevant area to contemporary AI discussions of deep learning is the expert systems. Why was there such optimism? Expert systems had accomplished a few successes: MYCIN/DENDRAL (although it was never used in production), some mining/oil case studies like PROSPECTOR, a customer configuration assistant XCON for DEC… And SCI was a synergistic program, remember, providing the chips and then powerful parallel computers whose expert systems would scale up to the tens of thousands of rules per second estimated necessary for things like the autonomous vehicles:
Small wonder, then, that Robert Kahn and the architects of SC believed in 1983 that AI was ripe for exploitation. It was finally moving out of the laboratory and into the real world, out of the realm of toy problems and into the realm of real problems, out of the sterile world of theory and into the practical world of applications. …That such a goal appeared within reach in the early 1980s is a measure of how far the field had already come. In the early 1970s, the MYCIN expert system had taken twenty person-years to produce just 475 rules. The full potential of expert systems lay in programs with thousands, even tens and hundreds of thousands, of rules. To achieve such levels, production of the systems had to be dramatically streamlined. The commercial firms springing up in the early 1980s were building custom systems one client at a time. DARPA would try to raise the field above that level, up to the generic or universal application. Thus was shaped the SC agenda for AI. While the basic program within IPTO continued funding for all areas of AI, SC would seek in four areas critical to the program’s applications: (1) speech recognition would support Pilot’s Associate and Battle Management; (2) natural language would be developed primarily for Battle Management; (3) vision would serve primarily the Autonomous Land Vehicle; and (4) expert systems would be developed for all of the applications. If AI was the penultimate tier of the SC pyramid, then expert systems were the pinnacle of that tier. Upon them all applications depended. Development of a generic expert system that might service all three applications could be the crowning achievement of the program. Optimism on this point was fueled by the whole philosophy behind SC. AI in general, and expert systems in particular, had been hampered previously by lack of computing power. Feigenbaum, for example, had begun DENDRAL on an IBM 7090 computer, with about 130K bytes of core memory and an operating speed between 50 and 100,000 floating point operations per second. Computer power was already well beyond that stage, but SC promised to take it to unprecedented levels—a gigaflop by 1992. Speed and power would no longer constrain expert systems. If AI could deliver the generic expert system, SC would deliver the hardware to run it. Compared to existing expert systems running 2,000 rules at 50–100 rules per second, SC promised running 30,000 rules firing at 12,000 rules per second and six times real time.
What happened was that the hardware came into existence, but the expert systems didn’t scale. They instantly hit a combinatorial wall, couldn’t solve the grounding problem, and knowledge engineering never became feasible at the level where you might encode a human’s knowledge. Expert systems also struggled to be extended beyond symbolic systems to real data like vision or sound. AI didn’t have remotely enough computing power to do anything useful, and it didn’t have methods which could use the computing power if it had it. We got the VLSI chips, we got the gigahertz processors even without gallium arsenide, we got the gigaflops and then the teraflops and now the petaflops—but what do you do with an expert system on those? Nothing. The grand goals of SCI relied on all the parts doing their part, and one part fell through:
Only four years into the SC program, when Schwartz was about to terminate the IntelliCorp and Teknowledge contracts, expectations for expert systems were already being scaled back. By the time that Hayes-Roth revised his article for the 1992 edition of the Encyclopedia, the picture was still more bleak. There he made no predictions at all about program speeds. Instead he noted that rule-based systems still lacked “a precise analytical foundation for the problems solvable by RBSs . . . and a theory of knowledge organization that would enable RBSs to be scaled up without loss of intelligibility of performance.” SC contractors in other fields, especially applications, had to rely on custom-developed software of considerably less power and versatility than those envisioned when contracts were made with IntelliCorp and Teknowledge. Instead of a generic expert system, SC applications relied increasingly on , a change in terminology that reflected the direction in which the entire field was moving. This is strikingly similar to the pessimistic evaluation Schwartz had made in 1987. It was not just that IntelliCorp and Teknowledge had failed; it was that the enterprise was impossible at current levels of experience and understanding…Does this mean that AI has finally migrated out of the laboratory and into the marketplace? That depends on one’s perspective. In 1994 the U.S. Department of Commerce estimated the global market for AI systems to be about $9001,8621994 million, with North America accounting for two-thirds of that total. Michael Schrage, of the Sloan School’s Center for Coordination Science at MIT, concluded in the same year that “AI is—dollar for dollar—probably the best software development investment that smart companies have made.” Frederick Hayes-Roth, in a wide-ranging and candid assessment, insisted that “KBS have attained a permanent and secure role in industry”, even while admitting the many shortcomings of this technology. Those shortcomings weighed heavily on AI authority Daniel Crevier, who concluded that “the expert systems flaunted in the early and mid-1980s could not operate as well as the experts who supplied them with knowledge. To true human experts, they amounted to little more than sophisticated reminding lists.” Even Edward Feigenbaum, the father of expert systems, has conceded that the products of the first generation have proven narrow, brittle, and isolated. As far as the SC agenda is concerned, Hayes-Roth’s 1993 opinion is devastating: “The current generation of expert and KBS technologies had no hope of producing a robust and general human-like intelligence.” …Each new [ALV] feature and capability brought with it a host of unanticipated problems. A new panning system, installed in early 1986 to permit the camera to turn as the road curved, unexpectedly caused the vehicle to veer back and forth until it ran off the road altogether. The software glitch was soon fixed, but the panning system had to be scrapped anyway; the heavy, 40-pound camera stripped the device’s gears whenever the vehicle made a sudden stop. Given such unanticipated difficulties and delays, Martin increasingly directed its efforts toward achieving just the specific capabilities required by the milestones, at the expense of developing more general capabilities. One of the lessons of the first demonstration, according to the ALV engineers, was the importance of defining , because “too much time was wasted doing things not appropriate to proof of concept.” Martin’s selection of technology was conservative. It had to be, as the ALV program could afford neither the lost time nor the bad publicity that a major failure would bring. One BDM observer expressed concern that the pressure of the demonstrations was encouraging Martin to cut corners, for instance by using the “flat earth” algorithm with its two-dimensional representation. ADS’s obstacle-avoidance algorithm was so narrowly focused that the company was unable to test it in a parking lot; it worked only on roads.…The vision system proved highly sensitive to environmental conditions—the quality of light, the location of the sun, shadows, and so on. The system worked differently from month to month, day to day, and even test to test. Sometimes it could accurately locate the edge of the road, sometimes not. The system reliably distinguished the pavement of the road from the dirt on the shoulders, but it was fooled by dirt that was tracked onto the roadway by heavy vehicles maneuvering around the ALV. In the fall, the sun, now lower in the sky, reflected brilliantly off the myriads of polished pebbles in the tarmac itself, producing glittering reflections that confused the vehicle. Shadows from trees presented problems, as did asphalt patches from the frequent road repairs made necessary by the harsh Colorado weather and the constant pounding of the eight-ton vehicle. …Knowledge-based systems in particular were difficult to apply outside the environment for which they had been developed. A vision system developed for autonomous navigation, for example, probably would not prove effective for an automated manufacturing assembly line. “There’s no single universal mechanism for problem solving”, Amarel would later say, “but depending on what you know about a problem, and how you represent what you know about the problem, you may use one of a number of appropriate mechanisms.”…In another major shift in emphasis, SC2 removed from its own plateau on the pyramid, subsuming it under the general heading . This seemingly minor shift in nomenclature signaled a profound reconceptualization of AI, both within DARPA and throughout much of the computer community. The effervescent optimism of the early 1980s gave way to more sober appraisal. AI did not scale. In spite of impressive achievements in some fields, designers could not make systems work at a level of complexity approaching human intelligence. Machines excelled at data storage and retrieval; they lagged in judgment, learning, and complex pattern recognition. …During SC, AI had proved unable to exploit the powerful machines developed in SC’s architectures program to achieve Kahn’s generic capability in machine intelligence. On the fine-grained level, AI, including many developments from the SC program, is ubiquitous in modern life. It inhabits everything from automobiles and consumer electronics to medical devices and instruments of the fine arts. Ironically, AI now performs miracles unimagined when SC began, though it can’t do what SC promised.
Given how people keep reaching back to the AI Winter in discussions of connectionism—I mean, deep learning—it’s interesting to contrast the two paradigms. Deep learning has long ago escaped into the commercial market, indeed, is primarily driven by industry researchers at this point. The case studies are innumerable (and many are secret due to their considerable commercial value). DL handles grounding problems & raw sensory data well and indeed struggles most on problems with richly formalized structures like hierarchies/categories/directed graphs (ML practitioners currently tend to use decision tree methods like XGBoost for those), or which require using rules & logical reasoning (somewhat like humans). Perhaps most importantly from the perspective of SCI and HPC, deep learning scales: it parallelizes in a number of ways, and it can soak up indefinite amounts of computing power & data. You can train a CNN on a few hundred or thousand images usefully, but Facebook & Google have run experiments going from millions to large datasets such as hundreds of millions and billions of images (eg Gao et al 2017, Gross et al 2017, Sun et al 2017, Mahajan et al 2018, Laanait et al 2019, Anonymous et al 2019), and the CNNs steadily improve their performance on both their assigned task and in accomplishing transfer learning. Similarly in reinforcement learning, the richer the resources available, the richer a NN can be trained (OA chart; consider how deep Zero’s NN is compared to the original AlphaGo, or Ape-X or Impala for learning many ALE games simultaneously, or just the other day, OpenAI’s 5x5 DoTA progress via essentially brute force). Even self-driving car programs which are a byword for incompetence deal just fine with all the issues that bedeviled ALV by using, well, ‘a single universal mechanism for problem solving’ (which we call CNNs, which can do anything from image segmentation to human language translation). These points are all the more striking as there is no sign that hardware improvements are over or that any inherent limits have been hit; even the large-scale experiments criticized as ‘boil the oceans’ projects nevertheless spend what are trivial amounts of money by both global economic and R&D criteria, like a few million dollars of GPU time. But none of this could have been done in the 1980s, or early 1990s. (As Hinton says, why didn’t connectionism work back then? Because the computers were thousands of times too slow, the datasets were thousands of times too small, and some of the neural network details like initializations & activations were broken.)
Considering all this, it’s not a surprise that the AI part of SC didn’t pan out and eventually got axed, as it should have. Sometimes the time is not ripe. Hero can invent the steam engine, but you don’t get steam engine trains until it’s steam engine train time, and the best intentions of all the bureaucrats in the world can’t affect that much. The turnover in managers and political interference may well have been enough to “disrupt the careful orchestration that its ambitious agenda required”, but this was more in the nature of shooting a dead horse. R&S seem, somewhat reluctantly, to ultimately assent to the view they critiqued at the beginning, held by the ARPA staff, that the failure of SC is primarily a demonstration of technological determinism than social & political contingency, and more about the technology than people:
…Thus, for all of their agency, their story appears to be one driven by the technology. If they were unable to socially construct this technology, to maintain agency over technological choice, does it then follow that some technological imperative shaped the SC trajectory, diverting it in the end from machine intelligence to high performance computing? Institutionally, SC is best understood as an analog of the development programs for the Polaris and Atlas ballistic missiles. An elaborate structure was created to sell the program, but in practice the plan bore little resemblance to day-to-day operations. Conceptually, SC is best understood by mixing Thomas Hughes’s framework of large-scale technological systems with Giovanni Dosi’s notions of research trajectories. Its experience does not quite map on Hughes’s model because the managers could not or would not bring their reverse salients on line. It does not quite map on Dosi because the managers regularly dealt with more trajectories and more variables than Dosi anticipates in his analyses. In essence, the managers of SC were trying to research and develop a complex technological system. They succeeded in developing some components; they failed to connect them in a system. The overall program history suggests that at this level of basic or fundamental research it is best to aim for a broad range of capabilities within the technology base and leave integration to others…While the Fifth Generation program contributed significantly to Japan’s national infrastructure in computer technology, it did not vault that country past the United States…SC played an important role, but even some SC supporters have noted that the Japanese were in any event headed on the wrong trajectory even before the United States mobilized itself to meet their challenge. …In some ways the varying records of the SC applications shed light on the program models advanced by Kahn and Cooper at the outset. Cooper believed that the applications would pull technology development; Kahn believed that the evolving technology base would reveal what applications were possible. Kahn’s appraisal looks more realistic in retrospect. It is clear that expert systems enjoyed significant success in planning applications. This made possible applications ranging from Naval Battle Management to DART. Vision did not make comparable progress, thus precluding achievement of the ambitious goals set for the ALV. Once again, the program went where the technology allowed. Some reverse salients resisted efforts to orchestrate advance of the entire field in concert. If one component in a system did not connect, the system did not connect. In the final analysis, SC failed for want of connection.
Reading about SC furnishes an unexpected lesson about the importance of believing in Moore’s Law and having techniques which can scale. What are we doing now which won’t scale, and what waves are we paddling up instead of surfing?
Source: Gwern.net
Powered by NewsAPI.org
Keywords:
DARPA • Ariella Azoulay • DARPA • Preprint • Portable Document Format • DARPA • DARPA • DARPA • Lynn Conway • DARPA • Non-disclosure agreement • Why Bother? (essay) • J. Michael Bailey • Sexual abuse • Carter Scholz • Strategic Defense Initiative • Childbirth • DARPA • Technology • History • History • Algorithm • Objectivity (philosophy) • Science • Technology • Physics • Contextualism • Gallium arsenide • Expert system • Moore's law • Counterfeit medications • Psychology • Robot • Science • Technology • DARPA • Spectre (2015 film) • Japan • Fifth generation computer • Research and development • Technology • Japanese language • Productivity • Research and development • System • Gallium arsenide • Integrated circuit • Silicon • Thermal radiation • Radiation hardening • Radio frequency • Very-large-scale integration • Integrated circuit • Integrated circuit • Silicon • Software design • Ecosystem • Integrated circuit • Parallel computing • Computer • Central processing unit • Autonomous car • Artificial intelligence • Expert system • Usability • Strategic Defense Initiative • Gallium arsenide • Integrated circuit • Strategic Defense Initiative • Computer network • DARPA • Know Nothing • Republican Party (United States) • DARPA • Regulation • Paradigm • Project management • Authority • Research • Bureaucracy • Social progress • Social group • Research • DARPA • Education • Management • Travel agency • DARPA • Swimming Upstream • Reality • Slush fund • Research and development • Technology • Failure • Risk • Research • System • Object (philosophy) • Weight • System • Chaos theory • ILLIAC IV • Bureaucracy • Lisp machine • Expert system • AI winter • Punctuated equilibrium • Life • Integrated circuit • Subsidy • Semiconductor fabrication plant • Autonomous car • Artificial intelligence • Expert system • Systems thinking • Thinking Machines Corporation • Connection Machine • Josephson effect • Superconductivity • Josephson effect • Gallium arsenide • Cray • Cray-3 • Gallium arsenide • Integrated circuit • Silicon • Serial digital interface • Gallium arsenide • Silicon • Silicon • Autonomous car • Artificial intelligence • System • Carnegie Mellon University • Real-time computing • Monochrome • Image compression • Computer vision • Algorithm • Software development • Security hacker • Regression analysis • Computer program • DARPA Grand Challenge • Electric current • Autonomous car • Crane (machine) • DARPA • Vehicle • Autonomous car • Unmanned aerial vehicle • Google • Waymo • Arizona • Waymo • DARPA Grand Challenge • Parallel computing • Thinking Machines Corporation • Parallel computing • Computer • Connection Machine • Parallel computing • Computer • Central processing unit • Computer programming • Computer performance • Parallel computing • Computer architecture • Efflorescence • DARPA • Parallel computing • Parallel computing • Parallel computing • MOSIS • MOSIS • Integrated circuit • Semiconductor fabrication plant • Integrated circuit • Semiconductor fabrication plant • Research • Very-large-scale integration • Integrated circuit • Central processing unit • Application-specific integrated circuit • Integrated circuit • Integrated circuit • Privacy • Godsend (Heroes) • Integrated circuit • BBN Technologies • Butterfly • Parallel computing • Hypercube • Cosmic Cube • California Institute of Technology • California Institute of Technology • Danny Hillis • Connection Machine • Massachusetts Institute of Technology • MOSIS • Integrated circuit • Graphics processing unit • Geometry pipelines • Reduced instruction set computing • MIPS instruction set • Systolic array • Integrated circuit • Computer • IWarp • MOSIS • Expert system • Causality • AI winter • Symbolic artificial intelligence • Logistics • Gulf War • United States Department of Defense • Logistics • Planning • DARPA • Artificial intelligence • DARPA • United States Department of Defense • Logistics • Artificial intelligence • Speech recognition • Hidden Markov model • N-gram • Speech recognition • Dragon NaturallySpeaking • Deep learning • Artificial intelligence • Deep learning • Expert system • Expert system • Mycin • Dendral • Prospecting • Xcon • Digital Equipment Corporation • Computer program • Integrated circuit • Parallel computing • Expert system • Scalability • Rule-based system • Autonomous car • Delaware • Bob Kahn • Architecture • Artificial intelligence • Reality • Theory • Pragmatism • Mycin • Expert system • Expert system • DARPA • Artificial intelligence • BASIC • Computer program • Information Processing Techniques Office • Artificial intelligence • Software • Speech recognition • Natural language • Computer vision • Autonomous car • Vehicle • Expert system • Computer program • Artificial intelligence • Expert system • Application software • Expert system • Application software • Computer program • Philosophy • Artificial intelligence • Expert system • Edward Feigenbaum • Dendral • IBM 7090 • Computer • Byte • Magnetic-core memory • FLOPS • FLOPS • Expert system • Artificial intelligence • Expert system • Computer hardware • Expert system • Rule of inference • Rule of inference • Rule of inference • Rule of inference • Real-time computing • Computer hardware • Expert system • Combinatorics • Symbol grounding problem • Knowledge engineering • Expert system • Reality • Data • Computer vision • Artificial intelligence • Computer performance • Software development process • Computer performance • Very-large-scale integration • Integrated circuit • Hertz • Central processing unit • Gallium arsenide • FLOPS • FLOPS • FLOPS • Expert system • IntelliCorp (software) • Expert system • Raqqa Is Being Slaughtered Silently • Epistemology • Knowledge organization • Raqqa Is Being Slaughtered Silently • Discipline (academia) • Software • Software • IntelliCorp (software) • Expert system • Discipline (academia) • Evaluation • IntelliCorp (software) • Business • Experience • Understanding • Artificial intelligence • United States Department of Commerce • Market (economics) • Artificial intelligence • Systems engineering • North America • Accounting • MIT Sloan School of Management • Science • Massachusetts Institute of Technology • Artificial intelligence • Software development • Smartphone • Data rate units • Technology • Artificial intelligence • Authority • Daniel Crevier • Expert system • Expert • Knowledge • Truth • Human • Expert • Edward Feigenbaum • Expert system • Data rate units • System • Camera • Software • Software bug • Computer • Machine • Proof of concept • Technology • Conservatism • League of German Girls • Observation • Flat Earth • Algorithm • Algorithm • Computer vision • Environmental science • Light • Sun • Road • Road surface • Road • Dirt road • Shoulder (road) • Dirt road • Vehicle • Tarmac • Reflection (physics) • Asphalt concrete • Colorado • Weather • Knowledge-based systems • Environmental science • Computer vision • Research and development • Autonomous robot • Automation • Manufacturing • Assembly line • Machine • Glimepiride • What You Know • What You Know • Pyramid • Artificial intelligence • DARPA • Artificial intelligence • System • Computational complexity theory • Intelligence • Machine • Computer data storage • Information retrieval • Judgement • Complexity • Pattern recognition • Artificial intelligence • Machine • Computer architecture • Computer program • Artificial intelligence • Artificial intelligence • Computer programming • Ubiquitous computing • Consumer electronics • AI winter • Connectionism • Deep learning • Paradigm • Deep learning • Perception • Data • Hierarchy • Categorization • Maximum likelihood estimation • Decision tree • Scientific method • Xgboost • Logical reasoning • Supercomputer • Deep learning • Scalability • Number • Computer performance • Data • CNN • Facebook • Google • Inductive transfer • Reinforcement learning • Operations research • AlphaGo • Ape • Impala • OpenAI • Defense of the Ancients • Autonomous car • Problem solving • Image segmentation • Language • Computer engineering • Economies of scale • Money • Research and development • Graphics processing unit • Geoffrey Hinton • Connectionism • Artificial neural network • Artificial intelligence • Steam locomotive • Steam locomotive • Train • Steam engine • The Best Intentions • DARPA • Technology demonstration • Technological determinism • Social science • Politics • Technology • Person • Structure and agency • Technology • Technology • Agency (philosophy) • Technology • Technology • Artificial intelligence • Supercomputer • Analog computer • UGM-27 Polaris • SM-65 Atlas • Ballistic missile • Planning • Concept • Thomas Hughes • Technology • System • Giovanni Dosi • Medical research • Map • Scientific modelling • Management • Function (mathematics) • Management • Trajectory • Variable (mathematics) • Analysis • Essence • Management • Medical research • Complexity • Technology • System • System • System • History • Basic research • Research • Technology • Japan • Computer • United States • United States • Pull technology • Expert system • Moore's law • Surfing •