Jack Vaughan

Old Big Data Today – Or the clarion of shiny new thingness

October 21, 2023 By Jack Vaughan

Hadoop Exits LLMs and Generative AI are the next steps forward for machine learning, the not-so-little engine that saved AI from the horror of technology irrelevance. Here, I note some similarities with today’s AI and yesterday’s Big Data – followed in a subsequent post by some observations from Andrew Ng, a machine learning pioneer, looking ahead to cost-effective use cases for the new tooling.

Similarity breeds comparison

A jaundiced view might hold that Genartive AI has taken over where Big Data left off. A rage a few years ago, Big Data fled from the scene. Is it anywhere now on a Gartner representation of the life of hype?

Big Data leaders, after they redefined recommendation engines and social media personalization, were often asked what Big Data was supposed to do next. The answer turned out to be “machine learning.” Flash forward to the present, and this has morphed into Large Language Models (LLMs) and prompt engineering.

There are plenty of differences between then and now. Let’s dwell on some similarities:

*As in the Big Data/Hadoop days of yesteryear, getting great gobs of custom data into the LLM is time consuming, labor intensive and error prone.

*The shiny new thingness may lure developers to chase the technology (which makes the resume sparkle) while short-changing the use case; that is, pursuing indefensible applications with a short and less-than-stellar commercial life spans.

*And, as with Big Data – and just about every innovation that has ever come about — what works as a prototype may fail to scale in production. As well, what worked for a small army of Google sysadmins may not work for you, or prove saleable either.

*The first tooling is raw, and development can become a trudge of semi-blind trial and error.

*There is a megaton bomb of hyperbole that explodes, followed by hemming-hawing, nitpicking and numb lethargy. See ‘Faded Love and Hadoop’.

These problems are familiar to innovators, but LLMs bring new classes of problems too. What some developers will find persistently annoying is a flakiness in interaction with the LLM. You can regularly prompt it with the same input, while getting different output. I asked Google Bard about this and the answer was: “Overall, whether or not prompt engineering is fun is up to you.”

Of course, a great effort is underway, and development teams will soon benefit from both the successes achieved and failures endured. Among the questions that should direct their efforts: Does the technology solve a widely-found problem of significant weight? In our next post, let’s find out what Andrew Ng says! – Jack Vaughan

THIS IS PART 1 OF 2. FOR PART 2, GO TO Use cases ultimately pave Generative AI’s path: Face it!

Nvidia and Cloudflare Deal; NYT on Human Feedback Loop

September 29, 2023 By Jack Vaughan

Cloudflare Powers Hyper-local AI inference with Nvidia – The triumph that is Nvidia these days can’t be overstated – although the wolves on Wall St. have sometimes tried. Still, Nvidia is a hardware company. Ok, let’s say Nvidia is still arguably a hardware company. Its chops are considerable, but essentially it’s all about the GPU.

Nvidia is ready to take on the very top ranks in computing. But, to do so, it needs more feet on the street. So, it is on the trail of such, as seen in a steady stream of alliances, partners and showcase customers.

That’s a backdrop to this week’s announcement that Cloudflare Inc will deploy Nvidia GPUs along with Nvidia Ethernet switch ASICs at the Internet’s edge. The purpose is to enable AI inferencing, which is the runtime task that follows AI model training.

“AI inference on a network is going to be the sweet spot for many businesses,” Matthew Prince, CEO and co-founder, Cloudflare, said in a company release concerning the Cloudflare/Nvidia deal. Cloudflare said NVIDIA GPUs will be available for inference tasks in over 100 cities (or hyper-localities) by the end of 2023, and “nearly everywhere Cloudflare’s network extends by the end of 2024.”

CloudFlare has found expansive use in Web application acceleration, and could help Nvidia in its efforts to capitalize on GPU technology’s use in the amber fields of next-wave generative AI applications.

With such alliances, all Nvidia has to do is keep punching out those GPUs – and development tools for model building.

*** *** *** ***

NYT on Human Feedback – The dirty little secret in the rise of machine learning was labeling. Labeling can be human-labor intensive, time-consuming and expensive. It harkens back to the days when ‘computer’ was a human job title, and filing index cards was a way of life.

Amazon’s Mechanical Turk – a crowdsourcing marketplace amusedly named after the 18^th Century chess-playing machine “automaton” that was actually powered by a chess master hidden inside the apparatus — is still a very common way to label machine learning data.

Labeling doesn’t go away as Generative AI happens. As the world delves into what Generative AI is, it turns out that human labelers are a pretty significant part.

That was borne out by some of the research I did in the summer for “LLMs, generative AI loom large for MLOps practices” for SDxCentral.com. Sources for the story also discussed how “reinforcement learning through human feedback” was needed for the Large Language Models underpinning Generative AI.

The cost of reinforcement learning, which makes sure things are working, is more than a small part of the sticker shock C-suite execs are experiencing with Generative AI.

Like everything, improvement may come to the process. Sources suggest retrieval augmented generation (RAG) is generally less labor intensive than data labeling. RAG retrieves info from an external database and provides it to the model “as-is.”

RAG is meant to address one of ChatGPT’s and Generative AI’s most disturbing traits: It can make a false claim with amazingly smug confidence. Humans have to keep a check on it.

But the build out of RAG requires some super smarts. As we have come to see, many of today’s headline AI acquisitions are as much about gaining personnel with advanced know-how as they are about gaining some software code or tool. This type of smarts comes at a high price, just as the world’s most powerful GPUs do.

This train of thought is impelled by a recent piece by Cade Metz for the New York Times. “The Secret Ingredient of ChatGPT Is Human Advice” considers reinforcement learning from human feedback, which is said to drive much of the development of artificial intelligence across the industry. “More than any other advance, it has transformed chatbots from a curiosity into mainstream technology,” Metz writes of human feedback aligned with Generative AI.

Metz’s capable piece discusses the role that expert humans are playing in making Generative AI work, with some implication that we should get non-experts involved, too. In response to the story, one Twitter wag suggested that Expert Systems are making a comeback. If so, guess we will have to make-do with more expert humans until “The Real AI” comes along! – Jack Vaughan

Oppenheimer at Harvard

August 28, 2023 By Jack Vaughan

Looking back today at “American Prometheus” – the book upon which this summer’s widely noted Oppenheimer film is based. Recalling I fashioned a mini tour/book review covering Oppenheimer’s Cambridge as I originally read Kai Bird’s and Martin J Sherwin’s 2005 book.

Yes, it’s still summer, so, I am sharing it here! With some editing. Editing is ever with us.

Oppenheimer’s tragedy truly is an American tragedy, and it is too little known. Worth noting: The creation of the Atom bomb is the ultimate tale of science and technology for bad and good. It redefined life for the generations that followed.

At the start, in his college days, the leader of the team of scientists that created the first A-bomb was a delicate mesh of scientist and poet. In the end, he was a heart-broken figure, done in by his lethal invention, and his soft-spot for arty friends who, steeped in the ethos of their times, promoted liberal and communist causes.

Oppenheimer did not have his roots in Boston, but he did pass through here, like so many others. It was in the air in Boston/Cambridge as much as anywhere: the mechanical, fluid and electronic sounds of a military-industrial complex based on the scientific breakthroughs and technical innovations of the mid-20th Century.

The son of a wealthy West Side New York clothier, Oppenheimer refused the fellowship Harvard offered him when he entered the university in 1922.Oppenheimer began his Harvard days as a chemistry student.

The chemist had been the epitome of the scientist – but that was changing just as he was entering college. He was not looking for a lucrative career. Oppenheimer worried his future would be that of an industrial chemist, testing toothpastes. But physics was uncovering wonder after wonder. He read prodigiously. His tenure at Harvard preceded construction of the Mallinckrodt Lab, so his chemistry studies were like in the basement of University Hall in Harvard Yard.

He looked to take as many advanced physics classes as he possibly could. He didn’t have the basic courses. But he read five science books a week. And he was picking physics texts unknown to the typical student. American Prometheus authors report that one physics professor, reviewing Oppy’s petition [replete with a list of texts he’d read] to take graduate classes, remarked: “Obviously, if he says he’s read these books, he’s a liar, but he should get a PH.D. for knowing their titles.” He was brash and precocious.

The famous figures of science and math [in which Oppenheimer thought himself deficient] passed through Harvard’s gates. Oppenheimer attended lectures by Whitehead and Bohr. Still, he nurtured a love for literature. He was a great polymath. He read The Waste Land, and wrote poetry of sadness and loneliness. He edited a school literary journal known as The Gad-Fly [under the auspices of the Liberal Club at 66 Winthrop St]. After Harvard, he discovered Proust.

He kept much to himself. Had but a few friends. “His diet often consisted of little more than chocolate, beer and artichokes. Lunch was often just a ‘black and tan’ – a piece of toast slathered with peanut butter and topped with chocolate syrup.” When he lived in Cambridge, like so many other great scientific thinkers in so many places, he took to long walks. He lived for a while at 60 Mount Auburn Street.

A mentor at Harvard could well have been future Nobelist Percy Bridgman. Oppenheimer admired a strain in this physicist noted for his studies of materials at high temperatures and pressures and his openness to imaginatively approaching the philosophy of science.

“Oppy’s” outsider status at Harvard could be laid to his sensitivity, but just as significant if not more so was his Jewish heritage. He came to the school at a time when its head was considering a quota system to reduce the growing number of Jewish entrants. Surely, the straight road to Harvard success was not fully open to him, even if that is what he’d desired. He was offered a graduate teaching position but turned it down.

Oppenheimer graduated from Harvard in three years. He wrote a friend: “Even in the last stages of senile aphasia I will not say that education, in an academic sense, was only secondary when I was at college. I plough through about five or ten big scientific books a week, and pretend to research. Even if, in the end, I’ve got to satisfy myself with testing toothpaste, I don’t want to know it till it has happened.”

From Harvard he went on to study in Gottingen in Germany, Thomson’s famed Cavendish Lab, CalTech, Berkeley, and, after the War, Princeton. Surely the Jewish Ethical Culture School he attended as a lad, which had a summer school adjunct in New Mexico, and the mesas of New Mexico, where he placed the crucial workings of the Manhattan Project, was most formative.

He and his friends skipped the Harvard commencement to drink lab alcohol in a dorm room. He had one drink and retired.

Other books on this topic worth noting include “Now It Can Be Told: The Story of the Manhattan Project” by Leslie Groves and, most particularly, “The Making of the Atomic Bomb” by Richard Rhodes.

Large models cooling

July 16, 2023 By Jack Vaughan

Molecular Sampler – The week just passed brought news of a combined MIT/IBM team suggesting a less compute-intensive route to AI-driven materials science. The group said it used a subset of a larger data pool to predict molecular properties. The use case has gained attention in both ML and quantum computing circles – where a drive to speed material development and drug discovery could lead to cost savings, better health outcomes and yet-to-be-imagined innovations.

Like most AI advances of late, the work gains inspiration from NLP techniques. The methods used to predict molecular properties tap into “grammar rule production,” which by now has a long lineage. There are 1 followed by 100 zeros of ways to combine atoms, which is to say grammar rule production for materials is a big job, and that style of computation is daunting and may not be immediately exploited.

Because the grammar rule production process is too difficult even for large-scale modern computing, the research team put its efforts into preparatory paring of data, a short-cut technique that goes back to the beginning of time. Some notes from the MIT information office:

“In language theory, one generates words, sentences, or paragraphs based on a set of grammar rules. You can think of a molecular grammar the same way. It is a set of production rules that dictate how to generate molecules or polymers by combining atoms and substructures.

“The MIT team created a machine-learning system that automatically learns the “language” of molecules — what is known as a molecular grammar — using only a small, domain-specific dataset. It uses this grammar to construct viable molecules and predict their properties.

As I read it, the MIT-IBM team have come up with a simulation sampler approach. The ‘smaller corpus’ approach is much explored these days as implementers try to take some of the ‘Large’ out of Large Language Models. One may always wonder if such synthesis ultimately can gain true results. I trust an army better qualified will dig into the details of the sampling technique used here over the weekend.

*** *** *** ***

ChatGPT damper – The signs continue to point to a welcome damper on ChatGPT (AI) boosterism – now that each deadline journalist in the world has asked the bot to write up a global heatwave story or Met red-carpet opening story in the style of Hemingway or Mailer or another.

Among the signals of cooling:

*There’s investor Adam Coons. The Chief Portfolio Manager at Winthrop Capital Management said AI on Wall Street will continue but then fade as a hot button.

For a stock market that has endorsed Mega cap growth stocks for their ChatGPT chops, it has become a FOMO trade. “In the near term that trade will continue to work. There’s enough investors still willing to chase that narrative,” he told Reuters. On the other hand, Coons and Winthrop Capital are cautious on it, as the hyperbole has obscured the true potential. He said:

“We are moving away from the AI narrative. We think that there’s still too much to be shown. Particularly [with] Nvidia, we think the growth figures that are being priced into that stock just don’t make sense. And there’s just really not enough proof statements from a monetization standpoint behind what AI can really do within the tech sector.”

*There’s Pincecone COO Bob Wiederhold speaking at VB Transform – Pinecone is in the forefront of up-surging Vector Databases that appear to have a special place in formative LLM applications. Still, Wiederhold sees need for a realistic approach to commercializing the phenomenon.

His comments as described by Matt Marshall on VentureBeat:

Wiederhold acknowledged that the generative AI market is going through a hype cycle and that it will soon hit a “trough of reality” as developers move on from prototyping applications that have no ability to go into production. He said this is a good thing for the industry as it will separate the real production-ready, impactful applications from the “fluff” of prototyped applications that currently make up the majority of experimentation.

*There’s Rob Hirschfeld commentary “Are LLMs Leading DevOps Into a Tech Debt Trap?” on DevOps.com – Hirschfeld is concerned with the technical debt generative AI LLMs could heap onto today’s DevOps crews, which are already awash in quickly built, inefficiently engineered Patch Hell. Code generation is often the second-cited LLM use case (after direct-mail and press releases).

Figuring out an original developer’s intent has always been the cursed task of those who maintain our innovations – but LLM’s has the potential to bring on a new mass of mute code fragments contrived from LLM web whacks. Things could go from worse to worser, all the rosy pictures of no-code LLM case studies notwithstanding. Hirschfeld, who is CEO at infrastructure consultancy RackN, writes:

Since they are unbounded, they will cheerfully use the knowledge to churn out terabytes of functionally correct but bespoke code…It’s easy to imagine a future where LLMs crank out DevOps scripts 10x faster. We will be supercharging our ability to produce complex, untested automation at a pace never seen before! On the surface, this seems like a huge productivity boost because we (mistakenly) see our job as focused on producing scripts instead of working systems…But we already have an overabundance of duplicated and difficult-to-support automation. This ever-expanding surface of technical debt is one of the major reasons that ITOps teams are mired in complexity and are forever underwater.

News is about sudden change. Generative AI, ChatGPT and LLMs brought that in spades. It is all a breathless rush right now, and analysis can wait. But, the limelight on generative AI is slightly dimmed. That is good because what is real will be easier to see. Importantly, reporters and others are now asking those probing follow-up questions like: “How much how soon?”

It’s almost enough to draw an old-time skeptical examiner into the fray. – Jack Vaughan

Adage

“Future users of large data banks must be protected from having to know how the data is organized in the machine….” E.F. Codd in A Relational Model of Data for Large Shared Data Banks

Noted Passing: Henry Petroski, technology historian who studied failures in engineering

July 4, 2023 By Jack Vaughan

Henry Petroski’s early focus was on the ideas and experience of civil engineering, but surely he became influential to all types of engineers over a long public career. He died June 14 in Durham, N.C. at 81.

As a Duke University professor, Petroski looked closely at the art and science of engineering, and I think he came up with some very meaningful conclusions. Studying the history of failures in rockets, buildings, bridges and the like was his special pursuit. His books include “To Engineer is Human,” “The Evolution of Useful Things,” and “The Pencil.”

My take-away from seeing him lecture and appear on TV, and from reading his books and Scientific American articles was this:

Styles of engineering come into use, formulated by individuals who learn first principles from (often painful) failures. Then the style becomes taken for granted. Successive engineer generations push the barriers of the basic style, and mistakes are made that are sometimes deadly.

Among the object lessons in engineering failures Petroski would often cite were the failure of the elevated skywalks at a Kansa City Hyatt Regency hotel, the collapse of the Tacoma Narrows Bridge in Washington State, the collapse of the Twin Towers of the World Trade Center in New York as the result of deliberate terrorist air crashes, and the loss of two NASA space shuttles.

His writing could take the form of excessively fine-grained pedanticism – I couldn’t forge through “The Pencil” history. Still, thanks to Petroski I did learn that its history was much about finding the right combination of graphite and clay – and I continue to study the pencils I sharpen with particular attention.

Interesting to learn in the New York Times obituary of Petroski’s childhood recollection: Making towers and bridges out of pantry cans and boxes. Guess he caught the analytical bug early!

I had the opportunity to interview Petroski very briefly after he spoke to a hall of software engineers at the OOPSLA Conference in Tampa in 2001. This was just a few weeks after 9-11, when the airways had just reopened, and a pretty tense time for travel. I recall that he was open to our questions. The assembled object-oriented programming crowd was enthralled, and the questions in the scrum after his speech were questions without clear answers at that time. Engineers ask questions, and wonder, especially about catastrophic events.

I don’t find raw notes on that long-ago interview, but mark here that, when the new World Trade Center was built, there was far more use of concrete that is less pervious to conflagration.

I did find a write-up I did for Application Development Trends on Petroski at OOPSLA. I will include a bit of that and a link. I tried to draw engineering principles from his studies that might apply to software architecture issues of the day, though that bit sounds weak. There were things to be afeared of in the burgeoning architecture of web services – but not the ones I imagined/predicted in 2001.

I can’t read this piece without thinking of my indebtedness to the great crew at ADT, led by the late Mike Bucken, who gave me so many opportunities to damn the torpedoes and get something interesting out there to our readers. The list of editors that would let me run the headline “There’s No Success Like Failure” is pretty short. – Jack Vaughan

From “There’s No Success Like Failure” on adtmag.com

If you interviewed a system designer who admitted to his or her list of failures in design, you would probably begin plotting ways to end the meeting and get to the next job candidate, wouldn’t you? You probably wouldn’t consider hiring the person.

An obsession with failure could be a problem, but a modicum of fear of failure—a respect for the phenomena that can undo a design—may be healthy in a designer or developer. Maybe you should hear out a job candidate who is capable of analytically discussing a failed project or two.

If you are wary of this advice, I don’t blame you, but you might be more inclined to follow it if you were to hear from Henry Petroski. This Duke University professor of history and civil engineering spoke at last fall’s OOPSLA Conference in Tampa, Fla. In a kick-off keynote address, Petroski discussed success and failure in design throughout history, concluding that there is a unique interrelationship between the two.

“All materials are flexible if slender enough,” asserts Petroski, who noted that designers in bridge design tend to go toward the sleek and aesthetic as they get further away in time from the first principles. The Tacoma Narrows Bridge breakup of 1940 stands—well it doesn’t exactly stand, it fell—as a testament to Petroski’s assertion.

Scholar Petroski took his OOPSLA audience back to ancient Rome to make his point. He discussed Vitruvius. That author of key architecture texts went to great length to consider failures of stone-and-axle variations (that’s how they moved pillars) of the day. Vitruvius suggests that following a successful design to an ultimate conclusion is not the way to proceed.

The big ships, many failed, of the era of European exploration came in for consideration. “Ships made of wood were scaled up, every dimension doubled,” said Petroski. “At a certain size, they would break in two.”

Petroski noted that nature does not design this way (to blindly scale up); the leg bones of large and small animals are not exactly proportional. Cable stay bridges are now the rage in bridge design, noted Petroski. Their design is becoming increasingly ambitious, he added, and some failure may be in store.

“Failures in bridge style seem to repeat in 30-year [intervals],” said Petroski. “Engineers are ambitious. Everyone wants to build the largest bridge in the world. Cable stay bridges are exhibiting problems.”

Whenever the envelope is pushed, he indicated, “there is opportunity for phenomena to manifest that were not obvious in the small.”

Failure could be generational, said Petroski; when engineers start to work with new design paradigms they take great care. Then as things get familiar, they forget about the fundamentals and they push, sometimes beyond the real design limits.

RELATED
Petrowski speaks – YouTube
Obituary – New York Times
Read the rest of “There’s No Success Like Failure” – ADTmag.com Jan 2002.