Truth Goggles: the Enlightenment Dream of Automated Fact Checking

polygraph_history_5comment submitted to TechCrunch article  “True Or False? Automatic Fact-Checking Coming To The Web – Complications Follow” by Devin Coldewey, 11/28/2011.

> “the layering of reference and context onto the information you read”.

This exists generally in the well-tested paradigm of citation and reputation, as it functions for example in peer-reviewed literature.  It seems that Daniel Schultz‘s “truth goggles” could be seen as a particular version of this, in which the annotation of the base layer is automated rather than authored, and the citation framework is specifically the fact-check databases Politifact and NewsTrust (for now).

If the citation framework were generalized to allow many annotators and reference sources, then I believe we’d be close to the http://Hypothes.is project’s model.

Pure algorithmic assessment of “fact” and reasoning and valid judgment is at minimum an extremely complex, long-term problem, and is quite possibly unsolvable in ways. In a human, distributed trust system such as present peer review, we trust that communicators are incented by reputation to uphold agreed-upon standards of evidence and judgment. Writers, journal editors, research funders, research institutions, etc. collectively build a system which, ideally, systematically rewards adherence to the shared objective standards and ethics. In this model, we don’t necessarily try or have to understand how each link in the system performs its complex evaluations; we rely on the fact that they are well incented to do it correctly, and are sufficiently cross-monitored to be trusted.

Regardless of peer-review mechanism, we have thorny questions of what constitutes “true,” or “factual,” and how people are affected by information. Coldewy says “facts are facts and fiction is fiction,” and I keep hearing versions of this in discussion of fact-checking systems and civic media; but to me it is a rather vast and optimistic supposition. What theories of language, of propaganda, of politics, of media effects, of cognitive science, support the view that people get truthful, and rationally deliberate together, if we just put more “factually” true “information” out there? It seems based more on traditional faith in Enlightenment rather than on hard evidence of how communication works.

I would like to see more, and am doing some work on, media analytics / environments based on empirical evidence and cognitive science models — what actually causes what effect on readers.

anyway, I think Schultz’s work is interesting and valuable, especially the distributed / API aspect, and I’m glad to see it covered and see the rapidly developing conversation around these issues.

Tim McCormick

http://tjm.org

follow me on Twitter: @mccormicktim

Image credit:  TechCrunch

Brainpickings: more about curation vs parasitism

further comment posted on Maria Popova’s post on Brainpickings.org: “Free Ride: Digital Parasites and the Fight for the Business of Culture”:

“Maria, I agree with you fully, there are dubious practices out there regarding online content, which may endanger the creation and curation of the culture that we want.   Also, there are interesting new practices and protocols emerging for curating, etc., and I would like to contribute to fostering and protecting such wonderful things as Brainpickings/Brainpicker.

However, I think that distinguishing good from parasitic/unethical/illegal is quite subtle and complex, as shown by the jurisprudence over aggregation, and the Romanesko case.  It is to help work out these issues that I offer observations about your vs. Huffington Post etc.’s practices.


As a side note, personally I am especially interested in issues of algorithmic curation — recommender systems, design for serendipity, applying cognitive science to reading environments, etc.  In many cases, such systems aggregate and operate upon creative or curatorial work done by many people, so they may raise tricky questions of who and how you’d credit for the discovery.

Eli Pariser (of “The Filter Bubble”) recently at MassHumanities 8 made an argument that we need some things to be human-curated because machines can’t do the type of serendipity and discovery we need.   I’m not convinced one should essentialize and separate these two dimensions:  humans can be mechanical — witness most newspapers and newspaper articles — and algorithms can deliver serendipity and surprise.  What we really have, now and increasingly, is cyborg curation, i.e. blended human and algorithmic work.  Consider a search engine:  algorithmic, but based on large-scale harvesting of human curatorial intentionality in the form of links and content.  Tools like Google Reader and Twitter dramatically expand my ability to receive human-curated and created work from hundreds of diverse sources, efficiently and egalitarianly.

notice for 1968 presentation by Douglas Engelbart on "augmenting human intellect". Referred to as the "mother of all demos"

The real question is how to build systems that serve our needs (including the incenting of creation and curation, not just the end-user experience).  I think one good way to frame the goal is that we are  “augmenting human intellect”, as Engelbart put it in 1962.

As I see it, one reason there will be a large algorithmic component to future “curation” is that, from an end-user’s point of view, relevance and serendipity and value are individual, thus very much enhanceable by personalization.  Economically, human curators can’t be doing personalized curation for every end user, so machines will play a big role there.

. . . .

Anyway, back to the issue of ethical/legal distinctions between curation and parasitism.  I read the helpful paper and article by Kimberly Isbell about aggregation legal issues and best practices.   Applying this to your discussion of Free Ride and parasitism, correct me if I’m wrong but it seems you focus on two main means of distinction:

1) crediting
2) commercial use

So to take 1), crediting:

> without crediting sources of discovery…it’s anywhere between
> unethical and downright illegal

I just observe that the overwhelming norm, across media, is that people don’t credit their immediate discovery source.  Some books may have thorough acknowledgements, academic work may cite works, workshops or conversations which led to ideas, but these seem to be exceptions.  If I look at most articles in magazines online or off, or blogs, etc., I don’t think it’s common for each element’s source to be credited.  On Twitter, which is perhaps the emerging super-discovery platform, there’s barely room to credit, and the difficulties are suggested by the fact that of the 85 most recent @brainpicker tweets I looked at yesterday at noon, I counted only 5 with in-tweet credit (RT, via, HT, etc.).

I think there are many factors that incline people to not identify discovery sources, not just lack of ethics — it may be considered irrelevant, edited out for space, thought to be undesirably revealing of sources or journalistic methods, the discovery may have been algorithmic and not clearly creditable, etc.

Legally, I don’t see strong precedent for requiring disclosure of sources:  as far as I can tell, the law in this area, such as around copyright and hot news, concerns reuse of material, and doesn’t address sources of discovery.  Topics and facts are not copyrightable, and practically, it may be very difficult to prove where a media source discovered any given item.  You suggest that HuffPo’s discovery of the AAS item from a source other than you is “statistically” unlikely, but that sounds like it would unfortunately be a difficult case to make, legally or otherwise.  Would I want arbitrary sources out there judging my blog posts or tweets as unethical or illegal based on statistical likelihood of my topic coming from them?  That sounds like exerting ownership of ideas, which has been explicitly rejected by our courts.

As far as how credit may be given, I’d suggest that explicit credit in the text of the piece is much better than implicit credit in the form of a link.  What’s on the other end of a link may disappear, change, be offline for a particular reader at any particular moment, or effectively not be discovered/admitted as evidence in any legal test.  We also can predict that users may frequently read without following out-links;  so, for example, in the case of the Brainpicking article that linked to AAS vs. the Huffpo article that explicitly named the AAS exhibit, I would guess that HuffPo article readers were far more, say 100x as likely to learn about the exhibit.  I know you do usually name creators/sources in Brainpickings, of course.

point 2), commercial use:
I infer that you make a distinction between non-commercial Brainpicker/Brainpickings and say, commercial HuffPo because of the “commercial” test in the fair use exclusion to copyright.   From working for some years at a not-for-profit that had commercial operations, I’ve learned that the delineation of “commercial use” can be quite complex.  For example:

> The Twitter example I find irrelevant – the curation I
> do there isn’t benefitting me in any way
>
> Twitter is not “monetizable” in the way HuffPo..

There are many ways that Twitter posting is both directly and indirectly monetizable.  For example, you can do what a number of feeds already do, and have “sponsored posts”, disclosed or not.  There are many marketers who pay people for favorable tweeting — along with favorable reviews, blog posts, and comments.  Whether you do this or not, it means Twitter is not prima facie non-monetizable.  Twitter links can earn associate fees — as you do with the many Brainpicker links leading to Amazon links, which have a Brainpicker associates tag that lets you earn commission on sales.  Your twitter links also often lead to Brainpickings, on which you solicit donations.

More broadly, having a large following on Twitter is a clear asset in many realms, such as applying for any media- or social-networking related job.  You noted that “followers…[are] a different kind of currency.” If you get a social-media fellowship at MIT or Harvard’s Nieman Foundation, or get writing/curating work at the Atlantic, would you really say that having 100k + followers had nothing to do with it?

My experience is that unless you are a registered not-for-profit organization, and your activity falls clearly within the not-for-profit mission of that organization, then claiming non-commercial use is not clearcut.  It doesn’t necessarily matter if you are, de facto, not making money, it matters what your legal status is and whether the activity is in keeping with that status grant.  Practically, individuals or any party not affirmatively classified as not-for-profit, can often encounter difficulties claiming fair use exemption this way.

You may point out that you are providing a “public service,” and give your curation for free.  But any commercial Web site might also say it performs a public service by offering freely accessible content.  Ad monetization can and frequently is avoided by readers’ use of, say, AdBlocker or, like you, Google Reader, which sites like HuffPo don’t prevent me from doing.

Anyway, I thank you again for the cabinet of wonders that is Brainpickings/picker, and hope that my ruminations may be of some help.  I’d like to keep in conversation as I work on my own discovery-tool / curation projects, and perhaps publish some findings this year.

Free Ride? Creating vs. Curating vs. Aggregating

This was a comment I wrote to Maria Popova, curator of the popular Brain Pickings blog / Twitter feed, on her article “Free Ride: Digital Parasites and the Fight for the Business of Culture” (November 16, 2011).   That article reviewed Robert Levine’s “Free Ride:  How Digital Parasites are Destroying the Culture Business, and How the Culture Business Can Fight Back.”

Maria, I appreciate what you do and often read interesting items pointed to by your Twitter feed.

However, I am having difficulty following how you can sharply distinguish between creators and “aggregators”.  To me, what you do is primarily aggregation — you curate and point me to other content — and it has value.  Barely ever do I read something that you fully “created”, say wrote word for word without referencing or summarizing anything else;  and of course, hardly anything is created ex nihilo like that.

You give Huffington Post’s item about the Victorian map of woman’s heart, which you say was lifted from your article, as an example of “parasite” practices, whereby  “editorial and curatorial merit are being hijacked…not benefitting the original creator or curator in any way.”

So, I compared your piece, HuffPo’s, and the original source, and come to a different conclusion.  Your piece features a map exhibited by the American Antiquarian Society’s current exhibit, “Beauty, Virtue & Vice.”  The AAS is the currently relevant “creator”, by having collected the materials, mounted the show, and put online the map images.   However, your article text doesn’t mention the AAS or the exhibit, merely linking in one place to an AAS web page from which one might possibly infer and navigate to info about the show.

By comparison, HuffPo’s article explicitly credits the exhibition, gives the AAS’s full name and show name, and fully encourages readers to view the show:  “check out the whole exhibit here — it’s worth it”.   You say this was HuffPo “reposting a reworded article,” but as I compare them, the text is entirely different, and it’s not self-evident that they took the item from you.  Presumably a lot of other people saw and passed around references to the exhibition, it’s at least possible they had another source.

I would hardly say that, looking around the internet or at media in general, it’s pervasive practice for people to cite exactly how they first came across sources — this would often be impractical, could cause legal issues, expose working methods, etc.  Few of your Twitter posts, or those of Read Write Web or any other major source, for example, say how the cited item was found.  HuffPo may have found the subject from your blog / Twitter feed, but they obviously went and and looked at it themselves and wrote it up;  didn’t they just find it via you, just as you find things via other sources all the time?

I don’t mean to explain or defend Huffington Post’s practices in general, I just use this as an illustrative case.  My point is there doesn’t seems to be a huge distinction between the “creator” curating you say you do, which is often just a pointer to another source, and the “parasite” aggregation of the HuffPo example you cite.

I’d say it’s theoretically and practically very difficult to clearly distinguish between creating, curating, assembling, and “aggregating.”  Authors assemble, editors create, filmmakers “direct”.  The “assemblage” that I, Twitter, Google Reader, etc. do in pulling together my daily online reading has great cultural value to me.   I see a big continuum of combinatorial activities, a bounty which we can both use and add to;  not the sharply delinated creators/parasites you suggest.  There may be a case for the “parasite” view, but I don’t see that you’ve made it here.

Anyway, thanks for Brainpickings, I’m a supporter.

Steve Jobs’ Path: from Marconi to the iPhone

More on the extraordinary genius loci of Silicon Valley.

The site named “Silicon Valley’s Birthplace” is the HP Garage in Palo Alto, where David Packard and William Hewlett formed Hewlett-Packard in 1938, and developed their first products.

the "birthplace of Silicon Valley," Hewlett & Packard's garage at 367 Addison Ave, Palo Alto

"The birthplace of electronics" 813 Emerson Ave., Palo Alto

I was fascinated to learn, via some online geo-roaming, that this is only three blocks away from the site of the Electronics Research Laboratory where, around 1911, the “father of radio,” Lee De Forest invented the triode vacuum tube and the amplifier, laying the foundation of the electronic era.

Silicon Valley is often considered to have been essentially “founded” by Frederick Terman, the Dean of the Stanford University’s School of Engineering through whom Hewlett and Packard met in 1935, who attracted large military research funding to Stanford, and championed the process of research commercialization.

However, recent scholarship such as Timothy Sturgeon’s “How Silicon Valley Came to Be” in Martin Kenney, ed., Understanding Silicon Valley (2000) has revealed the long-overlooked earlier era of the Valley’s technology ecosystem, starting particularly with the founding of the Federal Telegraph Corporation in Palo Alto in 1909.  Sturgeon notes that all the features later associated with Silicon Valley were present on a small scale even then — military research funding, university involvement, ferocious patent wars, international industrial competition and industrial policy, etc.  Prefiguring Terman, in 1909 the president of Stanford at that time, David Starr Jordan, had backed the startup that would become Federal Telegraph.

Into this scene appeared one of history’s greatest inventors and Edison-esque prodigous tinkerers, Lee de Forest, in 1910.  After receiving a B.S. and PhD from Yale University’s Sheffield Scientific School, de Forest came to Chicago, and worked as a translator of science articles for popular magazines.  However, as described in Steven Johnson’s recent, wonderful Where Good Ideas Come From: The Natural History of Innovation, “de Forest’s true passion lay in the cabinet of wonders he had assembled in his bedroom on Washington Boulevard:  batteries, spark gap transmitters, electrodes — all the building blocks that would be assembled in the coming decade to invent the age of electronics.”

Lee De Forest (1873 - 1961), the "father of radio"

De Forest came to San Francisco and began working for the Federal Telegraph Company of Palo Alto. At the time, the U.S. Government was anxious to develop new ship-to-shore radio signalling technology, and not lose out to the technology leaders such as Marconi of Britain.  FTC joined the race to win the lucrative Navy contracts on offer.

In FTC’s Electronics Research Laboratory, at Channing Ave and Emerson St. in Palo Alto, the explorations de Forest had begun in Chicago eventually yielded the triode, or three-element vacuum tube, and the amplifier.  Although De Forest did not initially fully understand the science of how these worked, or anticipate their more valuable applications, the inventions revolutionalized radio technology and laid the foundation of the transistor and all of modern electronics.

The connection of De Forest and Hewlett-Packard to nearly the same site in Palo Alto is not remarked on by Johnson in that book.  However, the close coincidence is powerfully suggestive of one of Johnson’s main themes:  that certain locations are extraordinarily fertile innovation loci, due to dense interconnection of creative elements.

One possible factor in the extraordinary coincidence is that both Lee de Forest’s laboratory and Hewlett & Packard’s garage were located on the West edge of Professorville, the residential area of Palo Alto where much of the Stanford faculty and administrators have traditionally lived.  That represented one of the greatest concentrations of technical and entrepreneurial talent and capital to be found in the country, and de Forest & H-P’s sites were directly on the path between those people and the Palo Alto train station, downtown Palo Alto, and the main approach to Stanford.  If they had scoured the country for a better precise place in which to have easy, frequent, and serendipitous contact with highly suitable collaborators and backers, they might not have done better.

Palo Alto, showing locations of de Forest and Hewlett & Packard's original labs

Palo Alto, showing locations of de Forest and Hewlett & Packard's original labs, in between "Professorville" and downtown / Stanford / train station

There’s a core concept in economic development theory, for how an initial event such as the establishment of QWERTY keyboards or the siting of a factory constrains the pattern of future activity:  path dependence. In the case of de Forest and Hewlett-Packard’s neighborhood laboratories, there may have been also a literal path dependence — the pathway of Palo Alto’s and Stanford’s elite walking home past their door.

Seventy years after de Forest, another restless genius, a local kid named Steve Jobs, began in junior high-school to tinker with electronics parts scavenged from school and from the plentiful electronic spare-parts outlets in the area.  He went to meetings of the Hewlett-Packard Explorers Club, where company scientists gave lectures, at the new headquarters building a short ways from the original HP Garage, and at one point he called the home of Bill Hewlett himself to ask for some needed parts. Hewlett gave him the parts, and also a summer job on an an assembly-line building HP frequency counters.

Jobs later worked weekends at electronic-parts outlet Halted in Sunnyvale, and frequented the king of the electronics parts warehouses, Haltek in Mountain View, near where Google is located today.  Haltek’s vast holdings extended even to vintage vacuum tubes of the De Forest era.  It was possibly the world’s greatest electronics tool kit, and was free for anyone to walk into and hang around as long as they wanted.

Just like de Forest, Hewlett and Packard, and countless others innovators who came to this place, Jobs was like the biblical seed sown on rich soil.  When Jobs moved to Mountain View at age five, he landed directly on arguably the century’s most fertile place for technological innovation, a place of uniquely dense recombination and interwoven pathways.  A hundred years after Marconi, once again a revolutionary radio technology, this time the smart phone, emanated from the Valley, led by Jobs’ iPhone.  From the same ground walked by de Forest in 1910, genius had soared.  This year, a funeral service at Stanford’s Memorial Hall, Jobs was laid to rest, just a short walk away from de Forest’s Emerson Ave.

Hill Towns of Silicon Valley: the Citadel

geography of power:  I love the visual of this SV hill town.  If you follow Sand Hill Road, past the world’s top tech venture-capital firms, at the end it winds up here, crossing over the outer-ring golf-course defenses, through the condo ring, past the inner ring of VC compounds, and finally at the center, Harvard Business School.  Click on image to enlarge, or view in Google Maps.

Sand Hill Road, Menlo Park