A few summers ago I worked at Connexions. I was put in charge of developing a “Search and Discoverability” system for the repository of modules; i.e., a way for people to discover existing content. We had a brute force keyword search but wanted something more sophisticated.
One option was to maintain a traditional classification system, similar to what biologists use to put living things into groups and subgroups. But who would pick the categories? A bunch of computer programmers? In such a system, whoever picks the categories, no matter how carefully, will inescapably introduce third-party structure and commentary, precisely the sort of thing from which a project like ours wanted to completely stay away.
If we made some sort of “Lens” (filters and/or sorting) system, even one by which users could develop and publish their own lenses. Users would likely become too dependent on obvious/popular lenses, creating a sub-class (almost a pun, sort of intended) of tragically undiscovered modules.
As a result of these discussions, at Connexions we would say “classification is censorship”. Abstracting that, one could argue that all non-”author”-specified metadata is censorship; a 3rd party is telling future viewers how certain content should be perceived and used (and effectively restricting access to other content).
Will mentioned this in his blog the other day. He talks about Google’s view (Lens…) of the web as the metadata in question. The way that Google choses to sort its search results has a huge effect on how people perceive information on the Internet. Google’s indexing and sorting algorithms create an effective metadata layer between its users and the information they seek. Google’s users are presented with whatever Google decides they are looking for. Google’s algorithms define our view into the web and, arguably, our perception of reality.
Granted, most of the searches in Google are a pretty raw reflection of keyword population and link relevance in the web, and that’s pretty much Google’s stated goal. However, this is a classic case of tyranny of the plurality: a tool/policy used by almost everyone is designed to only serve the needs of the plurality. As long as the needs of the plurality are met, the needs of all other subsets of the population essentially do not even have to be considered. This is a Bad Thing.
Here’s the part of my convention-questioning essay where I lay out 3 solid examples, using concepts that you can relate to, all arranged in contexts that shock you and make you think outside the box. Unfortunately, I can’t think of any good examples, because I am in the plurality, and Google has never let me down. Even when I tried to think outside the box to come up with an example, Google laughed in my face. I was thinking one example would be “most people who search for ‘dvd player’ are probably looking to buy one, but what if I wanted to know how one works? I would probably have to do a more complicated search”. But then I did a search for “dvd player” and on the first page was “How DVDs Work”. So for this search all the information is accessible…
…or is it? By definition, it’s impossible for me to think of a good example of a search that someone who is not like me might want to do. Furthermore, it’s impossible to determine the quality of our search and discoverability tools when 1.We only have one set of such tools, and 2.We don’t know the internal policies/methodologies of these tools. Google meets both of these criteria.
Let’s reconsider something I said above: “Granted, most of the searches in Google are a pretty raw reflection of keyword population and link relevance in the web, and that’s pretty much Google’s stated goal.” By that I meant, “so what if .01% of users searching on keyword ‘lama’ want 50 links to videos of lamas wearing sweaters? Google doesn’t claim to do that, it claims to return what is popularly requested when other users search on that keyword, and it satisfies that”…
…or does it? Even if Google outwardly acknowledges having plurality-centric policies (which we have now determined are Bad), there is no way to know if that is what they are actually achieving.
Now reconsider something else I just said above: “I am in the plurality, and Google has never let me down.” The bigger problem is that in the process of serving the plurality, Google defines the plurality. Google’s stated goal is to meet the plurality’s expectations of their ability to access information. But since Google is our only search and discoverability tool on the web, Google’s estimation of how the plurality wants to access information ends up defining which information we attempt to access in the first place!
This unknown-internal-policy problem has recently been discussed to great length in the context of electronic voting. In a nutshell: no matter how good any given system’s security is, it is impossible to have perfect security. This is okay with things like credit card transactions; people are willing to live with the risk that 1 in 10,000 [note: I made up this number] transactions will be fraudulent. Most of the time the credit card company can eat the loss. They raise the price of their products very slightly to compensate for this, and everyone is happy to live with that system.
However, with voting, not only are the stakes much higher, but the system is much less robust. If a race is very close, or if for any reason there is any question regarding the accuracy of the count, either because of technical error or malicious attack, there is no way to double check the machines.
Which is why many people are advocating for a paper trail. The vote can be quickly administered and counted with the electronic machines, but each voter will also get a paper receipt that lists their selections. They can then verify that the receipt matches the on-screen display. The receipt then gets put in a box, to be used in a recount if necessary. (More info can be found at blackboxvoting.org and Avi Rubin’s Homepage).
Paper receipts are considered to be the bare minimum feature for a reasonably trustworthy electronic voting system. Some people even demand that the software be open-source, for two main reasons: 1. It allows hundreds of thousands of programmers to review the system for security vulnerabilities and bugs 2. It of course almost completely obliterates the possibility of someone inserting malicious code to rig the election (this point is a little trickier, since malicious code could be inserted just before the election, after the codebase had been reviewed… so there would have to be some sort of live, continual hash done on the software installed on each machine).
For the very same reasons listed above, I advocate that Google must in the near future open-source its search/indexing tools. There are certain things that our society consider to be basic resources that all humans should have easy and constant access to: air, water, electricity, et cetera. Google’s search engine has become such a public resource. All users of the internet – millions of people in the world – rely heavily on Google’s search tools on a daily basis in order to do fundamental, important tasks.
In most industries, regulation, in the form of breaking up a company into several other companies, is sufficient to maintain reasonable quality and competitive prices. But even if things such as water and electricity are not competitively priced, the nature of the product received cannot be hidden from the user. In the case of unacceptable levels of contamination in drinking water, scientists still have the opportunity and the ability to test for these things at any time. With Google, it is essentially impossible for a consumer or a third party to determine the quality of Google’s product. A closed-source search engine monopoly means a monopoly on information.
Argument: Wait I don’t get it. How is Google going to make money? I mean, even if we say that they don’t need to or something, if they don’t make money they can’t operate in the first place.
My Response: First, open-sourcing their code doesn’t mean they are going to give it away, or allow other people to use it. It only means they will show people their internal policies. Anyone who attempts to use their code comercially will be prosecuted.
Second, Google has phenomenal resources in terms of computing power (thousands and thousands of computers stuck in a warehouse somewhere) and bandwidth (think 3,000 cable modems). It’s going to be quite a while before any other entity can come close to matching them, and anyone who is big enough to do so will be unable to stay under the radar and will eventually be charged for illegal use of Google’s code.
Third, Google has many other products, most of which leverage its search engine. These would all continue to be closed-source.
Argument: Google is a private company and has the right to do whatever it wants with the products it develops.
My Response: I don’t disagree with this in and of itself, but I believe that the power and responsibility into which Google has fallen ethically requires them to serve a need greater than securing their “intellectual property”.
Argument: Google isn’t the Internet, it is one of many tools on the Internet that people can choose to use. Google has no responsibility to deliver any particular type of product to any particular people.
My Response: If someone invented a cure for all types of cancer, would it be ethical for the drug company to not distribute it until they could get their marketing campaign together? Would it be ethical for quality control in the manufacturing of the drug to go slack because the drug was in such high demand that unhappy consumers were inconsequential to profits? Would it be ethical for the drug to not be affordable to some cancer patients? Some things are so valuable to so many people that we must regulate them in order to guarantee a given level of quality within a given timeframe.
Argument: I’m a Libertarian / free-market purist / fiscal conservative, and I really don’t think regulation is ever a good idea.
My Response: So am I. And I also don’t like industry regulation, except for extreme cases. If the free-market dictated that water or electricity wouldn’t be available to certain remote areas, I think that wou– wait a second, I just gave you the cancer example and you are whining about regulation? You think production and distribution of the cure for cancer should be completely dictated by the free market?? gdfjkdfjgkldsf
Argument: Google isn’t doing anything magic. They have thousands of computers and some smart programmers. Any other business, or even the government, could do the same thing and be competitive.
My Response: This is partially true. Any of the big computing companies such as Microsoft, Sun, or IBM could become aggressive and create a popular, competitive search engine. It doesn’t matter if the market settles on 1 winner or 3 winners, it’s going to be a race to the middle. Companies will compete by tacking things onto the feature list, or raw marketing muscle. The “quality” of the product would not be perceptible to the user and would not be how the search engines compete. Indeed it would be to their great advantage to do whatever they could to lower the consumers’ expectations, in order to develop a more controllable, consistent market. And again, even if competition DID “improve quality”, we still wouldn’t know what we were getting, because the system would be closed-soure. Of course, in this environment, people like me wouldn’t trust or use any of the search engines, so we would just be back to the days of no search-engines.
So, Google can chose to either inevitably be one of several big-name, mediocre, marketing-driven search engine companies, or it can open-source its search engine and guarantee its place in the history books as the creator of the most powerful and important information tool to date.
Admittedly, perhaps in the near future, computers will be fast and cheap enough so that a modestly funded government agency or non-profit organization would be able to maintain a web-index using open-source tools. In this case, everything I’ve said here is irrelevant. But I don’t think this will be the case. Google will always be far ahead of the computing-power curve.
Google is not going to open source their search indexing code for the reason that it will make it even easier for people to game the system and boost their page’s search engine ranking on Google.
You raise the idea that any information filtering will be inherently biased, but this is how it’s always been. The editors of the 1905 Encyclopedia Brittanica chose what topics to include articles on, and edited those articles to suit their tastes. Their has to be some kind of default ‘Lens’ to view internet search engine result. Whatever minor biases Google’s might have (you can’t seem to come up with any good examples) aren’t really that important, because with the right search criteria you should still be able to zero in on what you need.
Yes, in a perfect Marxist world Google would be open source. Unfortunately, in this hypothetical world Google may not exist because Larry Page still needs a way to put food on his hypothetical table.
Google is a good company - they provide services that play nicely with others. The real sign of a monopoly is that they use their power in one market to unfairly influence another. I have not (yet, perhaps, is a valid argument) seen this happening. Until that happens, leave the DOJ out of this. No one is going to influence Google’s policies by finger waving.
There are many alternative searches that I use regularly, staying clear of Google; examples being Technorati and WikiPedia. I use them because the results I get there are more relevant; the data is organized better for the search I am trying to do. Google will not always be the be all end all. Intregrating Google Maps may kill Mapquest, but lets face it, Google Maps is better to begin with.
If you are interested in open source search, I suggest you check out Lucene: http://lucene.apache.org/java/docs/index.html. Google isn’t keeping you from working on it.
You are like an officer from the Enterprise who has beamed down to a planet with some less advanced people to try to convince them that they need to evacuate their village. You are all like, trying not to violate the prime directive by revealing information that could fuck with their natural development, and that makes it hard to get the job done. You’ll all be like “well, there are problems that I can’t explain” and they are like “but the goddess bishtoog has never failed us before!”
P.S. at the end of the episode you get beaten up, driven out of the village and left for dead. Sorry, that’s just the way it goes. Give up now and our primitive asses might spare you and even give you one of our daughters as a wife. Your kid’s face will only be about half as fucked up as hers. But he will also be a total weener.
Google publishes human-readable words on a website. That’s all they do. Their only “product” is written speech.
Speech-production is an activity I’d not like to see regulated.
I would love to see them open their source, mind you, but I wouldn’t like to see them forced to open it. Their right to publish shouldn’t be contingent on anything else.
Unlike the cancer cure example, Google per se does not directly cure diseases or save lives, so I would think the ethics would be different, but I don’t know much about ethics. But how about instead of trying to regulate a private corporation, advocating the creation of a public Department of Internet Search that would be charged with creating the biggest and badest search engine ever and would be funded by enough tax dollars to compete for the best and brightest at Google. Sort of like how people are now trying to travel to space with private money based on technology originally developed at NASA with public money, but in reverse. And it would also, if only under the Freedom of Information Act, be open source. Of course, people would start complaining when taxpayer-funded servers start caching porn and such. And maybe changes could only be made to it via voting for lawmakers and presidents instead of committing CVS patches or whatever. But those are small details. ;-)
Oops, I wrote all this before reading your last paragraph. Damn. Anyway, in response to your penultimate paragraph, I don’t think people at Google really care all that much about creating a place for their company in the history books, unless doing so will also help them put food on the table, or buy new Lexuses, or whatever example you want to use.
I think you’re giving them too much credit. After all, they only came into popularity (maybe that’s a key word, here, as opposed to ones you use, such as monopoly and censorship) a few years ago, and we all lived fine before them. Maybe life wasn’t as good, but it was still fine. We could still do things like vote (well, most of us), work on projects such as Lucene, as suggested above, and there was still no cure for cancer.