An Aspiring Scientist’s Frustration with Modern-Day Academia: A Resignation

Here is a mind-blowing text that was sent to all EPFL researchers (presumably) by a doctoral student during the week-end. It expresses feelings that are worth to think about.

Just to be crystal-clear:

  • I am not the author of this text.
  • I don’t publish the name of his/her author, since I have no proof that his/her e-mail address was not spoofed.
  • I don’t think that the exposed facts are a problematic unique to EPFL, nor to any other Swiss university: to the contrary, this is probably a worldwide phenomenon.
  • Finally, I would like to make very clear that I did not experience the same feelings at all during my (very happy) PhD times at EPFL. So, don’t try to make any parallel with my own experience.
  • Like the author, I don’t have any good idea how to change the system towards a better one.

Still, if you are or have been in the academic world, I think it is worth to invest 10 minutes to read this text.

Dear EPFL,
I am writing to state that, after four years of hard but enjoyable PhD work at this school, I am planning to quit my thesis in January, just a few months shy of completion. Originally, this was a letter that was intended only for my advisors. However, as I prepared to write it I realized that the message here may be pertinent to anyone involved in research across the entire EPFL, and so have extended its range just a bit. Specifically, this is intended for graduate students, postdocs, senior researchers, and professors, as well as for the people at the highest tiers of the school’s management. To those who have gotten this and are not in those groups, I apologize for the spam.
While I could give a multitude of reasons for leaving my studies – some more concrete, others more abstract – the essential motivation stems from my personal conclusion that I’ve lost faith in today’s academia as being something that brings a positive benefit to the world/societies we live in. Rather, I’m starting to think of it as a big money vacuum that takes in grants and spits out nebulous results, fueled by people whose main concerns are not to advance knowledge and to effect positive change, though they may talk of such things, but to build their CVs and to propel/maintain their careers. But more on that later.
Before continuing, I want to be very clear about two things. First, not everything that I will say here is from my personal firsthand experience. Much is also based on conversations I’ve had with my peers, outside the EPFL and in, and reflects their experiences in addition to my own. Second, any negative statements that I make in this letter should not be taken to heart by all of its readers. It is not my intention to demonize anyone, nor to target specific individuals. I will add that, both here and elsewhere, I have met some excellent people and would not – not in a hundred years – dare accuse them of what I wrote in the previous paragraph. However, my fear and suspicion is that these people are few, and that all but the most successful ones are being marginalized by a system that, feeding on our innate human weaknesses, is quickly getting out of control.
I don’t know how many of the PhD students reading this entered their PhD programs with the desire to actually *learn* and to somehow contribute to science in a positive manner. Personally, I did.  If you did, too, then you’ve probably shared at least some of the frustrations that I’m going to describe next.
(1) Academia: It’s Not Science, It’s Business
I’m going to start with the supposition that the goal of “science” is to search for truth, to improve our understanding of the universe around us, and to somehow use this understanding to move the world towards a better tomorrow. At least, this is the propaganda that we’ve often been fed while still young, and this is generally the propaganda that universities that do research use to put themselves on lofty moral ground, to decorate their websites, and to recruit naïve youngsters like myself.
I’m also going to suppose that in order to find truth, the basic prerequisite is that you, as a researcher, have to be brutally honest – first and foremost, with yourself and about the quality of your own work. Here one immediately encounters a contradiction, as such honesty appears to have a very minor role in many people’s agendas. Very quickly after your initiation in the academic world, you learn that being “too honest” about your work is a bad thing and that stating your research’s shortcomings “too openly” is a big faux pas. Instead, you are taught to “sell” your work, to worry about your “image”, and to be strategic in your vocabulary and where you use it. Preference is given to good presentation over good content – a priority that, though understandable at times, has now gone overboard. The “evil” kind of networking (see, e.g., seems to be openly encouraged. With so many business-esque things to worry about, it’s actually surprising that *any* scientific research still gets done these days. Or perhaps not, since it’s precisely the naïve PhDs, still new to the ropes, who do almost all of it.
(2) Academia: Work Hard, Young Padawan, So That One Day You Too May Manage!
I sometimes find it both funny and frightening that the majority of the world’s academic research is actually being done by people like me, who don’t even have a PhD degree. Many advisors, whom you would expect to truly be pushing science forward with their decades of experience, do surprisingly little and only appear to manage the PhD students, who slave away on papers that their advisors then put their names on as a sort of “fee” for having taken the time to read the document (sometimes, in particularly desperate cases, they may even try to steal first authorship). Rarely do I hear of advisors who actually go through their students’ work in full rigor and detail, with many apparently having adopted the “if it looks fine, we can submit it for publication” approach.
Apart from feeling the gross unfairness of the whole thing – the students, who do the real work, are paid/rewarded amazingly little, while those who manage it, however superficially, are paid/rewarded amazingly much – the PhD student is often left wondering if they are only doing science now so that they may themselves manage later. The worst is when a PhD who wants to stay in academia accepts this and begins to play on the other side of the table. Every PhD student reading this will inevitably know someone unlucky enough to have fallen upon an advisor who has accepted this sort of management and is now inflicting it on their own students – forcing them to write paper after paper and to work ridiculous hours so that the advisor may advance his/her career or, as if often the case, obtain tenure. This is unacceptable and needs to stop. And yet as I write this I am reminded of how EPFL has instituted its own tenure-track system not too long ago.
(3) Academia: The Backwards Mentality
A very saddening aspect of the whole academic system is the amount of self-deception that goes on, which is a “skill” that many new recruits are forced to master early on… or perish. As many PhD students don’t truly get to choose their research topic, they are forced to adopt what their advisors do and to do “something original” on it that could one day be turned into a thesis. This is all fine and good when the topic is genuinely interesting and carries a lot of potential. Personally, I was lucky to have this be the case for me, but I also know enough people who, after being given their topic, realized that the research direction was of marginal importance and not as interesting as it was hyped up by their advisor to be.
This seems to leave the student with a nasty ultimatum. Clearly, simply telling the advisor that the research is not promising/original does not work – the advisor has already invested too much of his time, reputation, and career into the topic and will not be convinced by someone half his age that he’s made a mistake. If the student insists, he/she will be labeled as “stubborn” and, if the insisting is too strong, may not be able to obtain the PhD. The alternative, however unpleasant, is to lie to yourself and to find arguments that you’re morally comfortable with that somehow convince you that what you’re doing has important scientific value. For those for whom obtaining a PhD is a *must* (usually for financial reasons), the choice, however tragic, is obvious.
The real problem is that this habit can easily carry over into one’s postgraduate studies, until the student themselves becomes like the professor, with the backwards mentality of “it is important because I’ve spent too many years working on it”.
(4) Academia: Where Originality Will Hurt You
The good, healthy mentality would naturally be to work on research that we believe is important. Unfortunately, most such research is challenging and difficult to publish, and the current publish-or-perish system makes it difficult to put bread on the table while working on problems that require at least ten years of labor before you can report even the most preliminary results. Worse yet, the results may not be understood, which, in some cases, is tantamount to them being rejected by the academic community. I acknowledge that this is difficult, and ultimately cannot criticize the people who choose not to pursue such “risky” problems.
Ideally, the academic system would encourage those people who are already well established and trusted to pursue these challenges, and I’m sure that some already do. However, I cannot help but get the impression that the majority of us are avoiding the real issues and pursuing minor, easy problems that we know can be solved and published. The result is a gigantic literature full of marginal/repetitive contributions. This, however, is not necessarily a bad thing if it’s a good CV that you’re after.
(5) Academia: The Black Hole of Bandwagon Research
Indeed, writing lots of papers of questionable value about a given popular topic seems to be a very good way to advance your academic career these days. The advantages are clear: there is no need to convince anyone that the topic is pertinent and you are very likely to be cited more since more people are likely to work on similar things. This will, in turn, raise your impact factor and will help to establish you as a credible researcher, regardless of whether your work is actually good/important or not. It also establishes a sort of stable network, where you pat other (equally opportunistic) researchers on the back while they pat away at yours.
Unfortunately, not only does this lead to quantity over quality, but many researchers, having grown dependent on the bandwagon, then need to find ways to keep it alive even when the field begins to stagnate. The results are usually disastrous. Either the researchers begin to think up of creative but completely absurd extensions of their methods to applications for which they are not appropriate, or they attempt to suppress other researchers who propose more original alternatives (usually, they do both). This, in turn, discourages new researchers from pursuing original alternatives and encourages them to join the bandwagon, which, though founded on a good idea, has now stagnated and is maintained by nothing but the pure will of the community that has become dependent on it. It becomes a giant, money-wasting mess.
(6) Academia: Statistics Galore!
“Professors with papers are like children,” a professor once told me. And, indeed, there seems to exist an unhealthy obsession among academics regarding their numbers of citations, impact factors, and numbers of publications. This leads to all sorts of nonsense, such as academics making “strategic citations”, writing “anonymous” peer reviews where they encourage the authors of the reviewed paper to cite their work, and gently trying to tell their colleagues about their recent work at conferences or other networking events or sometimes even trying to slip each other their papers with a “I’ll-read-yours-if-you-read-mine” wink and nod. No one, when asked if they care about their citations, will ever admit to it, and yet these same people will still know the numbers by heart. I admit that I’ve been there before, and hate myself for it.
At the EPFL, the dean sends us an e-mail every year saying how the school is doing in the rankings, and we are usually told that we are doing well. I always ask myself what the point of these e-mails is. Why should it matter to a scientist if his institution is ranked tenth or eleventh by such and such committee? Is it to boost our already overblown egos? Wouldn’t it be nicer for the dean to send us an annual report showing how EPFL’s work is affecting the world, or how it has contributed to resolving certain important problems? Instead, we get these stupid numbers that tell us what universities we can look down on and what universities we need to surpass.
(7) Academia: The Violent Land of Giant Egos
I often wonder if many people in academia come from insecure childhoods where they were never the strongest or the most popular among their peers, and, having studied more than their peers, are now out for revenge. I suspect that yes, since it is the only explanation I can give to explain why certain researchers attack, in the bad way, other researchers’ work. Perhaps the most common manifestation of this is via peer reviews, where these people abuse their anonymity to tell you, in no ambiguous terms, that you are an idiot and that your work isn’t worth a pile of dung. Occasionally, some have the gall to do the same during conferences, though I’ve yet to witness this latter manifestation personally.
More than once I’ve heard leading researchers in different fields refer to other methods with such beautiful descriptions as “garbage” or “trash”, sometimes even extending these qualifiers to pioneering methods whose only crime is that they are several decades old and which, as scientists, we ought to respect as a man respects his elders. Sometimes, these people will take a break from saying bad things about people in their own fields and turn their attention to other domains – engineering academics, for example, will sometimes make fun of the research done in the humanities, ridiculing it as ludicrous and inconsequential, as if what they did was more important.
(8) Academia: The Greatest Trick It Ever Pulled was Convincing the World That It was Necessary
Perhaps the most crucial, piercing question that the people in academia should ask themselves is this: “Are we really needed?” Year after year, the system takes in tons of money via all sorts of grants. Much of this money then goes to pay underpaid and underappreciated PhD students who, with or without the help of their advisors, produce some results. In many cases, these results are incomprehensible to all except a small circle, which makes their value difficult to evaluate in any sort of objective manner. In some rare cases, the incomprehensibility is actually justified – the result may be very powerful but may, for example, require a lot of mathematical development that you really do need a PhD to understand. In many cases, however, the result, though requiring a lot of very cool math, is close to useless in application.
This is fine, because real progress is slow. What’s bothersome, however, is how long a purely theoretical result can be milked for grants before the researchers decide to produce something practically useful. Worse yet, there often does not appear to be a strong urge for people in academia to go and apply their result, even when this becomes possible, which most likely stems from the fear of failure – you are morally comfortable researching your method as long as it works in theory, but nothing would hurt more than to try to apply it and to learn that it doesn’t work in reality. No one likes to publish papers which show how their method fails (although, from a scientific perspective, they’re obliged to).
These are just some examples of things that, from my humble perspective, are “wrong” with academia. Other people could probably add others, and we could go and write a book about it. The problem, as I see it, is that we are not doing very much to remedy these issues, and that a lot of people have already accepted that “true science” is simply an ideal that will inevitably disappear with the current system proceeding along as it is. As such, why risk our careers and reputations to fight for some noble cause that most of academia won’t really appreciate anyway?
I’m going to conclude this letter by saying that I don’t have a solution to these things. Leaving my PhD is certainly not a solution – it is merely a personal decision – and I don’t encourage other people to do anything of the sort. What I do encourage is some sort of awareness and responsibility. I think that there are many of us, certainly in my generation, who would like to see “academia” be synonymous with “science”. I know I would, but I’ve given up on this happening and so will pursue true science by some other path.
While there was a time when I thought that I would be proud to have the letters “PhD” after my name, this is unfortunately no longer the case. However, nothing can take away the knowledge that I’ve gained during these four years, and for that, EPFL, I remain eternally grateful.
My sincerest thanks for reading this far

Awakening Zombie Code in Apache

At the end of last year, while playing with hash-DoS (see this previous post and my Insomni’hack 2013 talk for understanding the whole context), I have found funny things in the code source of the Apache httpd web server.  It concerns the module mod_auth_digest, which is responsible to authenticate users according to challenge-response protocols standardized in RFC 2617, namely MD5 and MD5-sess. Essentially these authentication mechanisms allow to protect passwords between the client and the web server from an adversary spying the communication, as they do not appear in clear like it is the case for the the Basic authentication mechanism provided by the module mod_auth_basic, but only in hashed form.

Those two different variants work in a slightly different way. By default, Apache uses the variants MD5 and the parameter qop (standing for “Quality of Protection”) is set to auth, as stated by the official documentation. First, two values are computed: \mathrm{HA1} = \mathrm{MD5}(\text{username:realm:password} and \mathrm{HA2} = \mathrm{MD5}(\text{method:digestURI}); then, the client computes the response as \mathrm{MD5}(\text{HA1:nonce:nonceCount:clientNonce:qop:HA2}) and sends it to the web server, that can repeat the same computation as it knows all the different parameter values.

The MD5-sess variant works in a slightly different way. The value \mathrm{HA1} is computed only once, on the first request by the client following the receipt of a WWW-Authenticate challenge from the server. It uses the server nonce from that challenge, and the first client nonce value to construct \mathrm{HA1} as \mathrm{MD5}(\mathrm{MD5}(\text{username:realm:password})\text{:nonce:cnonce}). The rationales of this construction are explained in § of RFC 2617:

This creates a ‘session key’ for the authentication of subsequent requests and responses which is different for each “authentication session”, thus limiting the amount of material hashed with any one key. […] Because the server need only use the hash of the user credentials in order to create the HA1 value, this construction could be used in conjunction with a third party authentication service so that the web server would not need the actual password value. The specification of such a protocol is beyond the scope of this specification.

More details can be found on the dedicated Wikipedia page. Interestingly, although it is possible to configure Apache httpd to use MD5-sess, through the AuthDigestAlgorithm directive, the documentation tells us that “MD5-sess is not correctly implemented yet”. Trying to use it in a .htaccess file results in an error 500 (“Internal Server Error”), and the httpd server gently explains why in the error logs:

Essentially, the use of MD5-sess is killed by the following routine:

Furthermore, other mechanisms, like one-time nonces (as a side note, cryptographically speaking, a nonce is a number that must be used only once…), nonce-count checking are not supported as well:

All those mechanisms require to store server-side information in a shared memory segment, as one needs some synchronization between the different threads. Still, there exists a lot of code in the source code of the module mod_auth_digest that are related to handling those mechanisms. Some configuration directives are also documented, like AuthDigestShmemSize, although shared memory seems to be used only by those disabled features. In summary, it appears that there seems to be a lot of zombie code in this mod_auth_digest module. Let’s try to awaken it 😉 !

The routine  note_digest_auth_failure()  is responsible to handle authentication errors, and it still contains code that access the shared memory segment, more exactly through the routine gen_client(). The following piece of code is pretty interesting:

The conditions conf->check_nc and !strcasecmp(conf->algorithm, "MD5-sess") are always false, but conf->nonce_lifetime == 0 can be made true through the AuthDigestNonceLifetime directive.

Here is a proof of concept: I have put the following .htaccess file in the /aaa directory

and the following tiny Python script sends HTTP requests with a missing opaque field at a rather slow pace:

This is sufficient to trigger floating-point exceptions (I also observed NULL pointer dereferences if the AuthDigestShmemSize directive is used) and to make repeatedly crash the different threads, hence rendering the httpd server in whole unavailable to legitimate requests.

In summary, if one is able to put a AuthDigestNonceLifetime somewhare in a .htaccess file, either directly or through injection, then one is able to completely sabotage an Apache httpd installation. This seemed pretty annoying to me, expecially if we have the shared web environments scenario in head. At the time of writing this post, this works with the versions 2.4.4 and 2.2.24, which are the latest ones.

For the record, I have contacted the Apache security team, first directly without success, then through the oCERT crew (thank you guys for your quick answer!), and I received the following answer:

Like a Hot Knife Through Butter

More or less recently, an interesting line of attacks against software has been revisited, namely Hash-DoS, or, in a nutshell, exploiting weak hash functions used in a hash table implementation to trigger a denial-of-service.

To the best of my knowledge, this problematic has been exposed as early as in 1998 in Phrack by Solar Designer, then variants have been discussed by Crosby and Wallach at USENIX 2003, formally defining algorithmic complexity attacks, by Klink and Wälde during 28c3 in 2011, applying the idea on PHP, Python, Java, Ruby, etc. and more recently by Aumasson, Bernstein and Bosslet (see their slides at Appsec Forum 2012, and their upcoming talk at 29c3), showing that the proposed solutions, essentially randomizing the hash function, were not always as effective as expected.

Technically, this kind of attacks consists in generating a large number of colliding inputs for the hash table. Hence, instead of a having a O(1) average access time to a stored element, one can force the hash table to have a O(n) one. If one is willing to explore all the elements in the hash table, the worst-case complexity becomes O(n^2) instead of O(n). Be able to generate multi-collisions (i.e., multiple inputs, not just two, mapping to the same output), in practice hence depends on the properties of the hash function transforming elements to a key.

In this short post, I’d like to show how hash-DoS can be applied to the btrfs file-system with some astonishing and unexpected success. Btrfs, while still in development stage, is widely considered as being a viable successor of ext4, and an implementation of it is already part of the Linux kernel. According to this page,

Directories are indexed in two different ways. For filename lookup, there is an index comprised of keys:

Directory Objectid BTRFS_DIR_ITEM_KEY 64 bit filename hash

The default directory hash used is crc32c, although other hashes may be added later on. A flags field in the super block will indicate which hash is used for a given FS.

The second directory index is used by readdir to return data in inode number order. This more closely resembles the order of blocks on disk and generally provides better performance for reading data in bulk (backups, copies, etc). Also, it allows fast checking that a given inode is linked into a directory when verifying inode link counts. This index uses an additional set of keys:

Directory Objectid BTRFS_DIR_INDEX_KEY Inode Sequence number

The inode sequence number comes from the directory. It is increased each time a new file or directory is added.

Knowing how trivial it is to compute multi-collisions for a CRC-based hash, I did not resist to play a bit. Roughly speaking, computing a CRC of an n-bit message M consists in interpreting the message as a polynomial M(x) of degree n-1 over \mathrm{GF}(2), dividing it by the CRC defining polynomial P(x), and taking the remainder, hence writing something like \mathrm{CRC} = M(x) - Q(x)\cdot P(x). Obviously, adding any multiple of P(x) to the message M(x) will generate a collision. For the gory details, see this page.

I basically found two different attacks:

  • I computed the time to create 4000 empty files in the same directory whose names were randomly chosen. This takes about 0.2 seconds. The box used is a Fedora distribution within a VM (and btrfs was a loopback-ed device).

    Then, I computed the time to create those 4000 empty files in the same directory, whose names were however chosen in order to hash to the same CRC32C value. This operation fails after 5 (!) seconds and creating only 61 files. In other words, this first attack allows an adversary, in a shared directory scenario, to avoid that a victim creates a file with a known-in-advance name. According to the btrfs maintainer, Chris Mason, 

Collisions are a known issue with any of the hash based directories. […] The impact of the DOS is that a malicious user is able to prevent the creation of specific file names. For this to impact other users on the system, it must be done in directories where both the malicious user and the victim user have permission to create files. The obvious example is /tmp, but there are other cases that may be more site-specific. Group writable directories have other security issues, and so we picked the hash knowing this kind of DOS was possible. It is good practice to avoid the shared directories completely if you’re worried about users doing these kinds of attacks.

  • A bit annoyed by this answer, I tried harder and found the following: I have created several files with random names in a directory (around 500). The time required to remove them is negligible. Then, I have created the same number of files, but giving them only 55 different crc32c values. The time required to remove them is so large that I was not able to figure it out and killed the process after 220 minutes (!). The python script I used is the following, and borrows some code from StalkR:

    More exactly, I mounted a 1GB btrfs file system on a loopback device:

    In the exploit script, just put the variable  hack = False  to generate random empty filenames or  hack = True  to generate colliding filenames and hence trigger the phenomenon. Here is a screenshot of what I obtained:

Given the result, it looks like that playing with collisions is much more likely to trigger an infinite loop than just a complexity increase; at least, the btrfs code surely does not expect to handle many collisions.

Essentially, to thwart both attacks, I would recommend to use a modern lightweight hash algorithm, such as SipHash, instead of CRC32C. Another alternative is to avoid using data structures that have a high worst-case complexity, like hash tables, for storing data that can potentially be manipulated by malicious users. Sacrificing a bit of average-case performance, data structures like red-black trees have a guaranteed search time in O(\log(n)) (I learned this while reading the source code of nginx).

For the record, this vulnerability has been announced to the btrfs maintainer Chris Mason on November 14th, 2012, who acknowledged the bug, but then did not answer any of my e-mails. and mentioned that

 My plan is to get this fixed for the 3.8 merge window. We’re juggling a lot of patches right now but I expect to fix things then.

[UPDATE OF 17/12/2012] As several readers of this post have noticed, and I would like to warmly thank them for their feedback, the second attack does NOT generate an infinite loop within the btrfs code, but merely within the bash expansion code which is responsible to expand the command line rm *. This can be seen in the above screenshot, as the CPU is burnt in userland, and not in the kernel. Hence, what I thought to be a complexity attack against the btrfs file system is actually a (less glamorous) complexity attack against bash.

This said, after having communicated this unfortunate glitch to the btrfs maintainer Chris Mason, he kindly answered me the following:

You’ve found a real problem in btrfs though. Changes since I tested the crc overflow handling have made us not deal with our EOVERFLOW error handling completely right, and there are cases where we force the FS readonly when we shouldn’t. So this is a valid bug, I’m just waiting on some review of my fix, which will get backported to a number of kernels.

To summarize, the message I wanted to pass through this post remains still valid: if one uses a weak hash function, like it is the case in the btrfs file system, one should assume that malicious users can generate zillions of collisions, and one should write accordingly robust code able to handle those collisions in an efficient way.

Another possibility consists in using a lightweight cryptographic hash function that translates the search for multi-collision in a hard task. The security/performance tradeoff to find here is definitely a delicate and hard decision to take.

Finally, the first described attack, i.e., make impossible the creation of a given file within a shared directory, keeps still valid.