Coder in a World of Code

Sunday, 18 November 2018

Poor man's map-reduce

Sometimes, you've just got to say "fuck it" and go with what you've got. I recently had such a night.

I had a large line-oriented input file, like a CSV file, no time, and a slow processing script. The script couldn't be optimised easily because each record needed a network call. I also didn't have the time to upgrade the script to use the language's built-in concurrency or parallelism. I needed answers sooner rather than later; every minute was a minute I was not in bed ahead of an important meeting where I would present the data and my analysis.

needed to run my script against two different back-end systems, and a quick back-of-the-envelope estimate put the runtime over two hours. Two hours that I did not have.

Long story short, there was no time for caution, and it was do-or-die. The answer, of course, is ~~improv everywhere~~ shell scripting.

To sum up, I had:

A large line-oriented input file
A slow line-oriented processing script
A *nix system
No time for caution

Map-reduce

Map-reduce is a way of processing large input data quickly. The input is split into chunks, processed in parallel, then brought back together at the end.

My problem happened to fit this model, but I had no time to wrestle my script. I split up my input file, ran the script on each part, and then brought the answers back together manually.

shell to the rescue!

First, I used split -l to break up my input file up into chunks:

split -l 459 input.txt

This gave me ~48 with filenames like xaa and xab, each with ~460 lines. Each file would take ~1 minute to process at 6 TPS, which was about the maximum throughput of the processing script.

Next, I launched my script 48 times:

for i in x*
do script.py $i > output-$i &; done

Off it went, with 48 processes launched in parallel, each writing to their individual output file.

Bringing it back together

The files need to be brought back together. In my case, a colleague had written another script to analyse a single output file and generate a report. I needed to join all my files back together.

cat output-x* | sort | uniq > aggregate-output.txt

Error handling

Some lines in the input would cause my processing script to crash. Because I needed to do this process twice, I tried two different approaches:

Let it crash and re-process missed lines later
Code up try/catch/continue.

The first approach turned out to be frustrating and time consuming. I ended up using wc -l to quickly work out which files had failed, re-collecting, re-splitting, and re-running after removing some of the offending lines. This was especially difficult because a single chunk could have multiple "poison" lines, so I ended up going down to chunks with 2 lines. Very annoying.

The second approach was much better and quicker, however, it did need a little extra print >>sys.stderr "msg"" to list of poison lines. All in, the second approach was quicker in this case.

Conclusion

In the end, I took the processing down from >2 hours to a couple of minutes with only a few moments 'investment' with the shell.

I would not recommend this to any sane person, except in the fairly narrow circumstances listed above. It's difficult to test, requires manual shell work, and failures can really set you back.

From a software engineering perspective it's ridiculous. On the other hand, it's not stupid if it works.

Sunday, 2 October 2016

systemd: OpenSSL of the future?

OpenSSL has received a lot of flak over the years. Both cryptographic and implementation flaws have been plentiful, and researchers are all over it. Much has been said about the software engineering process that was used to put it all together.

It appears to me that systemd is following a similar path, with another vulnerability turning up in the last few days. A very quick assessment of this vulnerability would place it as a 'medium' (CVSSv3 between 5.5 and 6.5) severity vulnerability. Ayer's post goes on to use this example as another data point in favour of the "systemd is insecure" camp.

OpenSSL is bad. It is security-critical software that has a constant stream of high and critical severity bugs. However, it's not too much of a problem for me. I update the library and restart the affected services of a test system, then on to the production systems. Usually this is just reloading or restarting nginx and sshd. The impact to end-users is small or non-existant. For some people, an OpenSSL bug is much worse, it all depends on the exact nature of the system you run and the particular bug.

On the other hand, we have systemd. We've not seen as many issues so far. But we can't just update and restart systemd, because it is a tightly integrated with many components. I feel like this is similar to the libc vulnerabilities we saw a little while back. The maintenance impact is much bigger because we're forced to do a full system restart.

This is one of the many reasons why privilege separation is such a good thing. Every new process gets the updated code. So while the minimal core of the application may contain vulnerable code, it might be easy to show that it is never reached. On the other hand, the privilege-separated workers don't need that much attention, so we just restart them and have them pick up the upgrades. Further, each worker process can be individually sandboxed and allowed to fail.

I agree with Ayer's perspective. Software is written by fallible humans who make mistakes with alarming regularity. When we dive into coding without thinking about the long-term implications, we set ourselves up for failure. If we build a monolith, any RCE bug is fatal. If we fail to sandbox our systems, any path traversal bug is fatal. We should avoid designs where common flaws are fatal.

Engineers need to be thinking about a few core questions:

How important is this system going to be?
How do we make bugs less likely?
How do we make bugs less problematic?

These are the central questions that a software development process answers. These are the questions which we frequently ignore. We purchase time-to-market with steep maintenance costs. If the systemd developers persist in ignoring these deeper issues, I think systemd will be the OpenSSL of the future. A constant stream of high and critical severity issues causing a never-ending headache for those that use it.

If you are starting a project, or running a project, please look at the design of vsftpd and OpenSSH. Look over the various secure software development life-cycles. Think about using a language that supports you. If you're going to push people to use your system, consider the security implications of your decisions and remember that hope is not a strategy.

Saturday, 2 January 2016

Top 5 Security Fails of 2015

Much like every other year, 2015 had a veritable smorgasbord of security breaches and failures. This top 5 list, in chronological order, catalogues the trials and tribulations of security in 2015.

Contributing author: Sephy Hallow.

1. GHOST

Source: openclipart

This year opened with a bit of a bang, with CVE-2015-0235 being announced in January 2015. This vulnerability was branded as "GHOST." This was an issue with a core library which underlies almost every piece of Linux software. A successful attack would result in remote code execution on the target machine, gaining a CVSS score of 10.0 out of a possible 10.

The only saving grace was that it was difficult to determine if a particular piece of software actually used the library in a vulnerable way. As it turned out, very few pieces of software were actually vulnerable, but the difficulty determining that lead to a fair few people going into panic mode for a day or two.

Score: 1/5 - All Ghost and No Ghoulies

2. Office of Personnel Management

Source: Office of Personnel Management Seal

The Office of Personnel Management (OPM) breach was announced in June of 2015. Although the number of records exposed initially estimated at four million, this breach turned out to be a gift that kept on giving, with the estimate ballooning to 18 and finally 21.5 million records. Even better, the records were said to contain highly sensitive information from background checks, including personally identifying information, social security numbers, and even security clearance data.

What made this a real show stopper was the inept response. Putting aside the inability to simply count the number of records compromised, this became a comedy of errors as it was eventually shown that the OPM had been warned several times regarding shoddy security practices. In the aftermath of the attack, OPM set about trying to spread the blame far and wide, and speculated on the identity of the perpetrators rather than fixing their systems.

Score: 5/5 - Bureau Prats

3. Stagefright

Source: Charles Darwin

This year, no stone was left unturned, with security researchers turning their ingenuity to Android. Their efforts uncovered a glorious bounty of not one, not two, but eight vulnerabilities in a single library. Six of the eight vulnerabilities scored the maximum CVSS of 10.0 out of 10, with a 9.3 and a 5.0 thrown in for good measure. The vulnerabilities manifested themselves in the library named libstagefright, which was used for showing media files. A proof-of-concept exploit was developed which triggered the issue by the means of a crafted MMS message, and did not require user interaction.

Obviously, everyone quickly deployed the fix, right? Wrong. In reality we're talking about the Android ecosystem here, with the multiple phone carriers who are well known for not pushing security updates out to users. Oh, and the carriers lock the devices so that users cannot apply the patches themselves. Seems like a winning combination.

Score: 3/5 - Phantom of the Opera-ting System

4. Ashley Madison

Source: No Wedding

Not one to be out done on the sensitivity of information recovered, The Impact Team leaked some 25GB of customer data from Ashley Madison in August. Who is Ashley Madison, you ask? None other than that upstanding company who's motto is "Life is short. Have an affair." Clearly, their real motto was "Life is short. Security is for losers."

The data included roughly everything: financial information, names, addresses, and details of sexual fantasies. The internet took up harassing and bullying the victims whilst half the criminal underworld attempted to extort the victims. At least one person is known to have committed suicide, having directly cited the leak as their motivation for doing so.

Score: 5/5 - Security Blows

5. TalkTalk

Source: TalkTalk Logo

In October this year, TalkTalk's defences crumbled after coming under an alleged "significant and sustained cyber-attack" and purportedly leaked the details of some four million customers. Back in reality, this was a simple SQL injection attack which could have been conducted by a relatively unskilled teenager in their bedroom. The attackers demanded a ransom of roughly £17 million. Eventually, TalkTalk revised their estimate of the number of records accessed down to approximately two hundred thousand, and the Metropolitan Police arrested a 15 year-old from Northern Ireland in connection with the breach.

After issuing a ransom demand and confusing the heck out of TalkTalk, the attackers either used or sold on the data, resulting in at least one victim losing nearly £3,000 to fraud. Finally, the CEO, Dido Harding, was hauled in front of a Home Affairs Select Committee and asked to account for TalkTalk's (in)actions regarding the incident, including allegations that they had "covered up both the scale and duration of this attack[.]" Nice.

Score: 3/5 - All Talk

Tuesday, 24 November 2015

Inferior Process and Incompetent Developers

In Falhar's recent post, "Everybody is doing TDD," they claim that every developer uses test-driven development (TDD), because they will either automate their tests, or they will manually test their application. They go on to state that those who are manually testing their applications are "fully incompetent." Whilst I agree that with a sufficiently broad definition, almost anyone who tests their programs are undertaking TDD. Whether that broadly-defined TDD matches the commonly accepted definition is a different matter. However, I want to argue that those who do not produce automated tests are not necessarily incompetent, but rather that this is a matter of context.

Let's take three developers working on three separate projects.

Developer A is working on a security critical software library. The library implements a well-known cryptographic construction, which is defined in one or more RFC documents. Prior to development, this developer produces an automated test suite which consists of the test vectors from the RFC and property-based randomised tests. They work alone, so there is no code or design review, but they do use commonly available static analysis and code style tools to ensure that their work is consistent and free of "obvious" errors.

Developer B is a keen gardener, but is very forgetful. In order to ensure that they do not forget to tend their various plants according to a complex schedule, they write a program to help them remember. When run by cron, the program sends them an email with the names of the plants to water. There is no over-arching specification, the requirements are all encoded within the developer's head. If the program fails, the primary impact is that some plants are not watered for a day or two, or the schedule does not work out quite as planned. To develop this program, the developer uses some simple shell scripts, and a single crontab entry.

Finally, we have Developer C. Developer C is working on the control software for a turbofan engine (commonly called a jet engine). They are part of a large team, which includes safety managers, requirements engineers, and so on. The development time scale is on the order of a decade, and starts with requirements gathering, hazard analyses, risk assessments, and so on. Due to the fact that a failed engine could send searing hot fragments of turbine blade into the passenger cabin, the decision is made to formally verify the software. Developers are not expected to test their code; they're expected to write code which can be shown to be equivalent to the specification. Testing is handled by a large and dedicated assurance team, who test both the components, and the system as a whole. The closest to testing that developer C undertakes is checking that their code and associated proof holds according to the verifier.

It does not make sense to refer to any of the above developers as incompetent, despite the fact that only one of them is practising TDD. Each project calls for differing levels of assurance, and therefore different processes. Each process is completely adequate for the context, and further, it is possible that a single developer undertakes each of the projects outlined, some as part of their hobby, and some as part of their usual employment. There is no incompetence here, just different assurance levels.

TDD is a tool which is available to many developers. Not using TDD does not mark a developer as incompetent. Using a process which is inappropriate for the assurance level required by a project may well result in poor outcomes, but often developers do not decide on the process. In the cases where developers do decide on the process, it may be the case that their choices are guided by forces other than software correctness, such as market forces, management pressure, team familiarity, and so on. There may be cases where the wrong process is used for the situation, and often this would be referred to as negligence and would likely be incompetence.

Saturday, 7 November 2015

Ransomware on Linux

Dr.WEB is reporting that ransomware has come to the Linux ecosystem. Fortunately, this has only affected "tens" of users thus far. In particular, this malware is targeting those with a lot to lose: web site administrators. This gives the malware a good chance of ensnaring some business-critical data or functionality, thereby giving the victim a bit more incentive to pay the ransom.

Ransomware has been around for some time in the Windows ecosystem. Previously these programs would show a dialogue, claiming that the machine was locked and could be unlocked when a suitable payment was made. In reality, these were often just programs configured to run automatically on start-up, and did not directly endanger user data. In recent years, these have made attempts at encrypting the user's data and putting the key out of reach. A prompt payment promises to return the key, and thus the data, to the victim. These have had varying levels of success, with the "best" managing to pull in millions of dollars for their creators. They have not been without their flaws which allowed the victims to recover their data without paying; some variants stored the key locally on the machine, some eventually had the keys disclosed by security researchers, and some which have yet to to be broken. Often, organisations have no option but to pay the ransom.

Fortunately, this particular strain of malware requires extensive user interaction to run, requiring root privileges. This does not prevent future generations of this malware piggy-backing on other access vectors, such as vulnerable web browsers, email clients, web servers, and so on. I would predict that we will see this kind of malware attached to remote exploits in the moderately near future. Even using old exploits, or only encrypting a user's home directory could turn up quite the bounty for the attacker, as those who don't update their systems may well not have suitable backup processes in place to recover from the attack, and many people store their valuable files in their home directory.

There are a few options to mitigate the risk posed by this threat. However, none will be wholly effective, so a combination may be required. For some organisations, this will simply be a strengthening or verification of existing defences. For others, this threat may call for entirely new defences to be deployed.

The first and most common would be to ensure that all systems under your control have all relevant security patches applied. This should limit the likelihood of an exploit being used to launch an attack without user interaction. A backup system which stores backups offline should be used. If an on-line backup system is in use, either deploy an offline system or ensure that a previously saved backup cannot be overwritten by a corrupted copy, or easily reached by an attacker. This will reduce the impact of a breach, as it should be possible to recover from relatively recent backups in the event of a compromise. Where possible, software which consumes untrusted input, such as web browsers, email clients, web servers, and so on, should be placed into to a suitable sandbox environment. This should reduce the likelihood that the malware will be able to reach critical business data. Finally, better user education may reduce the likelihood of a breach, as they may be better able to detect social engineering attacks which might have otherwise lead them to run the malware.

It is fortunate that Linux has several sandbox mechanisms available, and an appropriate one can be selected. Such mechanisms include chroots, SELinux, AppArmor, or seccomp-bpf. Other systems, such as FreeBSD, should not be considered invulnerable, and similar mitigations applied, such as the use of jails or Capsicum. Unfortunately, restricting a complex web browser's access to the file system may have unexpected consequences, or simply be very time consuming. Ubuntu provides an AppArmor profile to do this for Chromium. However, it is not without it's issues, such as not being able to determine if it is the default browser on the system.

Saturday, 8 August 2015

SQLite and Testing

Categorical claims are often the source of faulty statements. "Don't test with SQLLite [sic] when you use Postgres in Production" by Robellard is a fantastic example. I actually agree with a variant of this statement: "If you need high levels of assurance, don't test with SQLite alone when you use Postgres in production."

Robellard bases his claim on several points, noting that "SQLite has different SQL semantics than Postgres," "SQLite has different bugs than Postgres," and "Postgres has way more features that SQLite." He has a couple more points, but all of these largely amount to a discrepancy between SQLite and Postgres, or between one Postgres version and another, leading to a defect. These points are a genuine concern, but his claim relies on using exactly one database back-end for testing, and exactly one risk profile for various applications.

As a quick diversion, I am not using the common definition of risk which is synonymous with chance. I am using a more stringent definition: "the effect of uncertainty on objectives" as specified in ISO Guide 73:2009. This definition often requires an assessment of both the impact and likelihood of some form of scenario to obtain a fuller picture of an "effect."

If the risk posed by defects caused by an SQLite-Postgres discrepancy is too high, then you'll likely want use Postgres as part of your testing strategy. If the risk posed is sufficiently low, then SQLite alone may be appropriate. These are predicated on the risk posed by defects, and the organisational appetite for risk.

A testing strategy comprising several different testing methodologies can often be thought of as a filter of several layers. Different layers are variously better or worse at surfacing different types of defects. Some are more likely to surface defects within components, and others are better at locating defects in the interactions between components. Other "layers" might be useful for catching other classes of defects. Each layer reduces the likelihood of a defect reaching production, which reduces the risk that defects pose. Each layer also has a cost associated with writing and maintaining that layer.

It's quite common for different layers to be run at different times. For instance, mock-based unit tests might be run very frequently by developers. This provides the developers with very quick feedback on their work. Integration tests backed by an in-memory database might be run prior to committing. These take a little longer to run and so might get run less often, but still catch most problems caused by erroneous component interactions. A continuous integration (CI) server might run integration tests backed by Postgres, and slower UI tests periodically. Finally, penetration tests might be conducted on a yearly or six-monthly basis.

This sort of process aims to allow developers the flexibility to work with confidence by providing quick feedback. However, it also provides heavier-weight checking for the increased levels of assurance required for the risk-averse organisation. An organisation with a greater appetite for risk may remove one or more of those layers, such as in-memory integration tests, to speed development. This saves them money and time but increases their exposure to risk posed by defects.

SQLite is just a tool which may be used as part of one's testing strategy. Declaring "Don't test with SQLLite [sic] when you use Postgres in Production" ignores how it may be usefully applied to reduce risk in a project. In many cases SQLite is entirely appropriate, as the situation simply does not require high levels of assurance. In other cases, it may form part of a more holistic approach along side testing against other database backends, or be removed entirely.

Not every organisation is NASA, and not every project handles secrets of national import. Most failures do not kill people. An honest assessment of the risks would ideally drive the selection of the testing strategy. Often-times this selection will be balanced against other concerns, such as time-to-market and budget. There is no silver bullet. A practical, well-rounded solution is often most appropriate.

Saturday, 25 July 2015

Infosec's ability to quantify risk

In Paul Graham's latest post, "Infosec's inability to quantify risk," Graham makes the following claim:

"Infosec isn't a real profession. Among the things missing is proper "risk analysis". Instead of quantifying risk, we treat it as an absolute. Risk is binary, either there is risk or there isn't. We respond to risk emotionally rather than rationally, claiming all risk needs to be removed. This is why nobody listens to us. Business leaders quantify and prioritize risk, but we don't, so our useless advice is ignored."

I'm not going to get into a debate as to the legitimacy of infosec as a profession. My job entails an awful lot of infosec duties, and there are plenty of folks turning a pretty penny in the industry. It's simply not my place to tell people what they can and cannot define as a "profession."

However, I do take issue with the claim that the infosec community lack proper risk analysis tools. We have risk management tools coming out of our ears. We have risk management tools at every level. We have those used at the level of design and implementation, for assessing the risk a vulnerability poses to an organisation, and even tools for analysing risk at an organisational level.

At the design and implementation level, we have software maturity models. Many common ones explicitly include threat modelling and other risk assessment and analysis activities.

One of the explicit aims of the Building Security in Maturity Model (BSIMM) is "Informed risk management decisions." Some activities in the model include "Identify PII obligations" (CP1.2) and "Identify potential attackers" (AM1.3). These are the basic building blocks of risk analysis activities.

The Open Software Assurance Maturity Model (OpenSAMM) follows a similar pattern, including a requirement to "Classify data and applications based on business risk" (SM2) and "Explicitly evaluate risk from third-party components" (TA3).

Finally, the Microsoft Security Development Lifecycle requires that users "Use Threat Modelling" to "[...] determine risks from those threats, and establish appropriate mitigations." (SDL Practice #7).

So, we can clearly see that risk analysis is required during the design and implementation of a system. Although no risk management methodology is prescribed by the maturity models, it's easy to see that we're clearly in an ecosystem that's not only acutely aware of risk, but also the way those risks will impact organisational objectives.

If these maturity models fail to produce adequately secure software, we need to understand how bad a vulnerability is. Put simply, statements like "On the scale of 1 to 10, this is an 11" are not useful. I understand why such statements are sometimes necessary, but I worry about the media becoming fatigued.

Vulnerabilities are classified using one of several methods. Off the top of my head, I can think of three:

Common Vulnerability Scoring System (CVSS)
DREAD Risk Assessment Model (Wikipedia)
STRIDE (Wikipedia)

These allow for those with infosec duties to roughly determine the risk that a vulnerability may pose to their organisation. Put simply, they allow for the assessment of the risk posed to one's systems. They are a (blunt) tool for risk assessment.

Finally, there are whole-organisation mechanisms for managing risks, which are often built into an Information Security Management System (ISMS). One of the broadest ISMS standards is BS ISO/IEC 27001:2013, which states:

"The organization shall define and apply an information security risk assessment process [...]"

If this seems a bit general, you should be aware that an example of a risk management process (which includes mechanisms for risk assessment & analysis) is available in BS ISO/IEC 27005:2011.

Let's look at the CERT Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) Allegro technical report:

"OCTAVE Allegro is a methodology to streamline and optimize the process of assessing information security risks [...]"

Similarly, Appendix A provides guidance on risk management, which includes sections on risk assessment and analysis.

Yet another standard is NIST SP800-30 Revision 1, "Guide for Conducting
Risk Assessments". It states it's purpose quite clearly in section 1.1 "Purpose and Applicability"

"The purpose of Special Publication 800-30 is to provide guidance for conducting risk assessments [...]"

NIST SP800-30 Revision 1 also provides an example of how to conduct a risk assessment.

As you can see, members of the infosec community have quite a few tools for risk assessment and analysis at our finger-tips. From the design and implementation of software, through to the assessment of individual vulnerabilities, and even for assessing, analysing, and mitigating organisational risk, we're well equipped.

The infosec community is often very bad at communicating, and the media likes a salacious story. How often have you heard that a cure for cancer has been found, sight returned to the blind, and teleportation achieved? Recently, members of the infosec community have played into this, but that does not eliminate the fact that we do have tools for proper risk management. Our field is not so naive that we blindly believe all risk to be unacceptable.