Source Code for /public/appendix-security/index.php

<?php
// Include the functions.php file, which provides functions we use all over the place:
require('../inc/functions.php');

// Render our HTML header, including this lesson's title
add_header('Appendix B: Security');
?>

<p>
  Security principles you need to be aware of when making your website "mildly dynamic" with PHP are:
</p>

<h2 id="xss">Don't trust user input! (XSS)</h2>
<p>
  Any data that comes from a user, whether you find it in <code>$_GET</code>, <code>$_POST</code>, <code>$_COOKIE</code>,
  or even e.g. their browser's name and version number (<code>$_SERVER['HTTP_USER_AGENT']</code>) can not be trusted.
  It can <em>always</em> be tampered with by a malicious user. So:
</p>
<ul>
  <li>
    If you're sending (e.g. <code>echo</code>) user input back to the user, or to another user, you should <em>escape</em>
    it first:
    <ul>
      <li>
        <code>htmlspecialchars()</code> is the best solution in almost all circumstances if you're outputting to a web page, e.g.:<br>
        <code>Thanks for visiting my site, &lt;?php echo htmlspecialchars( $_GET['name'] ); ?&gt;!</code>
      </li>
      <li>
        If you're outputting <em>into a HTML element</em>, you must also make sure that the element's value is properly wrapped in quotes, e.g.:<br>
        <code>&lt;img src="&lt;?php echo htmlspecialchars( $_POST['profile_pic_url'] ); ?&gt;"&gt;</code>. If you miss the quote marks, a user could
        submit both the image file name <em>and</em> an extra attribute like <code>myfile.gif onload=...</code>, adding their own JavaScript
        to your site.
      </li>
      <li>
        If you want users to be able to submit HTML code, consider requiring them to use a safer language like Markdown or BBCode and then convert
        that to HTML for display.
      </li>
      <li>
        If BBCode/Markdown isn't an option, you'll need to <em>sanitize</em> the HTML you're given by users. Note that
        while <code>strip_tags()</code> can remove non-allowlisted tags, it can't remove attributes or attribute values and so it's still
        vulnerable to attacks: if you're handling user HTML, consider a library like
        <a href="http://htmlpurifier.org/">HTML Purifier</a>.
      </li>
    </ul>
  </li>
  <li>
    If you're expecting user input to be one of a set of values, you should <em>validate</em> it before use. For example:
    <ul>
      <li>
        If you're expecting one of a set of values, <code>in_array()</code> is helpful, e.g.:<br>
        <code>if( in_array( $_POST['fave'], [ 'Rumi', 'Mira', 'Zoey' ] ) ) { $fave_demon_hunter = $_POST['fave']; }</code>.
      </li>
      <li>
        If you're expecting a number, something like <code>$age = intval( $_POST['age'] );</code> will coerce it into an integer
        (it'll be <code>0</code> if they put something completely invalid in).
      </li>
      <li>
        If you're expecting an email address, <code>$email = filter_var( $_POST['email'], FILTER_VALIDATE_EMAIL );</code> does
        a good job of checking it looks like a valid email address. It returns <code>false</code> if it's invalid.
        <code><a href="https://www.php.net/manual/en/function.filter-var.php">filter_var()</a></code> has other helpful
        filters for checking things like URLs, IP addresses, and more.
      </li>
    </ul>
  </li>
</ul>

<h2 id="xsrf">Before you change anything, ensure the user intended it (XSRF)</h2>
<p>
  A whole class of attacks are based on the idea that you can trick a user into doing something without meaning to. E.g.
  if your guestbook takes what comes from the <code>&lt;form&gt;</code> and adds it as an entry, what's to stop <em>my</em>
  site from embedding a form that submits to <em>your</em> guestbook? (If that's not something you care about, that's fine,
  but you should ask the question each time: a button that deletes a user's account might be much more-serious!)
</p>
<p>
  The simplest way to prevent this kind of attack is to give the user a random "token" when they visit the form, and then check
  for the token when they submit it. Because two users get different tokens, an attacker's token can't be used to trick a
  user into submitting a form. Here's how you might do that:
</p>
<ol>
  <li>
    On your form page, create a token and store it in a session variable (so it's available on the next page too), e.g.:<br>
    <code><?php $_SESSION['token'] = bin2hex( random_bytes( 32 ) ); ?></code>
  </li>
  <li>
    Put that token into a hidden field on your form, e.g.:<br>
    <code>&lt;input type="hidden" name="token" value="&lt;?php echo $_SESSION['token']; ?&gt;"&gt;</code>
  </li>
  <li>
    When the form is submitted, check that the token is present and matches the one you stored in the session, e.g.:<br>
    <code>if( $_POST['token'] !== $_SESSION['token'] ) { die( 'XRSF: Invalid token. Please go back and try again.' ); }</code>
  </li>
</ol>

<h2 id="path-traversal">Don't give users the files they ask for without checking first (path traversal)</h2>
<p>
  In <a href="/05-accepting-input/#fruits-and-filters">an example in lesson 5</a>, we allowed users to choose a fruit file
  on the server. We kept this safe by only allowing users to choose from a pre-approved list of fruit files, by sending the ID number
  of the one they wanted and looking it up again by ID number once they did.
</p>
<p>
  Furthermore, we used a simple <code>&lt;img&gt;</code> tag to display the fruit image, so the user can only see files that they
  would normally be able to, over the Web. We didn't have PHP read the file <em>for</em> them (PHP could potentially have permission
  to read any file on your server: even the ones not intended to be shared on the Web like your logs).
</p>
<p>
  If we needed to allow users to specify the file name themselves, we'd need to check that the requested file is within the
  allowed directory. Here's how we might do that:
</p>
<ol>
  <li>
    Run <code>realpath()</code> on both the the path the user requested and the path within which they're allowed to request files.
  </li>
  <li>
    Ensure that the start of the requested path matches the entire allowed path.
  </li>
</ol>
<p>
  You can see an example of this in <a href="/source-viewer.php?file=public/inc/functions.php">the <code>sanitize_file_path()</code>
  function used on this site</a>, which rejects any paths that are outside of the application directory of this site. To do this,
  it first runs <code>realpath()</code> on both the "allowed" path and the "requested" path, which turns them into full and absolute
  paths without e.g. any <code>../</code> sequences. Then it uses <code>strpos()</code> to ensure that the requested path
  <em>begins with</em> (i.e. is within) the allowed path; if not, it throws a 403 (Forbidden) error.
</p>

<?php draw_potion( 'green' ); ?>

<h2 id="cookies">How can cookies be secured?</h2>
<p>
  <a href="/06-retaining-state/">Lesson 6</a> demonstrated the use of the <code>setcookie()</code> function to set a cookie, which can
  then be retrieved on subsequent requests via <code>$_COOKIE</code>. But cookies can be tampered by users, so you don't want to trust
  them blindly.
</p>
<p>
  (Note that PHP's built-in session cookies (e.g. <code>$_SESSION</code>) are secured by default, so you don't need to worry about them.)
</p>
<p>
  There are two major approaches to securing cookies:
</p>
<ul>
  <li>
    <strong>Keep the data on the server</strong>: come up with a long random number and give <em>that</em> to the user in their cookie.
    Store the data on your server associated with that number. When the user comes back, you can use the number to look up the data you
    stored. This prevents users from seeing the data or tampering with it, because it's stored on your server. To impersonate somebody
    else they'd need to guess the long number their victim had been given.
    (This is the approach used by <code>$_SESSION</code> for session cookies.)
  </li>
  <li>
    <strong>Sign or encrypt the data</strong>: a second approach is to cryptographically protect the data in the cookie. This can either
    be done with a "signature" (which allows the user to <em>view</em> the contents of their cookie, but be detected if they tamper with
    it) or fully encrypt it (which prevents them from even seeing its contents). In either case, the signature or encryption must be
    done using a key that's known only to your server.
  </li>
  <li>
    Both approaches are valid. Storing the data on the server is usually preferred because it's less-complex, but if you don't have a
    way to consistently store data or if your website is spread over multiple servers that don't share a common data store, then you
    might prefer signing or encrypting cookies.
  </li>
</ul>
<p>
  Making (non-session) cookies tamper-proof goes beyond the scope of this cookery class!
</p>

<?php
// Render our HTML footer
add_footer();
?>