Category: meta

  • WordPress: There and Back Again

    WordPress: There and Back Again

    I am horrible about settling with which blogging software to use. Over the years, I have gone from WordPress to Ghost, Jekyll to Hyde. Most recently I did a lot of work to make the jump from WordPress to Jekyll to reduce my hosting costs, only to realize the barrier to writing became higher, and that’s just not the hurdle my neurodivergent brain wanted when in a mood to write long-form.

    Love it or hate it WordPress has put a lot into making their WYSIWYG editor, actually, well, what you get. I personally like the blocks and composability of the post or page you are working on (when compared to the “classic” editor, as well as other WYSIWYG platforms like Drupal or Joomla, but that’s a discussion for another time).

    WordPress 2: The Electric Boogaloo feat. ActivityPub

    Okay so it’s not really my second install of WordPress, but I can say it’s my second one carrying the older content of this iteration (please don’t hurt me, I just wanted to use the subtitle).

    The thing that really pushed me over the edge though was the possibility to integrate WordPress with ActivityPub (via a plugin) and have it show on my timeline on Mastodon.

    While the discussion of the death of Twitter is for another post, I hope it is a catalyst to usher in an age of interoperability of various platforms under the umbrella of ActivityPub, and there are signs that it may be headed this way (beyond Mastodon’s explosive growth).

    Tumblr? I thought you were dead

    Tumblr was not a name I had in my 2022 bingo card, but Matt has been making moves. Back in 2019, Automattic acquired the Tumblr for a deal from Yahoo. Me, like many others, kinda just filed that news into the archives of our brains.

    “Huh, neat. I like Matt Mullenweg and Automattic. Hope they do good things with it.”

    – Me, 2019

    The Reports of my Death are Greatly Exaggerated

    And good things they did. Matt announced at the beginning of this month that they would be reversing the ban on nudity, and more recently that they would be adding support for ActivityPub. Obviously these were playing into the timing of the death of Bird App, but good moves nevertheless.

    Why I mention this in a post about WordPress though is I have a small inkling that they will be making ActivityPub a core part of WordPress. I hope so. This is just pure conjecture, and if they have already started work or announced they are going to, that’s pure coincidence with when I am writing this.

    I don’t mind a few plugins, but like to keep my install lean and would prefer to use functionality built into WordPress as there’s a support path going forward. A first-party plugin would also be acceptable as I can understand if not everyone want that social functionality built into their blog.

    But the Bots!

    Yep, I am fully aware of the fact that bots try all day long to get into WordPress sites, I see them in my CloudFront statistics trying to login to sites that are static files on S3.

    Bots trying their hardest but failing

    I wanted to come up with a solution that would allow me to access the actual server hosting the stack without issue, but for the public serve completely through a CDN to help with global distribution, and block access to the admin panel.

    The actual WordPress site is a Docker stack, configured through a docker-compose.yml file, completely self-contained. It’s fronted by Caddy, which handles the origin’s SSL.

    The tricky bit is “serving” the site using only the one hostname so I could access the origin directly just by setting a DNS override in my router to point at the actual origin, but not otherwise exposing the site to any strange passerby (i.e. not running it as the “default” site).

    I set Caddy to serve from the WordPress container only to the hostname – blog.emily.sh – and a second domain to serve a dummy blank page (only a single line in Caddy!) to make CloudFront happy and validate that dummy as the origin. This is so CloudFront is sending it’s requests to the IP that’s actually the origin, as it doesn’t accept naked IPs. You could alternatively use the ec2-generated hostname, but I apparently forgot to turn that on for the VPC subnet I deployed into, so any other domain pointing to that IP works just as well.

    An example Caddyfile, though I replaced the dummy domain with an example one so it’s not that easy to find the origin.

    blog.emily.sh {
            root * /var/www/html
            php_fastcgi wordpress:9000
            file_server
    }
    dummyhost.this.is.yours {
    	respond "Hello"
    }

    I created a Lambda@Edge function to rewrite the ‘Host’ header in flight before hitting the origin so Caddy knows to actually serve my blog instead of the dummy site. I wanted to write this as a CloudFront function directly, but they only allow you to attach those to Viewer Request and Viewer Response events, and I needed to rewrite the headers on the Origin Request event.

    For a sort of visual:
    Viewer Request → CloudFront → Lambda@Edge → Origin Request

    Et voilà, some obfuscation from the bots. Don’t get me wrong, it’s not perfect and easy enough to work around if you knew a certain IP was the origin, but good enough to cut down on a lot of the chatter, and allow me to completely bypass the CDN and still be secure. Security is about layers though, and this is just one of many (I know, I know, security through obscurity isn’t actual security but tell me how many less bots you get when you don’t run SSH on 22).

    A .well-known Aside

    This should pass through without issue, though I made a behavior for that route to not cache at all on the CDN. This way, when it comes time to renew the origin SSL cert, Caddy should have no issue, though I haven’t tested that flow yet. I do know ActivityPub correctly populates it’s .well-known entry though, so I do not think there will be any issue with SSL renewals.

    Likewise, I have a behavior for the admin route that simply requires any request to be signed with IAM credentials authorized to make that call on my AWS account. Since those are kinda hard to come by, I deem it as an acceptable way to deal with it. I could have written a Lambda to immediately return any given status code, or direct to S3 hosting a static page, but why pay for the compute and/or storage to make that happen. If someone that’s not me has access to my AWS account I have much bigger things to worry about than bypassing my CDN.

    There will be a tutorial on doing this for your blog soon! Just need to find the time to write it.