Category Archives: Web dev

Getting rsync to delete old non-empty directories containing excluded files

For some older projects we’re still deploying code with rsync, its not perfect but it works. Temporary files are excluded using --exclude-from=exclude.txt, this is great until the parent folder of an excluded file needs to be deleted.

For example, say your file structure looks like this:
Rsync delete file structure

And you remove the data/cache directory completely:

$ rm -rf data/cache
$ rsync --exclude="*.tmp" --exclude="data/logs/*.log"

You’re going to get the error “cannot delete non-empty directory: data/cache” because data/cache/config.tmp has not been removed by rsync.

Unfortunately --delete-excluded won’t work because you don’t want to delete the log files and other .tmp files created on the destination.

Fortunately rsync has filters (which include/exclude are shorthand for anyway), with a mode called “perishable”. Perishable excludes behave exactly as required, files are excluded from sync, and aren’t deleted on the destination unless they are in a directory that no longer exists.

The syntax for excluding files becomes:
$ rsync --filter="-,p *.tmp" --filter="-,p data/logs/"

The “-” signifies the filter is to exclude matches. The “p” makes the exclude perishable.

If, like us, you were using --exclude-from for the patterns, you can with this syntax:
$ rsync --filter="merge,p- /home/ideal/scripts/push/excludes/lenta.txt"

This information is all available in the rsync docs under “FILTER RULES”, it just took me a while to figure out the right syntax. Hopefully these snippets help someone out.

Parsing CIF files to get train schedule data in PHP

I’ve had some fun looking at what data you can get via the National Rail Open Data scheme, and was really impressed by the ActiveMQ implementation they’ve got for Real Time Train Movement Messages!

The messages National Rail send allow you to plot trains to stations, or even along route, but the message only contains station IDs or train IDs – which is kinda boring. I like to visualise the data, on a map for example.

To get the station name and location for a particular train message one has to access an entirely different database, from another provider. Along comes ATOC with their CIF files, which look very scary compared to ActiveMQ.

To cut a long story as short as I can, the CIF files contain a lot of information (400MB+ files), using string length and new lines to split up the data. You can sign up to download the files on the ATOC website, and the specification for the files is available here.

Why the blog post? Well, I needed a way to parse these files to populate a Mongo database and wanted to promote the PHP CIF parser I’ve started work on: https://github.com/rb-cohen/php-cif-parser

 

Getting the raw POST or PUT data of a request in ZF2 MVC

If you’re sending raw data to ZF2 over http (perhaps a json document), you can use the “getContent()” method of Zend\Http\PhpEnvironment\Request.

How does this work in practice?

In a controller you might want to do something like this:

1
2
3
4
5
6
7
8
9
<?php
namespace Application\Controller;</code>
 
use Zend\Mvc\Controller\AbstractActionController;
use Zend\View\Model\ViewModel;
 
class IndexController extends AbstractActionController {
    public function indexAction() {
        $data = $this->getRequest()->getContent();

But if you’re always expecting a certain format, say JSON, a controller plugin could give you back the array or object rather than a lot of json_decode calls in your controllers.

The plugin:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?php
 
namespace My\Mvc\Controller\Plugin;
 
use Zend\Mvc\Controller\Plugin\Params as ZendParams;
 
class Params extends ZendParams {
 
    public function fromJson() {
        $body = $this->getController()->getRequest()->getContent();
        if (!empty($body)) {
            $json = json_decode($body, true);
            if (!empty($json)) {
                return $json;
            }
        }
 
        return false;
    }
 
}

And then its as simple as:

1
2
public function indexAction() {
    $data = $this->params()->fromJson();

Help! Sometimes getContent() doesn’t return my data.

The method uses file_get_contents(‘php://input’) to get the raw data from the HTTP request. Unfortuantely, the php://input stream can only be read once per process. I did have a plug-in that called file_get_contents(‘php://input’) too, and if $request->getContent() gets called after this, it will be empty!

Wishing for a Fitocracy API

So its been a while since my last post. Rather than come up with some decent content I thought I’d waste time updating the theme of the blog and coming up with a new whizzy plugin. Progress, right?

I was hoping to get my Fitocracy stats up on the blog as a widget, but turns out its (next to) impossible at the moment. I get that an API isn’t always first on the agenda or something that users should expect to have – but it would be nice!

As soon as I can get a feed of my profile data I’ll be adding the widget and then releasing the plugin to the world. I’ve put in a request on the Fito forums, lets see if anyone has any ideas.

There are some other unofficial Fitocracy APIs out there, but none seem to work at the moment!