Saturday, May 8, 2010

How csplit turned my software into hardware

I'm typing this text with my penis.

This is made possible largely due to the erection inducing side-effects of csplit — a program I'd never used until today.

Story time!

Once upon a time in a far distant land, there was a handsome, charming, young prince who had never known true love... Meanwhile, somewhere else, I was sitting at my desk, scratching my head, wondering how the cunting fuck I was going to extract the tiny part of this 1.8GB database dump that I needed. The dump was generated with something like
mysqldump --all-databases > all_databases.sql
giving me a file with a metric fuck-load of sql to create databases, and insert data into them — from which I needed one particular database.

Of course I tried to edit it with Vim first (before realizing how large it was) and had to kill it before it started eating into my swap. Gedit gave me similar results, and Emacs just told me to fuck off. However, a quick trip to google gave me the answer: csplit! Following which my four foot blackzilla punched a hole through the front of my jeans and stabbed me in the eye.

Anyway, I went over to my dump file and grepped the bastard to find out the order that the "CREATE DATABASE" statements appeared in and then snipped out the one I wanted with these hacks (where foo_database is the database I wanted to extract, and bar_database is the next one in the file):
csplit all_databases.sql '/CREATE DATABASE.*foo_database/' '/CREATE DATABASE.*bar_database/'
This outputted three files (xx00, xx01, xx02):
  • xx00 containing everything up to the first match of /CREATE DATABASE.*foo_database/ (a regex)
  • xx01 containing everything between the two regexes (including the line matching the first regex)
  • xx02 containing everything after the second regex (/CREATE DATABASE.*bar_database/)
This put the code that I wanted into xx01. Very handy!

A related tool that could also have been useful is plain split. For example

split --bytes=10000000 a_large_file
would split a file up into 10MB chunks.