Skip to content

Fixing Mojibake / borked Unicode text

Here’s a problem I’ve recently hit at work. I hit the google geocode API with a couple of thousand addresses and stored the JSON results. Using the WebClient DownloadString method I didn’t think to set anything for the encoding. Browsing through the output files I see “Dorfstraße” where I’m expecting to see “Dorfstraße”. Urgh, mojibake!

Here’s my understanding of what went wrong:

  • Google’s API serves down the UTF8 bytes for “Dorfstraße”: 0x44, 0x6f, 0x72, 0x66, 0x73, 0x74, 0x72, 0x61, 0xc3, 0x9f, 0x65
  • WebClient Encoding uses the system default. In my case, Windows-1252. If I had set this property to UTF8 I wouldn’t have these Mojibake files.
  • In UTF8 the ß character is represented with the bytes 0xc3, 0x9f
  • WebClient interprets these bytes with Windows-1252 gets: “Dorfstraße”
  • Saving this with File.WriteAllText “uses UTF-8 encoding without a Byte-Order Mark”. This turns “Dorfstraße” into the bytes: 0x44, 0x6f, 0x72, 0x66, 0x73, 0x74, 0x72, 0x61, 0xc3, 0x83, 0xc5, 0xb8, 0x65
  • Open the file without a BOM and you see: “Dorfstraße”

I’ve got the files on my disk. How would I fix it? Here’s the plan to put things in reverse..

  1. read in the file bytes and interpret as UTF8 so we are back to “Dorfstraße”
  2. get the Western European (Windows) bytes for that string
  3. interpret those bytes as UTF8

…the code:

Amazon Web Services Presentation

Well it’s been a while since I’ve updated the blog. I have been doing plenty of tinkering just need to write up a ‘catch up’ post in the next few days. In the meantime here’s a presentation I gave on Amazon Web Services for Sydney’s ALT.NET group.

  • 0:00 Intro
  • 0:43 Quick summary of the services
  • 4:29 Using the services
  • 5:35 Query Request code demo
  • 9:10 EC2 intro
  • 11:36 EC2 code demo
  • 20:29 S3 intro
  • 21:19 S3 / CloudFront code demo
  • 29:21 Mechanic Turk Intro
  • 31:35 Mechanic Turk code demo

It’s best to watch it full screen on the HD resolution.

Google Authenticator One-time Password Algorithm in Javascript

I’ve recently setup 2-factor authentication on my Google account.  The new 2nd factor or “thing you have” is a smartphone application which generates 6 digit one-time passwords.

I was a bit surprised when I stumbled on this article Two Factor SSH with Google Authenticator. Turns out the algorithm used to generate the OTPs is an open standard. When you set-up an account in the smartphone app you are storing a key that’s used to create a HMAC of the current time.

You can read the specifics of the algorithm in the TOTP RFC Draft.  I really like the idea that you can use the smartphone app to generate OTPs for your own application.  I’ve implemented the algorithm in javascript on jsfiddle.   Javascript is nice and readable, but please don’t implement your verification client side! 🙂


  • 2012-Sept-6: jsSHA moved location
  • 2012-Sept-12: Something suspect about the way I’m converting BASE32 to bytes. Changed it to grab full bytes from the binary string, and ignore anything left over.
  • 2014-April-22: Github not a CDN anymore.. 🙂 Moved references to bootstrap and jsSHA

A Quick Intro to Facebook’s Auth + Graph API

Update 2011-02-19: Facebook appear to have changed their documentation a bit; I’ve just gone through the article to keep everything in line with the Facebook documentation. The old authorization URLs still work, just hoping to save any confusion.

Another quick blog article just to settle/document some concepts in my own head, and hopefully provide a quick intro to someone out there.

The Facebook Developers site documents all the APIs that you can use to integrate Facebook.  Coming into this with no knowledge of the API can be pretty overwhelming.  This article is a quick working example of authentication and the graph API.  You can read the Facebook references here:  Authentication and Graph API.

The authentication here is the OAuth 2.0, this is the protocol you’d use if you want to write server side code to allow people to log into your site via Facebook, or link existing accounts.  The authentication process also gives you an access token needed to read/write Facebook statuses/photos/etc via the Graph API.

  • First create an application here:
  • You will receive an <App ID> and an <App Secret>
  • You can use this to build an ‘authorize’ URL. This is the location you’ll send your users to initiate the authentication, e.g. on a ‘log in with Facebook’ button on your website:<App ID>&redirect_uri=http://localhost/blah.aspx&scope=offline_access,publish_stream,user_photos,read_stream<App ID>&redirect_uri=http://localhost/blah.aspx&scope=offline_access,publish_stream,user_photos,read_stream

    • redirect_uri – the user will be redirected here after they’ve authorized your application
    • scope – the permissions you are requesting to access, more info: Extended Permissions
    • Check out Dialog Form Factors to display a pop-up or mobile styled authentication page
    • Check out OAuth Dialog‘s display property to display a pop-up or mobile styled authentication page
  • The user is displayed a page to authorize the access your application is requesting:
  • The user ‘allows’ your application and gets redirected back to the location you specified in the authorize URL, with a <code> attached:
  • Your server now takes the code and exchanges it for an access token, performs a GET against the access_token URL:<App ID>&redirect_uri=http://localhost/blah.aspx&client_secret=<App Secret>&code=<Code>
  • Facebook responds with an access token:
    access_token=<access token>

Your application is now authorized to read/write data with the permissions of the user via the Graph API.  Hit some URLs to get a JSON response from facebook:

  • The feed:<access token>
  • Photo albums:<access token>

Even better upload a photo on behalf of the user!  Download cURL for your platform: cURL download.  Run cURL from a command-line:

-F access_token=<access token>
-F source=@IMG_2693.jpg
-F "message=Testing yet again!"

The -F arguments build a form POST, the @ attaches the file as a file upload, don’t forget to escape any pipe characters in your access_token with a caret.

So that’s a quick intro to what is possible via the graph and authentication APIs. I see plenty of other APIs on the developer site I haven’t tinkered with yet. If you see anything interesting leave me a comment, and I might pull it apart in a new blog post!