On syncing data

I’m working on an ETL pipeline, which pulls from a REST API, does some transformations, and upserts it into a database. The REST API occasionally has invalid data (e.g. alphabet characters in the zip code, or misspelled US state names), and also occasionally throws HTTP 500 error pages.

I’m supposed to stall the entire pipeline whenever an error is thrown by the REST API, but the errors occur somewhat often (it’s relatively young software) and may take a few days to fix. Since the pipeline syncs fixed windows of data (one day at a time), there may be gaps if the pipeline is stalled for several days, which necessitates either a backfill task to be run manually, or making the daily job smart enough to run from the start time of the last successful job, rather than from a fixed time span back.

We have a daily task that syncs data updated in the past day, and a separate backfill task that syncs data from an optional start date (the beginning of time by default) to a specified end date, but maybe we should ditch the daily tasks and modify the backfill task to track its latest sync date and run daily. In hindsight, I think we should’ve done that from the beginning.

Testing data processing applications

We’re testing a data processing applications. There are multiple pipeline stages, the boundaries (i.e. inputs and outputs) of which are well defined. The bulk of the code deals with reshaping data from one form to another; there is very little functional logic. Therefore, testing should be focused mainly on how the code reacts to:

  • Null data – when certain fields have null/nil/None value
  • “Empty” data – when certain fields are populated with the empty string “”, which is distinct from null/nil/None.
  • Missing fields – when not just the value is missing, but the field (or column) is missing entirely from the input
  • Improperly formatted data – from fields where the data is slightly off, like different date string formats, all the way to fuzz testing.

Also to test:

  • Validate input schema – especially if you have no control over it
  • Validate output schema – This is basically a regression test

I think a test harness for this should:

  • Make it easy to maintain/refresh test data. This may involve pulling inputs from your data sources, but testing shouldn’t be interrupted if the refresh fails.
  • Have designated “base” input objects
  • Have API calls for modifying input and then validating the output without having to manually reset the input object
  • Make sure the output schema is valid
  • Make sure the output values fall within a valid range

Apple push notifications on Amazon SNS from node.js

We’re going to hook up three things today: Apple Push Notification Service (APNS), Amazon Simple Notification Service (SNS), and node.js. This has the potential to be a mega-post, so instead, I’m going to write an annotated checklist. Maybe later I’ll turn it into a multi-part series.


Let’s talk about how all of these are hooked together. First, we want to send push notifications to iOS to alert users that something happened. We must use APNS to push to iOS. But eventually, we may want to develop an Android version of our app, but then we need to configure a Google app for that (Google Firebase can send push notifications to both iOS and Android). But our focus is on Amazon SNS, not to mention that SNS also supports Baidu cloud messaging and Windows Phone. To interface with SNS, Amazon has a node.js library called aws-sdk.

Setting up Apple Push Notifications
  1. Create an App ID in Apple Dev Center with “Push Notifications” enabled. Create a certificate for development and production and download them.
  2. Add the App ID to a provisioning profile.
  3. In XCode, go to the project settings. Update the provisioning profile. That should associate the push notification certificates with your app.
  4. In XCode, go to the project targets. Go to the “Capabilities” heading and enable “Push Notifications”. Then scroll down to “Background Modes”, enable it and check the box for “Remote notifications”.
  5. In app code, get the device token. I’m not going into detail here, but if you’re writing natively, use didRegisterForRemoteNotificationWithDeviceToken, or if you’re using the react-native-push-notification package, add a handler for onRegister: function(token) { … }

The iOS simulator cannot receive push notifications, so device token calls on the simulator will always fail.

Setting up Amazon SNS

First, a word about SNS concepts. There are multiple avenues for a user to be subscribed for notifications: Topics and applications. Topics are for broadcasting to a group of users subscribed to the same topic. Applications are for sending notifications to specific endpoints. In this case, we’ll use applications.

  1. From the AWS console, go to the SNS dashboard. Go to “Applications”, then “Create Platform Application”. Enter an application name, then upload the APNS certificate that you downloaded from step 1 of setting up APNS, above. After choosing the .p12 file (you may have to export it from Keychain Access), click “Load credentials from file”, then finish up by clicking “Create platform application”. Take note of the ARN of the newly created platform application; you’ll need it later.
Getting aws-sdk on node.js
  1. In your node.js project, run npm install aws-sdk --save.
  2. Configure your AWS credentials: on Linux/MacOS, create a file called ~/.aws/credentials. On Windows, create a file called C:\Users\USER_NAME\.aws\credentials. Hopefully you still have your AWS credentials from when you created your AWS account, because this should be contents of the credentials file:
    aws_access_key_id = YOUR_ACCESS_KEY_ID;
    aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
  3. At the top of your source files which will call SNS, add the following lines:
    var AWS = require('aws-sdk');
    AWS.config.update({region: 'us-west-2'});
    var sns = new AWS.SNS();

    Region should be filled in as appropriate. AWS calls will not work unless you configure the region.

Updating the device token from node.js

Before you chain all the SNS calls together, you might want to promisify them first. All the SNS calls are of the form sns.apiCall(params, callback) where params is an object containing the call parameters, and callback is a function(err, data). Promises are another topic I’m not going to detail, but here’s an example of a manually promisfied API – it will save you from Javascript callback hell:

function getEndpointAttributes(endpointArn) {
  return new Promise(function(resolve, reject) {
    var params = {
      EndpointArn: endpointArn
    sns.getEndpointAttributes(params, function(err, data) {
      if (err) {
  1. First, you should determine if you already have the user’s device token AND endpoint ARN stored someplace like your application database.
  2. If you don’t have the endpoint ARN, call sns.createPlatformEndpoint with the platform application ARN (step 1 from “Setting up Amazon SNS” above) and the user’s device token (you should code your mobile app to send this as a parameter),which will return the newly created endpoint ARN. The endpoint ARN is Amazon’s address for pushing notifications to your mobile app on a specific device. Save both the device token and endpoint ARN to some place like your application database.
  3. If you DO have the device token and endpoint ARN, compare the device tokens. Device tokens are mostly persistent, but are liable to changing once in a while. If the device token you received and the device token you stored are the same, there’s nothing else to do.
  4. If the device token you received and the device token you stored are different, then you need to delete the old endpoint by calling sns.deleteEndpoint with the old endpoint ARN, and then call sns.createPlatformEndpoint as detailed in step 2.
Sending a push notification
  1. Call sns.getEndpointAttributes with the user’s endpoint ARN. Make sure that the attributes have the “Enabled” property set to true, otherwise don’t send the notification.
  2. Call sns.publish with var params = { Message, Subject and TargetArn }. TargetArn should be the user’s endpoint ARN.

If the notification fails to send, the endpoint may have been silently disabled. SNS will disable an endpoint if the notification service (APNS or GCM) tells it that the device token is invalid. That may be caused by a bad certificate, or the device token expiring without a new one being uploaded for that user. To find out why, you can go to the SNS console, select your platform application, go to the “Actions” menu and select “Delivery status”. From that dialog, you can create an IAM role to log delivery failures to CloudWatch.

Lumens vs Candelas vs Lux

Whether you’re growing plants or have an obsession with flashlights, you will encounter the terms candela, lux and lumens. These all refer to measuring light, but what’s the difference?

Radiant flux

Let’s start from the beginning. Light is composed of photons, which have a specific energy depending on the wavelength. The radiant flux is simply the amount of light energy being emitted per second. So the units for this is joules per second, also known as a watt.

Luminous flux and the lumen

However, radiant flux doesn’t take into account the sensitivity of the human eye to certain wavelengths. To account for this, the power of each wavelength being emitted is weighted with a luminosity function, which models the sensitivity of the human eye. The unit for wavelength-weighted power is called the lumen.

A lumen is the total power emitted by a light source, weighted by the sensitivity of the human eye to certain wavelengths.
A lumen is the total power emitted by a light source, weighted by the sensitivity of the human eye to certain wavelengths.

Example: The human eye is most sensitive to greenish-yellow light at 555 nm wavelength. So if we had a single green (555nm) LED and a single red (650nm) LED emitting 1 watt of radiant flux, the green LED would actually have a higher lumen value than the red LED.

Factoring in the sensitivity profile of the human eye has implications. This means luminous flux and the lumen are meant to measure light as perceived by humans, which makes it a unit more useful whenever humans are involved. This is great for light bulbs and flashlights, but that means it’s less than ideal for non-human applications, like grow lights for plants.

Luminous intensity and the candela

Luminous flux and the lumen take into account the wavelength(s) of light being emitted, but they don’t account for the direction. So the next step is measuring light in a certain direction. If we lived in 2 dimensions, this would be radians or degrees, but since we live in 3 dimensions, we use the unit for solid angles, which is a steradian (just like how there are 360 degrees or 2π radians in a circle, there are 4π steradians in a sphere). So luminous intensity measures how much light power (lumens) is concentrated into a solid angle. This unit is called the candela: lumens per steradian.

Example: A light source putting out 2 lumens is focused so all its light is in a 1 steradian beam. This luminous intensity of the light beam is 2 candelas. Now, the beam is focused into a tighter, 0.5 steradian beam. The light source isn’t putting out any more or less power, but the light was focused into a more intense beam, so the light beam is now 4 candelas.

A candela is a lumen per steradian (solid angle).
A candela is a lumen per steradian (solid angle).

A common candle has luminous intensity of roughly 1 candela. I found this confusing, because candles emit light in (almost) all directions, like a sphere. Since there are 4π steradians in a sphere, the total human-visible light output of the candle is roughly 4π lumens, or about 12.57 lumens.

You may have heard of candlepower – it’s an obsolete unit of luminous intensity equal to 0.981 candelas.

Then what’s lux?

We just went over how a candela is a lumen per solid angle. Lux is a similar unit that measures lumens per area. Instead of lumens per steradian, lux is lumens per square meter. When you’re measuring the amount of light hitting a surface, it’s called illuminance; if you’re measuring the amount of light coming from a surface, it’s called luminous emittance. Illuminance is sometimes called “brightness”, but that term can have multiple meanings.

Example: You have 2 identical flashlights that each emit 2 lumens in a 10° angle.  If you pointed one of the flashlights at a wall 1 meter away, and the other flashlight at a wall 10 meters away, the patch of wall illuminated by the first flashlight has a greater illuminance (higher lux) than the patch of wall illuminated by the second flashlight. That is, both flashlights have the same luminous flux and luminous intensities, but one patch of wall is “brighter” than the other one.

A lux is a lumen per square meter.
A lux is a lumen per square meter.

Lumens, candela and lux are great for human applications, but in the world of hydroponics and grow lights, they’re pretty useless. Red and blue wavelengths drive plant growth, and green light is pretty much wasted on plants. When you’re buying or building an LED grow light, the spec sheet should specify radiant flux in watts (or radiant intensity in watts per steradian). Lumens are meaningless in the context of grow lights unless you know the emission spectrum of the lights in your comparison.

    • Lumens and watts can be thought of the as the raw number of photons being pumped out by the light source, but lumens take into account the sensitivity of the human eye.
    • Candelas take direction (i.e. “tightness of the beam”) into account (lumens per steradian), and so measures the luminous intensity.
    • Lux takes area into account (lumens per square meter), and so measures the illuminance or “brightness” of an area.

Simple UI Animations with react-native

Facebook’s react-native platform has an animation API that lets you animate Text, Image or View components. You can animate properties of the components, then make them play on an event, chain them together sequentially, or play them all together.

Styles you can animate:

Here are some stylesheet properties of components that you can animate:

  • Opacity
  • Width
  • Height
  • Translation offsets
  • Rotation
Animation variables:

One-dimensional values to be animated (such as opacity or height) are stored in an Animated.Value object, while two-dimensional values (such as XY translation) are stored in an Animated.ValueXY object. These are initialized like this:

import { 'Animated' } from react-native;
this.state = {
  opacityValue: new Animated.Value(255),
  translation: new Animated.ValueXY(0, 0)
Specifying animation types

There are three ways to procedurally change the value of an animation variable: spring, decay, and toValue. To call these functions, you must specify which variable is being configured, and a configuration object, with parameters described below. In addition, the target value at the end of the animation is called toValue.

  • Spring is straightforward to use; it causes the value to bounce, and you can optionally set the friction (“bounciness”) and tension (“speed”).
  • Decay is also straightforward to use; it’s simply an exponential decay function where you set the rate of decay (e.g., 0.97) and the initial velocity, which is required.
  • timing has 3 parameters that you can set: delay and duration are obvious, but the third one, easing, requires a deeper understanding to use. The default easing function is linear, but if you want to override it to something like Easing.sin(t) or Easing.bezier(x1, y1, x2, y2), you must add/integrate the line “import { ‘Easing’ } from ‘react-native'” to the top of your source code. The easing functions you can use are here.

Example to get something to fade out over 200ms:

Animated.timing(this.state.opacity, {toValue: 0, duration: 200})

You can also set the value directly simply by calling Animated.setValue().

Assigning animation variables to a component

Using the above example of opacity, just set a stylesheet property:

render() {
  let myStyle = {
    opacity: this.state.opacityValue,
  return (
    <Animated.View style={myStyle}>

Note how <Animated.View> is used instead of <View>. For Animated.ValueXY variables, you may have to get the X and Y values directly by accessing myXYValue.x._value and myXYValue.y._value.

Calling the animation

To start an animation, you simply call .start() on the Animated object, optionally passing a callback function.

  {toValue: 0, duration: 200}).start(() =>
    Alert.alert('Animation done!'))

The callback function is how you get animations to loop. At the time of writing, react-native does not have built-in parameters for looping an animation.

Sequential and parallel animations

You can start animations in sequence or in parallel by calling Animated.sequence() or Animated.parallel() with an array of Animation calls. It’s better explained by an example:

  Animated.spring(this.state.heartSize, { toValue: 1, friction: 0.7, tension: 0.4 }),
  Animated.spring(this.state.circleSize, { toValue: 0.5, friction: 1.0, tension: 0.3 }),
  • Create Animation variables by using Animated.Value()
  • Assign Animation variables to stylesheet properties attached to <Animated.View>, <Animated.Text> or <Animated.Image> tags
  • Configure an animation by calling Animated.timing(), Animated.spring(), or Animated.decay()
  • Start an animation by calling .start() on the configured animation
  • Start animations in sequence or parallel by wrapping multiple animations in an array and passing the array as a parameter to Animation.sequence() or Animation.parallel()
  • For more details, see the Animated API reference or the react-native Animations guide
Full code example
import React, { Component } from 'react';
import { Alert, Animated, Button, Text, View } from 'react-native';

export default class AnimationView extends Component {

  constructor(props) {
    this.state = {
      opacityValue: new Animated.Value(1),
      heartSize: new Animated.Value(0.5),
      circleSize: new Animated.Value(1.0),

  render() {
    let textStyle = {
      opacity: this.state.opacityValue
    let heartStyle = {
      transform: [{scale: this.state.heartSize}]
    let circleStyle = {
      transform: [{scale: this.state.circleSize}]
    return (
          <Button onPress={() => {
              Animated.timing(this.state.opacityValue, { toValue: 0, duration: 1000 }).start(() => Alert.alert('Animation done!'));
            title="Fade text away"
            color="powderblue" />
          <View style={{backgroundColor: "powderblue"}}>
              <Animated.Text style={textStyle}>Fade away</Animated.Text>

          <Button onPress={() => {
                Animated.spring(this.state.heartSize, { toValue: 1, friction: 0.7, tension: 0.4 }),
                Animated.spring(this.state.circleSize, { toValue: 0.5, friction: 1.0, tension: 0.3 }),
            title="Initiate love"
            color="pink" />
          <View style={{flexDirection: "row", backgroundColor: "white"}}>
              <Animated.Image style={heartStyle} source={require('./heart.png')}/>
            <Animated.Image style={circleStyle} source={require('./circle.png')}/>


Receiving and processing emails in Azure vs. AWS

One big difference I noticed between Amazon Web Services and Windows Azure is that Azure doesn’t have built-in support for sending and receiving emails, while AWS has Simple Email Service (SES) (Aside: of my friends at Amazon says they’re running out of words to put between “simple” and “service”). This is probably because the power of the cloud has been harnessed to send spam emails in the past, and indeed, Azure compute IP blocks have been added to spam blacklists. To send emails from Azure, Microsoft recommends using an external service like SendGrid.

However, what if you want to receive emails in Azure? I did some digging and there isn’t really a well-defined way. The only plausible results I found were setting up a compute role with a third-party SMTP server, or using an external service like Postmark, which cleans up email for you before calling a user-provided webhook. Postmark is nice because you can configure a spam filter before your webhook is called, which you would have to do manually if you were to run your own SMTP server.

Now let’s take a look at Amazon SES. SES has been around for a couple years now (the documentation was released in 2011). You can create a ruleset to chain multiple actions together when an email is trigger. For example, the first action can copy the email to an S3 (Simple Storage Service) bucket, followed by triggering a Lambda function (which is a serverless piece of code) to process the email, and the final action can be to make an SNS (Simple Notification Service) call to send a push notification to a mobile app, all from within AWS.

Amazon can even register your domain name using Amazon Route 53, and you probably should use that as well, because transferring an existing domain is a 7-step hassle. Also, if you registered or transferred your domain recently, ICANN rules prohibit you from transferring again for 60 days. Don’t shoot yourself in the foot by registering a domain name with another registrar within 60 days of using SNS, or you won’t be able to use that domain for receiving email until the prohibition expires.

So if you’re trying to send and receive emails in Windows Azure, you’re pretty much forced to use a third party service, whereas if you’re with Amazon, you can do everything from within AWS.

Getting Started with Azure Functions

I was investigating Azure Functions not so long ago when I came across the Azure Function challenge – a set of tasks posted by Microsoft for people to try out and learn about Azure Functions. Azure Functions is a relatively new (at time of writing) service offered by Microsoft, and is Microsoft’s version of AWS Lambda functions. It allows you to write a snippet of code which is triggered by certain events, and you’re charged by the number of times your code is executed and the memory consumption * time (measured in GB-seconds) of your executions. Here’s how I got started.

Editing tools

The first method of actually getting the code into Azure is directly through portal.azure.com. The second method is GitHub combined with Visual Studio Tools for Azure Functions, but it’s still in preview, has a particular set of prerequisites to install, and has a few known issues. However, the biggest advantages is that you get IntelliSense and you can run the code locally in an Azure simulator.


Triggers are what cause your Azure Function to run, and you can select which trigger either through the Azure Portal, or by editing the file function.json in the Azure Function’s project directory. A few examples are triggers are BlobTrigger, HttpTrigger, QueueTrigger, and TimerTrigger. The file function.json is where you configure triggers and data bindings.

Input and output binding

Once you get the trigger set up, you most likely will want to get at the input parameters, and also read or write to Azure Storage. For HTTP triggers, the HTTP request is bound by default to a request parameter called req, and the HTTP response is the return value. The example code provided by Microsoft does a good job of illustrating how to use both of them.

Accessing Azure Storage

Sometimes you want to access Azure Storage to… well, to store something from the request. I found the documentation here kind of lacking (Azure Functions hit v1.0 in November 2016) and figured this out through a lot of trial and error, so I hope someone finds this useful.

To write out to an Azure Table, first you need an actual Azure Storage Table… Assuming you have one set up, you need the account name, account key, the storage account’s connection string, and the target table name. Give the table parameter a name like tableBinding, which you will use in code. This is what my configuration looks like in the Azure portal:

This is what the corresponding function.json looks like:

  "bindings": [
      "type": "httpTrigger",
      "direction": "in",
      "webHookType": "genericJson",
      "name": "req"
      "type": "http",
      "direction": "out",
      "name": "res"
      "type": "table",
      "name": "tableBinding",
      "tableName": "AzureChallengeTable",
      "connection": "azurefunctionsc99a8a83_STORAGE",
      "direction": "out"
  "disabled": false

Note the “connection” property, which is set to an environment variable which can be defined in your appsettings.json file locally, or in the Azure Functions portal under Function app settings > Configure app settings > App settings section, and contains the connection string to the storage account. This is what allows you to access the Azure Storage account if you’re running locally.

  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "",
    "AzureWebJobsDashboard": "",
    "azurefunctionsc99a8a83_STORAGE": "ConnectionStringHere"

We configured the table binding to be called tableBinding, so our function signature now looks like this:

public static HttpResponseMessage Run(HttpRequestMessage req, CloudTable tableBinding, TraceWriter log)

You will also need to add #r Microsoft.WindowsAzure.Storage to the top of the run.csx file and add using Microsoft.WindowsAzure.Storage.Table (and import the Nuget package if you’re coding from VS). Now, we can insert objects that are a subclass of TableEntity (make sure you define how PartitionKey and RowKey are calculated) with a line like this:


To read:

var tableResult = tableBinding.Execute(TableOperation.Retrieve<MyEntity>(partitionKey, rowKey));
var myEntity = tableResult.Result as MyEntity;

If you wrote your code in VS, you can upload the project to GitHub, then use the Azure Functions portal to deploy the code to GitHub. One very important thing I noticed here is that the *.json and run.csx files must be at the top level of your repo; if you have multiple functions residing in multiple directories in the repo, Azure won’t find the code. So this basically means you need one repo per Azure Function.

One other quirk I noticed was that sometimes I got an HTTP 4xx response when testing, which was caused by an Azure bug with function keys. The workaround for this is to call the Azure Function with the key for the function, not the admin key (which is shared by all your Azure Functions).

In conclusion

Azure Functions (and AWS Lambda) is promising because it is simple, covers some very useful and common scenarios, and is easy to scale, but I found the documentation kind of spotty and hope it will improve in the coming months.

Raspberry Pi time lapse plant cam

Recently, I decided I wanted to take time lapse photos of plants in my hydroponics rig as they grew. I looked around to see if a device existed which would do that job, since a couple of friends bought and set up security cameras for their houses over the winter holidays. Everything I found was overkill for my needs; what I basically needed was a laptop with a webcam to take photos every minute or so and stitch them together into a video. So, I bought a Raspberry Pi 3 from Amazon and hooked up my Microsoft LifeCam. Here’s how I set it up:

Raspberry Pi

The Model 3 B comes with wifi built in, so all you really need is a webcam and a power supply. It’s powered by a 5V 2A micro USB port, so you need something like a wall charger – PC USB ports don’t supply enough current.


It’s a Microsoft LifeCam 720p Cinema. I forgot where I got it, but I think I won it as a prize a few years ago. Make sure to check online if your webcam will work with Linux/Raspbian.

Out-of-the-box configuration

I downloaded the non-GUI version of Raspbian and imaged it to an SD card; after plugging in the keyboard, monitor, and power supply, the first thing I did was change the password of the default account by running passwd. The second order of business was getting the wifi connected by editing /etc/wpa_supplicant/wpa_supplicant.conf. While editing the wpa_supplicant.conf file, I noticed the keyboard mapping was set to Great Britain by default, so I ran sudo raspi-config to set the locale, time zone, keyboard layout, and enabled the SSH server so I wouldn’t need to keep my keyboard and monitor plugged in.


With localization configured and the wifi successfully connected, it was time to start downloading packages. I started with sudo apt-get update and sudo apt-get upgrade. I installed the webcam package with sudo apt-get install fswebcam, and the video encoding tools with sudo apt-get install libav-tools.

Putting it all together

I wrote a bash script which:

  1. Creates a directory for the day, if it doesn’t exist, and then changes into it
  2. Takes a snapshot and saves it in the format imgNNNN.jpg, starting with 0001
  3. Stitches toether a video of all the .jpgs in the current directory
  4. Uploads it to iechoi.net

The full script is here.

There are already resources online which describe how to use fswebcam; the first few snapshots were inconsistent because the webcam needed some time to settle on the lighting conditions, so I added the -S option:
# -S 5 skips the first 5 frames; allows the camera to get settled with
#   lighting conditions
fswebcam -S 5 img$n.jpg

Raspbian doesn’t have ffmpeg; it has avconv instead, which runs more or less the same:
avconv -r 25 -f image2 -i img%04d.jpg -c:v h264 -crf 1 -y plant-$DAY.mp4

Uploading to iechoi.net:
rsync -avzhe ssh *.mp4 xxxx@iechoi.net:www/mov/
However, I didn’t want to enter my password for every rsync call, so I set up an SSH key on the Raspberry Pi by running ssh-keygen -t rsa, then copied ~/.ssh/id_rsa.pub to ~/.ssh on iechoi.net. I also had to authorize the key in my web host’s SSH settings.

Finally, I set the script to run every 15 minutes between the hours of 5am and 10pm by running crontab -e and then entering the following line:
*/15 5-22 * * * ./capture.sh