Making NASA JPL's Small Bodies DB APIs Dev Friendly

Working with startups for over a decade may have spoiled me. API Developer experience has come a long way in that time, SaaSifying backends to make building companies efficient, logical, and consistent.

Sadly, after months doing “science stuff” my feeling is this is a major missed opportunity in scientific projects and academia (and to be fair, few scientists building these apis are software developers by trade).

In many cases though it feels like it could be better and is impeding scientific progress. And something large scientific and research-based organizations should be paying attention to since it flies in the face of their core missions.

To single out someone who can take the punch, let’s pick on NASA JPL. It’s astounding this group can put car-sized rovers onto other planets but cannot get core scientific data to be easily accessible.

And please take this criticism as well-meaning (everyone). I am sure there are under-the-hood challenges in systems or funding or resourcing that are invisible to me that make this situation difficult to remedy. That said, Having the greatest library of books in the world is useless if no one can check them out or find what they need in the stacks.

The JPL SBDB API

NASA JPL’s Small Bodies Database API has issues.

  • obvious, scrape-web-results-and-stuff-into-API approach to what may be a very old, antiquated underlying system (Fortran 77? C?)
  • whack json return format of “fields” listing and “data” in one big array of arrays
  • limit of 1 request at a time per IP address (so huge speed bump for any attempt at parallel, or even concurrent, processing)

Complaints aside, the scientific content and value of NASA JPL SBDB data is amazing.

You can literally calculate the position and orbit of a small rock on the other side of the solar system to ridiculous degrees of accuracy. And use that to help fuel your calculations if you’re one of those geeky astro/physics science hippy types (such as I aspire to be.). It’s mind-blowing, tbh. Just, well… hard to use. Which is a shame.

(And To be fair to JPL, we switched to their API after we found both uptime and major data issues with MPC -the Minor Planet Centre’s - orbital data. Thus my complaint about the general quality of science APIs across the field.)

Here’s what I learned trying to bend their API to our will and transmuting their data to something useful, and into a format most modern devs would want to use with better developer UX. I hope it helps you out.

Approach

Basically, we’re going to change the cryptic fields and mass data blob into the type of key:value structures us devs like. Yes, I’ve already raised this issue with JPL when I was asked how we could improve the system. There was a curt “lack of resources” email response and fact their API returns were, in fact, “legal json” (ok, technically true, but c’mon…). I have been told that despite the curt, official reply, they do realize there are issues and are working on a better future version of the API (NASA JPL! Call me! I even work pro bono on these sorts of things!).

So, as mentioned, the API returns:

  1. Fields
  2. Data

Fields is a listing of, well… the fields, largely as shortforms and cryptic letters most scientists (but not devs) would be familiar with, and Data is a big ol’ field which is an array of arrays of the data of the aforementioned fields.

For the project I’m helping on, we’re interested in having an up-to-date list of comets and certain asteroids. So, if a keener astronomer sees one of these new objects we want to make sure we’ve got the fresh deets on that puppy. So, we need a job to keep our objects updated from NASA.

It may surprise you (It certainly did me), but there are about 35 new comets discovered every year, and more asteroids than you can shake a stick at. And having data on new discoveries is really important, particularly as we start to learn more about extra-solar objects.

So, I crafted two URLs for us to use as (sadly) the SBDB API is not GraphQL.
The SBDB needs a crafted GET url which corresponds to a parameter list of what you want from it, as well as target for that information.

  1. comets
  2. asteroids (most of which we discard)
const jpl_comets = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=c&full-prec=1"
const jpl_asteroids = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=a&sb-class=PAA,HYA,CEN,TNO&full-prec=1"

What you see are a long list of comma separated characteristics of a small body followed by the kind of small body data we went (the sb-kind term, c for comets, a for asteroids), and an extra modifier for asteroids (sb-class which pulls in certain classes of objects like TNOs - Trans-Neptunian Objects) and then asking for full-precision (since if you don’t do this you get weird numbers that throw errors on most computing systems but are “display friendly”.

Even then, the data you get back is a little strange. Numbers between 1 and 0 are not preceded by a 0, but give you things like .3824, and the name field is space padded (leading to my suspicion/assertion above that they’ve merely “api-ed” their web interface and have some very old programs underlying this that are calculating the data used. It’s really tabular text data they’ve stuffed into a “json interface”.

Ok, let’s roll up our sleeves a bit and turn this into something dev-usable. I’m using the excellent gorm as an ORM for our apis, but you can do the same with Ent, or any other orm layer (or straight up structs as below).

What I’m effectively doing is taking a response object and then transforming it into a form similar to our small bodies objects table.

So, let’s start creating a response model for this data:

// /models/jpl.go
package models

import (
 "gorm.io"
)

type JplResponse struct {
 Fields []string `field:"fields"`
 Data   [][]any  `field:"data"`
}

type JplJsonMap map[string]any

type JplResponseObject struct {
 Pds4Lid              string         `json:"pds4_lid"`
 UiName               string         `json:"full_name"`
 Prefix               string         `json:"prefix"`
 Name                 string         `json:"pdes"`
 Radius               float64        `json:"radius"`
 Density              float64        `json:"density"`
 Albedo               float64        `json:"albedo"`
 Perihelion           float64        `json:"q"`
 Aphelion             float64        `json:"ad"`
 OrbitalPeriod        float64        `json:"per"`
 RotationalPeriod     float64        `json:"rot_per"`
 Inclination          float64        `json:"i"`
 MeanMotion           float64        `json:"n"`
 MeanAnomaly          float64        `json:"ma"`
 GMag                 float64        `json:"G"`
 HMag                 float64        `json:"H"`
 K1Mag                float64        `json:"K1"`
 M1Mag                float64        `json:"M1"`
 TisserandParameter   float64        `json:"t_jup"`
 ArgumentOfPerihelion float64        `json:"w"`
 TimeOfPerihelion     float64        `json:"tp"`
 SemimajorAxis        float64        `json:"a"`
 Eccentricity         float64        `json:"e"`
}

So, now our task is how to get data from the sub-optimal json fields:data into JplResponse, through a mapping via JplJsonMap and into the JplResponseObject struct to make it useful and more easily manipulated as well as better named (the use of ambiguous and often undocumented abbreviations in astronomy is a scourge — though at least JPL lists theirs. #truestory I once beat my head for a couple hours over porting a small astro program because I assumed a t meant “time” as it does in astrophysics, rather than someone using it for “temperature”.).

Anyhow, sounds not so hard, right? Well, the api is “quirky”, so there’s a few little tricks you need to add in there to make sure things don’t blow up.

Let’s break out a controller/handler (I leave it to you to give the function a route). I’m using the Fiber framework, but this approach works just as well with Echo, Fuego, Chi, or even now we’re up at Go 1.22, straight up net/http.

// /controllers/jpl.go
package controllers

import (
 "fmt"
 "io"
 "log"
 "net/http"
 "os"
 "strconv"
 "strings"

 "github.com/goccy/go-json"   // faster, drop-in json parser
 "github.com/gofiber/fiber/v2"
 config "github.com/wakatara/goma/config"  // replace wakatara with your go mod init
 models "github.com/wakatara/goma/models"  // replace wakatara with your go mod init
)

const jpl_comets = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=c&full-prec=1"
const jpl_asteroids = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=a&sb-class=PAA,HYA,CEN,TNO&full-prec=1"

func GetJplObjects(c *fiber.Ctx) error {

 jpl_urls := []string{jpl_comets, jpl_asteroids}
 JplResponses := []models.JplResponse{}
 JsonMaps := []models.JplJsonMap{}

  // Iterate over the two jpl urls and grab responses
 for _, jpl_url := range jpl_urls {
  response, err := http.Get(jpl_url)
  if err != nil {
   log.Fatal(err)
  }

  body, err := io.ReadAll(response.Body)
  if err != nil {
   log.Fatal(err)
  }

  JplResponse := models.JplResponse{}
  json.Unmarshal(body, &JplResponse)
  if err != nil {
   fmt.Print(err.Error())
   os.Exit(1)
  }
  JplResponses = append(JplResponses, JplResponse)
 }

    // This is the meat: a double range statement
    // For each individual reponse count the fields
    // iterate through Data and grab the element and
    // add that to a slice for the field in question
 for _, JR := range JplResponses {
  for i := 0; i < len(JR.Data); i++ {
   JM := models.JplJsonMap{}
   for j := 0; j < len(JR.Fields); j++ {
    JM[JR.Fields[j]] = JR.Data[i][j]
   }
   JsonMaps = append(JsonMaps, JM)
  }
 }
 jpls_json := processJpl(JsonMaps)

 // Now you have the data in a json
  // key-value form for each field
  // Unmarshal it into your response object
 JROs := []models.JplResponseObject{}
 err := json.Unmarshal(*jpls_json, &JROs)
 if err != nil {
  log.Fatal(err)
 }

And tada! you actually have data you can use from JPL.

From here you could simply naively return this in a map if you so chose.

We filter the data after this to process it into a useful form for our specific purposes (so removing objects we’d never care about or would never receive imagery for, as well as tagging, and making sure we have a nice references json for people if they need to cite where data came from).

But, I find this convert field and data lists into useful json a common pattern that is good to have in your pocket when you need to get data from less dev UX focused apis.

In our particular case, we then process this into a form to check against our existing Objects database and append new objects into the table so we have the most up-to-date objects to match against inbound data (since processing imagery on something that has no object attached to it is a quick path to our dead_letter box for manual checking in our ingestion pipelines.).

Fin

Basically, we take something that is super valuable but unusable and transform it into something that can be used with most modern ORMs or either throwing it into (or pulling it out of) a database table, or by more precise manipulation in a processing system. Not rocket science, but it took me enough time to puzzle this stuff out that I wanted to leave breadcrumbs for someone else that might need to do something useful with the API.

Anyhow, I hope you found this useful and time-saving (as I imagine there’s a small but frustrated circle of readers who will run across this via search) and it helped solve the headbanging you were doing getting the data into a format to get your astro work done. The above info was hard won and required quite a bit of work to sort into something generalizable.

Please let me know if you can see improvements in approach or are interested in followups. At some point, I’m hoping to provide a much longer post (and academic paper) on how the system we’re building works and what you may take away from it for your own scientific work.

If this post was useful to you, lemme know via mail or elephant below. Feel free to mention or ping me on @awws on mastodon or email at hola@wakatara.com .


How to more effectively use the NASA JPL Small Bodies DB APIs.

Daryl Manning

sciastrodev

1868 Words

2024-02-17 09:14 +0800