Making NASA JPL's Small Bodies DB APIs Dev Friendly
Working with startups for over a decade may have spoiled me. API Developer experience has come a long way in that time, SaaSifying backends to make building companies efficient, logical, and consistent.
Sadly, after months doing “science stuff” my feeling is this is a major missed opportunity in scientific projects and academia (and to be fair, few scientists building these apis are software developers by trade).
If many cases though it feels like it is impeding scientific progress. And something large scientific and research-based organizations should be paying attention to since it flies in the face of their core missions.
To single out someone who can take the punch, let’s pick on NASA JPL. It’s
astounding this group can put car-sized rovers onto other planets but cannot get
core scientific data to be accessible (and to be frank, this is perhaps
symptomatic of a systemic problem with American political anti-intellectualism
and desire to build counter-evident border walls and militarize police instead
of funding proven avenues of economic progress like core scientific research.
</rant>
).
And pleas take this criticism as well-meaning. I am sure there are under-the-hood challenges in systems or funding or resourcing that are invisible to me. That said, Having the greatest library of books in the world is useless if no one can check the out or find what they need in the stacks.
The JPL SBDB API
NASA JPL’s Small Bodies Database API has issues.
- obvious, scrape-web-results-and-stuff-into-API approach to what may be a very old, antiquated underlying system (Fortran 77? C?)
- whack json return format of “fields” listing and “data” in one big array of arrays
- limit of 1 request at a time per IP address (so huge speed bump for any attempt at parallel, or even concurrent, processing)
Complaints aside, the scientific content and value of NASA JPL SBDB data is amazing.
You can literally calculate the position and orbit of a small rock on the other side of the solar system to ridiculous degrees of accuracy. And use that to help fuel your calculations if you’re one of those geeky astro/physics science hippy types (such as I aspire to be.). It’s mind-blowing, tbh. Just, well… hard to use. Which is a shame.
(And To be fair to JPL, we switched to their API after we found both uptime and major data issues with MPC -the Minor Planet Centre’s - orbital data. Thus my complaint about the general quality of science APIs across the field.)
Here’s what I learned trying to bend their API to our will and transmuting their data to something useful, and into a format most modern devs would want to use with better developer UX. I hope it helps you out.
Approach
Basically, we’re going to change the cryptic fields and mass data blob into
the type of key:value
structures us devs like. Yes, I’ve already raised
this issue with JPL when I was asked how we could improve the system. There was
a curt “lack of resources” email response and fact their API returns were, in
fact, “legal json” (ok, technically true, but c’mon…). I have been told that
despite the curt, official reply, they do realize there are issues and are
working on a better future version of the API (NASA JPL! Call me! I even work pro
bono on these sorts of things!).
So, as mentioned, the API returns:
Fields
Data
Fields
is a listing of, well… the fields, largely as shortforms and cryptic
letters most scientists (but not devs) would be familiar with, and Data
is
a big ol’ field which is an array of arrays of the data of the aforementioned
fields.
For the project I’m helping on, we’re interested in having an up-to-date list of comets and certain asteroids. So, if a keener astronomer sees one of these new objects we want to make sure we’ve got the fresh deets on that puppy. So, we need a job to keep our objects updated from NASA.
It may surprise you (It certainly did me), but there are about 35 new comets discovered every year, and more asteroids than you can shake a stick at. And having data on new discoveries is really important, particularly as we start to learn more about extra-solar objects.
So, I crafted two URLs for us to use as (sadly) the SBDB API is not GraphQL.
The SBDB needs a crafted GET
url which corresponds to a parameter list of what
you want from it, as well as target for that information.
- comets
- asteroids (most of which we discard)
const jpl_comets = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=c&full-prec=1"
const jpl_asteroids = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=a&sb-class=PAA,HYA,CEN,TNO&full-prec=1"
What you see are a long list of comma separated characteristics of a small body
followed by the kind of small body data we went (the sb-kind
term, c
for
comets, a
for asteroids), and an extra modifier for asteroids (sb-class
which pulls in certain classes of objects like TNO
s - Trans-Neptunian Objects
) and then asking for full-precision (since if you don’t do this you get
weird numbers that throw errors on most computing systems but are “display
friendly”.
Even then, the data you get back is a little strange. Numbers between 1 and
0 are not preceded by a 0
, but give you things like .3824
, and the name
field is space padded (leading to my suspicion/assertion above that they’ve
merely “api-ed” their web interface and have some very old programs underlying
this that are calculating the data used. It’s really tabular text data they’ve
stuffed into a “json interface”.
Ok, let’s roll up our sleeves a bit and turn this into something dev-usable. I’m using the excellent gorm as an ORM for our apis, but you can do the same with Ent, or any other orm layer (or straight up structs as below).
What I’m effectively doing is taking a response object and then transforming it into a form similar to our small bodies objects table.
So, let’s start creating a response model for this data:
// /models/jpl.go
package models
import (
"gorm.io"
)
type JplResponse struct {
Fields []string `field:"fields"`
Data [][]any `field:"data"`
}
type JplJsonMap map[string]any
type JplResponseObject struct {
Pds4Lid string `json:"pds4_lid"`
UiName string `json:"full_name"`
Prefix string `json:"prefix"`
Name string `json:"pdes"`
Radius float64 `json:"radius"`
Density float64 `json:"density"`
Albedo float64 `json:"albedo"`
Perihelion float64 `json:"q"`
Aphelion float64 `json:"ad"`
OrbitalPeriod float64 `json:"per"`
RotationalPeriod float64 `json:"rot_per"`
Inclination float64 `json:"i"`
MeanMotion float64 `json:"n"`
MeanAnomaly float64 `json:"ma"`
GMag float64 `json:"G"`
HMag float64 `json:"H"`
K1Mag float64 `json:"K1"`
M1Mag float64 `json:"M1"`
TisserandParameter float64 `json:"t_jup"`
ArgumentOfPerihelion float64 `json:"w"`
TimeOfPerihelion float64 `json:"tp"`
SemimajorAxis float64 `json:"a"`
Eccentricity float64 `json:"e"`
}
So, now our task is how to get data from the sub-optimal json fields:data
into
JplResponse
, through a mapping via JplJsonMap
and into the
JplResponseObject
struct to make it useful and more easily manipulated as well
as better named (the use of ambiguous and often undocumented abbreviations in
astronomy is a scourge — though at least JPL lists theirs. #truestory I once
beat my head for a couple hours over porting a small astro program because
I assumed a t
meant “time” as it does in astrophysics, rather than someone
using it for “temperature”.).
Anyhow, sounds not so hard, right? Well, the api is “quirky”, so there’s a few little tricks you need to add in there to make sure things don’t blow up.
Let’s break out a controller/handler (I leave it to you to give the function
a route). I’m using the Fiber framework, but this approach works just as well
with Echo, Fuego, Chi, or even now we’re up at Go 1.22, straight up net/http
.
// /controllers/jpl.go
package controllers
import (
"fmt"
"io"
"log"
"net/http"
"os"
"strconv"
"strings"
"github.com/goccy/go-json" // faster, drop-in json parser
"github.com/gofiber/fiber/v2"
config "github.com/wakatara/goma/config" // replace wakatara with your go mod init
models "github.com/wakatara/goma/models" // replace wakatara with your go mod init
)
const jpl_comets = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=c&full-prec=1"
const jpl_asteroids = "https://ssd-api.jpl.nasa.gov/sbdb_query.api?fields=full_name,pdes,prefix,diameter,density,albedo,q,ad,per,rot_per,i,n,ma,G,H,K1,M1,t_jup,w,tp,a,e&sb-kind=a&sb-class=PAA,HYA,CEN,TNO&full-prec=1"
func GetJplObjects(c *fiber.Ctx) error {
jpl_urls := []string{jpl_comets, jpl_asteroids}
JplResponses := []models.JplResponse{}
JsonMaps := []models.JplJsonMap{}
// Iterate over the two jpl urls and grab responses
for _, jpl_url := range jpl_urls {
response, err := http.Get(jpl_url)
if err != nil {
log.Fatal(err)
}
body, err := io.ReadAll(response.Body)
if err != nil {
log.Fatal(err)
}
JplResponse := models.JplResponse{}
json.Unmarshal(body, &JplResponse)
if err != nil {
fmt.Print(err.Error())
os.Exit(1)
}
JplResponses = append(JplResponses, JplResponse)
}
// This is the meat: a double range statement
// For each individual reponse count the fields
// iterate through Data and grab the element and
// add that to a slice for the field in question
for _, JR := range JplResponses {
for i := 0; i < len(JR.Data); i++ {
JM := models.JplJsonMap{}
for j := 0; j < len(JR.Fields); j++ {
JM[JR.Fields[j]] = JR.Data[i][j]
}
JsonMaps = append(JsonMaps, JM)
}
}
jpls_json := processJpl(JsonMaps)
// Now you have the data in a json
// key-value form for each field
// Unmarshal it into your response object
JROs := []models.JplResponseObject{}
err := json.Unmarshal(*jpls_json, &JROs)
if err != nil {
log.Fatal(err)
}
And tada! you actually have data you can use from JPL.
From here you could simply naively return this in a map if you so chose.
We filter the data after this to process it into a useful form for our specific purposes (so removing objects we’d never care about or would never receive imagery for, as well as tagging, and making sure we have a nice references json for people if they need to cite where data came from).
But, I find this convert field and data lists into useful json a common pattern that is good to have in your pocket when you need to get data from less dev UX focused apis.
In our particular case, we then process this into a form to check against our
existing Objects database and append new objects into the table so we have the
most up-to-date objects to match against inbound data (since processing imagery
on something that has no object attached to it is a quick path to our
dead_letter
box for manual checking in our ingestion pipelines.).
Fin
Basically, we take something that is super valuable but unusable and transform it into something that can be used with most modern ORMs or either throwing it into (or pulling it out of) a database table, or by more precise manipulation in a processing system. Not rocket science, but it took me enough time to puzzle this stuff out that I wanted to leave breadcrumbs for someone else that might need to do something useful with the API.
Anyhow, I hope you found this useful and time-saving (as I imagine there’s a small but frustrated circle of readers who will run across this via search) and it helped solve the headbanging you were doing getting the data into a format to get your astro work done. The above info was hard won and required quite a bit of work to sort into something generalizable.
Please let me know if you can see improvements in approach or are interested in followups. At some point, I’m hoping to provide a much longer post (and academic paper) on how the system we’re building works and what you may take away from it for your own scientific work.
If this post was useful to you, lemme know via mail or elephant below. Feel free to mention or ping me on @awws on mastodon or email at hola@wakatara.com .