My First Transit Spectrum Code

I recently submitted an HST Cycle 22 proposal to observe the hot super-Earth 55 Cnc e. To estimate the feasibility of the proposed observations, I relied on the spectra for 55 Cnc e generated by Hu and Seager (2014). However, Hu and Seager considered only atmospheres with an atomic makeup of at least 50% H. This is problematic, because the ~2000 K equilibrium temperature of 55 Cnc e means that H, either atomic or molecular, can escape from the surface at timescales of less than 100 years. Atomic C and O (and certainly the molecular combinations therein), on the other hand, do not escape — at least so long as the exobase temperature does not much exceed 10,000 K. Thus, if 55 Cnc e has an atmosphere at all, it is liable to be dominated by CO or CO2, not the H-bearing species predicted in Hu and Seager’s photochemical model.

55 Cnc e System

55 Cnc e and its host star: one of the few star-planet pairs that can be shown to scale! (This is a rough attempt to do so, at least.)


Unfortunately, I did not have enough time leading up to the proposal deadline to create my own tools for simulating the transit spectrum of a planet so that I could examine C-O dominated atmospheres. Yet it has been bugging me, and I would like to have this tool both for my understanding and for future reasearch/proposals. These past few weeks, I finally got around to it.

I looked to the paper by Benneke and Seager (2011) for a general roadmap in the creation of my code. However, their code is way more sophisticated than anything I can create in a few weeks of tinkering (especially since I am still getting familiar with my new favorite coding language, Python). Even if I could replicate their code exactly, it’s not my style to follow a recipe unless I really have no idea what I am doing. In the end, I settled on writing a pipeline that computes the transit depth for a planet with a simple isothermal, single-species atmosphere, but that employs extensive abosprtion line data. This code will thus allow legitimate, if crude, estimates of a transit spectrum for a planet. Below I briefly describe the creation of this code, and the trials and tribulations therein, in several parts.

Computing absorption coefficients

Following Benneke and Seager, I made use of the HITEMP database of high-temperature absorption lines. Another, more extensive, database, HITRAN, is available for low-temperature lines. The two databases have the same form, so the functions I created for reading in and processing HITEMP line data should be easily extensible to HITRAN. (These data can be accessed by anonymous ftp to

The data provided by HITRAN/HITEMP and their proper use are described in a 1996 paper (see especially the appendices). In brief, the database provides the wavenumber [cm-1] of the line and its intensity [cm-1/(molecule cm-2)]. These two are the most important parameters, but the database then also provides pressure broadening coefficients, a pressure shift parameter, the lower state energy, and so on.

Aside: What is absorption line intensity?

Given my minimal background in absorption spectroscopy, I was mystified by the “intensity” parameter. I’m used to intensity being a term to express the flux per unit solid angle of a radiation field. So what is it in this context? I haven’t found an explanation that I find entirely satisfactory. After mulling over how this parameter is used for a while, here is what I have decided. Disclaimer: I give no guarantees that this is the “right” way to conceptualize the intensity.

If I were going to try to make a record of a single absorption line, I would measure the cross section for absorption as a function of wavenumber (well, wavelength really, but HITRAN/HITEMP use wavenumber, so let’s stick with that). However, the resulting line profile changes according to the strength of the relevant broadening mechanisms (natural, pressure, Doppler), so to generalize my measurement I would compute the integral of the absorption coefficient over wavenumber. Even if the broadening mechanisms change, this integral should be constant.

This would allow me to compute the absorption coefficient as a function of wavelength, but only for a molecule that happens to be in the lower ro-vibrational state of this transition. In reality, only some fraction of the molecules in the gas doing the absorbing will be in this state. So, if I wanted to compute the optical depth of a column, I would have to use Botlzmann statistics (assuming local thermodynamic equilibrium, of course) to determine the fraction of the molecules in the right state to produce the absorption line in question, then compute the column density of just those particles, and finally multiply by the absorption cross section to get the optical depth. However, I could just lump my calculation of what fraction of the molecules are in the proper state into my computation of absorption coefficients. This would simply amount to taking the product the integral I computed earlier and the fraction of molecules in the right state. This is convenient, though it hides some of the physics.

This product is the “intensity” that the HITRAN/HITEMP databases provide. Thus, to compute the absorption, one first uses Boltzmann statistics to compute the intensity at the temperature in question (referencing from the intensity at 296 K that is provided in the database). Then this is multiplied by the line profile to determine the absorption cross section as a function of wavelength, and finally by the column density of the relevant species to compute the optical depth. This is how I understand the intensity.

I should note that, by my formulation, it makes sense to express intensity in units of cm2 molecule-1 cm-1 = cm molecule-1. However, others generally express it as cm-1/molecule cm-2. Theses units nudge the reader to think of intensity as a “per column density” quantity, but I prefer “the integral of cross sections over wavenumber, with a correction for state fractions.”

Back to computing absorption coefficients

I wrote the Python module to utilize the HITEMP data. Making a function to parse the text-file data was fairly trivial — hitemp.readlines. There a lot of lines, and I knew from the start I wouldn’t have the computing power to include them all. Consequently, I wrote readlines such that it read in data only for lines that are both within the user-defined wavelength range of interest and that have an intensity above some fraction of the strongest line’s intensity.

The other function contained in the hitemp module, hitemp.grid_P, sums absorption coefficients of all lines down to the chosen intensity on a grid of wavenumber and pressure. Since I assume an isothermal atmosphere, the code only considers a single temperature, although it would be easy to modify it to create a datacube over wavenumber, pressure, and temperature.  Unfortunately, the code require that the ratio of the partition sum at the desired temperature over the partition sum at the reference temperature (296 K) be supplied as input. A FORTRAN code, TIPS, is included in the HITRAN database that estimates these partition sums, but it operates via an interactive command-line prompt. I simply didn’t have the time to try to make this code play nice with Python, so for now one must compute the desired ratio of partition sums externally.

This part of the code presented an interesting problem. For the coarse wavelength grid I wished to use, the line profiles are very narrow relative to the wavelength spacing of the grid. Thus, simply computing the value of the line profile at each point on the wavelength grid is very, very wrong. A narrow line will often fall between two points on the grid and, essentially, be lost. In reality, if absorption from that line were observed using a low resolution spectrograph, the spectrogrograph pixels would measure the average value of the absorption over the width of the pixel, not just the value at the center of the pixel.

This made gridding the absorption coefficients a much more intense numerical task. To properly average the lines over the grid, I created the general function, so named because it, in essence, integrates and interpolates at the same time. The function, given the coordinates of a curve and the endpoints of a set of bins, computes the average value of the curve within each bin. Thus, the integral of the result is the same as the integral of the input. This is not the case for normal interpolation. Besides line profiles, I expect this function to be very useful when dealing with probability density functions.

Because of this issue of missing narrow lines, the grid_P option computes the Voigt profile of each line at points specifically chosen to cllearly resolve the structure of the line with a minimum of computations. Then, the absorption coefficients (the product of the profile and intensity) are intergolated (not interpolated) onto the supplied wavenumber grid. This is done successively for each line at each pressure and the absorption coefficients are summed. It takes a really long time. The code could probably be dramatically sped up for coarse wavelength grids by using analytical epxressions (or even approximations) to directly compute the integral of the line over bins corresponding to each wavenumber grid point, especially for the line wings — but that’s a future project.

The end result: I can compute the cumulative (all lines) absorption coefficients over a wavelength-pressure grid for later interpolation to different pressures. As a side note, I found that pressure, at least for coarse wavelength grids, does not greatly affect the results because even strong pressure broadening still produces narrow lines relative to grid spacings of  10 cm-1. So when applied to coarse spectra and atmospheres of moderate to low pressure, a pressure grid of only a few points suffices.

Spectral Resolution

Absorption coefficients of a set of molecular absorption combs for CO at resolutions of 0.2, 2, and 10 cm-1. The peak absorption coefficients get “diluted” as the resolution is coarsened.


Generating the transit spectrum (or “probing the onion”)

Computationally, generating the transit spectrum is surprisingly fast once the absorption coefficients over a wavelength-pressure grid are available. I created the module with the single function compute_depth to compute the transit depth as a function of wavenumber. The function assumes an isothermal atmosphere of temperature T and base pressure P0 at r0, the radius of the planet’s surface or cloud deck. The user sets the lower pressure limit to consider (i.e. the maximum altitude), and the code then divides the atmosphere into N shells of even thickness with that limit. At each pressure, the absorption coefficients are interpolated from the provided wavenumber-pressure grid.


If we could spatially resolve a planet transit at two wavelengths, one in the core of an absorption line and another well outside of the line, it would look something like this. The apparent size of the planet varies, thereby producing different transit depths.


With the profile of the atmosphere established, the code then considers lines of sight (LOSs) passing through the atmosphere at the radius of each shell. Thus, an LOS at the outermost shell passes only through that shell. The next LOS down passes through both the outermost shell and the one beneath. The next down passes through three shells… so on and so forth. For each LOS, the path length within each shell (it varies according to the geometry – see figure below) is computed and multiplied by the density of molecules to determine the column density. This is then multiplied by the absorption coefficients to compute the optical depth for just that shell, and finally the optical depths due to each shell intersected by the LOS are summed.

The Onion

The geometry of computing transit depths with atmospheric absorption.


The optical depth of each LOS can be used to compute the opacity of an annulus around the planetary disk. The product of the opacity and area of the annulus give the equivalent, fully opaque area that would absorb the same amount of light. The equivalent areas of each shell are  summed together with the geometric size of the planet to determine the overall “effective area” of the planet transit. Finally, these are divided by the area of the stellar disk to yield the transit depth.

 The Results: Possible Spectra of 55 Cnc e

At long last, I used my code to generate spectra of 55 Cnc e for three different extreme cases: 1 bar atmospheres of pure H2O, CO, and CO2. I only looked at the wavelengths covered by HST’s Wide Field Camera 3 instrument with the G140L grism, since this is the instrument I proposed to use. The spectra all show prominent features with amplitudes of at least 20 ppm. Unsurprisingly, water produces the clearest absorption signature.

55 Cnc e Spectra

Spectra of pure 1 bar atmospheres for 55 Cnc e (all assume a radius of 2 REarth and mass of 8 Earth).


I then used a code I created when writing my proposal to evaluate the capacity of G140L data to distinguish these spectra from a flat spectrum. (This code is ugly as it was both early in my days of Python coding and rather rushed, so I won’t post it.) I found that it takes several 10’s of orbits worth of HST data to rule out these spectra to at least 2-sigma. Doing the same with a millibar instead of a bar of atmosphere requires over 100 orbits to rule out an atmosphere. So, if no spectral structure is evident in the 26 orbits I proposed, we should be able to rule out an atmosphere of about a bar. Ultimately, it would be nice to get down to millibar levels, since then we could place a strong constraint on the sub-telluric density of 55 Cnc, which in turn would be evidence for an exotic, carbon-rich core (Madhusudhan et al. 2012; ApJ 759:40).

Viola! I’m excited to now have a tool in my quiver that I made all by myself to use with future transit spectroscopy endeavors.