# A Low Power Digital IC Emulating Intracellular Calcium Dynamics

Hamid Soleimani<sup>1</sup> and Emmanuel. M. Drakakis<sup>2</sup>

<sup>1</sup>Department of Bioengineering, Imperial College London, SW7 2AZ U.K., E-mail: hs15@imperial.ac.uk

<sup>2</sup>Department of Bioengineering, Imperial College London, SW7 2AZ U.K., E-mail: e.drakakis@imperial.ac.uk, Tel/FAX: +44-(0)207-59-45182

### March 23, 2018

#### Abstract

Low power/area cytomorphic chips may be interfaced and ultimately implanted in the human body for cell–sensing and cell–control applications of the future. In such electronic platforms, it is crucial to accurately mimic the biological time–scales and operate in real–time. This paper proposes a methodology where slow nonlinear dynamical systems describing the behavior of naturally encountered biological systems can be efficiently realised in hardware. To this end, as a case study, a low power and efficient digital ASIC capable of emulating slow intracellular calcium dynamics with time–scales reaching to seconds has been fabricated in the commercially available AMS 0.35  $\mu m$  technology and compared with its analog counterpart. The fabricated chip occupies an area of 1.5  $mm^2$  (excluding the area of the pads) and consumes 18.93 nW for each calcium unit from a power supply of 3.3 V. The presented cytomimetic topology follows closely the behavior of its biological counterpart, exhibiting similar time–domain calcium ions dynamics. Results show that the implemented design has the potential to speed up large–scale simulations of slow intracellular dynamics by sharing cellular units in real–time.

*Keywords:* Cytomimetic, nonlinear circuit, digital ASIC, synchronous cellular model, Calcium–Induced Calcium Release (CICR) model

## 1 Introduction

Real-time and large-scale simulations of biological systems codified by means of non-linear dynamics have been a hot research topic in the last two decades. However, fundamental issues such as excessive resource consumption have mainly prevented its progress so far. Mead was the first to highlight similarities in the computations of neurobiology and electronics aiming at real-time and large-scale simulations [1]. Since then, there have always been articulated arguments about the type and structure of electronic platforms where such biological systems can be implemented upon. Time, space, and energy are the three important physical criteria to judge such electronic systems [2]. In general, there are two main approaches for the implementation of such systems:

- Analog platforms [3–5] that have been considered to be the main choice for the direct implementation of intra/extracellular biological dynamics for years. Such systems are power efficient, however, parameter adjustment is generally challenging and very good layout is imperative in order for the resulting topologies to be immune from variability and mismatch. Analog devices can directly interact with biological systems without the need for data conversions, saving hardware cost for such designs.
- Digital platforms [6–8] that use digital computational units to implement the mathematical equations codifying the behavior of biological intra/extracellular dynamics. Generally, a digital platform achieves short development time, high reconfigurability, notable reliability and immunity to device mismatch. However, unlike analog devices such systems need analog to digital and digital to analog converters (ADCs and DACs) in order to interact with biological systems, imposing higher hardware cost to the design.

Depending on the specifications of the targeted application, one of these approaches or a mixture of both might be more efficient and useful. For example, in the case of large–scale implantable cytomorphic chips, where the biological dynamics is slow, the size of capacitors utilised in each analog processor may become very large (for example, in [9] it is reported as 1  $\mu$ F). Although, this problem could be alleviated to some extent by scaling the baseline current flowing into the circuit, however, other problems such as poor sensitivity appear and thus not practical. In such cases, digital platforms may benefit from the slow character of the targeted dynamics by emulating more units in a pipelined structure, although data conversion is necessary. It can be shown that as

| Parameters            | m=n=p=1 | m=n=p=2 | m=n=2, p=4 |
|-----------------------|---------|---------|------------|
| $z_0 \ (\mu M/s)$     | 1       | 1       | 1          |
| $z_1 eta \ (\mu M/s)$ | 2       | 6       | 3          |
| $V_{M_2} \ (\mu M/s)$ | 250     | 100     | 65         |
| $V_{M_3} \ (\mu M/s)$ | 2000    | 700     | 500        |
| $K_2 \ (\mu M/s)$     | 1       | 1       | 1          |
| $K_R \ (\mu M/s)$     | 30      | 15      | 2          |
| $K_A \ (\mu M/s)$     | 2.5     | 2.5     | 0.9        |
| $k_f \ (s^{-1})$      | 0.1     | 0       | 1          |
| $k (s^{-1})$          | 5       | 8       | 10         |

Table 1: Biological Values for the intracellular  $Ca^{2+}$  Oscillations Model with Various Hill Functions.

the number of emulating cells increases, the hardware cost of data conversion drops. In this paper, we show that digital hardware are efficient platforms in terms of area and power consumption for the implementation of slow biological dynamics in large–scale and real–time. The validity of this claim is confirmed by fabricating (in the commercially available AMS 0.35  $\mu m$  technology) and tested by means of a digital ASIC emulating the Calcium–Induced Calcium Release (CICR) model introduced in [10].

# 2 Synchronous Cellular Calcium Model

The CICR model introduced in [10], describes intracellular  $Ca^{2+}$  oscillations. In that model, the amount of  $Ca^{2+}$  released is controlled by the level of stimulus through modulation of the  $IP_3$  level. The following two-dimensional (2–D) dynamical system codifies the model:

$$\frac{dx}{dt} = z_0 + z_1\beta - z_2(x) + z_3(x,y) + k_f y - kx$$
(1)

$$\frac{dy}{dt} = z_2(x) - z_3(x, y) - k_f y$$
(2)

where

$$z_2(x) = V_{M_2} \frac{x^n}{K_2^n + x^n}$$
(3)

$$z_3(x,y) = V_{M_3} \frac{y^m}{K_R^m + y^m} \frac{x^p}{K_A^p + x^p}$$
(4)

with x and y representing the concentration of free  $Ca^{2+}$  in the cytosol and in the  $IP_3$  insensitive pool, respectively. The quantity  $z_0$  represents the constant  $Ca^{2+}$  input from the extracellular medium and  $z_1$  refers to the  $IP_3$  modulated release of  $Ca^{2+}$  from the  $IP_3$  sensitive store. The parameter  $\beta$  defines the amount of  $IP_3$ and therefore measures the saturation of the  $IP_3$  receptor. The biochemical rates  $z_2$  and  $z_3$  refer to the pumping of  $Ca^{2+}$  into the  $IP_3$  insensitive store and to the release of  $Ca^{2+}$  from that store into the cytosol respectively. The parameters  $V_{M_2}$ ,  $V_{M_3}$ ,  $K_2$ ,  $K_R$ ,  $K_A$ ,  $k_f$  and k are the maximum values of  $z_2$  and  $z_3$ , threshold constants for pumping, release and activation and rate constants, respectively. Parameters n, m, and p represent the Hill coefficients characterising the pumping, release and activation processes, respectively. Depending on the values of the Hill coefficients, different degrees of cooperativity can be achieved and this allows us to study various intracellular functions. The various values of the biological model parameters are presented in Table 1.

According to [11], the cellular calcium model developed upon this biological model is defined as follows:

$$\begin{cases} x^{+} - x^{-} = \Delta t \cdot F(x^{-}, y^{-}) + IN_{ext} \\ y^{+} - y^{-} = \Delta t \cdot G(x^{-}, y^{-}) \end{cases}$$
(5)

where

$$\begin{cases} F(x^{-}, y^{-}) = F(x_{min} + X^{-}.\Delta x, \ y_{min} + Y^{-}.\Delta y) \\ G(x^{-}, y^{-}) = G(x_{min} + X^{-}.\Delta x, \ y_{min} + Y^{-}.\Delta y) \\ IN_{ext} = \Delta t \cdot IN. \end{cases}$$
(6)

The + and - signs represent the current and previous states respectively. In this system, the phase plane is mapped onto a discrete space and accordingly the state variables are points connecting the cellular phase plane together. Thus, X = i where  $i \in \mathbf{R} \equiv 0, 1, ..., R-1$  and Y = j where  $j \in \mathbf{S} \equiv 0, 1, ..., S-1$  are discrete state variables corresponding to x and y in the continuous space. The location of each state point in the  $\mathbf{R} \times \mathbf{S}$  cellular space is given by:

$$\begin{cases} X = \lfloor \frac{x - x_{min}}{\Delta x} \rfloor & ; \quad x \in [x_{min}, x_{max}) \\ Y = \lfloor \frac{y - y_{min}}{\Delta y} \rfloor & ; \quad y \in [y_{min}, y_{max}) \end{cases}$$
(7)

where  $\Delta x = \frac{x_{max} - x_{min}}{R}$  and  $\Delta y = \frac{x_{max} - x_{min}}{S}$ . The *F* and *G* velocity functions are determined based on the biological dynamical system, in this case CICR calcium model:

$$\begin{cases} \frac{dx}{dt} = F(x, y) + IN\\ \frac{dy}{dt} = G(x, y) \end{cases}$$
(8)

where

$$\begin{cases}
F(x,y) = z_0 - z_2(x) + z_3(x,y) + k_f y - kx \\
G(x,y) = z_2(x) - z_3(x,y) - k_f y \\
IN = z_1 \beta.
\end{cases}$$
(9)

and

$$z_2(x) = V_{M_2} \frac{x^n}{K_2^n + x^n} \tag{10}$$

$$z_3(x,y) = V_{M_3} \frac{y^m}{K_R^m + y^m} \frac{x^p}{K_A^p + x^p}$$
(11)

The direction of new motions on the cellular space can be formulated as:

$$\begin{cases} X^{+} - X^{-} = 0 \Rightarrow \text{"no change"} \\ X^{+} - X^{-} \ge 1 \Rightarrow \text{"upward motion"} \\ X^{+} - X^{-} \le -1 \Rightarrow \text{"downward motion"} \end{cases}$$
(12)

$$\begin{cases} Y^{+} - Y^{-} = 0 \Rightarrow \text{"no change"} \\ Y^{+} - Y^{-} \ge 1 \Rightarrow \text{"upward motion"} \\ Y^{+} - Y^{-} \le -1 \Rightarrow \text{"downward motion"}. \end{cases}$$
(13)

It should be also noted that the resultant motion on the 2–D cellular phase planes is determined by a combination of motions in both X and Y directions and the number of motions on the cellular space can correspond to more than one step in each clock cycle.

## **3** Hardware Implementation

The proposed ASIC design presented in Fig. 1(a) implements (5) including parameters introduced in Table 1 for the m=n=2, p=4 Hill function. In this structure red, black and blue arrows respectively demonstrate in/out ports, internal buses and future possible connections leading to higher design flexibility (the blue connections are not implemented in the current version). It contains five major blocks. In this architecture, the velocity functions codified by (9) are hardwired in *Storage Blocks* (this can be also reconfigurable, see Section 7) and then according to (5), though *Inputs* and *Adders* units, the input (in case of x state variable) and previous values of state variables will be added to this to build the next time stamp of the state variables every one clock cycle. The result will be passed into *ShiftRegisters* and shifted once to the right side. The outputs of these blocks will be transferred as outputs through UART interface and also fed back to the *Addresser* blocks to find the next addresses to fetch the next velocity values from *Storage Blocks*. This process is recursively repeated until simulations end. Here is an explanation of each block in detail:

### 3.0.1 Input/Output Interfaces

In this design, the output signals are provided in two parallel and serial forms. The parallel output interface was implemented as a back-up plan, in case if the serial does not function, however, after fabrication both interfaces properly worked. The parallel signals comprise 6 pins for each state variable (X/Y), while the serial output transmits the state variables on a single wire according to the UART protocol. For the parallel case, the system only needs  $CLK_{core}$  to properly function. This will trigger the digital processing units in the entire design. The transmission bit rate for the parallel case is equal to  $12 \times CLK_{core}$  and for the serial one is determined by the UART clock ranging from 115200 to 110 bit per second. It should be noted that in the serial mode, since the UART packets carry 10 bits (one start, two stop and seven data bits), the  $CLK_{core}$  is defined as  $\frac{CLK_{serial1}}{10 \times Num. of packets}$ ; in this case Num. of packets is 4 since there are two packets per each state variable.



Figure 1: (a) The detailed internal structure of the ASIC design capable of emulating 16 calcium units in real-time. (b) Microphotograph of the chip, fabricated in the commercially available 0.35  $\mu m$  AMS technology. The chip occupies an area of 1.5  $mm^2$ including the pipelined network and excluding the area of the pads.

 $CLK_{serial1}$  is equal to the UART protocol baud rate and  $CLK_{serial2}$  is the UART receiver sampling frequency, which must be at least 8 times faster than  $CLK_{serial1}$  to make sure all input bits are captured properly by the UART receiver. Due to the limited number of pads, the input of the system only accepts values via the UART serial protocol and is calibrated by the *Controling Signals* pins.

## 3.0.2 Storage Blocks

The cellularized velocity values for all cells are stored in the storage blocks. The velocity vectors are calculated off-line using (2) according to predetermined model parameters. These signed values are stored in two  $R \times S$  (number of pixels) storage blocks. The size of velocity components is exactly the same as the bandwidth of the system and defines the length of each memory cell, which is 14 bits in this design.

### 3.0.3 Adders

An adder is employed for each dimension, which adds the velocity value fetched from the storage blocks to the previous state of the dynamical variable. According to (5), the input is also added to this value for the X cellular variable. The transferred values from the storage blocks and the input determine the motion direction on the cellular phase plane. For example, when the value of velocity received from  $Shift Reg_X$  (see Fig. 1(a)) plus the input is positive, an upward motion in the X direction occurs, while if the sum is negative the motion is downward. The same holds for  $Shift Reg_Y$  and the resultant direction on the 2D discrete phase planes is determined by motions in both X and Y directions. The absolute value of the velocities determines the amount of increase or decrease in the state variables in each step. Thus, the higher the absolute velocity value, the bigger the increase of the state variable.

#### 3.0.4 Addressers

The location of each state variable in the cellular phase plane is calculated by this block. In other words, this block is responsible for converting the non-cellular output variables to the cellular addresses to fetch the next velocity values from the storage blocks. This conversion is based on the relation (3), where the cellular phase plane was introduced as a discrete mesh-like plane and built from the continuous space. The address of the next state in the cellular space can be conveniently implemented by means of one subtract and one shift operation.

### **3.0.5** Shift Registers

Since the critical path in the proposed cellular model is determined only by one subtractor, we can easily share the digital hardware for large scale simulations in cytomimetic circuit design. To this end, a network of pipelined calcium units corresponding to 16 individual CICR models capable of operating in real-time is presented. The maximum value of this equals to *clock core*  $\cdot$  *dt* leading to a significant area reduction in such systems. As illustrated in Fig. 1(a), *Shift Reg<sub>X</sub>* and *Shift Reg<sub>Y</sub>* store and shift the data of each calcium unit in each clock pulse.



Figure 2: The printed circuit boards containing: (1) fabricated chip with PGA84 package mounted on a PGA ZIF header; (2) UART interface chip (FT232) and mini USB connector; (3) FPGA Spartan-6 XC6SLX150; (4) USB interface for programming the FPGA by the PC.

## 3.1 Hardware Layout

The resulting IC, including the serial UART interface and the pipelined network covers an area of 1.5  $mm^2$  (excluding the area of the pads), while consuming a power of 18.93 nW for each calcium unit. This value is calculated by measuring the total power consumption of the design using post layout synthesis simulations in Cadence and then divided by 16 (number of shared calcium units). It is powered from a 3.3 V supply and it contains 3788 basic gates. Fig. 1(b) displays a microphotograph of the chip layout in which different modules correspond to the structure shown in Fig. 1(a). The chip has one main module and four submodules. Thirteen outputs and six inputs are embedded in the design comprising the power supply pins to communicate with external devices. It should be stressed that as shown in the figure, the design has been fabricated along with another AFE design (placed on the left side of the chip).

# 4 Chip vs Simulated Time Domain Results

## 4.1 Experimental Setup

The experimental setup consists of three main components: a PC, a generic FPGA interface and the cytomimetic digital ASIC. The PC records data and controls the cytomimetic system via the FPGA interface. Fig.2 shows the printed circuit boards that host two main hardware components of the system, including an FPGA, and the cytomimetic chip. The serial UART interface enables the chip to communicate bi-directionally with the PC. The parallel outputs of the chip are also connected to the FPGA and available to the PC through the UART serial interface implemented on the chip board.

Both FPGA and chip boards contain the circuitry needed to ensure the proper functionality and testing of the chip, FPGA and the serial transceiver, such as voltage regulators and connectors to measure digital input/output voltages from the chip. The chip is driven by three main clocks supplied by the FPGA board. In this setup, 115200 baud rate is used in the design to transmit data to PC, thus, the  $CLK_{serial1} = 115200$ ,  $CLK_{core} = \frac{115200}{40} = 2880$  (each output value is coded by two 10 bits and the system has two outputs at the time, therefore 40 bits are needed to be sent out for one meaningful set of the state variables) and the  $CLK_{serial2} = 115200 * 8$ .

## 4.2 Single Calcium Behavior

Time domain waveforms for the biological and cellular model with two different output forms (parallel and serial) simulated respectively by MATLAB and ASIC are shown in Figs. 3(a1–c1). The cellular parameters corresponding to the implemented 32–pixel cellular model are  $\Delta x = 0.0625$ ,  $\Delta y = 0.0625$ ,  $x_{min} = -0.1$ ,



Figure 3: (a1–c1) Time domain comparison between the biological (MATLAB) and the cellular model with two different output protocols (chip–parallel and chip–serial) for a single calcium unit. Cytosolic  $Ca^{2+}$  and  $IP_3$  insensitive pool  $Ca^{2+}$  waveforms are respectively shown in red and black color graphs. (a2–c2) and (a3–c3) are time domain and raster plot results respectively of a network activity comprised of 16 calcium units with inhomogeneous inputs.

Table 2: Input Values for the Network Model Constructed by 16 of Calcium Units.

| Parameters           | Biological Model | Cellular Model |
|----------------------|------------------|----------------|
| $\alpha \ (\mu M/s)$ | 0.2              | 0.27           |
| $\gamma ~(\mu M/s)$  | 2.7              | 2.7            |

 $x_{max} = 1.9$ ,  $y_{min} = -0.1$  and  $y_{max} = 1.9$ . Results show acceptable agreement between the MATLAB and chip results, however due to the low resolution of the fixed point parallel outputs, Fig. 3(b1) shows poorer results compared to the serial case. This is mainly because of low resolution of the parallel outputs (6 bits per output) compared to the serial ones (14 bits per output).

### 4.3 Network Behavior

To investigate the applicability of the proposed cellular model in a large scale design, the results simulated by both MATLAB and ASIC of a network model constructed with 16 calcium units are compared. The input function for the simulated network in both cases is given by:

$$IN_{ext_i} = \alpha.\eta + \gamma \tag{14}$$

where  $\eta$  is a random number between 0 to 1 generated by a uniform distribution and other parameters are presented in Table 2. The time domain signals and the corresponding raster plots for the biological and the chip models with two different output protocols (parallel and serial) are demonstrated in Figs. 3(a2–c2) and 3(a3–c3) respectively.

In this figure, temporal evolution of the firing rate shows good agreement between the cellular and biological models. It can be seen that the calcium units excited by a noisy input are destabilised after a certain amount of time. The chip with parallel outputs still shows a bit poorer time domain result (Fig. 3(b2)). However, since in large scale simulations/emulations, the statistical nature of such activities is of interest in general, the trivial disagreement is not significant in the raster plot shown in Fig. 3(b3).

# 5 Analog vs Digital Processing Unit

To verify the suitability of digital platforms as the processing unit for slow biological dynamics, here we qualitatively and quantitatively compare both analog and digital designs on the same case study. In analog designs, computations are performed continuously and based on the physics of the devices (continuous–time continuous– value designs). In contrast, in digital designs, computations are performed upon discrete values of physical variables (discrete-time discrete-value designs). This would require to use data converters in the digital design in order to interact with biological systems.

In general it can be argued that analog designs consume less area compared to their digital equivalents, but when emulating slow biological systems in large-scale and real-time, the size of capacitors in analog designs may become large and impractical. In such cases, digital designs become more area efficient as they can benefit from the slow character of biological dynamics by emulating more units in real-time. Quantitative measurements of the fabricated chip are compared with simulated results reported in [12]. In order for the the proposed digital processing unit to operate properly in real-world applications, it has to be used along with mixed-mode convertors. Therefore, for the sake of comparison, the area and power consumptions of a 10-bit CMOS DAC and a 6-bit ADC are extracted from [13-14] and adapted according to the fabrication technology, operating frequency and power supply used in this paper. In this comparison, ADC and DAC have Nyquist architectures while sub Nyquist could also work in the system, most likely leading to lower power consumption for the digital design. It should be noted that one ADC and one DAC converter can be shared between all 16 units. Table 3 shows almost 4 and 18 times area and power reduction respectively including ADC and DAC modules. Note that the simulated power consumption reported from [12] is static, thus the total average value may be even higher leading to further power reduction for the digital design. Such a reduction in area and power consumptions is only based on 16 pipelined calcium units, while further reductions can be achieved by sharing more units. The maximum number of shared units is limited by the operating frequency and integration time step and can be calculated as  $clock \ core \times dt$ , which in this case is 700k shared units. In large-scale designs the hardware cost consumed by data converters becomes small compared to the processing part and converges to the size of two sets of shift registers. The efficiency of digital designs for the emulation of slow biological dynamics increases with scaling down of the feature size.

On the other hand, the operation of the digital design is characterised by current spikes whose typical duration is very short. Such current spikes observed during measurements of the fabricated chip were managed at layout stage by sizing the width of the power wires and adding decoupling capacitors. Large–scale analog designs (e.g. very long cochlear cascades) are prone to noise, mostly due to thermal fluctuations in physical devices, while in digital designs noise is due to round–off error which can be alleviated significantly at the expense of increased datapath bit length. Moreover, in large–scale analog designs computation is also offset prone due to mismatches in the parameters of the physical devices leading to lower accuracy. While in digital designs, computation is not offset prone since it is insensitive to mismatches in the parameters of the physical devices. Generally, it is also accepted that digital designs have a better scalability property in comparison with their analog counterparts [2]. It should be noted however that in the case of cytomorphic chips in which a few implantable cell–sensing and cell–control units are needed and scaling–up is not critical, analog ultra–low–power designs [15] may offer more practical interfacing (and better performance) with the biological systems. For both strategies (analog and digital), additional circuits for signal conditioning are required, however, as mentioned before, the impact of this on the digital design would be negligible for large scale applications.

# 6 Area and Power Tradeoffs for Large Scale Designs

Nowadays, many biological processes are codified in the form of mathematical dynamical systems. According to the nature of these processes, various time-domain evolution speeds are observed ranging from milliseconds to hours. For example, in bioelectrical systems (e.g. spiking neural networks in the brain) time scales are in the order of milliseconds while in pure biochemical systems (e.g. the expression of proteins in the cell) time scales can be in the order of hours. In digital synchronous designs, the frequency of the system is defined as follows:

$$Operating \ Frequency = \frac{Pipelined \ Units}{dt}$$
(15)

where Pipelined Units is the number of embedded pipelined units in the digital design and dt is the Euler time step. Table 4 illustrates the relation between the speed of the biological dynamical system (CICR model) and the digital ASIC power consumption. As expected, the lower (the slower) the biological timescale, the lower the power consumption. The reason for this decrease is that the dominant part of the power consumption in such

Table 3: Area and Average Power Comparison Between the Proposed ASIC Design and the Equivalent Analog Counterpart Emulating the Same Case Study.

| Quantitative Parameters   | Digital (per calcium unit)+ ADC+ DAC | Analog [7] |
|---------------------------|--------------------------------------|------------|
|                           | measured                             | simulated  |
| Area $(mm^2)$             | $\frac{1.50+0.86+0.35}{16} = 0.169$  | 0.661      |
| Average Power ( $\mu W$ ) | $\frac{0.30+0.64+0.43}{16} = 0.08$   | 1.53       |

Table 4: Average Power Consumption for the Fabricated Digital ASIC Consisting 16 Pipelined Calcium Units with the Area of  $1.5 mm^2$ .

| Bio–timescale (sec) | dt (sec)                                | Freq. (Hz)      | Ave. Power $(\mu W)$ |
|---------------------|-----------------------------------------|-----------------|----------------------|
| 1                   | $\frac{1}{64}$                          | $64 \times 16$  | 0.200                |
| 0.5                 | $\frac{1}{128}$                         | $128 \times 16$ | 0.303                |
| 0.25                | $\frac{11}{256}$                        | $256 \times 16$ | 0.486                |
| 0.125               | $\frac{\frac{200}{1}}{\frac{512}{512}}$ | $512 \times 16$ | 0.849                |
| 0.0625              | $\frac{1}{1024}$                        | $1024\times16$  | 1.586                |

designs is the dynamic switching part demonstrated by [2]:

$$P_{switching} \propto V_{DD}^2 \times C_{load} \times f \tag{16}$$

where  $V_{DD}$  is the power supply voltage,  $C_{load}$  and f are the capacitance load and the number of toggles in the output node. This means that faster biological dynamical systems with higher operating frequency demand more power consumption. As shown in Table 4, when the biological timescale decreases, dt should also decrease leading to higher operating frequency and consequently higher power consumption. However, such an issue can be alleviated by using more silicon area and less sharing of hardware resources especially with the modern fabrication technology capable of being scaled down even to 16nm. For example, IBM has implemented the most dense neuromorphic chip (28 nm technology) comprising 1 and 256 million individual neurons and synapses respectively with only 73 mW power consumption [16].

By scaling down the fabricated chip to 28 nm, the current area would shrink to 0.0096  $mm^2$  (almost 156 times smaller). Such a reduction in area permits the designer to implement more physical calcium units in silicon. Therefore optimum design would share maximum pipelined designs so that the power constraints are met and the implemented design can properly operate in real-time.

# 7 Reconfigurability

The proposed hardware architecture potentially has the ability to be fully reconfigurable and reprogrammable for all possible CICR dynamics with various Hill coefficient values as shown in [11]. This reconfigurability stems from the memory-based architecture of the proposed digital hardware, which is a limitation in practice when it comes to the implementation of dynamical systems in analog CMOS-based design. However, this option was not fabricated in the ASIC as the design was area limited. The reconfiguration ability could be implemented in hardware as depicted by two blue arrows in Fig. 1(a). The simulated hardware results extracted from [11] are shown in Fig. 4 for two other calcium dynamics. It should be noted that such a reconfigurability comes with a negligibly higher hardware cost (about 7% and 9% higher area and power consumption respectively) and calls for properly tuned cellular parameters for a certain output dynamic range.



Figure 4: Simulated hardware waveforms of a single calcium unit for the proposed cellular model with two different Hill coefficient value sets of (a) m=n=1, (b) m=n=2. Cytosolic  $Ca^{2+}$  and  $IP_3$  insensitive pool  $Ca^{2+}$  waveforms are respectively shown in black and red color.

# 8 Truncation Error

As explained in [5], depending on the cellular space dimension, a truncation error can be observed in the system in each clock cycle. By assuming a small enough dt so that its corresponding error is negligible, two error

Table 5: The Measured Time and Phase Domain Errors for a 32–pixel Hardware Cellular Model with Various Hill Functions and Output Resolutions.

| Cellular model | E                       | ζ      | RSME (serial) | RSME (parallel) |
|----------------|-------------------------|--------|---------------|-----------------|
| m=n=p=1        | 4.1425e-06              | 0.1157 | 0.3126        | 0.4356          |
| m=n=p=2        | 5.3421e-06              | 0.1793 | 0.2932        | 0.3934          |
| m=n=2, p=4     | 8.9856e-06              | 0.0293 | 0.1031        | 0.1864          |
| Average        | <mark>6.1567e-06</mark> | 0.1081 | 0.2363        | 0.3384          |

criteria can be defined [5]: I) The quantity

$$\epsilon = \sum_{i=0}^{R-1} \sum_{j=0}^{S-1} \frac{\epsilon_{x_{(i,j)}} + \epsilon_{y_{(i,j)}}}{2R \cdot S}.$$
(17)

where  $\epsilon_x$  and  $\epsilon_y$  are the difference between the continuous state variables and the corresponding cellular values and R and S are the dimensions of the cellular space. Clearly, if the memory bit-length increases, the corresponding  $\epsilon$  decreases, leading to smaller time domain errors. II) The quantity

$$\zeta = \sum_{i=0}^{\tau-1} \frac{x_{(i)}^{+} - X_{(i)}^{+} + y_{(i)}^{+} - Y_{(i)}^{+}}{2\tau}$$
(18)

where  $x^+ = \frac{x_{continuous}^+ \pm \epsilon_x - x_{min}}{\Delta x}$ ,  $y^+ = \frac{y_{continuous}^+ \pm \epsilon_y - y_{min}}{\Delta y}$  and  $\tau$  is the error measurement time, a multiple of dt. By increasing the resolution of the stored velocities (number of pixels), the corresponding  $\zeta$  decreases, leading to smaller time domain errors. As a consequence of the aforementioned truncation errors, we define a root mean square error (RMSE) to measure the time domain error of the proposed cellular system compared to the biological model. The measured time and phase domain errors for a 32-pixel cellular model with various Hill functions and output resolutions are shown in Table 5. It should be stressed that the truncation errors appear in the form of momentary and permanent lag/lead and deviation in the time domain signals where the velocity changes are more erratic and uneven. However, under certain conditions applied upon the memory bit-length and the number of pixels, the cellular trajectories can track fairly the continuous ones in the phase plane as seen in the 32-pixel cellular model with 14-bit (4.10) memory bit-length. The internal error values are the same for both serial and parallel interfaces, however, external ones shown with RSME are different as the interfaces have different resolutions.

# 9 Conclusion

This paper has demonstrated that slow nonlinear dynamical systems describing the behavior of natural biological systems can be efficiently realised on digital platforms. The validity of this claim was confirmed by the emulation of intracellular calcium dynamics on a low power digital cytomimetic chip. Measured results showed that the proposed hardware model emulates the time domain behaviour of the biological model with an acceptable accuracy and applies no serious limitation on the critical path of the system. This implies that a large number of calcium units can be realised in real-time. Therefore, as a proof of concept, it has also been confirmed that the pipelined cellular network containing 16 calcium units can properly operate in real-time. A multi-dimensional comparison with the exact analog design realised in the same technology (AMS  $0.35 \ \mu m$ ) is offered. Results showed that when emulating slow biological dynamics in large-scale and real-time including A/D and D/A conversion, digital cytomimetic designs seem to carry advantages compared to their analog counterparts. However, this conclusion may not be generally valid for other applications with different design specifications.

# 10 Acknowledgement

This work is supported by funding from the Hans Rausing Scholarship.

## References

- [1] C. A. Mead, "Neuromorphic electronic systems," Proceedings of the IEEE, vol. 78, pp. 1629–1636, 1990.
- [2] R. Sarpeshkar, "Analog versus digital: extrapolating from electronics to neurobiology," Neural computation, vol. 10, no. 7, pp. 1601–1638, 1998.

- [3] R. Sarpeshkar, Ultra Low Power Bioelectronics: Fundamentals, Biomedical Applications, and Bio-inspired Systems. Cambridge, U.K.: Cambridge Univ. Press, 2010.
- [4] G. Indiveri, E. Chicca, and R. Douglas, "A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity," *IEEE Trans. Neural Netw.*, vol. 17, no. 1, pp. 211–221, 2006.
- [5] E. Jokar, H. Soleimani and E. M. Drakakis, "Systematic Computation of Nonlinear Bilateral Dynamical Systems With a Novel Low-Power Log-Domain Circuit," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 8, pp. 2013–2025, 2017.
- [6] A. Cassidy, and A. G. Andreou, "Dynamical digital silicon neurons," Proc. Biomed Circuits Syst Conf. (BIOCAS), pp. 289–292, 2008.
- [7] H. Soleimani, M. Bavandpour, A. Ahmadi, and D. Abbott, "Digital implementation of a biological astrocyte model and its application," *IEEE Trans. Neural Netw. Learn. Syst.*, vol. 26, no. 1, pp. 127–139, Jan. 2015.
- [8] H. Soleimani, A. Ahmadi and M. Bavandpour, "Biologically inspired spiking neurons: Piecewise linear models and digital implementation," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 12, pp. 2991– 3004, 2012.
- S. S. Woo, J. Kim and R. Sarpeshkar, "A Cytomorphic Chip for Quantitative Modeling of Fundamental Bio-Molecular Circuits," *Biomedical Circuits and Systems, IEEE Transactions on*, vol. 9, no. 4, pp. 527–542, 2015.
- [10] G. Dupont, and A. Goldbeter, "One-pool model for  $Ca^{2+}$  oscillations involving  $Ca^{2+}$  and inositol 1, 4, 5-trisphosphate as co-agonists for  $Ca^{2+}$  release," *Cell calcium*, vol. 14, no. 4, pp. 311–322, 1993.
- [11] H. Soleimani and E. M. Drakakis, "A compact synchronous cellular model of nonlinear calcium dynamics: simulation and FPGA synthesis results," *IEEE Trans. Biomedical Circuit and Sys*tems, DOI:10.1109/TBCAS.2016.2636183, 2016.
- [12] K. I. Papadimitriou, G. B. V. Stan, and E. M. Drakakis, "Systematic computation of nonlinear cellular and molecular dynamics with low-power CytoMimetic circuits: a simulation study," *PloS one*, vol. 8, no. 2, pp. e53591, 2013.
- [13] A. Van den Bosch, M. A. Borremans, M. S. Steyaert, and w. Sansen, "A 10-bit 1-GSample/s Nyquist current-steering CMOS D/A converter," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 3, pp. 315– 324, 2001.
- [14] C. Sandner, M. Clara, A. Santner, T. Hartig, and F. Kuttner, "A 6-bit 1.2-GS/s Low-power Flash-ADC in 0.13-μm digital CMOS," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 7, pp. 1499–1505, 2005.
- [15] A. Houssein, K. I. Papadimitriou and E. M. Drakakis, "A 1.26 uW Cytomimetic IC Emulating Complex Nonlinear Mammalian Cell Cycle Dynamics: Synthesis, Simulation and Proof-of-Concept Measured Results," *Biomedical Circuits and Systems, IEEE Transactions on*, vol. 9, no. 4, pp. 543–554, 2015.
- [16] TrueNorth Project [Online]. Available: http://www.research.ibm.com/articles/brain-chip.shtml, last visited, Jan. 2016.