This tutorial will cover how DRAM (Dynamic Random Access Memory), or more specifically SDRAM (Synchronized DRAM), works and how you can use it in your projects. We will be using the SDRAM Shield.
What is RAM?
It is first important to understand what RAM is in general before diving into a specific type. RAM is simply a large block of memory that you can access more or less at random very quickly. It provides temporary storage for your design for things like images, video, or sampled data. In some applications it can even be used to store the instructions and data for a processor.
Notice the word temporary I used. This is because RAM is a volatile form of memory. That means without power, the contents of the memory will be lost.
RAM is organized into banks, rows, and columns. I like to think of RAM as a set of notebooks where each notebook is a bank, each page is a row, and each line is a column. Each bank, or notebook, can be accessed independently of the other banks. Each bank is comprised of many rows and each row has many columns. To access a specific piece of data you must specify all three pieces of information, the bank, row, and column.
The actual protocol required to access data depends on the type of RAM being used. However, all RAM breaks our a very similar interface. You generally have an address input, which specifies the row and column, a bank select input, which specifies the bank, a data input/output, which is used for reading and writing data, and a few control signals.
How DRAM works
So now you know that any type of RAM is used to store large amount of data, how does it actually store this data?
The basic storage element behind DRAM is the capacitor. Just as a basic refresher, a capacitor is a device that is able to store a charge. You can think of them much like a balloon. Just as you can fill a balloon with some air, you can fill a capacitor with some charge.
The basic cell in DRAM looks like the following.
There is simply a capacitor that stores a charge, and a transistor that allows charge to either be put into the capacitor or taken out.
These cells are arranged into a large 2D array of rows and columns. These are the same rows and columns from before.
When you write data to DRAM, charge is placed on capacitors that should have a value of 1, but no charge is placed on capacitors that have a value of 0.
When you read data from DRAM, the charge on the capacitor is measured using a circuit called a sense amplifier. If the sense amplifier detected charge on the capacitor then it outputs a 1, otherwise it assumes the cell was a 0.
There are a two main problems to the fundamental design of DRAM. First, to read the charge from the capacitor, the charge must be drained. This causes all reads to be destructive. Once you read a piece of data from DRAM, the value is no longer being stored in the memory array. To deal with this, the data must be written back into the array when you are done with it. This is called precharging.
To make the interface to DRAM a bit more efficient, an entire row is read into a buffer in the DRAM. The process of reading a row into that buffer is referred to as opening or activating the row. Once a row is open, data can be read or written to any columns in that row without having to open it again.
However, only one row per bank can be open at a time. To read from a different row in the same bank, you must first precharge the current row, then open the new row.
The second fundamental flaw of DRAM, and the reason it is called dynamic RAM, is that capacitors leak charge. That means that once a charge is stored on a capacitor, it will start losing that charge. This happens either through the transistor connected to it, or through the capacitor itself. What this means for your data is that, if neglected, the values stored will be lost.
The fix to this problem is to periodically refresh each row. A refresh consists of simply reading a row then writing it back into the array. This process ensures that the capacitors retain their charge.
The amount of time a row can go between refreshes depends on the DRAM. However, the SDRAM chip on the SDRAM Shield, must be refreshed every 64ms.
Generally, SDRAM will be able to perform the refresh operation for you. However, you still must tell it when to refresh.
DRAM vs SDRAM
The difference between these two types of RAM is that SDRAM is synchronous and DRAM is not. All this means is that the SDRAM uses a clock while DRAM does not. The benefits to SDRAM are that inputs and outputs are synchronized to whatever it is connected to, in our case the FPGA, as well as some speed benefits due to pipelining.
SDRAM is much more common than plain DRAM.
It is also worth noting that DDR (Double Data Rate) RAM, usually heard in the context of computers, is a form of SDRAM.
DRAM vs SRAM
The difference between DRAM and SRAM is a bit more interesting. SRAM operates fundamentally differently than DRAM. It doesn't store data on capacitors, but instead uses two inverters back to back.
This solves the two problems discussed earlier about destructive reads and forgetting the value. However, this comes at a price, literally. SRAM is much more expensive than DRAM due to the fact that the technology is much less dense. Each cell in SRAM is much larger than each cell of DRAM, meaning you can't pack nearly as many into the same area.
SRAM is, however, faster and uses less power than DRAM. Because of this, it is still used frequently in digital systems for things like caches. Modern CPUs have something like 8-16MB of very fast SRAM cache, but the computer can have 1000x that much (8+GB) DRAM.
The Controller
Create a new project based on the Base Project. We now need to add the SDRAM controller to our project. In the Component Selector, select Controllers/SDRAM Controller. We also need the pin definitions for the SDRAM Shield, so also check off Constraints/SDRAM Shield. Add these to your project.
Open up the sdram.luc file and take a look at it. It can be helpful to have the datasheet for the SDRAM chip open.
// Interface to the SDRAM chip
global Sdram {
// Outputs
struct out {
clk, // clock
cle, // clock enable
cs, // chip select
cas, // column address strobe
ras, // row address strobe
we, // write enable
dqm, // data tri-state mask
bank [2], // bank address
addr [13] // column/row address
}
// Inouts
struct inOut {
dq [8] // data bus
}
}
module sdram (
input clk, // clock
input rst, // reset
// SDRAM interface
inout<Sdram.inOut> sdramInOut,
output<Sdram.out> sdramOut,
// Memory interface
input<Memory.master> memIn,
output<Memory.slave> memOut
) {
// Commands for the SDRAM
//const CMD_UNSELECTED = 4b1000; // Unused
const CMD_NOP = 4b0111; // No operation
const CMD_ACTIVE = 4b0011; // Activate a row
const CMD_READ = 4b0101; // Start a read
const CMD_WRITE = 4b0100; // Start a write
//const CMD_TERMINATE = 4b0110; // Unused
const CMD_PRECHARGE = 4b0010; // Precharge a row
const CMD_REFRESH = 4b0001; // Perform a refresh
const CMD_LOAD_MODE_REG = 4b0000; // Load mode register
.clk(clk) {
.rst(rst) {
fsm state = {
INIT, // Initial state
WAIT, // Generic wait state
PRECHARGE_INIT, // Start initial precharge
REFRESH_INIT_1, // Perform first refresh
REFRESH_INIT_2, // Perform second refresh
LOAD_MODE_REG, // Load mode register
IDLE, // Main idle state
REFRESH, // Perform a refresh
ACTIVATE, // Activate a row
READ, // Start a read
READ_RES, // Read results
WRITE, // Perform a write
PRECHARE // Precharge bank(s)
};
}
// DFF to store the next state to go into after WAIT state
dff next_state[state.WIDTH];
// IO buffer flip-flops are important for timing
// The #IOB parameter tells the tools to pack the
// dff into the IO buffer which is important for
// consistant timing.
dff cle (#IOB(1)); // clock enable
dff dqm (#IOB(1)); // data mask
dff cmd [4] (#IOB(1)); // command (we, cas, ras, cs)
dff bank [2] (#IOB(1)); // bank select
dff a [13] (#IOB(1)); // address
dff dq [8] (#IOB(1)); // data output
dff dqi [8] (#IOB(1)); // data input
dff dq_en; // data bus output enable
dff addr [23]; // operation address
dff data [32]; // operation data
dff rw_op; // operation read/write flag
dff out_valid; // output valid
dff delay_ctr [16]; // delay counter
dff byte_ctr [2]; // byte counter
dff refresh_ctr [10]; // refresh counter
dff refresh_flag; // refresh counter expired flag
dff ready; // controller ready flag
dff saved_rw; // saved command read/write flag
dff saved_addr [23]; // saved command address
dff saved_data [32]; // saved command data
dff row_open [4]; // row in bank open flags
dff row_addr [4][13]; // open row addresses
dff precharge_bank [3]; // bank(s) to precharge
}
// xil_XXX modules aren't real modules but rather
// hardware primitives inside the FPGA.
// The OODR2 is used to output the FPGA clock to
// an output pin because a clock can't be directly
// routed as an output.
xil_ODDR2 oddr (
#DDR_ALIGNMENT("NONE"),
#INIT(0),
#SRTYPE("SYNC")
);
// The IODELAY2 is used to delay the clock a bit
// in order to align the data with the clock edge.
// These settings assume a 100MHz clock and the
// SDRAM Shield being stacked next to the Mojo.
xil_IODELAY2 iodelay (
#IDELAY_VALUE(0),
#IDELAY_MODE("NORMAL"),
#ODELAY_VALUE(100),
#IDELAY_TYPE("FIXED"),
#DELAY_SRC("ODATAIN"),
#DATA_RATE("SDR")
);
// Connections
always {
// Connect the dffs to the outputs
sdramOut.cle = cle.q;
sdramOut.cs = cmd.q[3];
sdramOut.ras = cmd.q[2];
sdramOut.cas = cmd.q[1];
sdramOut.we = cmd.q[0];
sdramOut.dqm = dqm.q;
sdramOut.bank = bank.q;
sdramOut.addr = a.q;
sdramOut.clk = iodelay.DOUT; // delayed clock
sdramInOut.enable.dq = dq_en.q;
sdramInOut.write.dq = dq.q;
memOut.data = data.q;
memOut.busy = !ready.q;
memOut.valid = out_valid.q;
// Connections for the IODELAY2
iodelay.ODATAIN = oddr.Q; // use the ODDR2 output as the source
iodelay.IDATAIN = 0;
iodelay.T = 0;
iodelay.CAL = 0;
iodelay.IOCLK0 = 0;
iodelay.IOCLK1 = 0;
iodelay.CLK = 0;
iodelay.INC = 0;
iodelay.CE = 0;
iodelay.RST = 0;
// Connections for the ODDR2
oddr.C0 = clk;
oddr.C1 = ~clk;
oddr.CE = 1;
oddr.D0 = 0; // using 0 for D0 and 1 for D1 inverts the clock
oddr.D1 = 1; // because D0 is output on the rising edge of C0
oddr.R = 0;
oddr.S = 0;
}
// Logic
always {
// default values
dqi.d = sdramInOut.read.dq;
dq_en.d = 0;
cmd.d = CMD_NOP;
dqm.d = 0;
bank.d = 0;
a.d = 0;
out_valid.d = 0;
byte_ctr.d = 0;
// Continuously increment the refresh counter
// If it reaches 750, 7.5us has elapsed and a refresh needs to happen
// The maximum delay is 7.813us
refresh_ctr.d = refresh_ctr.q + 1;
if (refresh_ctr.q > 750) {
refresh_ctr.d = 0; // reset the timer
refresh_flag.d = 1; // set the refresh flag
}
// If we are ready for a new command and we get one...
if (ready.q && memIn.valid) {
saved_rw.d = memIn.write; // save the type
saved_data.d = memIn.data; // save the data
saved_addr.d = memIn.addr; // save the address
ready.d = 0; // don't accept new commands
}
case (state.q) {
///// INTIALIZE /////
state.INIT:
ready.d = 0; // not ready while initializing
row_open.d = 0; // no rows open yet
cle.d = 1; // enable the clock
state.d = state.WAIT; // need to wait
delay_ctr.d = 10100; // for 101us (100us minimum)
next_state.d = state.PRECHARGE_INIT; // move to PRECHARGE_INIT after
state.WAIT:
delay_ctr.d = delay_ctr.q - 1; // decrement counter
if (delay_ctr.q == 0) { // if 0
state.d = next_state.q; // go to the next state
if (next_state.q == state.WRITE) { // if it's WRITE
dq_en.d = 1; // enable the data bus
dq.d = data.q[7:0]; // and output the first byte
}
}
state.PRECHARGE_INIT:
cmd.d = CMD_PRECHARGE; // need to precharge all banks
a.d[10] = 1; // all banks
state.d = state.WAIT; // need to wait after
next_state.d = state.REFRESH_INIT_1; // move to REFRESH_INIT_1 after
delay_ctr.d = 0; // delay 20ns (min 15ns)
state.REFRESH_INIT_1:
cmd.d = CMD_REFRESH; // need to perform two refreshes
state.d = state.WAIT; // need to wait after a refresh
next_state.d = state.REFRESH_INIT_2; // move to REFRESH_INIT_2 after
delay_ctr.d = 7; // delay 90ns (min 66ns)
state.REFRESH_INIT_2:
cmd.d = CMD_REFRESH; // need to perform two refreshes
state.d = state.WAIT; // need to wait after a refresh
next_state.d = state.LOAD_MODE_REG; // move to LOAD_MODE_REG after
delay_ctr.d = 7; // delay 90ns (min 66ns)
state.LOAD_MODE_REG:
cmd.d = CMD_LOAD_MODE_REG; // load the mode register
// Reserved, Burst Access, Standard Op, CAS = 2, Sequential, Burst = 4
a.d = c{3b000, 1b0, 2b00, 3b010, 1b0, 3b010};
state.d = state.WAIT; // need to wait
next_state.d = state.IDLE; // move to IDLE after
delay_ctr.d = 1; // delay 30ns (min 2 clock cycles)
refresh_flag.d = 0; // don't need refresh
refresh_ctr.d = 1; // reset the counter
ready.d = 1; // we can now accept commands
///// IDLE STATE /////
state.IDLE:
if (refresh_flag.q) { // if we need to perform a refresh
state.d = state.PRECHARE; // first precharge everything
next_state.d = state.REFRESH; // then refresh
precharge_bank.d = 3b100; // precharge all banks
refresh_flag.d = 0; // refresh was taken care of
} else if (!ready.q) { // if we have a waiting command
ready.d = 1; // we can accept another now
rw_op.d = saved_rw.q; // save the command type
addr.d = saved_addr.q; // save the address
if (saved_rw.q) // if write
data.d = saved_data.q; // save the data
// if there is already an open row
if (row_open.q[saved_addr.q[9:8]]) {
// if the row is the one we want
if (row_addr.q[saved_addr.q[9:8]] == saved_addr.q[22:10]) {
// the row is already open so just perform the operation
if (saved_rw.q)
state.d = state.WRITE;
else
state.d = state.READ;
} else { // need to open the row
state.d = state.PRECHARE; // first need to close current one
precharge_bank.d = c{1b0, saved_addr.q[9:8]}; // row to close
next_state.d = state.ACTIVATE; // then open the correct one
}
} else { // nothing is already open
state.d = state.ACTIVATE; // so just open the row
}
}
///// REFRESH /////
state.REFRESH:
cmd.d = CMD_REFRESH; // send refresh command
state.d = state.WAIT; // need to wait
next_state.d = state.IDLE; // go back to IDLE after
delay_ctr.d = 6; // wait 8 cycles, 80ns (min 66ns)
///// ACTIVATE /////
state.ACTIVATE:
cmd.d = CMD_ACTIVE; // activate command
a.d = addr.q[22:10]; // row address
bank.d = addr.q[9:8]; // bank select
delay_ctr.d = 0; // delay 20ns (15ns min)
state.d = state.WAIT; // need to wait
// set the next state based on the command
next_state.d = rw_op.q ? state.WRITE : state.READ;
row_open.d[addr.q[9:8]] = 1; // row is now open
row_addr.d[addr.q[9:8]] = addr.q[22:10]; // address of row
///// READ /////
state.READ:
cmd.d = CMD_READ; // read command
a.d = c{2b0, 1b0, addr.q[7:0], 2b0}; // address of column
bank.d = addr.q[9:8]; // bank select
state.d = state.WAIT; // need to wait
next_state.d = state.READ_RES; // go to READ_RES after
delay_ctr.d = 2; // wait 3 cycles
state.READ_RES:
byte_ctr.d = byte_ctr.q + 1; // count 4 bytes
data.d = c{dqi.q, data.q[31:8]}; // shift in each byte
if (byte_ctr.q == 3) { // if we read all 4 bytes
out_valid.d = 1; // output is valid
state.d = state.IDLE; // return to IDLE
}
///// WRITE /////
state.WRITE:
byte_ctr.d = byte_ctr.q + 1; // count 4 bytes
if (byte_ctr.q == 0) // first byte is write command
cmd.d = CMD_WRITE; // send command
dq.d = data.q[7:0]; // output the data
data.d = data.q >> 8; // shift data
dq_en.d = 1; // enable data bus output
a.d = c{2b0, 1b0, addr.q[7:0], 2b0}; // column address
bank.d = addr.q[9:8]; // bank select
if (byte_ctr.q == 3) // if we wrote all 4 bytes
state.d = state.IDLE; // return to IDLE
///// PRECHARGE /////
state.PRECHARE:
cmd.d = CMD_PRECHARGE; // precharge command
a.d[10] = precharge_bank.q[2]; // all banks flag
bank.d = precharge_bank.q[1:0]; // single bank select
state.d = state.WAIT; // need to wait
delay_ctr.d = 0; // delay 20ns (15ns min)
if (precharge_bank.q[2]) // if all banks flag
row_open.d = 0; // they are all closed
else // otherwise
row_open.d[precharge_bank.q[1:0]] = 0; // only selected was closed
default: // shouldn't be here
state.d = state.INIT; // restart the FSM
}
}
}
This module uses some new advanced features that we haven't covered yet so let's get right into it.
Structs
The most obvious new part to this module is the the use of structs. In this case, two structs are declared in the global namespace Sdram.
// Interface to the SDRAM chip
global Sdram {
// Outputs
struct out {
clk, // clock
cle, // clock enable
cs, // chip select
cas, // column address strobe
ras, // row address strobe
we, // write enable
dqm, // data tri-state mask
bank [2], // bank address
addr [13] // column/row address
}
// Inouts
struct inOut {
dq [8] // data bus
}
}
You can declare structs inside your module, but they are then local to your module and can only be used internally (not in port definitions).
A struct definition consists of the struct keyword, followed by the name of the struct, followed by the list of the struct's members. A member declaration consists of a name, an optional struct type, and an optional array size.
Take a look at the following example.
struct color {
red[8],
green[8],
blue[8]
}
struct display {
x[12],
y[12],
pixel<color>
}
In this example we have two structs, color and display. The color struct has three members, red, green, and blue, each an 8bit array.
The display struct is a bit more complex. The first two elements, x and y are 12bit arrays, but the third, pixel is itself a struct of type color.
The <name> notation is used to specify the struct type. This can be used with struct members, input, output, inout, sig, and dff types.
Accessing Struct Members
Now that we know how to declare a struct, we need to be able to access its members. Let's look at another example.
struct foo {
a,
b[4],
c[8]
}
.clk(clk) {
dff<foo> bar;
dff<foo> cat[2];
}
sig<foo> dog;
always {
bar.d.b = bar.q.c[3:0]; // d/q must be selected first
dog = bar.q; // structs can be assigned directly to others of the same type
cat.d[0] = dog; // need to select a single element before accessing the struct
cat.d[1].a = bar.q.a;
}
The members of a struct are accessed the same way as the d/q signals of a dff or the signals of a module instance.
The SDRAM controller uses two other structs defined in memory_bus.luc. These are split into a different file as they may be used by other modules besides the SDRAM controller.
The Interface
Take a look at the ports of the SDRAM controller.
module sdram (
input clk, // clock
input rst, // reset
// SDRAM interface
inout<Sdram.inOut> sdramInOut,
output<Sdram.out> sdramOut,
// Memory interface
input<Memory.master> memIn,
output<Memory.slave> memOut
) {
We have the canonical clock and reset inputs. We then have a bunch of IO condensed into 4 lines by using structs. The connection to the SDRAM chip consists of an output and an inout. The interface from the controller to the rest of the FPGA is broken into an input and an output. Take a look at the struct definitions for details of their contents.
Commands
The SDRAM chip accepts a series of commands that we define as constants for easier use.
// Commands for the SDRAM
//const CMD_UNSELECTED = 4b1000; // Unused
const CMD_NOP = 4b0111; // No operation
const CMD_ACTIVE = 4b0011; // Activate a row
const CMD_READ = 4b0101; // Start a read
const CMD_WRITE = 4b0100; // Start a write
//const CMD_TERMINATE = 4b0110; // Unused
const CMD_PRECHARGE = 4b0010; // Precharge a row
const CMD_REFRESH = 4b0001; // Perform a refresh
const CMD_LOAD_MODE_REG = 4b0000; // Load mode register
These commands connect to the CS, RAS, CAS, and WE pins on the SDRAM. See page 31 of the datasheet if you want to know more.
The FSM
The entire controller is based around a single FSM.
.clk(clk) {
.rst(rst) {
fsm state = {
INIT, // Initial state
WAIT, // Generic wait state
PRECHARGE_INIT, // Start initial precharge
REFRESH_INIT_1, // Perform first refresh
REFRESH_INIT_2, // Perform second refresh
LOAD_MODE_REG, // Load mode register
IDLE, // Main idle state
REFRESH, // Perform a refresh
ACTIVATE, // Activate a row
READ, // Start a read
READ_RES, // Read results
WRITE, // Perform a write
PRECHARE // Precharge bank(s)
};
}
The relations of these states can be summed up in the state diagram shown below. The WAIT state wasn't shown for clarity.
When the board is powered on (or reset) the FSM starts in the INIT state. SDRAM requires a bit of initialization before you can read and write to it. This is also covered in the datasheet (page 42) for those curious.
After the board is initialized, it sits in the IDLE state until one of two things happen, either it's time to perform a refresh or there is a pending operation.
First, let's talk about the refresh. To manage the refreshing, there is a timer that tells the controller to send another refresh operation. The SDRAM requires 8,192 refresh commands to be sent every 64ms. That means you can either send a refresh command every 7.813µs or all 8,192 commands in a batch every 64ms. To provide a more uniform interface, this controller sends the refresh commands evenly spaced. This limits the maximum amount of time the controller will be busy doing refreshes. In some applications where you need very fast burst speeds, but have some known down time, performing burst refreshing can be better.
When a read or write command is pending, the controller first checks to see if the row is open. If the requested row is already open, life is great, it simply reads or writes to the row. If the row isn't open then it first opens the row before performing the operation. The worst case is if there is already another row open. In this case the other row must be precharged, before the controller can open the new row and perform the operation.
Each of these operations has some number of cycles the SDRAM requires to complete (the reason for the WAIT state). These sometimes vary with the clock frequency (in other words, they have a set amount of real time). This controller assumes a clock rate of 100MHz. This is important for other reasons as well that will be discussed a little later. All of these delays and timing specifications can be found in the datasheet (many of them are on pages 27-28).
This mostly sums up how the controller works. If you want an even deeper understanding, you need to take a look at the rest of the code in the controller as well as the SDRAM datasheet.
However, there is some advanced voodoo magic going on in the controller code that is worth mentioning.
Dealing with the Hardware
When you start interfacing with a relatively high speed external device, you start having to deal with FPGA specific details. There are two hardware related issues addressed in the controller. The first is that the FPGA can't route a clock signal directly to an output pin. This is because the clock and general logic of an FPGA share different routing resources and there isn't a way for the clock signal to move back into the general routing system. However, we can use an ODDR2 primitive to compensate for this.
// The OODR2 is used to output the FPGA clock to
// an output pin because a clock can't be directly
// routed as an output.
xil_ODDR2 oddr (
#DDR_ALIGNMENT("NONE"),
#INIT(0),
#SRTYPE("SYNC")
);
// Connections for the ODDR2
oddr.C0 = clk;
oddr.C1 = ~clk;
oddr.CE = 1;
oddr.D0 = 0; // using 0 for D0 and 1 for D1 inverts the clock
oddr.D1 = 1; // because D0 is output on the rising edge of C0
oddr.R = 0;
oddr.S = 0;
This is the instantiation of a ODDR2 module. If you look the project files, you will notice there is no xil_ODDR2.luc file. This is because this isn't really a module, but rather an FPGA primitive. ODDR2 or Output Double Data Rate 2, is a primitive that is generally used to output data on both the rising and falling edges of the clock (hence, double data rate). However, in this case we are using the ODDR2 to simply output our clock signal. You can't output the clock signal directly due to how the FPGA is structured internally. So instead, you can use the ODDR with the data pins wired to 0 or 1.
When C0 has a rising edge, D0 is output until C1 has a rising edge. At that point D1 is output. Notice that in our case C1 is actually just the clock inverted. That means D0 is output when the clock rises and D1 is output when the clock falls.
You may be thinking now "Ok... but if D0 is output when the clock rises, shouldn't D0 be 1 and D1 be 0"? Very good my young padawan. That is exactly right if you wanted to output to be the same as the clock. However, we don't want this. We want the output clock to be our clock inverted!
Why the *&^$# would we want the clock to be inverted? Wouldn't that mean that the SDRAM would read it's inputs and change it's outputs on our falling edge? Oh wait... that's exactly what we want! We want this because that gives both devices half a clock cycle for their output to become stable before the other device. This all has to do with satisfying setup and hold times of both devices. If you don't know what the means, check out the External IO Tutorial.
Timing is the other hardware related issue we need to account for and we will use another FPGA primitive, the IODELAY2, to deal with it.
// The IODELAY2 is used to delay the clock a bit
// in order to align the data with the clock edge.
// These settings assume a 100MHz clock and the
// SDRAM Shield being stacked next to the Mojo.
xil_IODELAY2 iodelay (
#IDELAY_VALUE(0),
#IDELAY_MODE("NORMAL"),
#ODELAY_VALUE(100),
#IDELAY_TYPE("FIXED"),
#DELAY_SRC("ODATAIN"),
#DATA_RATE("SDR")
);
// Connections for the IODELAY2
iodelay.ODATAIN = oddr.Q; // use the ODDR2 output as the source
iodelay.IDATAIN = 0;
iodelay.T = 0;
iodelay.CAL = 0;
iodelay.IOCLK0 = 0;
iodelay.IOCLK1 = 0;
iodelay.CLK = 0;
iodelay.INC = 0;
iodelay.CE = 0;
iodelay.RST = 0;
As you may have guessed from the name, the IODELAY2 block provides a delay to inputs and outputs. In this case we are using it to delay the clock being output to the SDRAM. There are a lot of features of these primitives that aren't being used here. However if you want t check them out in their full glory, take a look at the UG381 document from Xilinx (ODDR2 starts on page 62 and IODELAY2 starts on page 74).
We need the delay because simply inverting the clock doesn't quite ensure timing is met. We need to shift it a little more.
The important values here are DELAY_SRC is set to make the IODELAY2 delay an output and ODELAY_VALUE is how much we want to delay the signal.
The actual amount of delay that is given per step of ODELAY_VALUE is a bit fuzzy and will actually vary over temperature and voltage in the Spartan 6 chip. However, with a 100MHz clock, using a delay of 100 (maximum is 255) ensures that the setup and hold times are being met. This delay was found empirically by running lots of tests checking for read/write errors.
The last piece to the puzzle is making sure that the input and output registers are packed into IOBs, or Input Output Buffers.
// IO buffer flip-flops are important for timing
// The #IOB parameter tells the tools to pack the
// dff into the IO buffer which is important for
// consistant timing.
dff cle (#IOB(1)); // clock enable
dff dqm (#IOB(1)); // data mask
dff cmd [4] (#IOB(1)); // command (we, cas, ras, cs)
dff bank [2] (#IOB(1)); // bank select
dff a [13] (#IOB(1)); // address
dff dq [8] (#IOB(1)); // data output
dff dqi [8] (#IOB(1)); // data input
The dff type has a parameter, IOB, that, when set to 1, will mark that flip-flop to be packed into an IOB.
What the heck is an IOB? An IOB is simply a flip-flop that is embedded in the pin of the FPGA. They aren't in the typical FPGA fabric, but rather right at the inputs and outputs.
We want to make sure these registers are packed into IOBs to ensure that there are no additional delays due to the signal needing to propagate through the FPGA.
To make sure these registers are actually packed into the IOB, their output/input can't connect to anything other than the top level output/input. If you tried to read these signals in some other part of your design, the tools would be forced to pull the flip-flop out of the IOB, possibly messing up timing. This is why it is important that these signals go directly to the top level inputs/outputs.
Xilinx Primitives
At the time of writing this, the IODELAY2 and ODDR2 are the only primitives currently supported by the Mojo IDE. All the supported primitives can be found by typing xil and the auto-complete will list the known modules (the primitives are always prefixed with xil_). More primitives will be added over time.
This sums up how the controller works, but now we need to use it for something.
Using the Controller
What good is a fancy SDRAM controller if we don't even use it? NO GOOD that what! To demonstrate how to use the controller we are going to create a tester. Our module will write a bunch of stuff to the RAM then read it back to make sure the contents are still there and correct.
There is one big problem with creating a tester like this. What do we write to the RAM? It has to be something easily generated because we don't have enough memory to memorize all the values. If we did we wouldn't be using the SDRAM. We could use part of the address, but this causes a very artificial pattern that can fail to detect some problems.
Instead we will use a pseudo-random number generator. The key word there is pseudo. Which is layman's terms translates to not-really-a-random number generator. This is something that generates random-looking numbers but they are actually completely predictable. That's a great property for us because we need to be able to regenerate the exact same 8,388,608 long sequence of numbers to verify our write.
From the components library add Math/Pseudo-random Number Generator to your project.
module pn_gen #(
// SEED needs to always be non-zero
// Since seed is XORed with the 32MSBs of SEED, we need the 96 LSBs to be nonzero.
SEED = 128h843233523a613966423b622562592c62: SEED.WIDTH == 128 && SEED[95:0] != 0
)(
input clk, // clock
input rst, // reset
input next, // generate next number flag
input seed [32], // seed used on reset
output num [32] // "random" number output
) {
.clk(clk) {
dff x[32], y[32], z[32], w[32]; // state storage
}
sig t [32]; // temporary results
always {
num = w.q; // output is from w
t = x.q ^ (x.q << 11); // calculate intermediate value
if (next) { // if we need a new number
x.d = y.q; // shift values along
y.d = z.q;
z.d = w.q;
// magic formula from Wikipedia
w.d = w.q ^ (w.q >> 19) ^ t ^ (t >> 8);
}
// Manually reset the flip-flops so we can change the reset value
if (rst) {
x.d = SEED[0+:32];
y.d = SEED[32+:32];
z.d = SEED[64+:32];
w.d = SEED[96+:32] ^ seed;
}
}
}
This algorithm is called Xorshift and it simply is a ported version of one presented on Wikipedia.
This module will generate a new number each time next is high. It can be reset to start the sequence over. If the value of seed changes, the sequence will be different.
This type of number generator is great for hardware because it only uses XOR and shift operations. Both of which are really cheap. However, it isn't a super great random number generator and should not be used for crypto purposes where it isn't good enough to look random.
The Memory Interface
Before we get into our tester module, we need to understand the interface used for reading and writing the SDRAM. Take a look at memory_bus.luc.
// Generic Memory Interface
global Memory {
// Memory slave outputs/master inputs
struct slave {
data [32], // data read
valid, // data valid
busy // device busy
}
// Memory master outputs/slave inputs
struct master {
data [32], // data to write
valid, // data valid
addr [23], // address to write/read
write // 1 = write, 0 = read
}
}
The interface consists of a master and a slave. The slave in this case is the SDRAM controller (the one receiving commands) and we will play the role of the master by issuing commands.
Whenever we want to issue a command, we need to first make sure that slave.busy is 0. This indicates that the controller can accept a new command.
To issue a write command we set master.write to 1, master.addr to the address we want to write to, master.data to the value we want to write, and finally master.valid to 1 to indicate a new command.
To perform a read we set master.write to 0, master.addr to the address to read, and master.valid to 1. The value of master.data is ignored. We then need to wait for slave.valid to be 1. When it is 1, slave.data is the value we requested. Note that slave.busy may go back to 0 before the read is actually complete. This is because the busy flag only says when the controller can accept a new request, not necessarily when it idle. If you issue multiple read requests, they will be processed in the order they are received.
The Tester
Create a new module named ram_test and copy the following into it.
module ram_test (
input clk, // clock
input rst, // reset
output<Memory.master> memOut, // memory interface
input<Memory.slave> memIn,
output leds [8] // status LEDs
) {
.clk(clk){ .rst(rst) {
fsm state = {WRITE, READ}; // states
dff addr [23]; // current address
dff error [7]; // number of errors
dff seed [32]; // seed for each run
}
pn_gen pn_gen; // pseudo-random number generator
}
always {
// Show the state and number of errors on the LEDs
leds = c{state.q == state.READ, error.q};
pn_gen.seed = seed.q; // use seed.q as the seed
pn_gen.next = 0; // don't generate new numbers
pn_gen.rst = rst; // connect rst by default
memOut.addr = addr.q; // use addr.q as the address
memOut.write = 1bx; // don't care
memOut.data = pn_gen.num; // use the pseudo-random number as data
memOut.valid = 0; // invalid
case (state.q) {
state.WRITE:
if (!memIn.busy) { // if RAM isn't busy
pn_gen.next = 1; // generate a new number
addr.d = addr.q + 1; // increment the address
memOut.write = 1; // perform a write
memOut.valid = 1; // command is valid
if (addr.q == 23x{1}) { // if address is maxed
addr.d = 0; // reset to 0
state.d = state.READ; // switch states
pn_gen.rst = 1; // reset the number generator
}
}
state.READ:
if (!memIn.busy) { // if RAM isn't busy
addr.d = addr.q + 1; // increment the address
memOut.valid = 1; // command is valid
memOut.write = 0; // perform a read
if (addr.q == 23x{1}-1) // if address is almost max
seed.d = seed.q + 1; // generate a new seed
if (addr.q == 23x{1}) { // if address is maxed
addr.d = 0; // reset to 0
state.d = state.WRITE; // switch state
pn_gen.rst = 1; // reset the number generator
}
}
if (memIn.valid) { // if new data
pn_gen.next = 1; // go to the next number
// if the data doesn't match the random number and the
// error counter isn't maxed out
if (memIn.data != pn_gen.num && !&error.q)
error.d = error.q + 1; // increment the error counter
}
default: // should never get here
state.d = state.WRITE; // get to a known state
}
}
}
Our tester has two states, WRITE and READ. We start in the WRITE state and fill up the RAM with random stuff. Once the RAM is full, we reset the number generator and move to the READ state.
In the READ state we read each value back and generate the same sequence of numbers again. If the values we read back don't match the number in our sequence, we increment the error counter. The error counter is setup to saturate at 127 error so if there are a ton of errors it will simply max out.
We need to be able to see what our tester is doing so we will use the LEDs to show the status. We hook up leds[7] to the state (so we know when it's reading or writing) and the rest to the error counter.
Generating the Clock
If you've been paying attention (you have haven't you?) you probably noticed that the SDRAM controller says it assumes a clock of 100MHz. However, the Mojo's clock is only 50Mhz. Whatever will do? Luckily the FPGA has a super rad circuit called a PLL that lets you generate new clocks. Even more rad is that there are tools to help us set it up.
We are going to be using the Core Generator tool from Xilinx. Support for this tool is built into the Mojo IDE, so simply click Project->Launch CoreGen.
Under FPGA Features and Design/Clocking double click on Clocking Wizard.
You're a clocking wizard Harry!
Change the name to just clk_wiz because the default is UGLY. Also uncheck Phase alignment (we don't care about that) and set the primary input clock to 50MHz.
On the next page you shouldn't have to change anything as CLK_OUT1 is already set to generate 100MHz.
On page 3, uncheck everything because again, we don't care.
Skip page 4 and on page 5, remove the 1 from the signal names. We only have one input and one output so why bother labeling them 1?
Finally, click Generate.
Once it finishes generating the core, you can close all the CoreGen windows. The core should automagically (it's a word, trust me) be under the Cores section of your project.
The Top Level
Now that we have all the pieces we need to hook it all up.
If you take a look at the sdram_shield.ucf file we added in the beginning of the tutorial, you'll notice that there are only two signals defined.
NET "sdramOut<0>" LOC = P5 | IOSTANDARD = LVTTL | SLEW = FAST; # clk
NET "sdramOut<1>" LOC = P2 | IOSTANDARD = LVTTL | SLEW = FAST; # cle
NET "sdramOut<6>" LOC = P6 | IOSTANDARD = LVTTL | SLEW = FAST; # cs
NET "sdramOut<2>" LOC = P115 | IOSTANDARD = LVTTL | SLEW = FAST; # cas
NET "sdramOut<5>" LOC = P111 | IOSTANDARD = LVTTL | SLEW = FAST; # ras
NET "sdramOut<3>" LOC = P112 | IOSTANDARD = LVTTL | SLEW = FAST; # we
NET "sdramOut<4>" LOC = P114 | IOSTANDARD = LVTTL | SLEW = FAST; # dqm
NET "sdramOut<7>" LOC = P116 | IOSTANDARD = LVTTL | SLEW = FAST; # bank[0]
NET "sdramOut<8>" LOC = P117 | IOSTANDARD = LVTTL | SLEW = FAST; # bank[1]
NET "sdramOut<9>" LOC = P118 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[0]
NET "sdramOut<10>" LOC = P119 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[1]
NET "sdramOut<11>" LOC = P120 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[2]
NET "sdramOut<12>" LOC = P121 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[3]
NET "sdramOut<13>" LOC = P138 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[4]
NET "sdramOut<14>" LOC = P139 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[5]
NET "sdramOut<15>" LOC = P140 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[6]
NET "sdramOut<16>" LOC = P141 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[7]
NET "sdramOut<17>" LOC = P142 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[8]
NET "sdramOut<18>" LOC = P143 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[9]
NET "sdramOut<19>" LOC = P137 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[10]
NET "sdramOut<20>" LOC = P144 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[11]
NET "sdramOut<21>" LOC = P1 | IOSTANDARD = LVTTL | SLEW = FAST; # addr[12]
NET "sdramInOut<0>" LOC = P101 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[0]
NET "sdramInOut<1>" LOC = P102 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[1]
NET "sdramInOut<2>" LOC = P104 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[2]
NET "sdramInOut<3>" LOC = P105 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[3]
NET "sdramInOut<4>" LOC = P7 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[4]
NET "sdramInOut<5>" LOC = P8 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[5]
NET "sdramInOut<6>" LOC = P9 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[6]
NET "sdramInOut<7>" LOC = P10 | IOSTANDARD = LVTTL | SLEW = FAST; # dq[7]
This is setup so that all the signals will pack into the structs defined in the SDRAM controller.
We can add them to mojo_top.luc like below.
output<Sdram.out> sdramOut, // SDRAM outputs
inout<Sdram.inOut> sdramInOut // SDRAM inouts
We now just need to instantiate our modules and hook everything up.
module mojo_top (
input clk, // 50MHz clock
input rst_n, // reset button (active low)
output led [8], // 8 user controllable LEDs
input cclk, // configuration clock, AVR ready when high
output spi_miso, // AVR SPI MISO
input spi_ss, // AVR SPI Slave Select
input spi_mosi, // AVR SPI MOSI
input spi_sck, // AVR SPI Clock
output spi_channel [4], // AVR general purpose pins (used by default to select ADC channel)
input avr_tx, // AVR TX (FPGA RX)
output avr_rx, // AVR RX (FPGA TX)
input avr_rx_busy, // AVR RX buffer full
output<Sdram.out> sdramOut, // SDRAM outputs
inout<Sdram.inOut> sdramInOut // SDRAM inouts
) {
sig rst; // reset signal
sig fclk; // 100MHz clock
// boost clock to 100MHz
clk_wiz clk_wiz;
always {
clk_wiz.CLK_IN = clk; // 50MHz in
fclk = clk_wiz.CLK_OUT; // 100MHz out (it's like magic!)
}
.clk(fclk) {
// The reset conditioner is used to synchronize the reset signal to the FPGA
// clock. This ensures the entire FPGA comes out of reset at the same time.
reset_conditioner reset_cond;
.rst(rst) {
// inouts need to be connected at instantiation and directly to an inout of the module
sdram sdram (.sdramInOut(sdramInOut));
ram_test ram_test;
}
}
always {
reset_cond.in = ~rst_n; // input raw inverted reset signal
rst = reset_cond.out; // conditioned reset
spi_miso = bz; // not using SPI
spi_channel = bzzzz; // not using flags
avr_rx = bz; // not using serial port
led = ram_test.leds; // connect LEDs to ram_test
sdram.memIn = ram_test.memOut; // connect ram_test to controller
ram_test.memIn = sdram.memOut; // connect controller to ram_test
sdramOut = sdram.sdramOut; // connect controller to SDRAM
}
}
You should be able to build your project now. Stack your SDRAM Shield onto your Mojo and load the project! If everything went well, you should see the left-most LED blinking and the other 7 off (no errors). Each time the LED blinks, 32MB of data was written and read back from the SDRAM!