Lab 6: Distributed Arithmetic Solution

Starting from:

~~$35~~

$29

Home

• Introduction

In this lab, we will be designing signed Distributed Arithmetic (DA). DA is an important FPGA technology and is extensively used in computing the sum of products without using a multiplier:
N∑−1
y = c, x =c[n] × x[n]
(1)
n=0

Convolution, correlation, matrix multiplication can all be formulated as such form of dot prod-ucts. The computation of DA usually takes N cycles (given the bitwidth of inputs is N bit). A prerequisite for a DA design is that the filter coefficients c[n] shoule be known a priori.

A (B + 1)-bit signed number can be represented as:

B−1

x[n] = −2B × xB[n] +xb[n] × 2b with xb[n] ∈ {0, 1},
(2)
∑

b=0

where xb[n] denotes the bth bit of x[n], and xB[n] the sign bit. Combining Eq. 1 and 2 together,

we have:
∑

∑b

y = N−1
c[n] (−2B × xB[n] + B−1 xb[n]2b)

n=0

=0

= c[0]
−xB[0]2B + xB−1[0]2B−1 + · · · + x0[0]20

+
c[1]

x
B
[1]2B + x
B

1
[1]2B−1
+

+ x
[1]20

(
−

−

· · ·

0

)
)

(

.
.
.
( )

+ c[N − 1] −xB[N − 1]2B + xB−1[N − 1]2B−1 + · · · + x0[N − 1]20

+ (c[0]xB−1[0] + c[1]xB−1[1] + · · · + c[N − 1]xB−1
.
.
.

+ (c[0]x0[0] + c[1]x0[1] + · · · + c[N − 1]x0[N − 1])20

1

Therefore,
c[n] × xB[n])
∑b

∑
c[n] × xb[n])

∑

y = −2B (N−1

+ B−1
2b
(N−1

(3)
n=0
∑
=0

n=0

N−1
c[n] × xb[n] is just the sum of either taking c[n]
Since xb[n] ∈ {0, 1}, b ∈ [0 . . . B], the SOP

n=0

or not. Therefore, we can make a LUT/ROM where each entry is a different combination of c[0] to c[N − 1].

The data flow (Architecture) of DA looks like Figure 1, and the input, output ports are showed in Figure 2.

• clk => Clock signal. We use rising edge for flops.

• rst => Reset signal. Note that the sequential elements that are used in your design should have synchronous active high reset. i.e. if the reset is pulled high, the flops should be reset at next rising edge of clk.
• in_valid => in_valid indicates the input data are valid. There are 4 input vectors for each valid set.

• The computation to be done is:

data_out = data_in_0 * coef_0 + data_in_1 * coef_1

+ data_in_2 * coef_2 + data_in_3 * coef_3,

where all data_in_* and coef_* are 4-bit signed numbers. The coefficients are as follows:

coef_0 = 7, coef_1 = 3, coef_2 = -8, coef_3 = -5

• The next_in indicates whether the the design is ready to take the next input data. If next_in is high, then it is ready. Since your design may take multiple cycles to finish the computation of one valid set of inputs, it is your responsibility to handle the next_in. From the view of the testbench, it will hold a valid set of input data when it sees next_in is logic low. When it sees next_in is logic high, at next cycle it will update the input with a new set of data (either valid or invalid).

• out_valid => valid signal for the output. When out_valid = 1, the outputs data_out are valid.
• For simplicity, there is no ready signal next_out on the output side of the design. That means there is no back pressure from the testbench. The testbench will only sample the valid output data and write them into the output.txt file.
• The controller will select which group of bits as address to look up the LUT.

• Note that Figure 1 may be incomplete. It is up to your design whether and where to add additional pipeline registers.

The LUT/ROM can be implemented in various writing styles:

• Use case statement.

2

Controller

Optional pipeline register

data_in_0

XB[0]
…
X1[0]
X0[0]

data_in_1

XB[1]
…
X1[1]
X0[1]

…

……

data_in_(N-1)

XB[N-1]
…
X1[N-1]
X0[N-1]

in_valid

next_in

LUT

out_valid
2b
Register
data_out
+/-

Figure 1 Data flow of a signed DA system

data_in_0

data_in_1

data_in_2

data_in_3

in_valid

next_in

da.vhd

clk rst

data_out

out_valid

Figure 2 Input, output ports of the design

• Use constant rom: array (0 to x) of type := (xxxx, xxxx, xxxx, etc);. In this case, initial value is necessary.
• Use hard-coded wires similar to Lab 5.

The grading will take into consideration the functional correctness, utilization (area), timing and the power of the design. Three re-submissions are allowed for this lab.

• Instructions

1. Before writing RTL code, think carefully how you are going to complete your data flow architecture and figure out the possible timing diagrams of your design.

2. Think of the question listed in Section 3 if you can update your design in an area-efficient way.
3. Complete your design in only ONE file: da.vhd. NOTE: You are NOT allowed to use multiplication in this lab, which means you cannot use assignments like a <= b * c;.

4. Run the simulation for da.vhd using the provided testbench tb_da.vhd to match your output file output.txt with the reference output_ref.txt.

3
5. Run the synthesis and implementation of your design da.vhd in Vivado using the provided constraint file constraints.xdc.

• Deliverables

1. Create a PDF report containing:

▪ Tables of

(a) Resources (Only “Slice Logic”, “IO and GT Specific”, “Primitives”) from da_utilization_placed.rpt
(b) Power (Dynamic and Static) from da_power_routed.rpt

(c) Worst Negative Slack (“Design Timing Summary”) from da_timing_summary_routed.rpt
▪ (20% of grade) Answer the question: Is there an area-efficient way to do the left-shift which is after the output of LUT?

(a) (5%) If yes, draw the updated block diagram starting from the LUT on the left to the output data_out on the right.
(b) (5%) Write down the mathematical recurrence relation induced from Eq. 3 that proves your thought above.
(c) (10%) Why the original left-shift is not a good idea and why the updated one is area-efficient? (Answer should not exceed 4 sentences)

The name of the PDF should be of the form lab6_firstname_lastname.pdf , e.g.

lab6_george_burdell.pdf .

2. (70% of grade) The ONLY completed design file da.vhd . The 70% grading will be based on:

◦ (50%) Functional correctness of your design.

◦ (20%) Sampling period and area of your design.

◦ (extra 5% within a total of 100%) If implemented in an area-efficient left-shift way, extra 5% will be given.

3. The simulated output file output.txt in the “run” folder.

4. The simulated output file output_cycle.txt in the “run” folder.

5. (10% of grade) The implemented area (utilization), power and timing reports, namely, da_utilization_placed.rpt
da_power_routed.rpt

da_timing_summary_routed.rpt

4
Move all of these files into a folder called lab6_firstname_lastname_gtID . Zip the folder and upload the archive lab6_firstname_lastname_gtID.zip on T-Square (eg. lab6_george_burdell_123456789.zip ). The first name and last name should be all in lower

case. Please strictly follow this naming convention. Otherwise my script will not work and you might get points deduction.

Note: Late submissions are not accepted. In case of extraordinary circumstances, writ-ten permission must be obtained from Dr. Madisetti.

5