Thursday, April 7, 2011

Soft Core ALU, CLA Instruction

Ok, i finally had time to button up the CLA adder i have been working on for my ALU soft core project. i have been developing a 256bit adder, but a lot of the time spent had to do with the involved test bench and compile structure that i have been working on.

i developed a test bench that will test the four different layers of the CLA individually, all the while providing two different "types" of tests to run.

i'll post the test bench here.

/*
-----------------------------------------------------
               OriginalD Proprietary
                        2011
-----------------------------------------------------
File     : cla_tb.v
Designer : Dustin Brothers 
Date     : April 1, 2011
Abstract : Provides testing for all levels of the CLA
         block.
-----------------------------------------------------
Description    :
   This test bench provides test ability for the three 
to four different levels of the CLA block. Depending 
on the definition of the TEST variable specified when
launching the makefile, will branch the overall test
bench.
   This test bench is self verifying in that it will
print out the status of each test performed after the
testing has been done.

ToDo           :

Known Issues   :

-----------------------------------------------------
Development Log
-----------------------------------------------------
Date     Init     Description
-----------------------------------------------------
*/

`timescale 1ns/100ps

module cla_tb;

// Common Ports
reg cin;
wire cout, pg, gg;

`ifdef DUMP
// Dumpfile
initial
begin
   $dumpfile( "file.dmp" );
   $dumpvars;
end
`endif

/**************** For 4-bit test ***************/
`ifdef CLA4
// CLA4 Specific Ports
reg [3:0] din1, din2;
wire [3:0] dout;

// Task Variable
`define WIDTH 3

// Instance
cla_4bit CLA4(
   .din1(din1),
   .din2(din2),
   .cin(cin),
   .dout(dout),
   .cout(cout),
   .pg(pg),
   .gg(gg)
   );
`endif /********* End 4-bit test **********/

/**************** For 16-bit test ***************/
`ifdef CLA16
// CLA4 Specific Ports
reg [15:0] din1, din2;
wire [15:0] dout;

// Task Variable
`define WIDTH 15 

// Instance
cla_16bit CLA16(
   .din1(din1),
   .din2(din2),
   .cin(cin),
   .dout(dout),
   .cout(cout),
   .pg(pg),
   .gg(gg)
   );
`endif /********* End 16-bit test *********/

/**************** For 64-bit test ***************/
`ifdef CLA64
// CLA4 Specific Ports
reg [63:0] din1, din2;
wire [63:0] dout;

// Task Variable
`define WIDTH 63

// Instance
cla_64bit CLA64(
   .din1(din1),
   .din2(din2),
   .cin(cin),
   .dout(dout),
   .cout(cout),
   .pg(pg),
   .gg(gg)
   );
`endif /********* End 64-bit test *********/

/**************** For 256-bit test ***************/
`ifdef CLA256
// CLA4 Specific Ports
reg [255:0] din1, din2;
wire [255:0] dout;

// Task Variable
`define WIDTH 255 

// Instance
cla_256bit CLA256(
   .din1(din1),
   .din2(din2),
   .cin(cin),
   .dout(dout),
   .cout(cout),
   .pg(pg),
   .gg(gg)
   );
`endif /********* End 256-bit test *********/

// Setup Loop Variables
reg [1:0] carry;                          // Need the extra bit for the loop comparison
reg [`WIDTH+1:0] data1, data2;            // Need the extra bit for the loop comparison

// Simulation
initial
begin
   // Simulation
   $display( "######## Start of SIM #########" );

`ifdef STRAIGHT
   // Loop Through all the awesomeness
   for( carry = 0; carry < 2; carry = carry + 1 )
   begin
      #5
      cin <= carry[0];
      for( data1 = 0; data1 < 2**(`WIDTH+1); data1 = data1 + 1 )
      begin
         din1 <= data1[`WIDTH:0];
         for( data2 = 0; data2 < 2**(`WIDTH+1); data2 = data2 + 1 )
         begin
            din2 <= data2[`WIDTH:0];
            #1 testInOut( din1, din2, dout, cin, cout );
         end
      end
   end
`else
   cin <= 1'b0;
   din1 <= 'h0;
   din2 <= 'h1;
   #1 testInOut( din1, din2, dout, cin, cout );

   #5
   din1 <= 2**((`WIDTH+1)/2);             // Go half way
   din2 <= 2**((`WIDTH+1)/4);             // Go quarter way
   #1 testInOut( din1, din2, dout, cin, cout );

   #5
   din1 <= 2**((`WIDTH+1)/3);
   din2 <= 2**((`WIDTH+1)/8);
   #1 testInOut( din1, din2, dout, cin, cout );

   #5
   din1 <= 2**((`WIDTH+1)/8);
   din2 <= 2**(`WIDTH+1) - 1;     
   #1 testInOut( din1, din2, dout, cin, cout );

   cin <= 1'b1;
   din1 <= 'h0;
   din2 <= 'h1;
   #1 testInOut( din1, din2, dout, cin, cout );

   #5
   din1 <= 2**((`WIDTH+1)/2);             // Go half way
   din2 <= 2**((`WIDTH+1)/4);             // Go quarter way
   #1 testInOut( din1, din2, dout, cin, cout );

   #5
   din1 <= 2**((`WIDTH+1)/3);
   din2 <= 2**((`WIDTH+1)/8);
   #1 testInOut( din1, din2, dout, cin, cout );

   #5
   din1 <= 2**((`WIDTH+1)/8);
   din2 <= 2**(`WIDTH+1) - 1;     
   #1 testInOut( din1, din2, dout, cin, cout );
`endif

   #100
   $display( "######### End of SIM ##########" );
   $finish;
end


// Test Function
task testInOut;
   input [`WIDTH:0] in1, in2, out1;
   input c1, c2;

begin
   if( (out1 != (in1+in2+c1)) || (((in1+in2+c1) > (2**(`WIDTH+1)-1)) && c2 != 1) || (((in1+in2+c1) <= (2**(`WIDTH+1)-1)) && c2 != 0) )
      $display( "In1: %h, In2: %h, CarryIn: %h, CarryOut: %h, Out: %h, Exp: %h -- Fail", in1, in2, c1, c2, out1, (in1+in2+c1) );
   else
      $display( "In1: %h, In2: %h, CarryIn: %h, CarryOut: %h, Out: %h, Exp: %h -- Pass", in1, in2, c1, c2, out1, (in1+in2+c1) );
end

endtask

endmodule

Basically i realized that i'd be either creating four test benches that are almost exactly the same, or i could branch one test bench, make it kind of monolithic and provide a very dynamic testing environment.

i like "self verifying" test benches. Do you know what i mean by that? i mean that the test bench is so self contained, that it tests the logic, knows what the result should be, and compares the output to that it expects. Can this become involved? Absolutely! Is this a very advanced way to write a test bench? Absolutely! Is it a good way to write a test bench? Absolutely! When i run "make sim TEST=CLA256" my test bench runs through several steps, verifying not only the data out but also the carry and tells me "Pass" or "Fail". i dont even have to open a simulator window!

i know, i know this can be dangerous. i agree! Careful not to get too focused on the pass/fail printouts as they can bite you. That's also why (if you noticed in my test bench) i print out the values as well as a pass/fail flag. It allows the designer to review the results.

Yes this code says "proprietary" but please feel free to glean any concepts from it that you like!

Here's the list of development:
CLA - Done
Subtractor
Multiplier
Divider
FPU Adder
FPU Subtractor
FPU Multiplier
FPU Divider

Oh, with the current state, my CLA can perform the 256bit addition in one clock cycle up to 111MHz assuming a 1ns gate delay at 2um process size. Reference Xilinx XAPP120 v2.0 document.

0 comments: