As I blakecoin the miner did

As I blakecoin the miner did  
I do not know how anyone, but I was shocked by the rapid rise of bitcoin in 2017. Now, of course, the excitement has already left, and in the 17th year, about crypto-currencies they talked and wrote to everyone who is not lazy.
I saw that people are trying to earn on crypto-currencies. Who knows how. Someone bought up video cards for all the savings and started independently to mine in the garage. Someone was investing in cloud mining. Someone is trying to organize their pool. Someone has started producing chocolate bitcoins, and some are producing mineral water:
I also began to study what exactly these bitcoins are. Once I even started my own investigation of the SHA256 algorithm and wrote an article here on the " " hub. Is it possible to calculate bitcoins faster, easier or easier? ". My researches of hashing algorithms are still going on and have not been completed yet Maybe someday I'll write a separate article about this. And now while this is and adapted it to the available board. Mars rover3 , and also adapted this project for DE10-Standard.
Actually about how I adapted The project for the Mars rover 3 is written here . For Cyclone V, in principle, everything is the same - only revision of the project of the quartus blake_cv, my sources are:
To my regret, I have only three hashes of the blake function in my Cyclone V.
Slightly there is not enough capacity of FPGAs up to four hashes. I run the project at a frequency of 120MHz and for one clock cycle of working, I calculate one hash of blake. Means the performance of my project 120 * 3 = 360MH /sec. Not very much to be honest, however, as I said, I already had a fee, and I do not need to return its cost Quartus still says that Fmax = 150MHz. You can try to raise the frequency, but I'm afraid I'll have to put a cooler, it'll be buzzing - well, not so much I need these crypts, so that there's still a buzz in the room.
The general idea of ​​the project is this: the board has a chip that has both FPGA and Dual-ARM:
When the board starts, the first thing the U-BOOT is loaded with is a FPGA, then Linux starts and there is a cgminer mining program in it. At first I thought that I could create a virtual communication channel between ARM and FPGA, and this is actually possible, but it did not work out that way. The fact is that the cgminer Miner program works with hardware miners via USB and uses the library libusb. That is, it's easier for me to connect a FPGA to a Linux system via a USB-COM converter on an FTDI than to fence a city with a FPGA connecting to an ARM bus. I already somehow did this and it was not very easy .
Now my "miner" looks like this (on Cyclone V put the radiator on the thermal paste, and then it gets very hot):
To tell the truth, the main problems at me just did not arise with the FPGA project, but with cgminer.
The problems are as follows:
1) Which cgminer to take as a basis for its development? And the related question "Where to connect to start the mine?". And what is the connection between these questions? It would seem, where is the problem - take the freshest cgminer you find. But allow: on github there are 98 forks of the cgminer program. All of them are somewhat different, what is good, and what is bad, what kind of worker is there at all? Here's to you and openSource. Each author added something to himself and corrected, or broke or made his own coin. To understand is not easy. I found the site for myself. , where on one page there is a link to the github project and on the github project for FPGA . That is, these two projects apparently somehow can and should overlap.
2) Since I took as a basis FPGA project from the author of kramble, in fact, of course, it would be logical to take his patches, which he attached to his project. But here it is not without problems. It has patches to the program cgminer-??? and cgminer-???. I decided that it's better to take one that is newer than 3.4.? but only lost time with it. It seems that the author began to adapt for this version, but something did not finish there and this version is quite raw. I had to take ??? and this seems generally an old-fashioned version.
3) Authors changing the program cgminer in their forks for their altcoyns do not follow the correctness of comments and the naming of functions in the code. Often in the code here and there there is a word bitcoin, and this fork cgminer already seems not to be considered for bitcoin, but can only be in altcoyin.
4) Tests. WHERE TESTS? I do not understand something, how can I make a complex product without tests? I did not find them.
To say the truth, even starting to do something was not easy. Imagine that you need to run some project in FPGA, but it's not very clear what it should do, how to receive data, what data and in what form it is necessary to produce the result. This FPGA project must be accompanied by some program that is not known exactly where to get it, but it must detect the miner's card, send something there (it is not known what) and receive something from it. In what format, what blocks, how often - nothing is known.
In fact, when I study patches of cgminer from kramble, I can pretty much imagine how it should work.
In the file usbutils.c there are devices that can be considered as hardware external miners on the USB bus:
static struct usb_find_devices find_dev[]= {
#ifdef USE_BFLSC
.drv = DRV_BFLSC,
.name = "BAS",
.ident = IDENT_BAS,
.idVendor = IDVENDOR_FTDI,
.idProduct = 0x601?
//.iManufacturer = "Butterfly Labs",
. iProduct = "BitFORCE SHA256 SC",
.kernel = ?
.config = ?
.interface = ?
.timeout = BFLSC_TIMEOUT_MS,
.latency = LATENCY_STD,
.epcount = ARRAY_SIZE (bas_eps),
.eps = bas_eps},
.drv = DRV_ICARUS,
.name = "BLT",
.ident = IDENT_BLT,
.idVendor = IDVENDOR_FTDI,
.idProduct = 0x601?
//.iProduct = "Dual RS232-HS",
.iProduct = "USB <-> Serial Cable",
.kernel = ?
.config = ?
.interface = ?
.latency = LATENCY_STD,
.epcount = ARRAY_SIZE (ftdi2232h_eps),
.eps = ftdi2232h_eps},

I added the descriptor of my USB-to-COM converter FTDI-2232H to this structure. Now if cgminer detects a device with VendorId /DeviceId = 0x0403: 0x601? then it will try to work with this device, as with the Icarus card, although it is not.
Further we look file driver-icarus.c and here there is a function icarus_detect_one:
static bool icarus_detect_one (struct libusb_device * dev, struct usb_find_devices * found)
int this_option_offset = ++ option_offset;
struct ICARUS_INFO * info;
struct timeval tv_start, tv_finish;
/* Blakecoin detection hash
N.B. golden_ob MUST take less time to calculate than the timeout set in icarus_open ()
{midstate, data} = {256'h553bf521cf6f816d21b2e3c660f29469f8b6ae935291176ef5dda6fe442ca6e? 96'hd1d9011caafb56522d4278bf};
* /
const char golden_ob[]=
const char golden_nonce[]= "0142b9b1"; //"000187a2";
const uint32_t golden_nonce_val = 0x0142b9b1; //0x000187a2;
unsigned char ob_bin[64];
unsigned char nonce_bin[ICARUS_READ_SIZE];
char * nonce_hex;
int baud, uninitialised_var (work_division), uninitialised_var (fpga_count);
struct cgpu_info * icarus;
int ret, err, amount, tries;
bool ok;
char tmpbuf[256]; //lancelot52
unsigned char * wr_buf = ob_bin;
int bufLen = sizeof (ob_bin);
icarus = usb_alloc_cgpu (& icarus_drv, 1);
if (! usb_init (icarus, dev, found))
goto shin;
usb_buffer_enable (icarus);
get_options (this_option_offset, icarus, & baud, & work_division, & fpga_count);
hex2bin (ob_bin, golden_ob, sizeof (ob_bin));
tries = 2;
ok = false;
while (! ok && tries--> 0) {
icarus_initialise (icarus, baud);
err = usb_write_ica (icarus, (char *) wr_buf, bufLen, & amount, C_SENDTESTWORK);
if (err! = LIBUSB_SUCCESS || amount! = bufLen)
memset (nonce_bin, ? sizeof (nonce_bin));
ret = icarus_get_nonce (icarus, nonce_bin, & tv_start, & tv_finish, NULL, 500);

Sense of this. The program passes a knowingly known task to search for a hash, and in the task it is told with which nonse to start the calculation and this nonse is slightly less than the present GOLDEN nonce. Thus, the board starts counting from the specified location and literally immediately in a fraction of a second stumbles upon GOLDEN nonce and returns it. The program will immediately receive this result, compare it with the correct answer and immediately becomes clear - this is really the HW miner with which you can work or not.
And here was a terrible problem - in the project there are patches in the C language, there is a test program on the python and testbench for the FPGA.
In patches on C, the test data looks like this:
1) patch for cgminer-???
const char golden_ob[]=
const char golden_nonce[]= "00468bb4";
const uint32_t golden_nonce_val = 0x00468bb4;

1) patch for cgminer-???
const char golden_ob[]=
const char golden_nonce[]= "000187a2";
const uint32_t golden_nonce_val = 0x000187a2;

And what is right and what is not? The initial data is the same, and the golden nonce is declared different !!! Paradox (I will say in advance that in the patch for cgminer-??? error - nonx 0x000187a2 is not correct, but how much time I spent on it )
In the project there is a test program on a python that reads a text file, extracts data from it and sends it to the card via a serial port There the test data are like this:

Well, that is completely different!
Then I realized that these are not the data that are sent to the board, only data is extracted from them, they are converted into a task in a special way and sent to the board.
But all the same, among these test data for the program on the python NO tasks similar to those described in the program in C !!!
Well, then I watch the test-testbench program on verilog:
    blakeminer # (. comm_clk_frequency (comm_clk_frequency)) uut
(clk, RxD, TxD, led, extminer_rxd, extminer_txd, dip, TMP_SCL, TMP_SDA, TMP_ALERT);
//TEST DATA (diff = 1) NB target, nonce, data, midstate (shifted from the msb /left end) - GENESIS BLOCK
reg[415:0]data = 416'h000007ffffbd9207ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b;
//ALSO test starting at -1 and -2 nonce to check for timing issues
//reg[415:0]data = 416'h000007ffffbd9206ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b;
//reg[415:0]data = 416'h000007ffffbd9205ffff001e11f35052d554469e3171e6831d493f45254964259bc31bade1b5bb1ae3c327bc54073d19f0ea633b;
reg serial_send = 0;
wire serial_busy;
reg[31:0]data_32 = 0;
reg[31:0]start_cycle = 0;
serial_transmit # (. comm_clk_frequency (comm_clk_frequency), .baud_rate (baud_rate)) sertx (.clk (clk), .TxD (RxD), .send (serial_send), .busy (serial_busy), .word (data_32));

There is a supposed data packet that the board must accept. But again, this supposed data packet does not at all resemble the data packet in the C program or the data for the test program on the python.
Here is the lack of common test data for the program on the python, C and Verilog very much spoils the picture. It turns out that there are no common points of contact between the components, common tests, and this is sad.
In general, the verifier of the blakecoin miner project concealed another form of mockery of my body.
If you run a simulation project with a verilog testbench, then in the simulator with these test data 416'h000007ffffbd9207ffff001e11f35052d5544 the result of GOLDEN nonce is remarkably located and returned.
Then I compile the project for a real FPGA board, I submit the same data from the program on the python and the board does not find GOLDEN nonce
It turns out that the test data in verilog testbench is "a bit bad". They are for low complexity, when there are only 24 leading zeroes in the resulting hash, and not 32 as required.
In the file experimental /LX150-FourPiped /BLAKE_CORE_FOURPIPED.v there is this code
    reg gn_match_d = 1'b0;
always @ (posedge clk)
`ifndef SIM
gn_match_d <= (IV7 ^ b76 ^ d74) == 0;
gn_match_d <= (IV7[23:0]^ b76[23:0]^ d74[23:0]) == 0;

Verilog simulator is not tested as it will work in hardware!
That is, for a real FPGA board we will check for 32 bits of leading zeros, and in the simulation we will only check 24 bits. It's just lovely I want to beat the author.
Of course, I won all this. At least, the test program on the python gives up vigorous messages:
Okay, what's the result? How many told?
Unfortunately not at all.
As soon as I was ready to start mine, at the end of January the complexity of the blind increased greatly:
Now I could leave a fee for a day and she even found solutions, but they were not accepted by the pool - there are still few leading zeros.
I tried switching to another currency - VCASH. With this currency, the pool at least occasionally gave me invigorating messages like this:
But all the same and VCASH pool does not charge anything. So sorrow.
I would like to take this opportunity to ask the knowledgeable people
Here I have a video card Nvidia 1060. It issues ???GHash /sec on a bluikoin and for an hour two to three times it issues a nonce, which takes a pool (and pays a pretty penny). I thought that if my FPGA board considers 360MHash /sec, well, that is about 3 times worse than a graphics card, then I'll get at least one nonse received by the pool in two hours. However, this does not happen. Even for a day there is not a single penny Where is the catch for me so there remains a mystery
Now I'm trying to understand at leisure whether it is possible to somehow optimize the existing FPGA project, let's say use the built-in memory or something else. Maybe, if I'm lucky, I'll think of something.
+ 0 -

Add comment