Well, I finished the last item on my checklist for the initial datapath design. All of the Zigbee data features aren't implemented, such as multi-casting, binding, or endpoint grouping, but data can now be transmitted, received, routed, and sent indirectly. It's a start, and enough for a proof of concept on the design direction.

In my opinion, I'm quite satisfied with how the design is turning out. Contiki has been a real life-saver. Not just the OS, but I've discovered that there are a lot of code gems that are hiding inside. I've re-written most of my tables to use the LIST functions inside Contiki. Those are a library of generic linked list functions, and since they are already being used by the Contiki OS, my use of them to implement some of the queues and tables are basically free (in terms of code space).

Another one I liked was the callback timer. This is a hidden little gem inside the Rime stack that implements two functions in one: a timer and a callback. Once the timer expires, it will call the function of your choice and even send in data to it. Basically, it rocks! I rewrote the whole network layer timing functions to take advantage of this beautiful library. It got rid of one process and a whole lot of code. I also used it in the MAC layer and will probably modify the APS layer to use it as well. 

These two libraries, along with the Contiki process handling, saved me a lot of effort that would have probably extended both the design and debugging time of this project by a factor of 2 to 4.

Another thing that saved my ass multiple times is my self-checking tests. Sure it was a pain to write the tests, and to write the self-checking framework, and the expected values, but it caught so many bugs that it proved to me how much of an idiot I was. I made some of the stupidest mistakes in the world, and I probably wouldn't have caught them without it. A good example is the off-by-one bug I found recently. The self-checker caught a problem where the data length was one byte longer than it should be. It turned out that it was caused by a typo. Ha ha ha. I almost cried. But without the tests, it would have never gotten caught. And those off-by-one bugs really suck to debug in real hardware. 

Anyways, I'm satisifed with the data path in its limited current state, and will proceed on to the MAC and NWK management functions. But first, I'm gonna take a day or two away from this stack. It's become too obsessive for me, and I want to enjoy some of my new free time (I converted to part-time work at my job so I could spend more time with my wife, dog, and this fucking stack). 


I've spent the past couple of days working on the data path in the MAC layer and it wasn't as easy as I expected it to be. I think the toughest part was making some hard decisions, literally. The MAC layer is where the line between hardware and software starts to get blurry. Since its so close to the hardware layer, you start running into issues where chip architecture starts to matter. Here is an example.

Many chips have a feature called auto-ACK. It's a nice feature because if the ACK request bit is set in the MAC frame, then the chip will automatically detect it and send an ACK back to the node that issued the frame. Easy, right? But here's the rub. IEEE 802.15.4 has a feature called indirect transfers. It's used for devices that sleep a lot, like battery-powered end devices. A router would basically buffer a frame whose destination is for the sleepy end device. When the device wakes up, it will send a data request to the router to see if there were any messages for it. If the router has a message for it, it will indicate it in the ACK to the data request by setting the "frame pending" bit.

This screws up the whole auto-ack thing because now, you have two types of ACKs: a regular one which just acknowledges that the device received the frame, and a special one which indicates that you received the data request and that you have a frame pending for that device.

It was pointed out to me that my introduction in the About button is woefully inadequate. Let me start over and introduce myself.

My name is Chris Wang, aka Akiba. Akiba is slang for Akihabara in Tokyo which is nicknamed Electric City. Its also Japanese slang for geek (otaku).

My current (up to last week) occupation was as part of the sales team (I can already hear the eyes rolling) at a semiconductor company, and my specialty was in USB. I recently quit so I could spend more time with my family and try to finish the Zigbee project. Now let me tell you a bit about my background.

I didn't actually start my career out as an engineer. I was originally a professional dancer and toured with artists doing dance videos and concerts. My specialty at that time was breakdancing, locking, and new-school hip hop dance styles. At that time, I had a dance crew and we would always be together, either practicing or performing at gigs. That crew was called "Freaks of Nature", hence the name FreakLabs.

Unfortunately, as we got more known, people from the crew got recruited to become artists in Taiwan (since we were mostly Chinese). The girl became part of a pop group called Babes, two of the guys went on to form/become Machi , and another is part of a group called F4 . That basically left me alone back in Southern California with nobody to dance or do gigs with. In case you were wondering, I did get one record contract, but my mom made me turn it down and go back to school.

Tired of the destitute life of a dancer, I went back to the university to finish my degree in Applied Physics. Apparently, it was even harder to get a job as a physicist than as a dancer, since you needed a PhD to really get anywhere. Luckily, applied physics carried over easily into electrical engineering, so I took two more classes and was able to squeeze out an EE degree as well.

My first job out of school was doing FPGA design at a startup. That was back when you still had to cobble all the tools together in the Xilinx Foundation package and squeeze your code to fit them into an XC40xx FPGA. I would compile the designs on a Pentium 100 and one compile run would take six hours to a day depending on the version of the Xilinx software. At that time, the state of FPGAs was horribly buggy. It doesn't seem like it's much different today.

Found a nasty bug today. I've been working on the MAC layer recently and stumbled across this one by mistake. Currently, when a frame is received, it will throw it into a queue and post a receive event to the MAC layer. The MAC layer will then process the frame. However, I made a mistake when I was writing one of my tests and accidentally generated two receive interrupts sequentially. Only one frame was taken out of the queue though and the second one was lost.

It turns out that after an event is processed, it will be cleared which means that the second event also got cleared from the event queue. Anyways, I modified the code to process frames until the buffer is empty in case two or more frames arrive back-to-back. It fixed the test and it logically removes the bug as well. Glad I took the time to set up a test environment and write a bunch of tests on the stack. That bug would have been at least a few days to a week to debug in actual hardware.

Now that I'm approximately into the sixth week of the FreakZ stack development, thought I'd provide everyone with a rough status update.

I'm basically finished with the NWK layer data path and the APS data path was also implemented, albeit without the group and binding tables. In my opinion, I think the data path is a major portion of the stack, at least the most important. It provides the majority of the functionality of the stack, with the rest being network management functions such as scanning for new networks, joining devices to the networks, removing them, etc. The data path is not fully implemented in each of the layers, but the basic functionality is there. The APS layer can handle ACK'd and non-ACK'd transmissions, the NWK layer can handle data routing and broadcasting, and the MAC can already handle unicasting and broadcasting although more work needs to be done on this layer.

This all being said, I did a sizing on GCC, currently using the x86 port. I don't know how much it will change if you try a different target, but at least this can give you a rough idea:

Code size: ~16k
RAM size: ~3.2k

These sizes are including a stripped down version of the Contiki OS, which is already quite small. The stack is taking up about 13-14k and Contiki is roughly 3k. The main contributor to RAM usage is the buffer pool and the numerous tables that make up the Zigbee stack. I'm currently using six frame buffers which take up about 1k  of RAM and most of my tables hold about ten entries. It's possible to reduce the frame buffers to about three (saving about 600 bytes of RAM), although you may get more dropped frames. You can even go down to two frame buffers if you feel adventurous.

Man. I spent the last three days learning how to use Flash so I could create this simple tutorial. You'd think that it would be easy to create a simple interactive thing like this, but it was actually extremely painful. You can't just draw the symbols and move things around. You actually need to program in a language that Adobe (MacroMedia) made called ActionScript. The programming part is no problem, but searching the massive documentation to find the right method to do what I wanted caused me to pound a large hole in the wall with my head. I still don't know why it requires a zillion languages just to do some stuff on the web.

Anyways, this is something that I've been wanting to do for a long time. Trying to explain Zigbee mesh routing using just words is very difficult, and has the tendency to cause people's eyes to glaze over. With an interactive simulation, you can do a play-by-play analysis of what's going on which hopefully will make it easier to see the flow of events that makes up the AODV routing algorithm. I originally had the idea when I was studying the routing algorithm myself. It's almost impossible to understand what's going on by reading the Zigbee spec. That's like trying to understand how to use Linux by reading the source code. You need to be a real masochist. 

There weren't many good tutorials on the web for AODV mesh routing, either. So to save the people the same pain I had to go through in studying it, I created this little applet. To move forward one frame, just click anywhere on the graphic. To move back, press the space bar. Flash doesn't have a handler for the right-click button, because Apple users usually only have one mouse button. You can't tell who will be using the Flash file since it's accessed inside a web browser. You'll need to have Adobe Flash v9 installed (or higher depending on when you're reading this) in order to use it. Otherwise, it will probably just give you a bad graphic, no graphic, or it will cause your computer to explode.

I'm also planning to try and make other tutorials similar to this for tree routing, source routing, and possibly some other aspects of Zigbee. I think its much easier to learn visually than reading that monster document. Without further ado, here's the tutorial. Hope you like it. If there are any problems, leave a comment for me...especially if you find bugs. 

I've been working on the NWK layer broadcasts all week. When I first started it, I thought it wouldn't be too difficult. I mean broadcasts are just frames that you send out with a broadcast address, right?

Unfortunately, that didn't turn out to be the case. There's this nasty little problem with broadcasts on wireless networks, where if you mishandle it, you end up with an infinite broadcast loop. I had nightmares of generating broadcast storms that would spread from node to node and never end. The problem is that broadcasts are quite sensitive to the stop conditions.

When you receive a broadcast from a neighbor node, you can't just forward it. You need to know more info about it such as:
1) is it a new broadcast?
2) is it broadcast that you previously transmitted and that neighbor node is forwarding it back to you?
3) is it a new broadcast, but has the same identifier as the previous one?

Passive ACK
These are just some of the questions that come up when you handle these things, and the wrong response could make you end up in an infinite broadcast loop. Zigbee takes care of these situations by requiring the use of a broadcast transaction table (BTT). Each entry in the BTT consists of at least the address of the node that forwarded the broadcast and the broadcast identifier. Once you implement the BTT, you then need to create an entry for each broadcast that you receive and keep a record of which device sent it. That way, you can filter new broadcasts from ones that you have already forwarded. They call this a "passive ack" system, since you're not allowed to use the 802.15.4 ACK on broadcast frames.