Joined: 04 Nov 2006 Posts: 89 Location: The Dalles, Oregon USA
Posted: Sun Sep 20, 2009 8:06 Post subject: nvram show from cfe trunkates output and locks up router
While trying to troubleshoot wl500w bricking issues, i ran into an interesting situation. (12533 mega)
In brief, nvram show from cfe not only causes the router to stop responding and require a reboot (after some output), but is also massively truncated versus nvram show from ssh console.
I am particularly interested in seeing someone do the following with both another asus wl500w as well as some other make/model of router.
Can someone with serial console access to their router do the following for me?
-Obtain serial access
-Break startup by holding cntl+c while plugging in router until ^c characters start showing on screen.
-run nvram show.
-Take note of last couple of variables displayed and/or (preferably) copy entire output to file
-Did the router stop responding to keyboard input?
-Hard reboot the router (if locked up, otherwise 'reboot' command)
-log in via ssh or telnet
- run nvram show from linux prompt
- take note of the last couple of variables shown and/or (preferably) copy output to file
Then
-- Compare two files / last variables(cfe vs linux console nvram show)
-- Do they match?
I don't *think* that nvram show in cfe should crash the router or cfe, but i am not sure.
I can't help but wonder if there is some CFE or other system bug that occurs and is causing the corrupted nvram and thus bricking.
Joined: 04 Nov 2006 Posts: 89 Location: The Dalles, Oregon USA
Posted: Wed Sep 23, 2009 5:38 Post subject:
Looking through the source for the router, I think I may have found that there could be a hard limit for the nvram show length depending on a set of variables set during compile by Asus... (if asus tweaked the variables in the Broadcom code at all)
First, lets look at the code that you are actually calling form the command line when you run nvram show
From GPL_WL_500W_2006/WL500W/src/cfe/cfe/arch/mips/board/bcm947xx/src/
So, for the nvram 'show' command, it appears the code checks if there is enough memory to buffer the output nvram, and if so it calls nvram_getall including buf and NVRAM_SPACE.
Once it has retrieved nvram using that function, it cycles through formatting and printing it with the for statement.
This leads to 2 questions...
1. What is the definition of nvram_getall and
2. Where is NVRAM_SPACE defined?
Ok, so found part of the nvram_getall here:
GPL_WL_500W_2006/WL500W/src/include/bcmnvram.h
Code:
/*
* Get all NVRAM variables (format name=value\0 ... \0\0).
* @param buf buffer to store variables
* @param count size of buffer in bytes
* @return 0 on success and errno on failure
*/
extern int nvram_getall(char *nvram_buf, int count);
/*
* returns the crc value of the nvram
* @param nvh nvram header pointer
*/
uint8 nvram_calc_crc(struct nvram_header * nvh);
#endif /* _LANGUAGE_ASSEMBLY */
/* The NVRAM version number stored as an NVRAM variable */
#define NVRAM_SOFTWARE_VERSION "1"
#define NVRAM_CRC_START_POSITION 9 /* magic, len, crc8 to be skipped */
#define NVRAM_CRC_VER_MASK 0xffffff00 /* for crc_ver_init */
#ifdef __cplusplus
}
#endif
#endif /* _bcmnvram_h_ */
Notice the hard-coded variable here of #define NVRAM_SPACE=0x8000 ... i have no idea what is going on here as there doesn't appear to be any action on the count variable or the *nvram_buf pointer.
How much space does that actually consist of? Well, 8000 hex is 32768... but does that tell us anything..? Is it Bits? Bytes?
Ah, in another place we seem to have the definition of nvram_getall which actually takes action on the variables ( think the two definitions work together in some way):
GPL_WL_500W_2006/WL500W/src/linux/linux/arch/mips/brcm-boards/bcm947xx/nvram_linux.c
Code:
nvram_getall(char *buf, int count)
{
unsigned long flags;
int ret;
spin_lock_irqsave(&nvram_lock, flags);
ret = _nvram_getall(buf, count);
spin_unlock_irqrestore(&nvram_lock, flags);
return ret;
}
which calls _nvram_getall in GPL_WL_500W_2006/WL500W/src/shared/nvram.c....
Code:
_nvram_getall(char *buf, int count)
{
uint i;
struct nvram_tuple *t;
int len = 0;
bzero(buf, count);
/* Write name=value\0 ... \0\0 */
for (i = 0; i < ARRAYSIZE(nvram_hash); i++) {
for (t = nvram_hash[i]; t; t = t->next) {
if ((count - len) > (strlen(t->name) + 1 + strlen(t->value) + 1))
len += sprintf(buf + len, "%s=%s", t->name, t->value) + 1;
else
break;
}
}
return 0;
}
Ok, so if you follow through these, "NVRAM_SPACE" appears to be initially hard-set by the #define statement (bcmnvram.h) then passed into nvram_getall (nvram_linux.c) as variable 'count', and passed again to _nvram_getall (nvram.c), where nvram is actually iterated through.
I do not quite understand how theh _nvram_getall is using NVRAM_SPACE, but it does appear to use it for the comparison as to when to stop iterating through nvram.
But what is the NVRAM_SPACE variable actually set to? Is it set statically to 0x8000 by the above include file?
A bit more looking for how the variable was used, and found this little gem...
What this tells us is that the output from nvram show designating free space is a calculation made using that same NVRAM_SPACE variable.
From the dd-wrt command prompt, nvram show works fine on the Asus wl500w-- and provides a number at the end designating some free space 'left', and it even adds up to 32768, which is the above set variable.... answering the question that 0x8000 is number of bytes.
Code:
size: 29709 bytes (3059 left)
Does dd-wrt rely on CFE to call nvram functions? I wouldn't think it does as CFE is a bootloader...
If dd-wrt doesn't interface with CFE for nvram functions, then the variable set for NVRAM_SPACE during the compliation of code could be different between CFE and dd-wrt code, right?
In the case of truncated output, that could be what is biting us here... the NVRAM_SPACE variable was set in CFE to be much smaller than dd-wrt uses.
Could this cause other issues with dd-wrt flashing to nvram? We need to look more closely into where else NVRAM_SPACE is used...
Of course, this is all just some educated guessing and conjecture..
Joined: 04 Nov 2006 Posts: 89 Location: The Dalles, Oregon USA
Posted: Thu Sep 24, 2009 3:19 Post subject:
Okay,
Did a series of troubleshooting tonight including flashing back to stock firmware, checking cfe nvram show, upgrading firmware and doing the same, changing settings and doing the same.
The result is that something in the closing of the ssh and vpn certs / keys is crashing the nvram show in cfe. In one variable's case it actually caused a panic/reboot.
This indicates we need to figure out what characters are causing the problem... note that it did not happen on the openvpn_dh variable.
I think this type of problem __May__ still be causing the bricking, but its hard to tell if this is *normal* on most routers...
No one else can help? All i need is for them to set up ssh access so that nvram includes sshd_dss_host_key then serial in to cfe and do an nvram show....
More detailed information below.
Reflashed the router using the wl500g-clear-nvram.trx, wl500g-recover.trx then the factory WL500W_2.0.0.6_EN_CN_TW_DE_KR.trx.
Confirmed that nvram show from cfe would complete successfully, and checked the NVRAM_SPACE variable by adding the used and left.
Code:
...
url_date_x=1111111
ddns_server_x=
wl_antdiv=-1
usb_bannum_x=0
size: 8522 bytes (24246 left)
*** command status = 0
CFE> 8522 + 24246 = 32768 -- Not the problem!? Okay...
So the problem is not a compile time variable change, since 32768 is the same value that dd-wrt uses. Hmm Well, lets get into dd-wrt and see what happens with cfe nvram show...
Flashed with dd-wrt.v24_mini_asus.trx to see if nvram show in cfe would work...
nvram show from cfe still completes successfully (and adds to 32768)
Used the cfe 'save' command to save a copy of all of the area defined as text by cfe...
Code:
CFE> save 192.168.11.10:nvram_default_mega 0x80800000 203184
TFTP Client.
2109828 bytes written to 192.168.11.10:nvram_default_mega
Ok, so now cfe nvram show is working just fine.. time to start configuring...
First, lets configure my standard stuff on the services page...
Does it still work?
Weird bug... and questionable as to whether it is causing the bricking or not, but i HAVE had the most problems with bricking on my routers while i was configuring VPN on the services page.....
Joined: 01 Feb 2007 Posts: 138 Location: Wherever the boat takes me.
Posted: Fri Sep 25, 2009 3:47 Post subject:
Those are characters that many programmers have trouble with, especially linux. I do know that dd-wrt chokes on them if they are used as a value in nvram.
There are others, but those are the biggest problems. They are legal characters, just not programmer friendly usually requiring special handling when used as variables. _________________ Ray
Asus RT-N66U B1, AP Router, Merlin Version 3.0.0.4.376.49_4, bl_version=1.0.1.3
Asus WL-500W, DD-WRT v24-sp2 (05/27/14) mini-usb-ftp - build 24160
Buffalo WHR-G54S, Repeater:
_____ broadcom, DD-WRT v24-sp2 (03/29/14) mini - build 23838, AutoAP (2013-10-01)
Buffalo WHR-HP-G54, Repeater:
_____ broadcom, DD-WRT v24-sp2 (03/29/14) mini - build 23838, AutoAP (2013-10-01)
LinkSys WRT54G-V2, Repeater:
_____ broadcom, DD-WRT v24-sp2 (01/17/15) mini - build 25948 , AutoAP (2013-10-01)