CWNE Essays – TalesFromtheWi-Fi

To help prevent plagiarism, there has been a growing trend to publish the essays you submitted for your CWNE. So, I’m joining and publishing mine here.

Essay #1 – Low Power Mode

Background

An increase in Wi-Fi complaints frequently occurred following power disruptions of the PSE. A cursory analysis revealed the access points were running in low-power mode causing an inability for the access points and auto-RRM to properly monitor and manage the RF environment. The problem was noted on the wireless access points connected to and receiving power from a variety of Cisco switches. An analysis of why the access points were powering-up in low-power mode and how to prevent it is explored.

Body

This issue only affected sites with Cisco Meraki MR53 wireless access points installed. Users in these locations often complained of poor throughput following power outages. The MR53 is a three-radio access point. One of the radios is a dedicated scanning and security radio.

Investigation of the wireless access points in the areas where complaints were sourced revealed the RF Spectrum view of the access points was not reporting data as was expected.

Figure 1 – Missing RF Spectrum Data

The real-time FFT and waterfall spectrograms were unavailable. This was an indication that the dedicated scanning radios were not running. See Figure 1.

It was also noted that channel distribution and output power of neighboring access points was not as expected. At times, neighboring access points would be found utilizing the same 5GHz channel.

The issue was observed on switches serving as few as a single wireless access point.

Viewing the access point summary information page and viewing the power section revealed the access points were running in low-power mode. See Figure 2.

Figure 2 – Low Power Mode

This issue was not noted during installation or verification as the problem does not occur as frequently when access points power on individually or in small groups. Given that the switches are not provided power via UPS, the issue would occur after a site-based or localized power disruption.

Further analysis revealed multiple access points feeding to the same PSE experiencing the same problem. Examination of the PSE showed it was only providing 20.0 watts as listed under “Power drawn from the source:” and “Power available to the device:” on access points in low-power mode instead of the expected 23.6 watts. Compare details Figures 3 and 4.

Figure 3 – CDP Negotiation

The PSE devices throughout the organization were various models of Cisco PoE capable switches with varying versions of firmware, but the same problem was noted on all models and firmware versions. The PSE was not oversubscribed and had sufficient available power to provide to the access points.

Several iterations of switch reboots were performed during maintenance windows, and the problem was confirmed as reoccurring after issuing a reload command. Following reboot, a varying number of access points would boot in low-power mode but not always the same ones.

Further analysis of the PSE showed that the access points running in low-power mode had negotiated power via CDP and access points running in full-power mode had negotiated via LLDP. See “Power Negotiation Used:” in Figures 3 and 4.

Figure 4 – LLDP Negotiation

Figure 5 – Functional RF Spectrum Page

I then disabled CDP on all switch ports with access points and rebooted the switch. Through multiple reboots, the access points booted in full-power mode. This was confirmed via the access point summary page, RF Spectrum page, as well as on the PSE. See Figures 4 and 5.

I re-enabled CDP, rebooted the PSE, and the problem returned.

Having noted previously that warm reboots (reload) and cold reboots (power disruption) sometimes affect switches differently, additional test-case switches were selected in areas of the organization that experienced frequent power disruption. CDP was then disabled on switch ports feeding access points. The switches were monitored, and switches and access points were examined following confirmed power disruptions. The continued monitoring confirmed that disabling CDP forced power negotiation to occur via LLDP and prevent the access points from entering low-power mode. The test environment was expanded to include an entire site and again wait for a power disruption to occur before disabling CDP at a larger scale. After the switches rebooted following a power disruption, the same was noted as previously, the access points continued to negotiate power via LLDP and operate in full-power mode.

After the access points were returned to full-power mode and were operating at full capability, the users no longer complained of Wi-Fi connectivity issues after power disruptions.

Once confidence was gained that disabling CDP resolved the issue and did not introduce any unexpected behavior, CDP was disabled throughout the organization on switch ports feeding access points.

Summary

After extensive testing, verification and monitoring, it was determined that access Cisco Meraki MR53 access points negotiating power via CDP were entering low-power mode following a power disruption to the PSE. Disabling CDP on the switch ports forced the access points to negotiate power via LLDP. This prevented the access points from entering low-power mode following a reload or power disruption to the PSE. This kept the access points operating at full capabilities with the monitoring radio enabled and full ability to monitor the RF environment. Thus, allowing auto-RRM to have accurate information from all access points in an area and make appropriate channel and output power changes.

Citations

https://documentation.meraki.com/MR/Monitoring_and_Reporting/Low_Power_Mode

https://meraki.cisco.com/lib/pdf/meraki_datasheet_MR53.pdf

Essay # 2 – IPSK Design

Background

A large K-12 school system was removing site-based servers and migrating DHCP services back to a centralized DHCP servers housed in their data center. Without site-based DHCP services, local network access was unavailable to network support staff (NSS) during outages that disrupted those services. Unable to get an address on any of the wired or wireless networks when going onsite to troubleshoot, any local resources used to perform debugging were inaccessible from client devices and tools.

The only available SSIDs were a WPA2-PSK secured network, and a captive-portal controlled BYOD network with internet only connectivity. A design solution was required that would allow NSS to perform site-based analysis during outages without implementing major changes to the existing wireless network. The expectation was to maintain the current level of wireless network security without increasing the number of SSID’s. A Cisco Meraki Identity PSK (IPSK) without RADIUS solution was designed to layer additional configuration on top of the existing WPA2-PSK SSID already in use throughout the organization.

Body

With onsite file servers being deprecated in favor of data being stored in the cloud, DHCP services residing on those servers were no longer being provided onsite and were moved to centralized servers in the organization’s data center. At times, NSS, while onsite, required network access for their devices to debug various network outages. When outages disrupted DHCP services, troubleshooting was made more challenging. The only access available was via console ports on core network components. At times, even the console was not available. When buildings were inaccessible after-hours, NSS may perform baseline triage via local network access via external wireless access points providing coverage to carpool lanes and bus lots.

A dedicated VLAN, Subnet, and DHCP scope were chosen, and the local core switch was employed to host DHCP services. That same VLAN was also mapped to a Group-Policy in the Meraki dashboard for use by NSS. The current WPA2-PSK SSID was changed to a WPA2-IPSK SSID. The IPSK SSID configuration then mapped the Group-Policy to the passphrase. The existing passphrase was mapped to a base Group-Policy providing the same connectivity users had previously. The NSS Group-Policy mirrored the base Group-Policy except it was bound to a new complex passphrase generated using the ‘openssl rand -base64’ command via OpenSSL.

The existing configuration used a single configuration template for all schools and the SSID was bound to different VLANs using access point tags. Due to the use of different VLAN schemes at Elementary, Middle, and High schools and IPSK configuration constraints within the Meraki dashboard, the original configuration template was cloned, the VLAN tags updated, and the appropriate IPSK-Passphrase/Group-Policy mappings were made to support each level site. The result was a separate template for each level of education. Sites were then rebound to the appropriate template to support the proper VLAN/Group-Policy bindings. See Template Designs in Figures 1 and 2.

Figure 1 – Old Template Design

Figure 2 – New Template Design

While testing the cloned templates, I discovered a dashboard bug that prevented the VLAN/Group-Policy mappings from working properly. When editing the VLAN tag, within the Group-Policy, clients were still being bound to the VLAN in the Group-Policy from the original template but could get connectivity via the VLAN mapped to the base Group-Policy/VLAN binding. The policy applied would show as “unknown policy”. See Figures 3 and 4.

Figure 3 – Correct Policy

Figure 4 – Incorrect/Original Policy

A ticket was opened with Meraki support and escalated to their backend engineering team and is currently being investigated. While reproducing the issue to provide details to Meraki support, I established a work-around. The SSID-Passphrase to Group-Policy mappings needed to be deleted and re-created. The work around had to be performed after the networks were re-bound to the new templates from the original. Any network bound from the original template to the new template would be in a failed state until the work-around was performed again. Networks bound previously were not impacted.

After testing, the new VLAN, Gateway and DHCP configuration elements were added to the necessary switches throughout the organization.

Following implementation, the small NSS were able to reconfigure the wireless profiles on their devices to use their dedicated passphrase so they could get user device connectivity even during periods where the centralized DHCP servers were unreachable.

Given the limited number of members of the NSS, the passphrase could be changed on a recurring basis, or in the event of staff turnover or a lost device and would not impose significant complexity. An additional option that was considered but ultimately not pursued, was using a different passphrase for each team member.

Conclusion

A seemingly small alteration in the way DHCP services were implemented meant that NSS would not be able to get DHCP addresses on the wired or wireless networks when going onsite to perform troubleshooting. A solution was designed to provide DHCP addresses via the wireless network. The existing WPA2-PSK SSID was converted to a WPA2-IPSK SSID, adding the functionality necessary to provide local DHCP services to a small subset of individuals. Making a minor change to the SSID configuration allowed it to function as it had previously while maintaining the current level of wireless security and adding local DHCP services. During testing and implementation an unexpected bug was discovered that required a work-around while it was investigated by Meraki backend engineering, further reinforcing the need to always verify results.

Essay # 3 – Rogue Detection

Background

A large K-12 public education institution consisting of more than 12,000 Cisco Meraki wireless access points, distributed across more than 180 sites, needed a way to identify rogue wireless access points for the purpose of mitigation. Given the size of the organization, recurring dashboard monitoring of the networks proved too time consuming and resource straining. A solution that was at least partially automated was needed to assist in detecting, classifying and mitigating threatening rogues connected to the wired network, while not over-burdening limited technical resources.

The use of dashboard-based configuration templates made enabling of email alerts overwhelming and, from experience, can be regarded as SPAM messages if too frequent in nature. All attempts to tune the WIDS system proved to be futile as the number emails generated by non-threating rogues detected was overwhelming.

Body

The wireless access points deployed consisted of various indoor and outdoor Meraki cloud managed models. All of which have a dedicated radio to provide integrated 24x7x365 WIDS/WIPS.

To assist in the automation process, I leveraged the Cisco Meraki API. I developed a Python script to iterate over rogue access points reported in the Meraki dashboard via Air Marshall from all networks in the organization. This script produced a spreadsheet of interesting rogue devices while ignoring any known, non-threatening rogues. The script was run manually on a recurring basis to establish baselines, and investigation/mitigation performed when necessary. The script took less than 15 minutes to execute and further analysis typically took no more than 15-30 minutes. Even when run on a weekly basis, the time investment was minimal. The details included on the spreadsheet were the network name (obscured for security), the channel, the serial number (listed in the ‘Seen By’ column) (also obscured for security) of the access point reporting the strongest signal as reported in dB. See Figure 1.

Figure 1 – Rogue access point

Rogue devices reported on the spreadsheet were further analyzed to determine signal strengths from detecting access points. Rogues with low signal could be disregarded as coming from external neighboring networks. On devices with strong signal, manual investigation was performed to determine the threat of the detected rogue. The device referenced in Figure 1 and Figure 2 had an open SSID enabled.

Given the large footprint of the organization and the proximity to neighboring residences and businesses, mitigation techniques performed via WIPS, particularly those automated in nature, were deemed too risky due to legal ramifications.

The device in Figure 1 was detected by a Cisco Meraki MR53 wireless access point installed inside one of the schools.

Further manual investigation of the rogue in Figure 1 via Air Marshal revealed that the rogue device was operating in the 2.4GHz range, was connected to the wired LAN, and the SSID did not have any security configured. See Figure 2. Note that the detected signal may vary depending on when/how it was reported. Also note that the device had been first seen a year ago. Far too long for a rogue device to be on any network.

Figure 2 – Unsecured SSID connected to the wired LAN

My role being that of a contractor, lacks properly anointed organizational authority to enforce organizational security policies. Therefore, an organization employee was notified to perform “last mile” mitigation.

Devices discovered via additional iterations of the script were misconfigured smart TVs, misconfigured wireless printers and the occasional pocket router or consumer grade wireless access point/router. To date, rogues have not been of malicious intent but those brought in by end users attempting to fulfill a need on their own rather than engaging IT for proper support.

An unintended benefit of this project was the discovery of an improperly deployed access point during a recent network upgrade. The install team deployed the access point but removed its assignment from the network in the Meraki dashboard. Since the access point had no configuration, it was broadcasting the ‘Meraki Setup’ SSID and was detected as a rogue. See Figure 3.

Figure 3 – Misconfigured access point

I was able to correct the configuration of this access point by adding the serial number back to its intended network and verifying its configuration.

Conclusion

Developing a rogue wireless device identification and mitigation plan proved to be successful in helping the organization identify and remove rogue devices. The plan as implemented accomplished the goals of making the organization more secure, while minimizing the amount of time utilized. During the initial iteration of this project, three rogue wireless devices were identified at three different sites. The appropriate school system staff were notified to visit the sites to have the devices removed from the network and to educate the users as to why they were not permitted. Additionally, a misconfigured access point was discovered, and the configuration was able to be corrected.

References

https://meraki.cisco.com/lib/pdf/meraki_datasheet_MR53.pdf