no subject

I wrote a program once, on the insistence of a manager, which was supposed to analyse a multi-dozen-thousand list of internal corporate IP addresses and DNS names and make sure the numbers matched the naming conventions, and output mismatches for the attention of the networking team.

Of course they didn't give me the official naming conventions.

I had the brainstorm to read in the list of IP/name pairs and run a "closest match" on them. This would let me automatically build sets of matching data, and I could then use the derived rules to project what the results should be, run a check between those and the actual data, and highlight any discrepancies.

It had the advantage that I didn't need to hardcode any actual naming convention rules - the program should automatically derive them from the correct 99% of the data.

Well, it sort of worked.

The data went in. The program ground away to itself, testing and trying combinations of match rules and making notes of which rules had the most 'hits' in the raw data. Then it went through again and spat out the data which didn't match those rules.

Except that the data it spat out was for IP/name combos which were actually correct.

No, I hadn't coded the tests backwards. It's just that closest-match process only works if the raw data fed into it follows the desired rules more than 50% of the time. It doesn't work when the company network hosts are 75% incorrectly named in the first place.

(23 comments)

no subject

Post a comment in response: