Assignment 4: Perl

Here is updatede top3.tar.gz with files from Tuesdays meeting and the problems below. Untar with gunzip -c top3.tar.gz | tar xf - and remove with /bin/rm -rf top3.
  1. Modify the word count example perl script so it only counts html tags. An tag starts with the less than sign < and ends with the greater than sign >. Test your program on a couple of web pages.
  2. Some html tags like < ol > need matching </ol > end tags while other tags like < li > do not have matching end tags. Modify the example in 1 to print out the tags and the matching end tag (if any) next to each other.
  3. XML has stricter rules than does html. For example, tags without ending tags have a slash (/) at the end. So if one had < li > one would need either an end tag </li> or one would write the tag < li/>. Make a translater that finds all tags like < li > and rewrites the file with the new trailing slash tags for the these.
  4. XML has a stricter nesting rule. It does not allow overlapping tags like < b > < i > TEXT < /b >< /i > but requires them to be nested like < b > < i > TEXT < /i >< /b >. Make an error checker that catchs the first such error.
Sample XML file from maple tmp.xml