Assignment 4: Perl
Here is updatede top3.tar.gz with files from Tuesdays
meeting and the problems below. Untar with gunzip -c top3.tar.gz | tar xf -
and remove with /bin/rm -rf top3.
- Modify the word count example perl script so it only counts html
tags. An tag starts with the less than sign < and ends with the
greater than sign >. Test your program on a couple of web pages.
- Some html tags like < ol > need matching </ol > end tags
while other tags like < li > do not have matching end tags. Modify
the example in 1 to print out the tags and the matching end tag (if
any) next to each other.
- XML has stricter rules than does html. For example, tags without
ending tags have a slash (/) at the end. So if one had < li >
one would need either an end tag </li> or one would write the
tag < li/>. Make a translater that finds all tags like < li > and
rewrites the file with the new trailing slash tags for the these.
- XML has a stricter nesting rule. It does not allow overlapping
tags like < b > < i > TEXT < /b >< /i > but requires
them to be nested like
< b > < i > TEXT < /i >< /b >.
Make an error checker that catchs the first such error.
Sample XML file from maple tmp.xml