Tuesday, September 18, 2007

i need help

Getting an academic degree on a discipline requires students to learn, if not all but at least an overview of, everything under that field. That is why I opted to get a graduate degree instead of taking certifications first. I am still new to the IT industry and I told myself, why not be a jack of all trades first and once I know my strength and interests, then I could be a master of one!

So, misery begins. I used to work in the field of networking. Taking a class in software development is basically harakiri. Having a specialized course in client-server computing is suicide! Well, I’ve had programming experience in my undergrad studies but graduate work requires more skills. I had to relearn old school programming languages and alienate myself from today’s languages in-demand. Yeah, yeah, there are a lot of tutorials online but still!!!

Last night, I got to chat with some of my classmates. Good thing, I am not alone. They are having difficulties as well. Even some professional programmers are struggling. I need help. They need help. We all need help!

Help! Anyone able to strip HTML tags and other entities (scripts, CSS, etc.) please don’t be shy to lend a hand. This is what I have come up so far in Java (just strips the HTML tags but not scripts and CSS):

string = string.replaceAll("\\<.*?\\>", "");

It basically just ignores anything inside <>. It is not a problem if you prefer to create a method/function as long as I can display valid texts of an HTML page. I am implementing this is Java, still can’t figure out how to do this in C but anyhow, I’ll stick with Java as long as it’s done.

The hard part of making a pseudo-proxy server has been dealt with.

- - -

Update:
So far i have managed to ignore HTML tags by:

string = string.replaceAll("\\<.*?\\>", "");
This basically replaces text between <> tags with nothing.What I do is I try to read an HTML line by line (using a loop), print the string implementing the code above.

I just taught myself Java and have limited knowledge about the "built-in" classes. I would appreciate if you could help me find a way to search for texts between \<-- and \--> which are basically texts for scripts. These texts may come in multiple lines so I think I need to create a loop to search for the is found.

I have read about class Pattern, but basically, don't know how to use it. I think using pattern matching would be the best thing to do here.

I think i can deal with CSS thingies once I'm able to solve the problem with . I'll just change the expression.

Appreciate any help on this.

0 comments: